Linear Regression
Problems
practice
 electricenergy.txt
In the United States, electric energy is measured in kilowatt hours and purchased with dollars. This data set came from 12 months of electric bills for a New York City apartment in the early years of the Twenty First Century. Plot a graph of cost vs. energy consumed and determine the equation of the best fit straight line.
 Explain the significance of the coefficients m, b, and r^{2}.
 dashworldrecords.txt
The text file referenced above has data on the world records for the 100 m dash. The data are broken up into four groups: men's electronicallytimed world records
 men's handtimed world records
 women's electronicallytimed world records
 women's handtimed world records
 Perform a linear regression on both men's and women's world record times as a function of the year the record was set.
 Explain the significance of the numerical results.
 Make an interesting prediction.

vostok.txt
Snow rarely gets a chance to melt in Antarctica, even in the summer when the sun never sets. In the interior of the continent, the temperature of the air hasn't been above the freezing point of water in any significant way for the last 900,000 years. The snow that falls there accumulates and accumulates and accumulates until it compresses into rock solid ice — up to 4.5 km thick in some regions. Since the snow that falls is originally fluffy with air, the ice that eventually forms still holds remnants of this air — very, very old air. By examining the isotopic composition of the gases in carefully extracted ice cores we can learn things about the climate of the past. By extension we might also be able to predict some things about the climate of the future.
Columns: Age of air (years before present)
 Temperature anomaly with respect to the mean recent time value (°C)
 Carbon dioxide concentration (ppm)
 Dust concentration (ppm)
Questions…
 CO_{2}
 Construct a set of overlapping time series graphs for CO_{2} concentration and temperature anomaly.
 Construct a scatter plot of temperature anomaly vs. CO_{2} concentration.
 How are atmospheric carbon dioxide concentration and temperature anomaly related?
 What temperature anomaly might one expect given current atmospheric CO_{2} levels?
 anscombedata.txt
This collection of four hypothetical data sets in one table was created by F.J. Anscombe in 1973 for use as a teaching tool. The data don't correspond to any real experiment. They are just a bunch of numbers with a peculiar behavior. Identify this peculiarity by calculating the coefficients m, b, and r for each of the four data sets, then look at each graph with your eyes and employ your brain to make a judgment. Is linear regression the right tool for analyzing this data? If not, why not and what should be done instead? The columns should be paired up in the following manner… X and Y_{1}
 X and Y_{2}
 X and Y_{3}
 X_{4} and Y_{4}
 standardatmosphere.txt
This text file provides standard meteorological data for the Earth's atmosphere as a function of altitude above sea level. Find the transformation that will relate the pressure to altitude with a linear equation.
 Write the nonlinear equation that results.
statistical
Do the fit
 For each of the following data sets…
 determine the equation of the best fit straight line(s) and
 explain the significance of the coefficients m, b, and r^{2}.
 brakingdistance.txt
In this road test, braking distances were measured for different cars traveling at 60 mph and 80 mph. Graph these distances against one another. Source: Road Test Summary. Road & Track (July 1998): 18687.  satellitefailures.txt
Satellites in low earth orbit (LEO) operate between 250 and 1500 km above the ground. Because Earth's atmosphere extends hundreds of miles into space, LEOs eventually experience enough friction that they fall back to earth and burn up. The accompanying text files gives the number of low earth orbit satellites that reentered the Earth's atmosphere and the number of sunspots for each year since 1969. Graph the number of reentered satellites vs. the number of sunspots. Source: NASA Goddard Space Flight Center.  soap.txt
The weight of the soap in a bathroom shower was recorded almost every day for about a month. Graph the mass of this soap as a function of time.  standardatmosphere.txt
This text file provides standard meteorological data for the Earth's atmosphere as a function of altitude above sea level. Graph temperature as a function of altitude for the tropospheric portion of the atmosphere from sea level to 11 km. (Do not analyze the entire data set. The atmosphere above 11 km behaves much differently.)  toaster.txt
The duration of the toast cycle was measured for different lightdark settings of a two slot electric bread toaster. Graph cycle time as a function of lightdark setting for this toaster when it held one and two slices of bread.  wavelengthoflight.txt
In this experiment the wavelengths of the visible line spectra for an excited gas were measured using two different methods. Graph these trials against one another.  electricenergyhouse.txt
In the United States, electric energy is measured in kilowatt hours and purchased with dollars. This data set came from 24 months of electric bills for a house in New York City in the second decade of the Twenty First Century. Plot a graph of cost vs. energy consumed and determine the equation of the best fit straight line.
 Explain the significance of the coefficients m, b, and r^{2}.
Answer the question
 Answer the questions associated with the following data sets.
 Determine the year when women sprinters will run as fast as their male counterparts in the 100 m dash using…
 dashelectronictiming.txt
only those world records that were timed electronically (as opposed to manually).  dasholympicgoldmedals.txt
olympic gold medal winners (as opposed to world record setters).
 dashelectronictiming.txt
 co2maunaloa.txt
Mauna Loa Observatory on the "Big Island" of Hawaii has been recording atmospheric carbon dioxide concentrations for nearly half a century beginning in the year 1958. Readings are taken continuously, but only the monthly averages are reported. Values are reported in parts per million (ppm) Construct a graph of atmospheric CO_{2} concentration vs. time.
 What two obvious behaviors are revealed in your graph?
 Split the data set in half and perform a linear regression analysis on the data for the years…
 19581987 and
 19882017.
 Compare the behavior of CO_{2} levels in the first half of the data set to the second half.
 gwvardo.txt
Global warming is most easily observed in long term temperature measurements taken at high latitudes (near the poles). Vardø is a village in the extreme northeast of Norway on the Barents Sea. Despite being a few degrees north of the Arctic Circle, its harbor remains ice free due to the warm North Atlantic drift current (an extension of the Gulf Stream). Vardø's climate is mild for its latitude, which means it varies from a few °C above freezing in the summer to a few °C below freezing in the winter. A location with such a stable climate is a good place to check for human induced climate change. Construct a graph of average monthly temperature (AMT) vs. time for the period 1881 to 2006.
 Using linear regression, determine the following quantities for the whole data set…
 the rate of change of temperature in °C per century
 the uncertainty in this value
 the coefficient of determination
 the rootmeansquare error (if you have the ability to calculate this number)
 Divide the data set up into four equal intervals of roughly 378 months (31.5 years) and repeat.
 Compile your results in a table like the one below and comment on the manner in which temperatures have changed at Vardø in this 125 year period. (Use the results of all four calculated columns in your analysis, not just the rate of temperature change.)
Vardø, Norway Source: NASA Goddard Institue for Space Science time interval ΔT/Δt
(°C/100 y)uncertainty
(°C/100 y)r^{2} rmse
(°C)overall (18812006) 1st quarter (18811912) 2nd quarter (19121943) 3rd quarter (19441975) 4th quarter (19752006)  gwcentralpark.txt
[Note: This is an extension of the previous problem, but it can be worked on independently with little loss of meaning.]
Surface air temperatures have increased in New York City on the order of one degree celsius in the Twentieth Century — consistent with the trend of global warming. New York is the largest city in the United States and the fourth largest metropolitan area on the planet. 8.5 million people live within the city limits and an additional 10 million are within commuting distance. With a gross metropolitan product approaching one trillion dollars ($10^{15}) the economy of New York City is larger than that of all but a dozen or so nations. This geographic concentration of people and economic power must certainly have an effect on the local climate. Repeat the analysis described in the previous problem using 125 years worth of temperature measurements taken in Central Park in New York City. Construct a graph of average monthly temperature (AMT) vs. time for the period 1881 to 2006.
 Using linear regression, determine the following quantities for the whole data set…
 the rate of change of temperature in °C per century
 the uncertainty in this value
 the coefficient of determination
 the rootmeansquare error (if you have the ability to calculate this number)
 Divide the data set up into four equal intervals of roughly 378 months (31.5 years) and repeat.
 Compile your results in a table like the one below and comment on the manner in which temperatures have changed at New York City in this 125 year period. (Use the results of all four calculated columns in your analysis, not just the rate of temperature change.)
Central Park, New York Source: NASA Goddard Institue for Space Science time interval ΔT/Δt
(°C/100 y)uncertainty
(°C/100 y)r^{2} rmse
(°C)overall (18812006) 1st quarter (18811912) 2nd quarter (19121943) 3rd quarter (19441975) 4th quarter (19752006)  hawaiianchain.txt
The Hawaiian Island chain is more than just the visible islands. It also includes the Emperor Seamounts. (Seamounts are islands that have eroded down below sea level.) The combined HawaiiEmperor chain is a series of volcanic structures formed by a single, longlived plume of magma referred to as a "hotspot". The hotspot stayed fixed as the pacific plate slowly moved over it, resulting in a chain of volcanoes stretching from the Aleutian Islands off the coast of Alaska to Mount Kilauea on the Big Island of Hawaii. Use this data to determine the speed of the Pacific plate. The columns in this data set are as follows: Volcano number
 Volcano name
 Volcano age (millions of years)
 Distance from Kilauea (km)
 Uncertainty in age (millions of years)
 Uncertainty in distance (km)
 taketheatrain.txt
The A Train makes the longest run of any subway in the New York City Transit system. The stretch from 207 Street to BroadwayNassau is just about as long as the entire island of Manhattan. The data in the accompanying text file were taken from the 2008 weekday schedule for the A Express Train. Add two new columns to the data table.
 Use the time of day given in the timetable to determine the time elapsed in hours.
 Use the fact that the numbered streets in Manhattan are spaced 20 per mile and determine the distance traveled in miles.
 Construct a distancetime graph with a line of best fit and use it to determine the following quantities in AngloAmerican units…
 the average speed of the A Train.
 the length of Manhattan.
 the length of the A line.
 Add two new columns to the data table.
 Brady Haran is a video journalist best known for his YouTube video channels Periodic Videos, Numberphile, and Sixty Symbols. In May 2013 Haran released a video of his journey from Lukla, Nepal to the Base Camp for Mount Everest. Along the way, he and his guides Buddhi Rai and Chandra Rai measured the altitude and boiling point of water as they approached the the world's tallest mountain.
 Compile a table of place name, altitude, and boiling point of water each time Buddhi reported values. (Also include a row for sea level and Mount Everest.)
 Convert the table into a graph of boiling point vs. altitude.
 Add an appropriate curve fit.
 Predict the boiling point of water at the summit of Mount Everest.
Linearize it
 For each of the following data sets…
 find the transformation that will relate the two variables with a linear equation
 write the nonlinear equation that results
 aerodynamicdrag.txt
In this experiment students measured the aerodynamic drag on a weighted party balloon falling at different speeds.  constantforce.txt
In this experiment different masses were subject to the same force and their accelerations recorded.  milkfreshness.txt
The following data were taken from a milk carton sold in North Carolina. Source: Greenler, Robert. Chasing the Rainbow. Milwaukee, WI: EltonWolf, 2000: 140.  moorelaw.txt
This data set shows the number of switches in a computer for various years in the Twentieth Century. Source: IBM Gallery of Science & Art, 590 Madison Avenue, Second Floor, New York, NY 10022 (July 1992).  resonancetube.txt
In this experiment various tuning forks of known frequency were held above a resonance tube, which was used to determine the wavelength of the sound emitted.
algebraic
Linearize it
 Idea for a problem set. Transform the following nonlinear equations into linear equations by the appropriate change of variables. For each transformed equation, identify…
 the new x variable
 the new y variable
 the slope, m
 the yintercept, b
 equation one
 equation two
 and so on