# Linear Regression

## Problems

### practice

- electric-energy.txt

In the United States, electric energy is measured in kilowatt hours and purchased with dollars. This data set came from 12 months of electric bills for a New York City apartment in the early years of the 21st century.- Plot a graph of cost vs. energy consumed and determine the equation of the best fit straight line.
- Explain the significance of the coefficients
*m*,*b*, and*r*^{2}.

- dash-world.txt

The text file referenced above has data on the world records for the 100 m dash. The data are broken up into four groups:- men's electronically-timed world records
- men's hand-timed world records
- women's electronically-timed world records
- women's hand-timed world records

- Perform a linear regression on both men's and women's world record times as a function of the year the record was set.
- Explain the significance of the numerical results.
- Make an interesting prediction.

- vostok.txt

Snow rarely gets a chance to melt in Antarctica, even in the summer when the sun never sets. In the interior of the continent, the temperature of the air hasn't been above the freezing point of water in any significant way for the last 900,000 years. The snow that falls there accumulates and accumulates and accumulates until it compresses into rock solid ice — up to 4.5 km thick in some regions. Since the snow that falls is originally fluffy with air, the ice that eventually forms still holds remnants of this air — very, very old air. By examining the isotopic composition of the gases in carefully extracted ice cores we can learn things about the climate of the past. By extension we might also be able to predict some things about the climate of the future.

Columns- Age of air (years before present)
- Temperature anomaly with respect to the mean recent time value (°C)
- Carbon dioxide concentration (ppm)
- Dust concentration (ppm)

Adapted from Petit, et al. 1999

Questions…

- CO
_{2}- Construct a set of overlapping time series graphs for CO
_{2}concentration and temperature anomaly. - Construct a scatter plot of temperature anomaly vs. CO
_{2}concentration. - How are atmospheric carbon dioxide concentration and temperature anomaly related?
- What temperature anomaly might one expect given current atmospheric CO
_{2}levels?

- Construct a set of overlapping time series graphs for CO

- anscombe.txt

This collection of four hypothetical data sets in one table was created by F.J. Anscombe in 1973 for use as a teaching tool. The data don't correspond to any real experiment. They are just a bunch of numbers with a peculiar behavior. Identify this peculiarity by calculating the coefficients*m*,*b*, and*r*for each of the four data sets, then look at each graph with your eyes and employ your brain to make a judgment. Is linear regression the right tool for analyzing this data? If not, why not and what should be done instead? The columns should be paired up in the following manner…- X and Y
_{1} - X and Y
_{2} - X and Y
_{3} - X
_{4}and Y_{4}

*The American Statistician*. Vol. 27 No. 1 (1973): 19. - X and Y
- standard-atmosphere.txt

This text file provides standard meteorological data for the Earth's atmosphere as a function of altitude above sea level.- Find the transformation that will relate the pressure to altitude with a linear equation.
- Write the nonlinear equation that results.

### statistical

#### Do the fit

- For each of the following data sets…
- determine the equation of the best fit straight line(s) and
- explain the significance of the coefficients
*m*,*b*, and*r*^{2}.

- braking-distance.txt

In this road test, braking distances were measured for different cars traveling at 60 mph and 80 mph. Graph these distances against one another. Source: Road & Track, 1998. - satellite-failures.txt

Satellites in low Earth orbit (LEO) operate between 250 and 1500 km above the ground. Because Earth's atmosphere extends hundreds of miles into space, LEOs eventually experience enough friction that they fall back to Earth and burn up. The accompanying text files gives the number of low Earth orbit satellites that reentered the Earth's atmosphere and the number of sunspots for each year since 1969. Graph the number of reentered satellites vs. the number of sunspots. Source: NASA Goddard Space Flight Center. - soap.txt

The weight of the soap in a bathroom shower was recorded almost every day for about a month. Graph the mass of this soap as a function of time. - standard-atmosphere.txt

This text file provides standard meteorological data for the Earth's atmosphere as a function of altitude above sea level. Graph temperature as a function of altitude for the tropospheric portion of the atmosphere from sea level to 11 km. (Do not analyze the entire data set. The atmosphere above 11 km behaves much differently.) - toaster.txt

The duration of the toast cycle was measured for different light-dark settings of a two slot electric bread toaster. Graph cycle time as a function of light-dark setting for this toaster when it held one and two slices of bread. - wavelength-of-light.txt

In this experiment the wavelengths of the visible line spectra for an excited gas were measured using two different methods. Graph these trials against one another. - electric-energy-house.txt

In the United States, electric energy is measured in kilowatt hours and purchased with dollars. This data set came from 24 months of electric bills for a house in New York City in the second decade of the 21st century.- Plot a graph of cost vs. energy consumed and determine the equation of the best fit straight line.
- Explain the significance of the coefficients
*m*,*b*, and*r*^{2}.

#### Answer the question

- Answer the questions associated with the following data sets.

- Determine the year when women sprinters will run as fast as their male counterparts in the 100 m dash using…
- dash-electronic.txt

only those world records that were timed electronically (as opposed to manually). - dash-olympic.txt

olympic gold medal winners (as opposed to world record setters).

- dash-electronic.txt
- co2-mauna-loa.txt

Mauna Loa Observatory on the "Big Island" of Hawaii has been recording atmospheric carbon dioxide concentrations for nearly half a century beginning in the year 1958. Readings are taken continuously, but only the monthly averages are reported. Values are reported in parts per million (ppm)- Construct a graph of atmospheric CO
_{2}concentration vs. time. - What two obvious behaviors are revealed in your graph?
- Split the data set in half and perform a linear regression analysis on the data for the years…
- 1958-1987 and
- 1988-2017.

- Compare the behavior of CO
_{2}levels in the first half of the data set to the second half.

- Construct a graph of atmospheric CO
- gw-vardo.txt

Global warming is most easily observed in long term temperature measurements taken at high latitudes (near the poles). Vardø is a village in the extreme northeast of Norway on the Barents Sea. Despite being a few degrees north of the Arctic Circle, its harbor remains ice free due to the warm North Atlantic drift current (an extension of the Gulf Stream). Vardø's climate is mild for its latitude, which means it varies from a few °C above freezing in the summer to a few °C below freezing in the winter. A location with such a stable climate is a good place to check for human induced climate change.- Construct a graph of average monthly temperature (AMT) vs. time for the period 1881 to 2006.
- Using linear regression, determine the following quantities for the whole data set…
- the rate of change of temperature in °C per century
- the uncertainty in this value
- the coefficient of determination
- the root mean square error (if you have the ability to calculate this number)

- Divide the data set up into four equal intervals of roughly 378 months (31.5 years) and repeat.
- Compile your results in a table like the one below and comment on the manner in which temperatures have changed at Vardø in this 125 year period. (Use the results of all four calculated columns in your analysis, not just the rate of temperature change.)

Vardø, Norway Source: NASA Goddard Institue for Space Science time interval ∆ *T*/∆*t*

(°C/100 y)uncertainty

(°C/100 y)*r*^{2}rmse

(°C)overall (1881–2006) 1st quarter (1881–1912) 2nd quarter (1912–1943) 3rd quarter (1944–1975) 4th quarter (1975–2006) - gw-central-park.txt

[Note: This is an extension of the previous problem, but it can be worked on independently with little loss of meaning.]

Surface air temperatures have increased in New York City on the order of one degree Celsius in the 20th century — consistent with the trend of global warming. New York is the largest city in the United States and the fourth largest metropolitan area on the planet. 8.5 million people live within the city limits and an additional 10 million are within commuting distance. With a gross metropolitan product approaching one trillion dollars ($10^{15}) the economy of New York City is larger than that of all but a dozen or so nations. This geographic concentration of people and economic power must certainly have an effect on the local climate. Repeat the analysis described in the previous problem using 125 years worth of temperature measurements taken in Central Park in New York City.- Construct a graph of average monthly temperature (AMT) vs. time for the period 1881 to 2006.
- Using linear regression, determine the following quantities for the whole data set…
- the rate of change of temperature in °C per century
- the uncertainty in this value
- the coefficient of determination
- the root-mean-square error (if you have the ability to calculate this number)

- Divide the data set up into four equal intervals of roughly 378 months (31.5 years) and repeat.
- Compile your results in a table like the one below and comment on the manner in which temperatures have changed at New York City in this 125 year period. (Use the results of all four calculated columns in your analysis, not just the rate of temperature change.)

Central Park, New York Source: NASA Goddard Institue for Space Science time interval ∆ *T*/∆*t*

(°C/100 y)uncertainty

(°C/100 y)*r*^{2}rmse

(°C)overall (1881–2006) 1st quarter (1881–1912) 2nd quarter (1912–1943) 3rd quarter (1944–1975) 4th quarter (1975–2006) - hawaiian-chain.txt

The Hawaiian Island chain is more than just the visible islands. It also includes the Emperor Seamounts. (Seamounts are islands that have eroded down below sea level.) The combined Hawaii-Emperor chain is a series of volcanic structures formed by a single, long-lived plume of magma referred to as a "hotspot". The hotspot stayed fixed as the pacific plate slowly moved over it, resulting in a chain of volcanoes stretching from the Aleutian Islands off the coast of Alaska to Mount Kilauea on the Big Island of Hawaii. Use this data to determine the speed of the Pacific plate. The columns in this data set are as follows:- Volcano number
- Volcano name
- Volcano age (millions of years)
- Distance from Kilauea (km)
- Uncertainty in age (millions of years)
- Uncertainty in distance (km)

- take-the-a-train.txt

The A Train makes the longest run of any subway in the New York City Transit system. The stretch from 207 Street to Broadway-Nassau is just about as long as the entire island of Manhattan. The data in the accompanying text file were taken from the 2008 weekday schedule for the A Express Train.- Add two new columns to the data table.
- Use the time of day given in the timetable to determine the
*time*elapsed in*hours*. - Use the fact that the numbered streets in Manhattan are spaced 20 per mile and determine the
*distance*traveled in*miles*.

- Use the time of day given in the timetable to determine the
- Construct a distance-time graph with a line of best fit and use it to determine the following quantities in Anglo-American units…
- the average speed of the A Train.
- the length of Manhattan.
- the length of the A line.

- Add two new columns to the data table.
- Brady Haran is a video journalist best known for his YouTube video channels Periodic Videos, Numberphile, and Sixty Symbols. In May 2013 Haran released a video of his journey from Lukla, Nepal to the Base Camp for Mount Everest. Along the way, he and his guides Buddhi Rai and Chandra Rai measured the altitude and boiling point of water as they approached the the world's tallest mountain.
- Compile a table of place name, altitude, and boiling point of water each time Buddhi reported values. (Also include a row for sea level and Mount Everest.)
- Convert the table into a graph of boiling point vs. altitude.
- Add an appropriate curve fit.
- Predict the boiling point of water at the summit of Mount Everest.

#### Linearize it

- For each of the following data sets…
- find the transformation that will relate the two variables with a linear equation
- write the nonlinear equation that results

- aerodynamic-drag.txt

In this experiment students measured the aerodynamic drag on a weighted party balloon falling at different speeds. - constant-force.txt

In this experiment different masses were subject to the same force and their accelerations recorded. - milk-freshness.txt

The following data were taken from a milk carton sold in North Carolina. Source: Greenler, Robert.*Chasing the Rainbow*. Milwaukee, WI: Elton-Wolf, 2000: 140. - moore-law.txt

This data set shows the number of switches in a computer for various years in the 20th century. Source: IBM Gallery of Science & Art, 590 Madison Avenue, Second Floor, New York, NY 10022 (July 1992). - resonance-tube.txt

In this experiment various tuning forks of known frequency were held above a resonance tube, which was used to determine the wavelength of the sound emitted.

### algebraic

#### Linearize it

- Idea for a problem set. Transform the following nonlinear equations into linear equations by the appropriate change of variables. For each transformed equation, identify…
- the new
*x*variable - the new
*y*variable - the slope,
*m* - the
*y*-intercept,*b*

- the new

- equation one
- equation two
- and so on