Method: Regression

Methods: Correlation, Regression, Transformation, Outlier, ANCOVA,
Topics: Nature, Geography,
Datafile Name: Acorns
Abstract:

Fifty species of oak trees grow in the United States. Twenty eight species of oak from the Atlantic region and 11 from the California region were studied. The size of each species' acorns was measured to see whether acorn size is related to geographic range. It is suggested that a plant's...

Methods: Scatterplot, Regression, Correlation,
Topics: Health,
Datafile Name: Age and Height
Abstract:

The height of a child is not stable but increases over time. Since the pattern of growth varies from child to child, one way to understand the general growth pattern is by using the average of several children's heights, as presented in this data set. The scatterplot of height versus age is a...

Methods: Regression, Multivariate Regression, Time Series,
Topics: Economics, Consumer,
Datafile Name: Agricultural Economics Studies
Abstract:

These data provide the student with opportunity to develop models for beef prices, beef consumption, pork prices, and pork consumption that take account of consumption and price of a substitute product , a demand shifter (income), and other factors. Frederick V Waugh was Director of the Division ...

Methods: Outlier, Transformation, Regression,
Topics: Environment,
Datafile Name: SMSA
Abstract:

Researchers at General Motors collected data on 60 U.S. Standard Metropolitan Statistical Areas (SMSA's) in a study of whether air pollution contributes to mortality. The dependent variable for analysis is age adjusted mortality (called "Mortality"). The data include variables measu...

Methods: Regression, Time Series, Dummy Variable,
Topics: Automotive, Government, Health,
Datafile Name: Albuquerque Batmobile
Abstract:

In April 1979 the Albuquerque Police Department began a special enforcement program aimed at reducing DWI accidents. The program consisted of a squad of police officers who manned a van which housed a Breath Alcohol Testing (BAT) device. The data collected to evaluate the program consisted of a q...

Methods: Diagnostics, Dummy Variable, Interaction, Regression,
Topics: Economics, Government, Consumer,
Datafile Name: Home Prices
Abstract:

From data maintained by the multiple listing agency in Albuquerque, the file contains prices of home resales along with descriptive data on the home, including square feet, age, number of certain features, custom built or not, and corner location or not. Another dummy variable is whether the home...

Methods: Correlation, Dummy Variable, Outlier, Regression, Scatterplot,
Topics: Consumer, Health,
Datafile Name: Alcohol and Tobacco
Abstract:

Data from a British government survey of household spending may be used to examine the relationship between household spending on tobacco products and alcholic beverages. A scatterplot of spending on alcohol vs. spending on tobacco in the 11 regions of Great Britain shows an overall positive line...

Methods: Regression, Scatterplot, Boxplot,
Topics:
Datafile Name:
Abstract:

American League baseball teams play their games with the designated hitter rule, meaning that piutchers do not bat. The League believes that replacing the pitcher, typically a weak hitter, with another player in the batting order produces more runs and gnerates more interet among fans. Is there e...

Methods: Regression, Interaction,
Topics: Food, Consumer,
Datafile Name: Beef Council Check-off
Abstract:

Average size of farm (SIZE) and percent of farms with sales of $100,000 or more dominate the regression of percent of farmers voting yes for the American Beef Council check-off (YES) on characteristics of farms and operators by counties in Montana. In fact, there is a strong interaction of these ...

Methods: Correlation, Regression, Scatterplot,
Topics: Health, Psychology,
Datafile Name: Brain size
Abstract:

Are the size and weight of your brain indicators of your mental capacity? In this study by Willerman et al. (1991) the researchers use Magnetic Resonance Imaging (MRI) to determine the brain size of the subjects. The researchers take into account gender and body size to draw conclusions about the...

Methods: Regression, Outlier, Scatterplot,
Topics: Health, Medical,
Datafile Name: Breast Cancer
Abstract:

In a 1965 report, Lea discussed the relationship between mean annual temperature and the mortality rate for a type of breast cancer in women. The subjects were residents of certain regions of Great Britain, Norway, and Sweden. A simple regression of mortality index on temperature shows a strong p...

Methods: Diagnostics, Multicollinearity, Regression,
Topics: Food, Science,
Datafile Name: Cheese
Abstract:

As cheddar cheese matures, a variety of chemical processes take place. The taste of matured cheese is related to the concentration of several chemicals in the final product. In a study of cheddar cheese from the LaTrobe Valley of Victoria, Australia, samples of cheese were analyzed for their chem...

Methods: Regression, Interaction, Time Series,
Topics: Economics, Education,
Datafile Name: Enrollment Forecast
Abstract:

These data were used by the Office of Institutional Research at the University of New Mexico to forecast fall undergraduate enrollment. Both the January unem- ployment rate and the June high school graduates for a year are known before students enroll in the university in the fall.

The ...

Methods: Regression, Dummy Variable,
Topics: Economics, Government,
Datafile Name: Factors in Country Inflation
Abstract:

The authors of the publication cited as the datafile source were concerned to find factors associated with differences in historical inflation rates among countries. They developed indexes of independence of the central bank from questionnaires about policies and practices as well as from the leg...

Methods: Correlation, Diagnostics, Paired T-Test, Regression, Transformation,
Topics: Consumer, Economics,
Datafile Name: Fish Prices
Abstract:

The price of fish varies by species and time. The average price recieved by fishermen and vessel owners for several species of fish increased from 41 cents per pound in 1970 to $1.10 per pound in 1980. A paired t-test shows that this increase is highly significant.

There is a strong cor...

Methods: Regression, Interaction, Dummy Variable, ANOVA,
Topics: Consumer, Food,
Datafile Name: Taste Test Scores
Abstract:

Two-way analysis of variance with two levels of each classification can be easily handled by regression analysis with dummy (0,1) variables for the two levels. The example involves food taste scores in an experiment in which a coarse and fine screen for texture and a high and low liquid level wer...

Methods: Lurking Variable, Outlier, Regression,
Topics: Consumer, Engineering, Automotive,
Datafile Name: Cars
Abstract:

In a regression to predict fuel consumption (measured in Gallons/100 miles -- see the related story "Measuring Fuel Efficiency") from the weight and drive ratio of cars, the Buick Estate Wagon shows up as an outlier. Remarkably enough, even though the Buick Estate Wagon has high fuel co...

Methods: Nonlinear Regression, Transformation, Regression,
Topics: Biology,
Datafile Name: Medflies
Abstract:

By using Mediterranean fruit flies, Gompertz's 1825 theory that mortality rates increase at an exponential rate as age increases is examined. (i.e. as an organism gets older, its chance of dying per unit of time increases exponentially.) 1,203,646 fruit flies comprised the population for this...

Methods: Regression, Polynomial Regression, Residuals, Interaction,
Topics: Sports,
Datafile Name: Great Pitchers
Abstract:

The linear regression of hits, control, and clutch measures on the earned run averages (ERA) of leading baseball pitchers of the 1920-1950 era accounts for 96.7 percent of the variation on ERA's. However, a check of the residuals against the predictor variables finds a non-linearity with resp...

Methods: Outlier, Regression, Polynomial Regression, Scatterplot,
Topics: Sports,
Datafile Name: Hitters 1920 -1950
Abstract: In 1954 Branch Rickey wrote an article for Life Magazine entitled Goodbye to some old baseball ideas. He criticized some traditional baseball statistics and proposed some of his own that he thought more useful. For individual hitting Rickey proposed the sum of on-base average (OBA) and ex...
Methods: Histogram, Scatterplot, Regression,
Topics: Nutrition, Health,
Datafile Name: Cereals
Abstract:

This datafile contains nutritional information and grocery shelf location for 77 breakfast cereals. Current research states that adults should consume no more than 30% of their calories in the form of fat, they need about 50 grams (women) or 63 grams (men) of protein daily, and should provide for...

Methods: Regression, Origin, Regression Thru, Diagnostics,
Topics: Astronomy,
Datafile Name: Hubble
Abstract:

In 1929, Edwin Hubble investigated the relationship between distance of a galaxy from the earth and the velocity with which it appears to be receding. Galaxies appear to be moving away from us no matter which direction we look. This is thought to be the result of the "Big Bang". Hubble ...

Methods: Regression, Time Series,
Topics: Consumer,
Datafile Name: Ice Cream
Abstract:

Ice cream consumption was measured over 30 four-week periods from March 18, 1951 to July 11, 1953. The purpose of the study was to determine if ice cream consumption depends on the variables price, income, or temperature. The variables Lag-temp and Year have been added to the original data.

...
Methods: Outlier, Regression, Polynomial Regression, Boxplot,
Topics: Sports,
Datafile Name: Hitters 1920 -1950
Abstract:

The data set is Branch Rickey's set of outstanding hitters in baseball over the period 1920 to 1950 based on the sum of what Rickey defines as on-base- average and extra-base-power (OBA + EBP). The student should be asked to run the simple regression of EPB on OBA, as well as the second degre...

Methods: Regression, Transformation,
Topics: Health, Biology,
Datafile Name: Mercury in Bass
Abstract:

Mercury contamination of edible freshwater fish poses a direct threat to our health. Largemouth bass were studied in 53 different Florida lakes to examine the factors that influence the level of mercury contamination. Water samples were collected from the surface of the middle of each lake in Aug...

Methods: Scatterplot, Regression, Outlier,
Topics: Sports,
Datafile Name: Olympic Gold
Abstract:

This dataset contains the gold medal performances in the men's long jump, high jump and discus throw for the modern Olympic games from 1900 to 1984. Data are also provided for the 1896 Olympics, but one may wish to omit them from the analyses because that Olympics was quite different from lat...

Methods: Regression, Dummy Variable, Interaction,
Topics: Consumer, Engineering,
Datafile Name: Nambeware Polishing Times
Abstract:

The average polishing times for 59 products are related in general to the sizes of the products as indicated by product diameter (or equivalent). However, dif- ferent types of products have special problems in polishing. For example, cas- seroles are deeper than ordinary plates and on that accoun...

Methods: Regression,
Topics: Energy,
Datafile Name: Nuclear Plants
Abstract:

Data were collected on 32 light water nuclear power plants. For each of these plants, the cost, date of construction permit and net capacity (in MegaWatts) are recorded.

It can be interesting to predict power in MegaWatts from the other variables and to look for trends over time.

...
Methods: Time Series, Regression, Scatterplot,
Topics: Sports,
Datafile Name: Olympic Gold
Abstract:

This dataset contains the gold medal performances in the men's long jump, high jump and discus for the modern Olympic games from 1900 to 1984. Data are provided for the 1896 Olympics, but one may wish to omit them from the analyses.

Regressions and scatterplots of performance variab...

Methods: Regression, Dummy Variable, Confidence Interval,
Topics: Government, Miscellaneous,
Datafile Name: Parking Meter Theft
Abstract:

The variable CON in the datafile Parking Meter Theft represents monthly parking meter collections by the principle contractor in New York City from May 1977 to March 1981. In addition to contractor collections, the city made collections from a number of "control" meters close to City Ha...

Methods: Regression, Transformation, Polynomial Regression,
Topics: Automotive, Engineering, Consumer,
Datafile Name: Passenger Car Mileage
Abstract:

Variation in gasoline mileage among makes and models of automobiles is influenced substantially by the weight and horsepower of the vehicles. When miles per gallon and horsepower are transformed to logarithms, the linearity of the regression is improved. A negative second order term is required t...

Methods: Regression, Outlier, Dummy Variable,
Topics: Government, Social Science, Sociology,
Datafile Name: New York City Crime
Abstract:

The datafile contains percent changes in manpower and seasonally adjusted changes in weekly auto thefts and larcenies for the 25 precincts in New York City from a base period of 27 weeks in 1966 to an experimental period of 58 weeks in late 1966 and 1967. During the experimental period police man...

Methods: Regression, Dummy Variable, Interaction, Transformation,
Topics: Consumer, Engineering,
Datafile Name: Nambeware Polishing Times
Abstract:

The relation between polishing time and product diameters as well as type of product (casserole, other) is one which is useful to the company for estimating the polishing time for new products which are designed or suggested for design and manufacture. A necessary regression assumption is that th...

Methods: Regression, Time Series,
Topics: Economics, Consumer,
Datafile Name: Predicting Appliance Sales
Abstract:

The file gives annual unit factory shipments of dishwashers, disposals, refrigerators, and washing machines in the United States from 1960 through 1985 together with two potential predictor series, durable goods expenditures and private residential investment. The data for each series in 1986 wer...

Methods: Regression, Residuals, Time Series,
Topics: Economics, Consumer,
Datafile Name: Predicting Retail Sales
Abstract: The datafile contains 11 years of quarterly sales for four kinds of retail establish- ments, along with non-agricultural employment and wage and salary disbursements The task is to develop a model for predicting sales using leading values of employ- ment or wage and salary disbursements, seasonal in...
Methods: Outlier, Regression, Residuals, Interaction, Dummy Variable,
Topics: Economics, Sports,
Datafile Name: Q-back and Team Salaries
Abstract:

The datafile contains 1991 season leading quarterback and total team salary
for football teams in the American Football Conference (AFC) and National
Football Conference (NFC) of the National Football League(NFL). Two poten-
tial influential observations (Steelers and Bears) ...

Methods: Dummy Variable, Regression, Residuals, Time Series,
Topics: Consumer, Economics,
Datafile Name: Quarterly Appliance Sales
Abstract:

The data presented in the datafile Quarterly Appliance Sales are suitable for developing models for predicting the four kinds of appliance sales. Three kinds of models can be developed. These are:

1. A model with appropriate quarterly indicators, a time variable, coincident and lagged v...

Methods: Boxplot, Regression,
Topics: Nature, Zoology,
Datafile Name: Wild Horses
Abstract:

Management of the growing mustang population on federal lands has been a controversial issue. A suggested method for controlling overpopulation is to sterilize the dominant male in each group. Eagle, Asa, and Garrott et al. (1993) conducted an experiment evaluating the effectiveness of sterilizin...

Methods: Regression, Time Series,
Topics: Economics, Health,
Datafile Name: Emerald
Abstract:

The Emerald datafile provides the Consumer Price Index (CPI) for all urban consumers, the medical component of the CPI, and the claim costs of the Emerald health care plan for the years 1986 to 1992. The Emerald health care providers claim that the cost of their plan has risen more slowly than ov...

Methods: Outlier, Regression, Residuals, Transformation, Nonlinear Regression, Dummy Variable,
Topics: Health, Medical, Social Science,
Datafile Name: Smoking and Cancer
Abstract:

Nevada and the District of Columbia are outliers in the distribution of cigarette consumption (sale) per capita by states in 1960. How the most extreme observa- tion, Nevada, should be handled in the regressions of various cancer death rates on cigarette consumption, however, varies. In addition,...

Methods: Correlation, Regression, Scatterplot,
Topics: Health, Medical,
Datafile Name: Smoking and Cancer
Abstract:

Government statisticians in England conducted a study of the relationship between smoking and lung cancer. The data concern 25 occupational groups and are condensed from data on thousands of individual men. The explanatory variable is the number of cigarettes smoked per day by men in each occupat...

Methods: Outlier, Regression, Polynomial Regression, Dummy Variable,
Topics: Economics, Government,
Datafile Name: State Public Expenditures
Abstract:

In extending the regression of state and local per capita public expenditures to factors in addition to economic ability (see State Spending and Ability to Pay), metropolitanization is a relevant factor. A linear regression with the perce...

Methods: Outlier, Regression, Dummy Variable,
Topics: Economics, Government,
Datafile Name: State Public Expenditures
Abstract:

Economic ability is seen as a major determinant of varying public expenditures per capita among states in 1960. Two factors modify the simple relationship of expenditures to economic ability, however. First is the fact that Nevada has an undue influence on the overall regression. Second is the fa...

Methods: Diagnostics, Regression, Outlier, Transformation,
Topics: Economics, Government, Consumer,
Datafile Name: Home Prices
Abstract:

How taxes change in response to changing market value of homes is a question of concern to citizens as a policy matter as well as a personal financial concern. The tax data included in the datafile Home Prices permit an examination of this question for used homes in Albuquerque in 1993. The linea...

Methods: Regression, T-test, Time Series,
Topics: Government, Medical, Miscellaneous, Sports,
Datafile Name: Time Series
Abstract:

The datafile includes four series which can be described as time series exper- iments. At some point in the time series, new conditions are introduced. The question then is raised whether the sample of observations after the change comes from a population with a different mean than the sample of ...

Methods: Regression, Smoothing, Time Series,
Topics: Sports,
Datafile Name: Tour de France
Abstract:

The Tour de France bicycle race has been run since 1903. It is the best known bicycle race in the world.

The data are facts about each race. Look for trends over time, for differences among countries, and for other patterns.

...
Methods: Transformation, Regression, Assumptions, Regression,
Topics: Miscellaneous,
Datafile Name: Transformations
Abstract:

Four sets of data are presented in which the relation between X and Y is not best described as linear in the original numbers. Therefore, some transformation of X, Y, or both is appropriate. Suggestions are log Y, log X, log Y and log X, 1/Y, and 1/X. Two of the sets can be described as learning ...

Methods: Regression, Polynomial Regression, Outlier, Transformation,
Topics: Economics,
Datafile Name: TV Ad Yields
Abstract:

The scatter diagram below suggests that the relation between advertising
yield and spending is not linear. An alternative is to fit a regression line with
a second order term, which is shown. However, the logic of the second order
regression line, which turns down within the ...

Methods: Regression, Residuals, Time Series,
Topics: Government, Health,
Datafile Name: U.S. and New Mexico Highway Deaths
Abstract:

The graph below suggests that the New Mexico highway fatality rate is highly related to the U.S. fatality rate. Other aspects need to be considered, however. Among these are the lagged New Mexico rate and the lagged U.S. rate, and perhaps time itself. Whatever model is finally adopted, the residu...

Methods: Regression, Collinearity,
Topics: Economics,
Datafile Name: Unemployment
Abstract:

This dataset contains the United States unemployment rate, Federal Reserve Board index of industrial production, and year of the decade for 1950-1959. Unemployment is a natural dependent variable.

A regression of Unemployment versus FRB index shows that unemployment rises as industrial ...

Methods: Regression, Interaction, Dummy Variable,
Topics: Economics, Government,
Datafile Name: Unions and State Labor Law
Abstract:

The dummy variables for presence of state bargaining laws covering public employees and presence of state right to work laws in the datafile provide an opportunity to examine a two-predictor regression involving dummy varia- bles. In this case there is interaction between the two dummy variables ...

Methods: Collinearity, Correlation, Causation, Lurking Variable, Regression,
Topics: Social Science,
Datafile Name: US Crime
Abstract:

These data are crime-related and demographic statistics for 47 US states in 1960. The data were collected from the FBI's Uniform Crime Report and other government agencies to determine how the dependent variable crime rate (R) depends on the other variables measured in the ...

Methods: Assumptions, Regression, Regression, Partial Regression Plot, Polynomial Regression, Smoothing,
Topics: Environment, Weather,
Datafile Name: US Temperatures
Abstract:

The data gives the normal average January minimum temperature in degrees Fahrenheit with the latitude and longitude of 56 U.S. cities. (For each year from 1931 to 1960, the daily minimum temperatures in January were added together and divided by 31. Then, the averages for each year were averaged ...

Methods: Dummy Variable, Regression, Scatterplot,
Topics: Government,
Datafile Name: Votes
Abstract:

The "Votes" dataset contains the percent of the popular vote that was won by the Democratic presidential candidates in the 1980 and 1984 elections. Both candidates, Jimmy Carter in 1980 and Walter Mondale in 1984, were defeated by the Republican Ronald Reagan. (In 1980 an independent ca...

Methods: Regression, Outlier, Collinearity, Assumptions, Regression,
Topics: Economics, Social Science,
Datafile Name: Wages and Hours
Abstract:

The data are from a national sample of 6000 households with a male head and earnings of less than $15,000 annually in 1966. Thirty-nine demographic subgroups were formed for analysis of the relation between average hours worked during the year and average hourly wages and other variables. The stu...

Methods: Diagnostics, Correlation, Regression,
Topics: Psychology,
Datafile Name: Crawling
Abstract:

This study investigated whether babies take longer to learn to crawl in cold months when they are often bundled in clothes that restrict their movement, than in warmer months. The study sought an association between babies' first crawling age and the average temperature during the month they ...

Contact Us


© 2017 Data Description, inc. All rights reserved.