Method: Outlier

Methods: Correlation, Regression, Transformation, Outlier, ANCOVA,
Topics: Nature, Geography,
Datafile Name: Acorns
Abstract:

Fifty species of oak trees grow in the United States. Twenty eight species of oak from the Atlantic region and 11 from the California region were studied. The size of each species' acorns was measured to see whether acorn size is related to geographic range. It is suggested that a plant's...

Methods: Outlier, Transformation, Regression,
Topics: Environment,
Datafile Name: SMSA
Abstract:

Researchers at General Motors collected data on 60 U.S. Standard Metropolitan Statistical Areas (SMSA's) in a study of whether air pollution contributes to mortality. The dependent variable for analysis is age adjusted mortality (called "Mortality"). The data include variables measu...

Methods: Correlation, Dummy Variable, Outlier, Regression, Scatterplot,
Topics: Consumer, Health,
Datafile Name: Alcohol and Tobacco
Abstract:

Data from a British government survey of household spending may be used to examine the relationship between household spending on tobacco products and alcholic beverages. A scatterplot of spending on alcohol vs. spending on tobacco in the 11 regions of Great Britain shows an overall positive line...

Methods: Regression, Outlier, Scatterplot,
Topics: Health, Medical,
Datafile Name: Breast Cancer
Abstract:

In a 1965 report, Lea discussed the relationship between mean annual temperature and the mortality rate for a type of breast cancer in women. The subjects were residents of certain regions of Great Britain, Norway, and Sweden. A simple regression of mortality index on temperature shows a strong p...

Methods: Outlier, Histogram, Mean, Median, Boxplot, Distribution,
Topics: Economics,
Datafile Name: CEO Salaries
Abstract:

Forbes magazine published data on the best small firms in 1993. These were firms with annual sales of more than five and less than $350 million. Firms were ranked by five-year average return on investment. The data extracted are the age and annual salary of the chief executive officer for the fir...

Methods: Outlier, Summary Statistics, Mean, Median, Histogram, Boxplot,
Topics: Miscellaneous,
Datafile Name: Distribution Patterns
Abstract:

This "story" illustrates some of the different distribution patterns that variables may take on. The distribution of Tobin's Q-ratios for firms shows high positive skewness. Sometimes this can be 'remedied' by taking the logarithms of the data. Try this for the Q-ratios. Oth...

Methods: Confidence Interval, Distribution, Histogram, Outlier,
Topics: Science,
Datafile Name: Speed of Light
Abstract:

Simon Newcomb measured the time required for light to travel from his laboratory on the Potomac River to a mirror at the base of the Washington Monument and back, a total distance of about 7400 meters. These measurements were used to estimate the speed of light.

A histogram or dotplot o...

Methods: Lurking Variable, Outlier, Regression,
Topics: Consumer, Engineering, Automotive,
Datafile Name: Cars
Abstract:

In a regression to predict fuel consumption (measured in Gallons/100 miles -- see the related story "Measuring Fuel Efficiency") from the weight and drive ratio of cars, the Buick Estate Wagon shows up as an outlier. Remarkably enough, even though the Buick Estate Wagon has high fuel co...

Methods: Outlier, Regression, Polynomial Regression, Scatterplot,
Topics: Sports,
Datafile Name: Hitters 1920 -1950
Abstract: In 1954 Branch Rickey wrote an article for Life Magazine entitled Goodbye to some old baseball ideas. He criticized some traditional baseball statistics and proposed some of his own that he thought more useful. For individual hitting Rickey proposed the sum of on-base average (OBA) and ex...
Methods: Outlier, Paired T-Test,
Topics: Sports,
Datafile Name: Helium football
Abstract:

Two identical footballs, one air-filled and one helium-filled, were used outdoors on a windless day at The Ohio State University's athletic complex. The kicker was a novice punter and was not informed which football contained the helium. Each football was kicked 39 times. The kicker changed f...

Methods: Outlier, Regression, Polynomial Regression, Boxplot,
Topics: Sports,
Datafile Name: Hitters 1920 -1950
Abstract:

The data set is Branch Rickey's set of outstanding hitters in baseball over the period 1920 to 1950 based on the sum of what Rickey defines as on-base- average and extra-base-power (OBA + EBP). The student should be asked to run the simple regression of EPB on OBA, as well as the second degre...

Methods: Scatterplot, Regression, Outlier,
Topics: Sports,
Datafile Name: Olympic Gold
Abstract:

This dataset contains the gold medal performances in the men's long jump, high jump and discus throw for the modern Olympic games from 1900 to 1984. Data are also provided for the 1896 Olympics, but one may wish to omit them from the analyses because that Olympics was quite different from lat...

Methods: T-test, Outlier, Boxplot, Mann Whitney U Test, Summary Statistics,
Topics: Health, Medical,
Datafile Name: Nursing Home Data
Abstract:

Acorn is the acronym for Association of Community Organizations for Reform Now.
These data were presented by Acorn to a Joint Congressional Hearing on discri-
mination in lending.  Acorn concluded that "banks generally have exhibited a per-
vasive pattern of lending...

Methods: Assumptions, Regression, Outlier, T-test, Boxplot, Diagnostics, Multivariate Regression,
Topics: Economics,
Datafile Name: OECD Economic Development
Abstract:

Data on per capita income and the percentages of the labor force employed in agriculture, industry, and service occupations for 20 Eurpoean OECD countries lend themselves to several kinds of analysis. Univariate analysis of each of the variables is interesting, and the relations between per capit...

Methods: Regression, Outlier, Dummy Variable,
Topics: Government, Social Science, Sociology,
Datafile Name: New York City Crime
Abstract:

The datafile contains percent changes in manpower and seasonally adjusted changes in weekly auto thefts and larcenies for the 25 precincts in New York City from a base period of 27 weeks in 1966 to an experimental period of 58 weeks in late 1966 and 1967. During the experimental period police man...

Methods: Boxplot, Scatterplot, Outlier,
Topics: Education, Economics,
Datafile Name: Faculty Salaries
Abstract:

A faculty salary study was done at The Ohio State University to compare faculty salaries with those at other universities. Data were collected from the Association of American Universities. The overall average salary for OSU was obtained by computing the weighted average of salaries at each facul...

Methods: Outlier, Regression, Residuals, Interaction, Dummy Variable,
Topics: Economics, Sports,
Datafile Name: Q-back and Team Salaries
Abstract:

The datafile contains 1991 season leading quarterback and total team salary
for football teams in the American Football Conference (AFC) and National
Football Conference (NFC) of the National Football League(NFL). Two poten-
tial influential observations (Steelers and Bears) ...

Methods: Outlier, Boxplot, T-test, Summary Statistics,
Topics: Consumer, Economics, Finance,
Datafile Name: Refusals in Mortgage Lending
Abstract:

Acorn is the acronym for Association of Community Organizations for Reform Now. These data were presented by Acorn to a Joint Congressional Hearing on discri- mination in lending. Acorn concluded that "banks generally have exhibited a per- vasive pattern of lending practices that have the ef...

Methods: Histogram, Mean, Median, Outlier,
Topics: Sports,
Datafile Name: Crews
Abstract:

The dataset contains the weights in stones and pounds of the crews participating in the Oxford and Cambridge boat race in 1992. The weights have been converted to pounds for convenience.

Figure 1 is a histogram of weights for the crews in the 1992 Cambridge vs. Oxford boat race. The two...

Methods: Outlier, Regression, Residuals, Transformation, Nonlinear Regression, Dummy Variable,
Topics: Health, Medical, Social Science,
Datafile Name: Smoking and Cancer
Abstract:

Nevada and the District of Columbia are outliers in the distribution of cigarette consumption (sale) per capita by states in 1960. How the most extreme observa- tion, Nevada, should be handled in the regressions of various cancer death rates on cigarette consumption, however, varies. In addition,...

Methods: Outlier, Regression, Polynomial Regression, Dummy Variable,
Topics: Economics, Government,
Datafile Name: State Public Expenditures
Abstract:

In extending the regression of state and local per capita public expenditures to factors in addition to economic ability (see State Spending and Ability to Pay), metropolitanization is a relevant factor. A linear regression with the perce...

Methods: Outlier, Regression, Dummy Variable,
Topics: Economics, Government,
Datafile Name: State Public Expenditures
Abstract:

Economic ability is seen as a major determinant of varying public expenditures per capita among states in 1960. Two factors modify the simple relationship of expenditures to economic ability, however. First is the fact that Nevada has an undue influence on the overall regression. Second is the fa...

Methods: Diagnostics, Regression, Outlier, Transformation,
Topics: Economics, Government, Consumer,
Datafile Name: Home Prices
Abstract:

How taxes change in response to changing market value of homes is a question of concern to citizens as a policy matter as well as a personal financial concern. The tax data included in the datafile Home Prices permit an examination of this question for used homes in Albuquerque in 1993. The linea...

Methods: Dummy Variable, Diagnostics, Interaction, Outlier, Residuals, Transformation,
Topics: Economics, Education, Government,
Datafile Name: Teacher Pay by States
Abstract:

The scatter diagram below shows one potential influential observation, namely Alaska, which is an outlier in terms of spending per pupil. In addition, the question can be raised whether the level and slope of an appropriate regression line would be the same for the three regions of the country. S...

Methods: Regression, Polynomial Regression, Outlier, Transformation,
Topics: Economics,
Datafile Name: TV Ad Yields
Abstract:

The scatter diagram below suggests that the relation between advertising
yield and spending is not linear. An alternative is to fit a regression line with
a second order term, which is shown. However, the logic of the second order
regression line, which turns down within the ...

Methods: Assumptions, Regression, Outlier, T-test, Boxplot, Diagnostics, Multivariate Regression,
Topics: Economics,
Datafile Name: OECD Economic Development
Abstract:

The Datafile OECD Economic Develpment  contains per capita income (PCINC)
and the percentage of the labor force employed in agriculture (AGR) for 20 European
OECD countries in 1960. Below are the same variables for the United States by 
decades with PCINC adjusted t...

Methods: Regression, Outlier, Collinearity, Assumptions, Regression,
Topics: Economics, Social Science,
Datafile Name: Wages and Hours
Abstract:

The data are from a national sample of 6000 households with a male head and earnings of less than $15,000 annually in 1966. Thirty-nine demographic subgroups were formed for analysis of the relation between average hours worked during the year and average hourly wages and other variables. The stu...

Methods: Outlier, T-test, Mann Whitney U Test, ANOVA,
Topics: Economics,
Datafile Name: Waste Run-up
Abstract:

The Mann-Whitney test is available when the normal assumption underlying the Student t-test is in question. are outliers in the data. The data on weekly run-up (or waste percent) for five suppliers of the Levi-Strauss clothing plant in Albuquerque provides an example. Presence of outliers violate...

Contact Us


© 2017 Data Description, inc. All rights reserved.