I. Collect at least 40 pieces of ratio data (measurments - for example, NFL players weights). Do *NOT* use countable data (for example, the number of people in families). Countable data will not be appropriate for some of the statistical tests required in this project. Your data may be obtained personally (workplace data is often an excellent source) or may be obtained from a referenced source from the library or Internet. The data you collected for STATDISK problem 2-18 may be appropriate if approved by the instructor (see below). Informationplease.com and the following website - http://lib.stat.cmu.edu/DASL/ - are good sources of data on the Internet. The data must be real, not made up, and cannot be from our textbook.
You must have your data approved by the instructor before starting your project. Discuss with me a description of the data you plan to use.
The project will be graded on presentation (how it looks, format, etc.), interpretations, possible errors, and possible missing information. I would prefer that you not put each page in a plastic cover unless you have a specific need for that.
Please include at the bottom of your title page the following:
Presentation:II. You will be required to complete various statistical tests on the data and report your results. All calculations (except the standard deviation of the actual data, the mean of the frequency table, and the standard deviation of the frequency table) must be computed by hand/calculator. You must show your calculations - for example, the mean = 645/40 = 16.125). You can use STATDISK to verify your answer, but not as the sole source of computation. You should make comments/interpretations on the results you get and what they tell you about the data. Make it interesting to read. Include your units of measurements (inches, feet, $, etc.) throughout the project. The tests you must complete are as follows:
Interpretations:
Errors:
Missing Information:
Overall Grade:
Part A: Chapter 1 - Data Sampling
Write a paragraph:
(1) Explaining how you found your data, describe
the data, why you choose this particular data, and provide a complete
listing of the actual data in ascending order. (STATDISK can
assist you with this task.)
(2) If appropriate, provide the proper reference source
for your data.
(3) What type of sampling process did you use (random,
stratified, cluster, etc.) and explain why the way you collected your data
was this process.
Part B: Chapter 2 - Descriptive Statistics
(1) Provide the following descriptive statistics:
mean, median, mode (give frequency as well), midrange, sample standard
deviation. Show your calculations for each of these numbers except
the sample standard deviation (you may use your calculator or STATDISK
for this). Describe (write sentences) what each of these numbers represents.
Comment on which measure of center you think best represents your data
and why.
(2) Design a frequency table for your
data with at least 5 classes (first and/or last class should not have a
frequency of 0), showing the midpoint of each class. Using your calculator
or STATDISK re-evaluate the mean and standard deviation using the frequency
table. Explain (write sentences) why the mean and standard deviation
values have changed from what you calculated in Part B1.
(3) Determine the Range Rule of Thumb
for your data. The first part of the Range Rule of Thumb involves an estimate
of the standard deviation. Why is the Range Rule of Thumb standard
deviation different from the acutal standard deviation? Using your
actual mean and standard deviation determine the expected usual minimum
and maximum values (second part of Range Rule of Thumb). Show your
calculations for these numbers. Name any values in your data set
that would be considered to be unusual based on the Range Rule of Thumb's
maximum and minimum values and explain why. (Write sentences!)
(4) Provide a histogram and a boxplot of
your data. What type of distribution (normal, uniform, skewed) does
your data have and explain why (write sentences).
(5) Choose the lowest score, the highest
score, and any score in the middle. Determine the z-scores (measure
of position) of these chosen scores. Show your calculations.
Are any of these scores unusual based on their z-scores? Explain
why or why not (write sentences).
Part C: Chapter 3 - Probability
Using one of the classes you developed in your frequency table, determine
the probability to selecting a single value that is in that class from
the entire list of data. Show your calculations for determining your answer.
Write a sentence or two explaining what this probablity means. Remember
probability can be thought of as a percentage as well.
For the remainder of the project, even if your data is not in a normal
distribution, complete the following:
Part D: Chapter 5 - Normal Probability Distributions
Select three data values - one below the mean and two above the mean.
Using
the techniques from Chapter 5 (showing all computations and drawings),
determine the following:
(1) For the value below the mean, find
the probability of selecting a value below that score. Explain
what your answer means using %.
(2) For one of the values above the
mean, find the probability of selecting a value above that score.
Explain what your answer means using %.
(3) For the two values above the mean,
find the probability of selecting a value between those two scores.
Explain what your answer means using %.
Part E: Chapter 6 - Confidence Intervals
Using the mean of your sample as the best point estimate of the population
mean, develop a confidence interval for the population mean (mu).
Show possible drawings and all calculations. Write sentences explaining
what your confidence interval represents.
Part F: Chapter 7 - Hypothesis Testing
(1) Using your data's sample mean and standard
deviation, make up a reasonable problem that involves a hypothesis
test about your data. (Reasonableness: Why would you state a claim
that the mean is greater than 100 when your sample has a mean less than
100???) State the claim such that it does not involve equality
so that you have a chance to support your claim. Complete the problem using
the traditional method of testing hypotheses. Show all work and drawings.
Intrepret the results (write sentences) of your hypothesis test.
(2) Using STATDISK try different values for
alpha. Find two close alpha values that create different conclusions
for your original claim. Write sentences explaining what is occuring.
(3) Using STATDISK and keeping alpha at 0.05,
try different values for your proposed population mean. Find two
close population mean values that create different conclusions. Write
sentences explaining what is occuring.
III. Your information should be reported in narrative report form and labeled Part A, B, C.....F. Make it interesting, not just dry computional results. Add your opinions and comments about each of the results. You may use STATDISK to assist you (verify your computational answers). However, any STATDISK prinouts incorporated into the project should be intergrated nicely into the report not just attached at the end of the report.
IV. The narrative portion of the report should be typed; statistical computations can be neatly hand written. PSTCC open computer labs have word processing programs. CAOS in McWherter Building, 2nd floor, will type student papers. There is a charge for CAOS produced papers.
V. Project is due on date listed on course calendar. You should start work on your project right after we finish Chapter 5. You may ask the instructor for help regarding the activities or for a review of a draft of your report.
An overall letter grade will be assigned to the project and then translated into a numerical grade as follows:
A+ 100
C+ 80
A 96
C 75
A- 93
C- 70
B+ 90
D 65
B 86
F 50 or less
B- 83
VII. If a Writing Project is not submitted, the course final grade (which will include a 0 for this component) will be lowered by one letter grade.
Revised: May 5, 2005