During the course of the semester, we will be using probability and statistics in our study of genetics. This first laboratory exercise will review the principles that will be used throughout the course. For some students, this will serve as a refresher, while for others, it may be completely new material.
In the examples given below, assume that a coin has a head and a tail side, each of which is equally likely to be obtained when the coin is tossed. Also, a deck of cards is a standard deck with 52 cards (no jokers), 13 cards in each of four suits. The cards within a suit are ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, jack, queen and king, and the suits are clubs, diamonds, hearts and spades.
The probability (p) of an event occurring
is calculated by the frequency of the event (e) divided by the total of
all possible occurrences (n)
p = e/n
For instance, the probability of selecting the ace of spades is 1/52. The probability of selecting ANY ace is 4/52 or 1/13; and the probability of selecting ANY spade is 13/52 or 1/4.
Sum Rule
The sum rule is used when considering the
probability of either of two mutually exclusive events. If the verbal expression
is 'A or B,' the 'or' clues you in that the sum rule is applied. In this
case, the individual probabilities are added.
pA or B = pA + pB
For example, the probability of selecting the three of clubs or any ace from the deck is the sum of the individual probabilities:
probability of selecting a three of clubs
= 1/52
probability of selecting any ace = 4/52
total probability = 1/52 + 4/52 = 5/52.
Product Rule
The product rule is used when two events occur simultaneously (or consecutively). The general verbal formula is 'A and B.' In this case, total probability of both events occurring is the product of the two individual events. There are a few tricky applications which will be considered following the examples.
What is the probability of tossing a coin twice and obtaining heads both times?
The probability of obtaining heads on each coin flip is ½; therefore the total probability is: ½ x ½ = ¼.
Assuming there is an equal chance of having
a boy or a girl, what is the probability in a family with 7 children, that
all 7 will be girls?
(½)7 = 1/128
Now for the tricky calculations:
What is the probability that you will draw an ace of spades and a king of hearts when you draw 2 cards from the deck? The term 'and' indicates that the product rule should be used. As mentioned above, the probability of drawing the ace of spades is 1/52. But the first card could be either the ace of spades OR the king of hearts. Thus, the probability of drawing one of the two cards first is
1/52 + 1/52 = 1/26
Assuming you are holding the first card
when you draw the second card, the probability of drawing the second card
specified is 1/51. (Remember that one of the cards has been removed
from the deck!) Thus, the total probability is:
1/26 x 1/51 = 1/1326
What is the likelihood, in a family with 7 children, that there will be one boy and six girls? You could use the same formula as above, but you must keep in mind that there are 7 ways to have one boy and six girls. (The boy could be first, second, third, fourth, fifth, sixth or last in birth order.) Because there are 7 mutually exclusive possibilities, the sum rule comes into play (note the 'or' in the listing above). Thus the probability of six girls and one boy in a family with seven children is 7/128.
Binomial Expansion and Pascal's Triangle
If you consider the likelihood, in a family
with seven children, of having 3 girls and 4 boys, the calculations become
a little more difficult. Exactly how many different ways can you have 3
girls and 4 boys? Take a couple of minutes (no more!) to list them; attempt
to use a method of logic that will prevent you from repeating any of the
combinations.
_______ _______ _______ _______ _______ _______
_______ _______ _______ _______ _______ _______
_______ _______ _______ _______ _______ _______
_______ _______ _______ _______ _______ _______
_______ _______ _______ _______ _______ _______
_______ _______ _______ _______ _______ _______
_______ _______ _______ _______ _______
_______
How many could you list? Did you develop
a pattern? Could you easily determine combinations for larger families?
Would you like to know an easier way?
Perhaps you will recall binomial expansion from algebra:
(a + b)0 = 1
(a + b)1 = a + b
(a + b)2 = a2 + 2ab + b2
(a + b)3 = a3 + 3a2b + 3ab2 + b3
(a + b)4 = a4 + 4a3b + 6a2b2 + 4ab3 + b4
(a + b)5 = a5 + 5a4b + 10a3b2 + 10a2b3 + 5ab4 + b5
(a + b)6= a6 + 6a5b + 15a4b2 + 20a3b3 + 15a2b4 + 6ab5 + b6
(a + b)7 = a7 + 7a6b
+ 21a5b2 + 35a4b3 + 35a3b4
+ 21a2b5 + 7ab6 + b7
You may have heard of Pascal's triangle,
which is made up of the coefficients in front of each term in the expanded
form:
| 1 | ||||||||||||||
| 1 | 1 | |||||||||||||
| 1 | 2 | 1 | ||||||||||||
| 1 | 3 | 3 | 1 | |||||||||||
| 1 | 4 | 6 | 4 | 1 | ||||||||||
| 1 | 5 | 10 | 10 | 5 | 1 | |||||||||
| 1 | 6 | 15 | 20 | 15 | 6 | 1 | ||||||||
| 1 | 7 | 21 | 35 | 35 | 21 | 7 | 1 |
If you look at the triangle, the outside numbers are always one. The other values can be calculated as the sum of the two numbers on either side of it in the row above. Calculate the values for the next row in Pascal's triangle.
___ ___ ___ ___ ___ ___ ___ ___ ___
In fact, each term in the binomial expansion can be calculated using the formula:
n! (pxqn-x)
x!(n-x)!
Where n! = n x (n-1) x (n-2) x .... x 1
The term n! is read 'n factorial.'
Going back to the initial problem (ways of having 3 girls and 4 boys in a family of 7 children), let
n = total number of children
p = probability of having a girl
q = probability of having a boy
x = number of girls
n - x = number of boys
Then the coefficient
n!
x!(n-x)!
represents the number of combinations of 3 girls and 4 boys in a family of 7.
If you don't have a calculator that handles factorials, there is a quick way of calculating them. Let's solve for the coefficient with 3 girls and 4 boys:
7! = 7 x 6
x 5 x 4!
= 7 x 6 x 5
= 7 x 5 = 35
3!4! (3 x 2
x 1) x 4!
6
expand ;
cancel 4!,
simplify, cancel 6
solve
in numerator and in numerator and
denominator
denominator
Thus, there are 35 combinations of 3 girls and 4 boys in a family of 7 children. (How many were you able to list??)
The usefulness of this formula will become apparent when we begin analyzing pedigrees. If, for instance, we have a family in which both parents are carriers for the recessive gene for cystic fibrosis (Cc), and we want to know the probability that their first child will have cystic fibrosis (cc), we can readily determine that the probability of each parent providing the defective gene is ½, and since both parents must contribute the gene, we use the product rule to determine the probability is ½ x ½ = ¼. But what is the probability that 2 of their 3 children will have cystic fibrosis? Use the formula to calculate the probability:
n = number of children
p = probability of having cystic fibrosis
q = probability of not having cystic fibrosis
x = number of children with cystic fibrosis
n - x = number of children not having
cystic fibrosis
What is the probability that two of the three children WILL have cystic fibrosis? ______
What is the probability that two of the three children WILL NOT have cystic fibrosis? ____
Why is there a difference in these two
values?
Chi Square Analysis
If you were to flip a coin 100 times, how many times would you expect to obtain heads? __ Would you be surprised if there were some slight variations from this value? ____
If a magician were flipping the coin and obtained heads 50% of the time, would you be suspicious that something unusually were occurring? ___
What about 75% of the time? ___
If the magician ALWAYS got heads, 100 times
in a row, would you be suspicious? ___
We expect that there will be chance deviations from the expected values, but sometimes the variations are large, and are due to something beyond chance. Sample size has an impact; it is common in a family with two children to have all boys. It is not as common in a family of twenty children to have all boys! This is why a larger sample yields results with greater validity.
The chi square test is used to determine if the deviations are within a range considered to be normal, or if they are so different than what we expected that we must consider that something other than chance is involved.
One method of determining whether variations
from the expected are reasonable is the chi square test (see table 1).
The observed results are what is measured. The expected results are what
was predicted. These should be in the same units. For instance, if you
expect ¼ of families with two children to have all boys, and you
poll 100 families with 4 children, you would expect ¼ x 100 or 25
families to have all boys. One way of double checking if the observed and
expected values are in the same units is to see if the sum of each of the
columns is the same. The deviation is the difference between the observed
and expected values (O - E). Note that the sum of all the deviations is
always zero. This value isn't very useful, since the negative deviations
cancel the positive ones. Therefore
| category | observed (O) | expected (E) | (O - E) | (O - E)2 | (O - E)2/E |
| 2 boys | 32 | 25 | 7 | 49 | 1.96 |
| 1 boy; 1 girl | 46 | 50 | -4 | 16 | 0.32 |
| 2 girls | 22 | 25 | -3 | 9 | 0.36 |
| total | 100 | 100 | 0 | 2 = 2.64 |
the squared deviation [(O - E)2]
is used. This eliminates all negative values. This is also not very useful,
as it doesn't take into consideration sample size. (A deviation of two
is a LOT in a sample size of five, but not very much in a sample size of
a thousand.) Therefore, the squared deviation is divided by the expected
value to obtain a measure of the relative size of these variations from
the predicted values [(O - E)2/E]. These are summed to obtain
the chi square (2) value.
The chi square value is then found on a chart to determine a range of p (probability of variation due to chance alone) values (table 2). The degrees of freedom are the minimum number of values in a data set that must be known in order to determine the values of the remaining classes. In the example of families with 2 children, if you were told to collect the data from 100 families, and you knew that 32 families had two boys and 46 families had a boy and a girl (in any order), you could determine that 22 families had two girls. In fact, if you know any two classes, you can determine the third class. The value of the third class depends on the value of the first two, which are independent variables. The degrees of freedom is a measure of the number of independent variables, which in this example is two. For the most part, the degrees of freedom is one less than the number of classes, though when we discuss population genetics, you will discover this general rule does not apply.
A p value of .99 means that there is a
99% chance that the variation from the expected is due to chance alone.
A p value of .05 means there is a 5% chance that variation is due to chance
alone (and a 95% chance that something other than chance has caused the
variation). Typically, a p value greater than 0.05 is accepted as variation
due to chance alone. Any time the p value is less than 0.05, it is assumed
that something other than chance is involved. Usually the chi square value
falls between two p values. Therefore the p value is listed in a range.
In the example given in table 1, the 2 value is 2.64 and there
are 2 degrees of freedom. Thus, the p value lies between 0.20 and 0.50.
It is written as:
0.50 > p > 0.20
This means that there is a 50-80% chance that the variation in the values is due to chance alone, and not some other factor. The hypothesis (that in a family with two children, ¼ will have two boys, ½ will have a boy and a girl, and ¼ will have two girls) is accepted.
Worksheet Name: _____________________________
Using H for heads and T for tails, list
all different possible results of flipping three coins (the order of the
coins matters):
_______________________________________________________________________
Now, how many ways are there of getting three heads? ___ two heads and a tail? ___ two tails and a head? ___ three heads? ___ How many different possibilities are there? ___
Flip three coins ten times and record your
results:
| trial | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| coin 1 | ||||||||||
| coin 2 | ||||||||||
| coin 3 |
Now, tally the data:
| your data | class data | |
| 3 H | ||
| 2 H, 1 T | ||
| 1 H, 2 T | ||
| 3 T |
Complete a chi square analysis using your
data:
| category | observed (O) | expected (E) | (O - E) | (O - E)2 | (O - E)2/E |
| 3 H | |||||
| 2 H, 1 T | |||||
| 1 H, 2 T | |||||
| 3 T | |||||
| total | ----- | 2 = |
df = ________ ___ > p > ____ Accept or reject hypothesis?
Complete a chi square analysis using class
data:
| category | observed (O) | expected (E) | (O - E) | (O - E)2 | (O - E)2/E |
| 3 H | |||||
| 2 H, 1 T | |||||
| 1 H, 2 T | |||||
| 3 T | |||||
| total | ----- | 2 = |
df = ________ ___ > p > ____ Accept or reject hypothesis?
Was there a difference in your p values for your data and the class data? ___ If so, explain why. ____________________________________________________________________
_______________________________________________________________________
In Mendelian genetics, typical expected
ratios for a cross involving a single trait are 1:0, 1:1 and 3:1. In a
dihybrid cross, the expected ratio is 9:3:3:1, though sometimes modified
ratios are obtained because of epistasis. This will be discussed in detail
in class, but standard ratios include 15:1, 13:3, 9:7 and 12:3:1. [Note
that all of these ratios add up to 16 parts.] We will be using beans of
two colors to develop and test hypotheses for best fit. You will be given
three bags of beans (A, B and an unknown, which is numbered). You are to
sort and count the beans, then develop a hypothesis as to the best Mendelian
ratio (or modified ratio) that fits the data. The easiest way to make an
educated guess is to determine the fraction of beans that are each of the
two colors (white/total and brown/total); then multiply these decimals
by 16 (since the ratios all involve 16 parts). Your values will most likely
NOT be whole numbers, but, when rounded to whole numbers, should give you
a 'best guess' on which to base your hypothesis. Once you have developed
a hypothesis as to the appropriate ratio, do a chi square analysis to see
if the variation is due to chance alone.
| bean color | bag A | bag B | unknown # ___ |
| white | |||
| brown | |||
| total |
Bag A: predicted ratio: ____ white: ____
brown
| category | observed (O) | expected (E) | (O - E) | (O - E)2 | (O - E)2/E |
| white | |||||
| brown | |||||
| total | ----- | 2 = |
df = ________ ___ > p > ____ Accept or
reject hypothesis?
Bag B: predicted ratio: ____ white: ____
brown
| category | observed (O) | expected (E) | (O - E) | (O - E)2 | (O - E)2/E |
| white | |||||
| brown | |||||
| total | ----- | 2 = |
df = ________ ___ > p > ____ Accept or
reject hypothesis?
Unknown #___: predicted ratio: ____ white:
____ brown
| category | observed (O) | expected (E) | (O - E) | (O - E)2 | (O - E)2/E |
| white | |||||
| brown | |||||
| total | ----- | 2 = |
df = ________ ___ > p > ____ Accept or reject hypothesis?
The bags of beans were prepared as follows.
To obtain a ratio of X white: Y brown beans, X tablespoons of white beans
were mixed with Y tablespoons of brown beans. The assumption was made that
the beans were the same size. Based on the results of your studies, was
this a reasonable assumption? Why or why not?
In one school system, there are 472 families
that have five children. Do a chi square analysis to determine if the variation
in the distribution is due to chance alone.
| category | observed (O) | expected (E) | (O - E) | (O - E)2 | (O - E)2/E |
| 5 girls | 19 | ||||
| 4 girls, 1 boy | 81 | ||||
| 3 girls, 2 boys | 162 | ||||
| 2 girls, 3 boys | 135 | ||||
| 1 girl, 4 boys | 69 | ||||
| 5 boys | 6 | ||||
| total | 472 | ----- | 2 = |
df = ________ ___ > p > ____ Accept or
reject hypothesis?
What is the total number of girls in all
the families? ___
What is the total number of boys in all
the families? ___
One of the schools in the district is a
girls' boarding school. How has the data from this school affected the
results?
What is the probability of picking a red
2 or a black 3 from a deck of cards? Show your work.
What is the probability of picking a six
and an eight from the deck of cards? Show your work.
For questions, comments
and additional information, contact mfhicks@pstcc.edu
Last Updated: August
24, 2001
Site map: Margaret
F. Hicks Home - Biology 2120 -
Notes
- Probability Lab
Search | Home
Page | P.S. Web
| Webmaster
Pellissippi State Technical Community College
2000©