The Probability Simulations module targets the following cognitive tasks:
| Task         | SkillsConcepts | |
|---|---|---|
| ProbModel-1: | Understand random variables | |
| ProbModel-2: | Define the probability model of a discrete random variable | Understand discrete random variables |
| ProbModel-3: | Define the probability model of a continuous random variable | Understand continuous random variables |
| ProbModel-4: | Understand why simulations are done | |
| ProbModel-5: | List the simulation steps | Understand the simulation steps |
| ProbModel-6: | Define the box model | Understand what the box model represents |
| ProbModel-7: | Define the sample | Understand how the sampling is done |
| ProbModel-8: | Define the statistic/event of interest | Understand the nature of the statistic/event of interest |
| ProbModel-9: | Run simulations | Understand the summary results of the simulations |
| ProbModel-10: | Determine the exeprimental probability of the event of interest | Understand the experimental probability |
| ProbModel-11: | Examine the sampling distribution of the statistic of interest | Understand the sampling distribution of a statistic |
A random variable maps the outcomes of a random experiment to numbers. It is generally denoted by a capital letter, e.g., . As an example, if a coin is tossed, the possible outcomes are Head and Tail. If the random variable is the number of heads, then
and
.
Discrete numerical variables were discussed in the Variable Types module. A discrete random variable is a formalized version of the discrete numerical variable discussed in that module. The possible values of a discrete random variable are finite or countable and most commonly are counts.
The functional nature of a random variable is often surpressed. For example, in the above example, we simply say or
, where
is a random variaable denoting the number of heads. Prior to conducting the experiment, the outcome, and hence the value of the random variable mapping this outcome, are unknown. We only know the probabilities of the outcomes (at least ideally).
The values of the random variable, together with the probabilities of these values, define the probability distribution of the random variable. In the above example, if the coin is fair, then and
. The sum of the probabilities must always equal 1, since exactly one outcome must occur in a random experiment.
A continuous random variable is a formalized version of the continuous numerical variable discussed in the Variable Types module. The possible values of a continuous random variable are from a range or interval.
We are often interested in the probability of an event defined in terms of a random variable. For example, what is the probability of getting two boys in a three child family? This is easy to compute if certain simplifying assumptions are made, i.e., . However, it is hard to compute the probabilities of certain events. For example, what is the probability a coin must be flipped at least twice before getting the first head?
Rather than computing the probability of an event exactly, if it often possible to estimate this probability by simulating the random experiment many times. For example, you could toss a coin until the first head appears and record the number of tosses. If this is done repeatedly, e.g., 10,000 times, then the proportion of experiments resulting in two or more tosses estimates the true probability of tossing the coin at least twice before the first head. The estimate of the event of interest obtained by simulations is called the experimental probability.
The five-step method is a powerful, formatized method of obtaining the experimental probability of an event of interest. The five steps are:
The five-step method is implemented in the following Java applet:
The text below, beginning with ProbModel-6, interleaves a step-by-step example of how to use this five-step Java applet.
The probabiliy model of a discrete random variable is defined by the pairs consisting of the values of the random variables and their corresponding probabilities. For example, if gives the number of heads in a toss of the coin (as above), then:
| 0 | 0.5 |
| 1 | 0.5 |
A box model is a concrete representation of the probability model of a discrete random variable. Think of it as a box containing identical balls with numbers corresponding to the possible values of the discrete random variable. The number of balls for each value is proportional to the probability of that value. In our example, we would have two balls, one with a '0' and one with a '1'. If a ball is drawn from the box, then the probability of getting the '0' ball is 0.5 and the probability of getting the '1' ball is 0.5. Thus we have a physical representation of the probability model in the table above.
Enter '0' and '1' as the first two values in the above applet. The count for each ball is already 1, corresponding to a single '0' ball and a single '1' ball, and thus does not need to be changed. This is the first step in computing experimental probabilities of events defined by drawing balls from the box. Here we are interested in events related to drawing two balls from the box, assuming that the balls are replaced after each draw. Click the Next button to go to Step 2.
The properties of a single simulation are defined in Step 2. Specifically, the method of sampling and the sample size must be specified. The various sampling methods will be discussed as you proceed through the modules that use the five-step simulation engine.
Based on the example begun in Step 1, we need to select Draw n With Replacement from the Define the sample popup menu. Then since two balls will be selected, specify 2 in the n= field. Click the Next button to go to the Step 3.
The Statistic of Interest corresponds to the random variable the user is studying. The Event of Interest specifies the outcomes of the statistic of interest that comprise the event the user is studying.
Based on the example begun in Step 1, suppose we are interested in the number of heads obtained in the two draws. Recall that the in Step 1 we defined a random variable which has a value of '1' for a head and a '0' otherwise. A single simulation consists of drawing with replacement two balls (Step 2) and recording a '0' or '1' in each draw. The sum of the responses counts the number of heads. For example, if the two draws result in a '0' and then a '1', the sum will be 1, i.e., a single head has been observed in the two draws. Therefore, select Sum from the popup menu as the Statistic of Interest.
The Event of Interest is specified next if the user wishes to study a specific event. Suppose we are interested in the event defined by getting two heads in the two draws defined by the simulation. Select X=a from the Event of Interest popup menu, where in this case X represents the Sum. Specify 2 in the a= field. Click the Next button to go to the Step 4.
The actual simulations are run in Step 4. Click the 1 button to do a single simulation. The Current Simulation gives the outcome and the Sum=2 panel specifies whether or not the Event of Interest was satisfied. Click the 1 button until 10 simulations are completed. The results are given under the Statistic of Interest and Event of Interest.
The Experimental Probability is given in the Success row. With only 10 simulations, the experimental probability may not be near the theoretical probability of 0.25. Click the 1000 button to do 1000 more simulations. The experimental probability should now be close to 0.25. Click the 5000 button to do 5000 more simulations (for a total of 6010 simulations). The experimental probability should now be well determined, i.e., near 0.25. Click the Next button to go to Step 5.
Step 5 gives experimental sampling distribution of the Statistic of Interest resulting from the simulations. Each simulation results in one of the possible values of the statistic of interest. If N simulations are done, then we have N realizations of the statistic of interest. The experimental sampling distribution of the statistic of interest is simply the experimental probability (density) histogram of these N realizations. The experimental sampling distribution shows the variability of the statistic of interest, i.e., it shows how the experimental probabilties are distributed across the possible values of the statistic of interest.
The random variable of interest here is the sum of the outcomes of a simulation consisting of two draws with replacement. The possible outcomes are 0 (no heads), 1 (one head), and 2 (two heads). We have 6,110 simualations of the sum statistic, each of which takes on 0, 1, or 2 as possible values. The plot of these 6,110 values is given in Step 5 and this constitutes the experimental sampling distribution.
The experimental density histogram shows that the experimental probability of getting a '1' is about twice as likely as getting a '0' or '2'. Click on the bars to display the experimental probabilities or click on the Show Distribution button. Clicking on the bar corresponding to X=2 will given the experimental probability of the Event of Interest.
The theoretical probabilities can be computed using the techniques in the Binomial Distribution module. The theoretical probability distribution (or the theoretical sampling distribution of the statistic of interest) is given by:
| 0 | 0.25 |
| 1 | 0.5 |
| 2 | 0.25 |
This simulation can also be done by using the probability model corresponding to the Sum statistic above. Go to Step 1 and click the Clear Model button. Enter 0, 1, and 2 as the Values and 1, 2, 1 as the corresponding Counts, i.e, we have four balls: one with a '0', two with a '1', and one with a '2'. In Step 2, select Draw Once in the popup menu. Select The Draw in Step 3 and set the Event of Interest to X=2 as before. Carry out the simulation and the results will be the same as before. Why?
Example #1 shows how probabilities are estimated relating to the number of boys per family. Cleck the Next button to go to the next step.
Self-test