Cereal Boxes I

Suppose cereal boxes each contain a pen as a prize and that the pens come in six equally likely colors, i.e., to a close approximation, each color pen appears in the same number of cereal boxes. How many cereal boxes will you need to buy, on the average, before you obtain all six colors. This is a difficult problem to solve using basic probability concepts, but it is easy using the five-step simulation method below.

The five-step method carries out the simulation, which is used in Chapters 4 and 5 to experimentally estimate the theoretical probabilities and theoretical expected values, respectively. We will introduce the program through the cereal box problem described in Section 5.1 of the text. Click on the Five-Step Method item under Chapter 5 in the left panel of LifeStats to reveal the steps. As in the text, the expected number of cereal boxes is found by carrying out the five steps.

The five steps for the cereal box are:

Step 1. Define the box model:

Models in this program are specified by box models. A box model is simply a box containing balls having numbers written on them. In this case, each numbered ball represents a pen of a specific color. Since the pens are equally likely, the balls need to be chosen with equal probability, i.e., each ball with a specific number should occur exactly once. In some situations (but not here), more than one ball will have the same number (see Section 5.3 for a discussion on box models). One then draws balls from the box. This step is very important because it specifies the probability model, or population, for whatever statistical problem is being studied. The most important step is to decide what balls should go in the box: how many balls, what value is written on each, and how many of each kind of ball. For the cereal box problem, we need six balls: one with a 1, one with a 2, ..., and one with a 6. See Step 1 of the program (Define the box model) for the entry screen.

The Value columns contain the values that are to be written on the balls, and the Count columns reveal the number of balls containing each value. For the current problem, we need six values: 1, 2, 3, 4, 5, 6, each with a count of 1. By default the counts are 1, so fill in the values in the first 6 positions in the left-hand Value column. Click on Show Box to see the balls, their values, and the number of times each ball occurs. Click on Histogram to see a visual representation of the ball values and the number of times each occurs. Click on the Next button to go to Step 2.

Step 2. Define the trial:

A trial consists of drawing balls from the box with replacement until we have drawn at least 1 of each value. To define the required process, click on the Define the sample: pop-up menu and select the Draw Until All X menu item. Click on the Next button to go to Step 3.

Step 3. Define the statistic/event of interest

In the cereal box problem we are interested in how many balls are drawn in a trial to obtain at least one ball of each type. In the program, the number of draws is called the "sample size." Click on the Statistic of interest (X) menu and select the Sample Size menu item. The Event of interest menu is not required in this problem. Click on the Next button to go to Step 4.

Step 4. Run the trials

The six buttons labeled 1, 10, 100, 1000, 5000, 10,000 determine the number of times one can repeat the trial. It is possible to click on the buttons repeatedly. For example, clicking on 1000 and then 5000 results in 6,000 trials. Click on the 1 button. The computer will then draw balls from the box until it has drawn at least one of each type. The actual draws are given in the Current Simulation: field. For example, your trial may look like this: 2 2 3 6 6 5 5 1 3 1 3 2 3 2 4.

Thus, on this trial, the first two draws were 2, the next was a 3, and so on until the 15th draw resulted in a 4. At this point, at least one of each type had been drawn, so the trial ended. The numerical outcome (statistic of interest) is the sample size, 15 in this case, which appears in the area labeled Sample Size. Your trial's outcome will be different.

One trial is not enough. To perform more trials, just click on one of the other numbered buttons. Click on the 100 button. The program will run more trials, each time showing the trial simulation (quickly) at the Current Simulation: field and writing the numerical outcome in the Sample Size area.

The total number of trials so far has been 101, which is given as the # of Simulations summary value. Click on the Next button to go to Step 5.

Step 5. Summary Statistics

You can take the 101 sample sizes in the Sample Size area of Step 4 and find the average of them by hand. Fortunately, the computer has already found the average, which is given as the Average value in either Step 4 or Step 5. For example, the average may be 14.7426, which means on average about 15 boxes are needed to obtain all six colors of pens. Your average will be different, but it should be similar. (Note that 14.7 is the true theoretical value.)

Besides the average, you will see other summary statistics in Step 5. Specifically, the standard deviation, median, interquartile range, min, and max are given. A frequency histogram of the 101 outcomes is also provided. Note that the distribution is positively skewed and ranges from 6 (at least 6 trials will be required) to about 30 (your maximum may vary considerably from 30).

Once you have navigated to a new step, you can always go back to a previous step by clicking on one of the numbers (1 ... 5) in the Step tool palette. For example, go back to Step 4 and run another 1000 trials. Note that the maximum number sample size increases as the number of trials increases. Why? You can click Reset in Step 4 to start new trials.