Objective Basic Principles Normal Quantiles Normal Quantile Plot Examples
This module examines a batch of numbers (i.e., the data) to determine if they are consistent with a normally distribution. The techniques are graphical and informal.
The normal distribution is the most used distribution in statistics. The principal reasons are:
Since valid statistical inferences are often based on having an underlying normality assumption, it is critical to be able to evaluate this assumption in an meaningful way. The normal probability plot provides an intuitive and interpretable method of assessing normality.
The qth sample quantile is a value along the measurement scale with a proportion q or less of the data less than the qth quantile and a proportion 1-q or less greater than the qth quantile. Because of the discreteness of the data, the proportion of the data less than the qth quantile typically will not be exactly q. Important special cases are the quartiles and the median. Approximately one-fourth of the data is less than the lower quartile and three-fourths are less than the upper quartile. The median is the second quartile and has about one-half of the data below it. These definitions are discussed more fully in the Quantiles module.
The normal quantile plot implemented in JavaStat incorporates the features of a box and whisker plot. Specifically, options are available for showing the quartiles and outliers in an outlier boxplot.
Consider the following data which was generated randomly from a normal distribution (see the Normal Distribution module) with a mean of 10 and a standard deviation of 2:
12.61, 13.07, 7.36, 8.26, 10.98, 8.43, 11.53, 10.10, 8.89, 8.66, 12.08, 11.26, 11.41, 9.78, 7.23, 13.81, 10.62, 10.49, 8.81, 10.58, 10.60, 10.93, 12.20, 8.28, 6.97, 8.12, 9.06, 10.49, 3.84, 12.05This list of numbers does not give us much insight into the underlying normal probability law that generated the data. However, a graphical view of the data, as seen below in the Normal Plot of the Histogram applet, provides rich visual content. If the data is normally distributed, the data will fall approximately along a straight line.
The Normal Plot can be enhanced by items in the Options menu. Select Robust Fit from the Options menu to fit a line to the data. If the data is normally distributed it should fall along this line. It appears that the data is normally distributed except possibly the value in the lower left corner.
Quartiles and outliers can be visualized in this plot by selecting Outlier boxplot from the Options menu. The left and right vertical end values of the resulting box (as measured to the horizontal axis) are the lower and upper quartiles, respectively. The horizontal axis value of the vertical line within the box is the median. The horizontal lines extending from the box are called the whiskers. The whiskers extend to the largest and smallest values that are not outlier values. Since the whiskers extend to the maximum and minimum values, no outliers are present in this plot. In particular, the value in the lower left corner is not an outlier.
Example #1 (Distances) examines the distribution of student distances from Oxford to determine if the distribution is normally distributed using a histogram and normal quantile plot. Various normalizing transformations are explored.