This module shows you how to identify the individual and the variable type and to specify the variable values.
The Variable types module targets the following cognitive tasks:
| Task         | SkillsConcepts | |
|---|---|---|
| Var-1: | Identify the observational (or experimental) unit in a study | Understand the concept of an observational unit |
| Var-2: | Distinguish between a population and sample | Understand the concepts of population and sample |
| Var-3: | Understand the concept of a variable | |
| Var-4: | Identify an (unranked) categorical variable and determine its possible values | Understand the properties of (unranked) categorical variables |
| Var-5: | Identify a ranked categorical variable and determine its possible values | Understand the properties of ranked categorical variables |
| Var-6: | Distinguish between (unranked) categorical and ranked categorical variables | |
| Var-7: | Identify a discrete numerical variable and determine its possible values | Understand the properties of discrete numerical variables |
| Var-8: | Identify a continuous numerical variable and determine it possible values | Understand the properties of continuous numerical variables |
| Var-9: | Distinguish between discrete and continuous numerical variables | |
| Var-10: | Organize data for statistical analysis | Understand the concept of data |
Observational units, or individuals, are the fundamental objects of study in an investigation. More specifically, an individual can be a case, respondent, subject, rock, plant, process, etc., depending on the field of study. For example, in business the observational unit could be a company, in psychology the observational unit could be a case, and in engineering the observational unit could be a process.
The collection of individuals of interest in a study is the population. Sometimes the population is conceptual, i.e, not real. For example, all possible tomato plants of a certain variety cannot be enumerated. The selected group of individuals is the sample. The number of individuals in a sample is , which is always less than the number of individuals in the corresponding population,
, i.e,
.
Data is collected on a group of individuals, i.e., the sample, hopefully obtained by modern sampling methods. The data consist of measured or observed values on certain common characteristics or attributes of the individuals. The only characteristics of interest vary from individual to individual in the population; consequently, they are called variables. For example, the height of a person varies from person to person. Species is not a varaible since all persons are homo sapiens.
A variable is a rule which defines a unique value for each individual in the population. Generally, it is possible to define many variables on the individuals of a population. For example, financial characteristics (price-earnings ratio, debt-equity ratio, etc.) can be defined on companies and operating characteristics (temperature, yield, etc.) can be defined on processes. The companies and processes are the observatinal units, respectively.
The goal of data analysis is to make statements about distributional properties of variables, individually or collectively. Variables are classified according to their use in statistics. The major division classifies variables according to whether they are numerical or categorical. Numerical variables are also called quantitative variables; whereas, categorical variables are also called qualitative variables.
Stanley Stevens defined an alternate way to classify the measurements of variables. The levels of measurement in his scale are: nominal (unranked categorical), ordinal (ranked categorical), interval, and ratio. Both interval and ratio correspond to continous numeric depending or whether or not there is a well-defined zero. For example, temperature in Celsius is interval, whereas temperature in kelvins, which has an absolute zero, is ratio. Stevens' classification does not have a scale corresponding to discrete numeric and thus will not be used in IDEAL.
Categorical variables take on values from a finite set of possible levels. A level is a label for a non-numeric value. For example, the eye color of a person is a categorical variable with levels blue, grey, brown, and hazel. Notice that eye-color levels have no natural order. A categorical variable whose levels are not ordered is said to be unranked.
A categorical variable whose levels are ordered is said to be ranked. For example, the bond rating of corporations (the observational unit) is ranked categorical. For example, Standard and Poor's bond rating is: AAA, AA, A, BBB, BB, B, CCC, CC, C, D. This rating goes from lowest risk (AAA), low risk (AA, A), etc. to default (D).
Numerical variables are further divided into whether they are discrete or continuous. The possible values of a discrete variable are finite or countable. Most commonly, a discrete numeric variable has values which are counts, i.e., 0, 1, 2, etc.
The possible values of a continuous variable are from a range or interval. For example, the height of a adult is is continuous since values fall in a range, perhaps from 36 in to 96 in. Continuous variables are often non-negative, e.g., most measured quantities. However, balances can be negative, e.g., a bank balance.
The individuals of a study are obtained by a specified protocol, which defines the sampling process. One or more variables are obtained on each individual of the sample. The resuling data can be organized into a table. If there are variables, then the table has
rows (corresponding to the individuals) and
columns (corresponding to the variables).
The data table is often named to identify the study of interest. For example, JavaStat has a data table called aircraft that has two variables, Spr and CAR, measured on each of 22 aircraft. Type is actually a label which identifies the type of aircraft, e.g., FH-1, and thus it is not a true variable. Spr represents sprcific power and is continuous numeric, whereas CAR is unranked categorical representing whether or not the plane can land on an aircraft carrier.
Example 1 illustrates how to identify the individual, the type of numerical variable, and the possible variable values for various statistical situations.
Example 2 illustrates how to identify the individual, the type of categorical variable, and the possible variable values for various statistical situations.
Self-test