Variable Types

Learning Objectives

This module shows you how to identify the individual and the variable type and to specify the variable values.

The Variable types module targets the following cognitive tasks:

Task        Skills Concepts
Var-1: Identify the observational (or experimental) unit in a study Understand the concept of an observational unit
Var-2: Distinguish between a population and sample Understand the concepts of population and sample
Var-3: Understand the concept of a variable
Var-4: Identify an (unranked) categorical variable and determine its possible values Understand the properties of (unranked) categorical variables
Var-5: Identify a ranked categorical variable and determine its possible values Understand the properties of ranked categorical variables
Var-6: Distinguish between (unranked) categorical and ranked categorical variables
Var-7: Identify a discrete numerical variable and determine its possible values Understand the properties of discrete numerical variables
Var-8: Identify a continuous numerical variable and determine it possible values Understand the properties of continuous numerical variables
Var-9: Distinguish between discrete and continuous numerical variables
Var-10: Organize data for statistical analysis Understand the concept of data

Var-1: Observational Unit

Observational units, or individuals, are the fundamental objects of study in an investigation. More specifically, an individual can be a case, respondent, subject, rock, plant, process, etc., depending on the field of study. For example, in business the observational unit could be a company, in psychology the observational unit could be a case, and in engineering the observational unit could be a process.

Var-2: Population and Sample

The collection of individuals of interest in a study is the population. Sometimes the population is conceptual, i.e, not real. For example, all possible tomato plants of a certain variety cannot be enumerated. The selected group of individuals is the sample. The number of individuals in a sample is , which is always less than the number of individuals in the corresponding population, , i.e, .

Var-3: Variables

Data is collected on a group of individuals, i.e., the sample, hopefully obtained by modern sampling methods. The data consist of measured or observed values on certain common characteristics or attributes of the individuals. The only characteristics of interest vary from individual to individual in the population; consequently, they are called variables. For example, the height of a person varies from person to person. Species is not a varaible since all persons are homo sapiens.

A variable is a rule which defines a unique value for each individual in the population. Generally, it is possible to define many variables on the individuals of a population. For example, financial characteristics (price-earnings ratio, debt-equity ratio, etc.) can be defined on companies and operating characteristics (temperature, yield, etc.) can be defined on processes. The companies and processes are the observatinal units, respectively.

The goal of data analysis is to make statements about distributional properties of variables, individually or collectively. Variables are classified according to their use in statistics. The major division classifies variables according to whether they are numerical or categorical. Numerical variables are also called quantitative variables; whereas, categorical variables are also called qualitative variables.

Stanley Stevens defined an alternate way to classify the measurements of variables. The levels of measurement in his scale are: nominal (unranked categorical), ordinal (ranked categorical), interval, and ratio. Both interval and ratio correspond to continous numeric depending or whether or not there is a well-defined zero. For example, temperature in Celsius is interval, whereas temperature in kelvins, which has an absolute zero, is ratio. Stevens' classification does not have a scale corresponding to discrete numeric and thus will not be used in IDEAL.

Var-4: Unranked Categorical Variables

Categorical variables take on values from a finite set of possible levels. A level is a label for a non-numeric value. For example, the eye color of a person is a categorical variable with levels blue, grey, brown, and hazel. Notice that eye-color levels have no natural order. A categorical variable whose levels are not ordered is said to be unranked.

Var-5: Ranked Categorical Variables

A categorical variable whose levels are ordered is said to be ranked. For example, the bond rating of corporations (the observational unit) is ranked categorical. For example, Standard and Poor's bond rating is: AAA, AA, A, BBB, BB, B, CCC, CC, C, D. This rating goes from lowest risk (AAA), low risk (AA, A), etc. to default (D).

Var-6: Unranked vs. Ranked Categorical Variables

Both ranked and unranked categorical variables have a finite set of levels, which are not numerical. The only difference is whether or not the levels can be ordered or not.

Var-7: Discrete Numerical Variables

Numerical variables are further divided into whether they are discrete or continuous. The possible values of a discrete variable are finite or countable. Most commonly, a discrete numeric variable has values which are counts, i.e., 0, 1, 2, etc.

Var-8: Continuous Numerical Variables

The possible values of a continuous variable are from a range or interval. For example, the height of a adult is is continuous since values fall in a range, perhaps from 36 in to 96 in. Continuous variables are often non-negative, e.g., most measured quantities. However, balances can be negative, e.g., a bank balance.

Var-9: Discrete vs. Continuous Numerical Variables

The distinction between a discrete and continuous variables is sometimes disputable. In reality, continuous variables can only be measured to a certain accuracy. However, if in principle, the values could be measured to arbitrary accuracy, it is said to be continuous. What about bank balance? The degree of accuracy is 1 cent so technically balance is discrete, i.e, the number of values is finite or perhaps countable. Nonetheless, since the number of possible values is very large, it is often considered to be continuous. Furthermore, the distribution used to model balance would almost certainly be continuous.

Var-10: Data

The individuals of a study are obtained by a specified protocol, which defines the sampling process. One or more variables are obtained on each individual of the sample. The resuling data can be organized into a table. If there are variables, then the table has rows (corresponding to the individuals) and columns (corresponding to the variables).

The data table is often named to identify the study of interest. For example, JavaStat has a data table called aircraft that has two variables, Spr and CAR, measured on each of 22 aircraft. Type is actually a label which identifies the type of aircraft, e.g., FH-1, and thus it is not a true variable. Spr represents sprcific power and is continuous numeric, whereas CAR is unranked categorical representing whether or not the plane can land on an aircraft carrier.

^ Examples

Example 1 illustrates how to identify the individual, the type of numerical variable, and the possible variable values for various statistical situations.

Example 2 illustrates how to identify the individual, the type of categorical variable, and the possible variable values for various statistical situations.

Self-test