The table below summarizes our data types. To expand on the information in the table, you can look through the text that follows.
|Height, Age, Income||Pages in a Book, Trees in Yard, Dogs at a Coffee Shop|
|Letter Grade, Survey Rating||Gender, Marital Status, Breakfast Items|
Quantitative vs. Categorical
Some of these can be a bit tricky – notice even though zip codes are a number, they aren’t really a quantitative variable. If we add two zip codes together, we do not obtain any useful information from this new value. Therefore, this is a categorical variable.
Height, Age, the Number of Pages in a Book and Annual Income all take on values that we can add, subtract and perform other operations with to gain useful insight. Hence, these are
Gender, Letter Grade, Breakfast Type, Marital Status, and Zip Code can be thought of as labels for a group of items or individuals. Hence, these are
Continuous vs. Discrete
To consider if we have continuous or discrete data, we should see if we can split our data into smaller and smaller units. Consider time – we could measure an event in years, months, days, hours, minutes, or seconds, and even at seconds we know there are smaller units we could measure time in. Therefore, we know this data type is continuous. Height, age, and income are all examples of
continuous data. Alternatively, the number of pages in a book, dogs I count outside a coffee shop, or trees in a yard are
discrete data. We would not want to split our dogs in half.
Ordinal vs. Nominal
In looking at categorical variables, we found Gender, Marital Status, Zip Code and your Breakfast items are
nominal variables where there is no order ranking associated with this type of data. Whether you ate cereal, toast, eggs, or only coffee for breakfast; there is no rank ordering associated with your breakfast.
Alternatively, the Letter Grade or Survey Ratings have a rank ordering associated with it, as
ordinal data. If you receive an A, this is higher than an A-. An A- is ranked higher than a B+, and so on… Ordinal variables frequently occur on rating scales from very poor to very good. In many cases we turn these ordinal variables into numbers, as we can more easily analyze them, but more on this later!
We looked at the different data types we might work with in the world around us. When we work with data in the real world, it might not be very clean – sometimes there are typos or missing values. When this is the case, simply having some expertise regarding the data and knowing the data type can assist in our ability to ‘clean’ this data. Understanding data types can also assist in our ability to build visuals to best explain the data. But more on this very soon!