To identify a specific value in the data set we still need to know the personID, month, and a variable name, but month has been converted from a row identifier to part of a compound column identifier.īefore proceeding, start a do file that loads 2000_acs_cleaned.dta. In the wide form, personID is a simple row identifier, but now the variable names for the level one variables are a compound identifier with two parts: the variable name (employed) and the month in which the variable was observed. In the long form, personID and month were a compound identifier for the rows in the data set, while the variable names were a simple column identifier. Now consider the identifiers in this data set.
In most cases the long form is easier to work with, so we'll do most of our examples in this form. The long form is longer because it has more rows the wide form is wider because it has more columns. We call the first form the long form (or occasionally the tall form) and the second the wide form. However, employed is a level one variable with three (potentially) different values per person. Because yearsEdu is a level two variable, there's just one value per person and thus one variable (column). In this form, each observation represents a person. Now consider the exact same data in a different form: personID In this form, each observation represents a month (or more precisely, a person-month combination). Consider observing two people for three months: personIDĮxercise: Identify the level one units, level two units, level one variable(s), and level two variable(s) in this data set. With a hierarchical data set, an observation (row) can represent either a level one unit or a level two unit. It does not meet the definition of a level two variable because different level one units (monthly observations) have different values for it. But it might! And if it does, highest degree earned becomes a level one variable. In a study that observes individuals for a few months, it's unlikely that their highest degree earned will change. Of course all of these depend on the details of the actual data set.
For data structures where a level two unit is observed over time, level two variables are variables that do not change over time. A variable is a level two variable if all the level one units within a given level two unit have the same value for the variable. If the hierarchy has more than two levels, simply keep counting up: if you have students grouped into classes grouped into schools grouped into districts, students are level one, classes are level two, schools are level three, and districts are level four.Įach variable is associated with a specific level. A level two unit is then a group of level one units. A level one unit is the smallest unit in the data set. Hierarchical data can be be described in terms of levels (note that these are not the same as the levels of a categorical variable). These structures may seem very different, but the same concepts apply to both and often even the same code. Another common hierarchical data structure is panel or longitudinal data and repeated measures, where things are observed multiple times. The American Community Survey is an example of one of the most common hierarchical data structures: individuals grouped into households. Many data sets involve some sort of hierarchical structure. This is part five of Data Wrangling in Stata.