The Juvenile Justice Professional's Guide to
Human Subjects Protection and the IRB Process
Home Before we begin Let's begin History of H.S. Protection Confidentiality of Secondary Youth Data Responsibility for Protecting Human Subjects Administration of the IRB
Research Juvenile Justice Site Map
Statistical Disclosure Limitation Techniques
Statistical Disclosure Limitation Techniques
Restricted Data Access Measures
Privacy Certificate
Information Transfer Agreement
Microdata and tabular data are two common products of a data collection effort. Microdata files are the actual electronic record of a particular youth and include personal information such as name, social security number, age, race, gender, and offense, along with other demographic factors. When released, these data files, while rich in information, present unacceptable risks of disclosing confidential youth information. Tabular data includes numbers, percents, and rates within a table and the discussion of these data within the text.

Just as microdata files threaten confidentiality requirements of 28 CFR 22, tabular data files present additional risks. Confidentiality issues arise when cells within a table include only a few youth or when characteristics, such as ethnicity, are uniquely distinguishing. Under these conditions, researchers may be able to identify an individual youth and, in combination with other tables, identify additional information such as ‘most serious offense’. Disclosure risk may also occur when table cells include all youth within a field thus disclosing information about them. For example, a frequency table that shows the fifteen learning disabled youth in a school district who are aged 13-15 and are all under the supervision of the juvenile justice system would constitute disclosure. To protect these data and to ensure that the risk for disclosure is minimal, the information in microdata and tabular data files is restricted through statistical disclosure limitation techniques. Once made available for public use, the files are considered restricted data products.

A professional with appropriate statistical knowledge and who is familiar with the microdata file and the tabular data under consideration should carry out statistical disclosure limitation techniques. Although this implies significant expertise and skill, juvenile justice professionals without such experience and training will nonetheless be able to recognize what these techniques intend to achieve. The examples of statistical disclosure limitation techniques below are not intended to be comprehensive and technical. Numerous ‘how-to’ manuals are available for researchers who wish to learn how to apply these statistical methods.

Microdata Files
  • Remove direct identifiers—name, social security number, and date of birth.
  • Collapse information into larger categories—rare offenses, ‘weapons in school’ should be re-categorized as ‘weapons’.
  • Mask subject identification—create a new subject identification variable and drop all other identifying information from the data set.
  Reference pseudonymcoded identifiers replace personally identifiable youth data. Reference list linking codes with youth is necessary to break confidentiality rules.

Reversible encryptionthe encrypted format is created by a mathematical algorithm and contains identifiable youth data in a hidden form that can be unhidden given access to the encryption algorithm.

Irreversible (one-way) encryptionthe encrypted format is created by a secure and unique algorithm and produces a unique, personally identifiable code that cannot be converted back to personally identifiable data. It is computationally impossible to determine personally identifiable information from the encrypted format.

Before an organization releases microdata files, staff should be knowledgeable about Federal and agency-level confidentiality regulations along with related statistical disclosure limitations methods. Each microdata file intended for public use should be reviewed and analyzed relative to the application of statistical disclosure processes. In addition, disclosure limitation practices should be consistent within and among agencies with overlapping microdata files to prevent disclosure among linked data sets.

Tabular Data
  • Suppress (not publish) sensitive cells—table cells with a count of one or two.
  • Round (adjust) values in all cells to a specified base—all rounded values (other than zero) are multiples of 3 (base 3) or 5 (base 5). Base 3 and 5 are the most common choices for rounding.

Statistical disclosure limitation methods are not needed for reports of tabular data that represent a single variable, such as gender. However, when tables display two or more variables (age by gender by race) then a method such as rounding is applied to the table cells. Raw data are used to produce the table; the rounding procedure is the final step in tabular presentation. For example, when rounding raw data with base 3, data within table cells are rounded to the nearest multiple of 3, with each cell having one added, one subtracted, or remaining the same. Procedures applied to cells are not shared with the public when released as tabular data. When cell numbers are small or when youth characteristics are unique, rounding eliminates disclosure of youth identity and confidential information.

Rounded numbers are also used when calculating percentages and rates. This practice prevents reconstruction of raw data and unintended disclosure of youth data. Because raw data and rounded tabular data are not the same, there is the risk of error in secondary analysis of tabular data. When cells are greater than 25, the actual numbers, percentages, and rates show minimal differences between actual and rounded calculations; however cells with fewer counts are less accurate. Researchers conducting secondary analyses must determine whether or not rounded data are suitable for analysis and for drawing accurate conclusions.

OJJDP Home | NCJJ Home | National Juvenile Court Data Archive | Site Map