Understanding Basic Statistical Representations

Meg Gorzycki, Ed.D. and Alen Tersakyan

Purpose

Confronted with statistical information and representations such as bars, graphs, tables and charts, many students slide over the material, hoping that it will either be explained in class or that it will not appear on a test. There are several reasons why helping students understand this content is important:

  • Help students understand research in their field of study
  • Help students discern the veracity of claims in advertising
  • Help students understand the implications of research for their own well-being
  • Help students become critical readers of political rhetoric
  • Help students comprehend the complexity of the world
  • Help students understand how statistical representations can embody biases

The purpose of this material is to identify ways instructors can help students understand representations of data and provide examples of exercises and assessments.

Statistical Literacy

Statistical literacy concerns the individual's ability to "decode" information that numerically describes phenomena, is presented in statistical lexicon, and often uses graphic organizers to summarize data or illustrate patterns and relationships (Gal, 2004; Pfannkuck, 2004; Shields, 1999). At the basic level, students should be able to:

  • Understand central tendency
  • Interpret simple charts, graphs, and tables
  • Describe the role of variables in research
  • Define "sample population" and explain how it is determined
  • Explain simple statistical information in narrative form
  • Address the meaning and significance of data

What are the Basics?

Introduction to statistics typically address the following (Raykov & Marcouides, 2013):

  • Why understanding statistics and graphic organizers is important
  • Understanding mean, mode, median (central tendency)
  • Understanding standard deviation
  • How to interpret graphs, charts, and tables
  • How to identify variables and assess their impact
  • How to understand correlations
  • How to understand and generate a hypothesis
  • Identify methods of gathering data and appropriate applications

When Should Instructors Teach Statistics?

Statistical information and representations are found in all disciplines. They routinely appear in scholarly reading in STEM fields, health sciences, social sciences, and economics. To determine whether students in your class will benefit from explicit instruction on statistics and understanding statistical representations, and to integrate skill-building into courses, instructors may take the following steps:

  1. Preview the assigned reading material and identify where students will encounter statistics and statistical representations, and anticipate whether or not these elements may be problematic for students
  2. Ask students whether they understand statistics and the way data is represented in their assignments, and determine whether their ability to put the data into their own words is sufficient or deficient
  3. Develop exercises and class activities that strengthen students' skills to interpret statistics and statistical representation
  4. Reinforce student learning by administering abundant formative assessments of their work

Sample Exercises

Case Study: The Grading System

A student became very distressed and angry when she learned that her instructor, Professor Owens, was going to assign test grades based on a point-percentage system whereby 97% and up was an “A+,” 94-96% was an “A,” 90-93% was an “A-,” and so forth. The student complained, “None of my other instructors do it this way. You should curve everything so that the grades reflect the students’ level of learning.”

The professor replied, “Are you asking me to grade things on a class bell curve?”

The student said, “Yes, I think that is the only fair way to grade.”

The professor confirmed that the student wanted grades to land along a curve, wherein just as many scores were above average as they were below average. She then conducted an analysis and found that 31 students took an exam worth 100 points, and that the scores were as follows from highest to lowest: 98. 98, 98, 97, 96, 96, 82, 80, 74, 70, 70, 69, 69, 69, 68, 68, 67, 66, 63, 63, 63, 63, 58, 53, 53, 52, 50, 49, 49, 48, 44.

The mean score was 69, mode of 63, and median 68. She produced a bell curve based on a normal distribution, with 69 as the average score. Using confidence intervals in a normal distribution, whereby 68.3% of the sample population (the professor’s 31 students who took the test) is captured by scores ranging from 85 to 52.               

The professor then pointed out that if the bell curve were used, she would have to award passing grades to students who got less than one half the material correct on the test. She felt that giving students passing grades for mediocre work was not appropriate, especially since the professional community and employers are counting on the professor to maintain high standards of knowledge and skill.

Instructional Suggestions

  • Clearly communicate grading system in the syllabus and present  a rationale
  • Take class time to review the concepts of central tendency  (mean, mode, median), standard deviation, and normal distribution
  • Facilitate class discussion about the advantages and disadvantages of using a bell curve to represent grades and to “norm” human behavior

The Teacher's Teacher

A student aspiring to be a school administrator submitted a research paper to his instructor who awarded the essay with a “C.” Frustrated, the student claimed that the report was thorough, accurate, well-organized, and well-documented. The instructor disagreed, and pointed to a graph entitled: “National 8th Grade Reading Scores, 2007-2015,” which appeared as follows:

National Assessment of Educational Progress. (2015). http://www.nationsreportcard.gov/reading_math_2015/#reading/scores?grade=8.

The expository writing related to the graph read in part:

The national average on the reading test for all students in 8th grade in 2015 was 265. The highest score possible on the reading test was 500, so a score of 265 means that on average, 8th graders only got 53% of the test questions right. The threshold for basic level reading skills is a score of about 243, while the threshold for proficient reading is 281, and a score of 323 or better represents advanced level reading. Thus, on average, 8th graders are reading at a level just below the level of proficiency.

The instructor then patiently asked the student a series of questions regarding his research, during which the student discovered:

  • The statement about “all students in 8th grade” could not possibly be true, and that his report should have alerted readers to the reality that thought the target population of 8th graders in 2015 was 3,911,000, a sample size of 139,000 produced the data in his essay. This means that roughly 3.5% of the target population was represented in the testing.
  • The statement about the average indicating that students on average got 53% of the test correct may or may not be true. The student did not report anything on how the test was scored, and so the assertion is something that might require further research.
  • The graph itself is accurate, but could be enhanced. The Y axis ranges from 260 to 269, and thus does not reveal the distance between the average and a perfect score, and it tends to exaggerate the jump in scores from 2011 and to 2013. A graph illustrating the full range of scores possible would show readers that the average reading score over time has not dramatically changed, and is rather flat.

The Scale of Graph

A student examined a graph in which researchers illustrated the differences between the test scores of two populations in a test. The experimental group received a high dose of caffeine shortly before the test, while the control group received no caffeine before the test. At a glance, the differences between the two scores appeared to be dramatic, and so the student concluded that caffeine has a tremendously adverse effect on test-taking. Review the following two graphs (Figure 1 and Figure 2), and then address the subsequent questions.

Figure 1: Average Test Scores of Students with and without Caffeine

Figure 2: Average Test Scores of Students with and without Caffeine

Questions:

  1. How does format of the graph influence the scale of differences represented in the scores?
  2. What should readers do when they read graphs to avoid making errors in their interpretations of these differences?

The Hidden Variables

The previous study of caffeine’s influence on students’ test scores provides a second lesson on the importance of close reading and critical thinking. Read the following narrative from a fictitious report, and then discuss the questions that follow.

Researchers found that the ingestion of high doses of caffeine adversely impacts students’ test scores. They observed a consistent trend whereby the test scores of those who had no caffeine prior to the test did slightly better than those who had high doses of caffeine before the test, and that the differences in the tests scores were sustained across all classes.

Questions

  1. The tests scores appear to increase slightly by class level in both the experimental and control group; what does this suggest about the results?
  2. What other variables might have affected the test results?

The Trouble with Tables

Situating data in tables provides readers with a quick way to understand research findings, but like graphs, they can be difficult to interpret. Take the following quiz that requires readers to interpret a set of tables, then review the answers and provide insight on how to accurately read each table. (Please note, these tables are not based on actual studies).

  1. Among U.S. children ages 12-17 in 2005, 34.6% represents: 

    Percentage of Children Ages 12-17 in U.S. Who Weekly Search Internet Pornography

    Year

    All

    Male

    Female

    White

    Black

    Hispanic

    Asian

    1995

    1.8

    3.6

    1.1

    12.5

    2.7

    2.6

    1.2

    2000

    4.4

    6.7

    2.6

    16.7

    9.5

    5.6

    1.8

    2005

    21.3

    16.4

    6.5

    34.6

    11.8

    6.7

    2.0

    2010

    48.4

    37.7

    18.6

    42.5

    21.7

    10.2

    4.7

    2015

    57.3

    53.9

    20.1

    57.3

    27.8

    13.2

    9.7

a. The percentage of children who search Internet pornography that are white

b. The percentage of white children in the survey of children’s Internet searches

c. The percentage of white children that searched Internet pornography

2. Which statement best describes the 18.3% highlighted in this table?

Percent Distribution of First Marriages by Age and Gender

 

Gender and Marital Status

Total

Under 20

20-24

25-29

30-34

35-45

46-60

Over 60

Men

 

 

 

 

 

 

 

 

1980

100

22.4

37.6

21.3

6.2

5.7

4.8

2.0

1985

100

21.6

35.8

18.4

15.6

4.8

2.7

1.1

1990

100

18.5

30.4

27.7

15.2

4.3

2.7

1.2

1995

100

16.5

26.9

27.5

18.3

6.7

2.4

1.7

Women

 

 

 

 

 

 

 

 

1980

100

24.7

37.8

21.3

10.7

3.0

1.0

1.5

1985

100

22.1

36.2

19.4

15.4

4.1

1.7

1.1

1990

100

17.8

20.5

27.2

15.7

12.0

4.0

2.8

1995

100

15.2

20.9

29.7

17.6

8.4

6.4

1.8

a. In 1995, 18.3% of all men got married between the ages 30-34

b. In 1995, 18.3% of all men ages 30-34 who married were married for the first time

c. In 1955, 18.3% of first-time marriages for men were for men aged 30-34

3. Which assertion is true regarding the data in this table?

2010 Auto Accidents in the U.S. Involving Cell Phones

Driver’s Gender

White

African American

Hispanic

Asian

Native American

All

Male

9,954

4,672

3,015

2,877

1,982

22,500

Female

4,557

3,201

1,394

1,415

933

11,500

All

14,511

7,873

4,509

4,192

2,915

34,000

a. In 2010, there were a total of 34,000 auto accidents in the U.S.

b. In 2010, white female drivers were involved in nearly the same number of accidents as all both male and female Hispanic drivers

c. The last number in each column represents 100% of the population named atop the column

d. All of the above are true

e. Only a and c are true

One of the keys to understanding each other three sample tables is to understand whether any of the cells in the table represent a total or 100 percent of a population, and if so, which cell is that representative. Here are some tips for each of the sample tables and the quiz questions.

  • The correct answer for the first question regarding children and Internet searches is “C.” Note that each column in the table, except for “year” and “All,” represents 100 percent of the children in the survey who fit that description. Hence, one way to correctly answer the question might be to re-phrase the question and to ask: “Of the 100 percent of all the white children ages 12-17, how many searched the Internet for pornography on a weekly basis?” See also that none of the rows or column add up to 100 percent or even a grand total of the number in each category. This reinforces the fact that each percentage found under each column heading is a percentage of that whole, and it also underscores the reality that the table may not speak for all children ages 12-17, as the column headings do not include headings for those of mixed race, Native Americans, or those of Middle Eastern descent.
  • The correct response for the second table is “C.” It is helpful to focus on the title of the table and think about what is being measured. In this case, the data does not represent multiple marriages by age, and so answer “B” cannot be true; and, the research also did not study all men—it only studied the age at which men and women experienced their first marriage. The study does not include people who never got married. In the second question, the column labeled “total” indicates that each cell to the right of the “total” is 100 percent, and so, the percent of each cell (in this case range of age column) should add up to 100 percent. Readers should note that this table is subdivided so that the data for first marriage by age is arranged by gender.
  • The correct response for the third example is “C.” Readers should note that the column labeled “All” appears with a row labeled “All.” This means that there is more than one total represented in the table. Just as each cell in the last column may be added to reach a total, so too the last number in each row may be added to reach a total. It is possible to render percentages from these numbers, but they are not the focus of the table.
  • Each table allows readers to quickly make comparisons of discrete populations in the studies. In addition, tables are useful in understanding trends over time, as illustrated in the first and second example.

Smoky Stacks

Stacked charts are sometimes difficult to interpret, in part because many readers are accustomed to anticipating to see comparisons illustrated by individual bars in a graph and not within them. Read the following chart (based on fictitious data) and answer the question that follows.

  1. Which statement is accurate?
    1. Twice as many people in Sweden as in Italy generally oppose ownership of semi-automatic weapons in Italy
    2. About 30% of Russians generally support ownership of semi-automatic weapons with exceptions
    3. There is a stronger consensus against public ownership of semi-automatic weapons in Japan than there is in France
    4. About one in four American absolutely oppose citizen’s ownership of semi-automatic weapons without exception

The correct response to the question is “B.” In general, stacked graphs represent 100 percent of a given population or thing in a single column. The difficult in reading them comes with the scattered middle ground. Notice how the middle responses for the question about gun ownership is not aligned across the graph. At a glance, it may seem that France has a stronger consensus on the matter than Japan, because more Japanese subjects absolutely object to citizens’ ownership of semi-automatic weapons than do the French; but, in both countries, only 5.5 to 4.5 percent support such ownership in any way, and in both cases, roughly 95 percent oppose such ownership in one form or the other.

References

Gal, I. (2004). Statistical literacy: Meanings, components, responsibilities. In Dani Ben-Zvi and Joan Garfield (Eds.), The Challenge of developing statistical literacy, pp. 17-46. Netherlands: Kluwer Publishers.

Pfannkuck, M. & Wild C. (2004). Towards and understanding of statistical literacy. In Dani Ben-Zvi and Joan Garfield (Eds.), The Challenge of developing statistical literacy, pp. 3-15. Netherlands: Kluwer Publishers.

Raykov, T., & Marcoulides, G. A. (2013). Basic statistics: an introduction with R. Lanham, MD: Rowman & Little field Publishers, Inc.

Shields, M. (1999). Statistical literacy: Thinking critically about statistics. Association of Public Data Users. Retrieved from: http://www.statlit.org/pdf/1999SchieldAPDU.pdf