NC State University

Department of Graphic Design and Industrial Design

Master of Graphic Design

Centuries before “Big Data,” information visualization was mostly a tool for inquiry and documentation — its trust (and its authority) comes from being grounded in the scientific process. As physical forms meant to represent abstract concepts, however, all visualizations lie. Perhaps Mark Twain (quoting Prime Minister Disreaeli) put it best: “There are three kinds of lies: lies, damned lies, and statistics.”

The purpose of the Data Stories project is to understand the relationship between form and abstract information (i.e., data). This relationship has recently come to the forefront in our Big Data–driven world, where traditional statistical approaches to visualization break down in the face of data that is large in scale and multidimensional in nature.

Find two (or more) disparate data sets relating to North Carolina, each from different sources and taking different metrics. Find an imaginary — even ridiculous (but plausible) — correlation in your chosen data. Create a series of data mashups that combine both datasets into one form. These visualizations should explore spatial properties of the data in 2D, 3D, and 4D, and highlight your argument for the false correlation. The ultimate goal of the project is to reveal the lie in your data by telling a compelling (visual) story.

 

Investigations

Presidential Campaign Ads Cause Insanity (duh)

Rachael Paine

× Presidential Campaign Television Advertisements

× Police Incidents that Resulted in Mental Commitment

Through this project I was interested in exploring how a reader can be made aware of misleading data visualizations. I began with the question: By what means can a graphic designer make misleading data visualizations explicit to users, resulting in increased informational comprehension?

I chose two unrelated data sets, presidential campaign ads and mental commitments, providing information from Raleigh, NC in 2016.

I followed a systematic process. Each data set was dissected into all possible variables (B). For data set #1, each incident could be viewed by date, time of day, and region within the geographic location. Data set #2 had an extensive range of variables. Each incident could be viewed by date, time of day, channel and show type in which the ad aired; as well as ad topic, whether it was pro or con, which candidate it favored, and how the ad was financed. I compared the time and date of police incidents with the time and date of campaign ads for my visualizations. I further explored factors for data set #2 including candidate, ad topic, and pro/con themes.

I began to vigorously chart each data set separately, addressing as many variables as possible. These charts were layered to find correlations visually. A clear correlation appeared: When more presidential ad campaigns play on television, more occurrences of police incidents result in mental commitment. Thus, the lie was born – presidential campaign ads make people go crazy! Duh.

I then set forth to find interesting ways to visualize this information. One candidate aired about three times the amount of ads as the other, which clearly lent itself to placing the blame for the correlation on that candidate (C). Personally, I was cool with that lie, as it favored my political preference. But both candidates followed the same trajectory of ad air date and time. It became apparent that either party, Republican or Democrat, could use this same data to argue in their favor… to the detriment of the other candidate. Hhhhmmmm, interesting. Or perhaps another “duh” moment.

I kept questioning my intent. My discomfort with participating in the lying (in particular, the choice to lie in favor of my political preference) led me to a conclusion. Data can become a lie when it is delivered through the designer’s or stakeholder’s lens. Neutrality is dismissed and the information can become blurred and confusing (D). Likewise, the lens through which a reader sees the world is the lens through which they seek out, receive, and interpret information, including data.

Moving forward with the considerations of both designer and reader bias, I produced an interactive visual metaphor to tell the story that information (i.e. data) can become a lie when delivered or received through a narrow lens. Politics clearly afforded this metaphor. I designed a visualization that could be viewed through a Republican (red) filter or a Democratic (blue) filter (E). The lens through which the visualization are viewed determines the information the viewer receives. For the interactive presentation, each viewer was handed a pair of red-tinted glasses and blue-tinted glasses. For this site’s sake, I have dropped the color “filters” over the visualization (F).

So the shifting lie emerges. When viewing as a Republican, clearly Hillary Clinton is causing this uptick in mental commitments. When viewing as a Democrat, clearly Donald Trump is at fault. So which is it? Who really knows?; because it’s all a lie.

As a visual communicator, I reflect on my responsibility to understand how my personal schema, understanding, experience, and self-identity serve as a bias for how I present information to the world. I also must keep in mind the bias of the reader. How might this understanding allow for neutral communication resulting in increased informational self-determination for the reader/user.

Health Codes Discriminate against Women

Dajana Nedic

× NC Certified Minority- and Female-Owned Businesses (2010–2017)

× Food Inspection Violations in Wake County (2012–2017)

Approaching this project, I had not considered how the design of data visualizations could impact a wide range of audiences. I quickly learned that data is very often used to not only reveal specific correlations but also to hide important facts. I approached this data visualization project with an unclear vision for utilizing and visually structuring data. However, I began by looking at a wide range of data sets with the goal of finding two that only focused on data collected in North Carolina.

Deciding on what data to work with and how to weave two unrelated sets together to devise a cohesive story was more challenging than I had expected. While this project was meant to expose the lies within collected data, I found myself obsessed with correctly depicting the lie with exact numbers, dates, etc… As I began to visualize the parsed data in programs such as Raw Graphs and Tableau Public, I was able to gain a better understanding of how to use the data to create visual lies.

Ultimately, I chose to focus on data sets that were very different from one another. The first set consisted of records detailing NC Certified Minority- and Female-Owned Businesses from 2010 to 2017. The second set outlined Food Inspection Violations in Wake County from 2012 to 2017. Parsing through both data sets, I focused on the city of Raleigh, the years 2013, 2014, and 2016 along with the categories; Male and Female, Minority types, and total records of food inspection violations.

Considering the categories I was focusing on, I had trouble developing a lie that was deceptive enough to call for attention. Being aware of how scrutinized women have become in our society at present, I settled on using the phrase “Health Codes Discriminate against Women.” Using the total records of food inspection violations as the distinguishing factor between Male, Female and various Minority types, it became easier to develop visually deceptive forms. Representing this collective data by utilizing various line weights, colors, and opacities helped to blur the line of fact and falsehood.

Rain Ruins Relationships

Bree McMahon

× North Carolina Weather Data (2015)

× North Carolina Marriage and Divorce Rates (2015)

I started this project by searching online for an intriguing data set. Generally, I am interested in social issues so I focused on subjects like race, gender, age, and economic status. Eventually this search led me to a data set detailing marriage and divorce rates in 2015. It was arranged by county, which intrigued me further. I wondered how area and region could potentially affect a viewer’s perspective. Once securing the Marriage and Divorce data, I searched for a data set that was also arranged by county. Randomly I thought, “I wonder what precipitation looks like in North Carolina?” From there, locating the right data set was simple.

My time spent on this project was very evenly split between data cleaning and designing. I learned quite a bit about working with data, and the Microsoft Excel software. I spent much of my time manipulating the data to “do what I want.” This included merging the Divorce/Marriage rate into a singular “Success Rate,” utilizing z-scores to manage numbers, and considering standard deviation differences. North Carolina has 100 counties so the numbers were vast and tedious. But, the outliers intrigued me. I decided to eliminate all average counties, or ones with z-scores nearest 0 in both sets, which left me with half the data.

Once this was completed, a trend emerged: the more precipitation a county recorded, the higher their Marital Success Rate. Furthermore, this trend was present in all four regions of North Carolina: Inner Coastal Plains, Mountains, Piedmont, and Tidewater.

While I was cleaning my data, I frequently used conditional formatting to quickly visualize emerging trends. The gradients produced by the Excel formatting presets were helpful, and often beautiful (at least to a designer). Traditionally, linking color to data can be difficult, but it was used as inspiration for my first visualization (B). Generally, in data, color will move from light to dark. Lighter tints may indicate low values, and darker tints might represent higher numbers. I flipped this standard to further confuse the viewer. I also used subtle gradients and various vibrant hues for each region, hoping aiming to distract.

Since I was also considering region in my data, I decided to map the outliers, hoping to spot another trend (C). Interestingly enough, each region had around 3–4 rule-breakers, mostly located near the state borders. I’m sure there’s a conspiracy there.

My first visualization was colorfully confusing, so I wanted to create something that made it easy for a viewer to compare data sets (and draw the “correct” conclusion). I created various graphs based on bar charts representing Precipitation and Marital Success for each region. I converted the graphs to 3d shapes. The 3d aspect allowed a viewer to analyze the shapes from any angle, and appreciate the “obvious” trend (D).

For me, this project demonstrated just how easy it is to manipulate data while maintaining some semblance of truth, or half-truths. You can design with deceit, but you can also use math, and reasoning to let the numbers do the lying. I also learned, don’t get married in Wayne County.

Interactive 3d Model

Employers Say “OK” to DUIs

Amber Ingram

× Number of DUI Occurrences by Month in Raleigh (2010–2012)

× Number of Unemployed People by Month in Raleigh (2010–2012)

The first assignment of this project was to pick two data sets that had nothing to do with each other. Although some may speculate that I knew my data sets would affect each other, the truth is they did not in the way I thought they might. I picked these two data sets because unemployment and DUIs are two subjects that are highly talked about. Therefore, I thought it would be interesting to see how the two may affect each other in the Raleigh area. The driving under the influence data represents the estimated number of persons ticketed. The monthly unemployment data estimates the number of jobless persons who did not work at all during the related months but were either able to work or looking for work during this time.

When trying to find a relationship between the two data sets, I mashed them together visually to see if it would help me find a compelling story. I created several mash-ups before finally landing on one that I thought was appropriate for my data (A). The scale used for this visualization makes unemployment appear to be a bigger problem than DUIs, especially when compared to visual B. Although this data visualization did not give me the story I was looking for, I thought it was another interesting way to skew the viewer’s perception of my data sets.

Moving forward, I decided to change the way I came at the data sets. This is when I converted my data to a z-score to show the rate of change from month-to-month (B), revealing my story. When I first started looking at my data in this way, I was looking at the years 2009–2013. While the z-score did show some correlations between the years of 2009–2013, I found that if I took off years 2009 and 2013 that the lie became even more apparent. By limiting the scope of my data, I was able to see a stronger visual relationship that suggested that when DUIs increased in the city of Raleigh, people were also actively gaining jobs. The white circles on the yellow peak lines indicate a spike in DUI changes(B). By using a visualization that represents z-scores rather than actual values, I played with the viewer’s ability to understand the meaning of directions and trends. When the line for unemployment would fall (standard deviation below the mean), it meant people gained jobs, even though most people would associate a fall in a data set to mean a decrease in the associated data. There are other small details that I used to skew my data. When z-scores are created, they should start at zero on the beginning of the y-axis but since I deleted the year 2009, this changed where my z-score value started.

Moving into a 4D visualization, I changed the perspective of the viewer and created a key in the top right corner to serve the entire visualization (C). The yellow rods represent the number of DUI occurrences and the pink rods show unemployment. The .gif I have provided does not show the entire 4D animation, but below the .gif are still visuals of each year when shown and all the years show together (D). The interesting thing I found after I created this 4D visualization is that when all years are shown together, the black center inside the pink rods can be looked at as all of the people in Raleigh who are employed. The visualization allows the viewer to see DUI peaks throughout the three years.

The project taught me how important it is to not believe everything you see. It felt somewhat wrong designing visualizations that intentionally lied to the viewer.

Powerball Preys on the Poor

Mac Hill

× North Carolina Powerball Ticket Sales by Drawing (2014–2016)

× North Carolina Average Monthly Wages (2014–2016)

I found this project incredibly relevant to this moment in culture. While we were lying with data, the Oxford Dictionaries declared “post-truth” the word of the year for 2016, and somehow alternative facts became a real problem in American society. As a designer, it was interesting to explore my power over a viewer’s interpretation and what is true about a data set.

I chose data sets with meanings that interested me, rather than ones with visual similarities, specifically, North Carolina Powerball ticket sales and average monthly wages for North Carolina. This meant I had little existing similarities to work with and pushed me to play with the data statistically. I found that playing with averages made it easier to see patterns and relationships between the two data sets. Early in the year, Powerball sales spike, while average income is at its lowest point, a pattern I exaggerated to suggest a negative correlation. Averaging also adds to the deception of the visualizations and gave me greater power to play with meaning.

For my visualizations, I played with forms associated with the lottery, specifically the lined up circles on a ticket and the numbered balls from drawings. I found that round shapes and diagrams were more difficult to interpret spatially and suggested correlations that didn’t exist. This was especially true of the minimalist radial diagram I created to visualize monthly averages for both data sets (A). The radial shape challenges the viewer to interpret area differences and makes it appear that there’s a negative correlation.

In a more elaborate visualization, I combined round shapes and a ticket-like layout with a sankey diagram (B). By representing each individual drawing for the three years my data set covered, the visualization overwhelms the viewer with unnecessary information. Both the drawing breakdown and sankey diagram place January and February at the top, suggesting that they have the highest values, while December looks like the smallest (it’s not) because it’s at the bottom. The jackpot color scale adds a level of complexity to it, making it even more difficult for the viewer to spot the lie.

Moving into 3D, I continued to play with round shapes, but added a reflective texture and helix shape that distort the presentation of the data (C). The structure makes it difficult to compare the data points side by side and the reflective texture amplifies the differences between the data points.

Overall, the project instilled in me a sense of responsibility. As a designer I have to consider how my aesthetic choices can affect the viewer’s interpretations, and choose carefully.

Tobacco Kills before You Light Up

Grace Anne Foca

× North Carolina Lung Cancer Rates per County (average of 2009–2013)

× North Carolina Tobacco Farms per County (average of 2009–2013)

I chose datasets that did not seem to have any correlation numerically, especially since they were in the context of every county in North Carolina, but I noticed that not every county has tobacco farms. After I isolated the counties that do have tobacco farms, I saw a more obvious “correlation” between the lung cancer rates and the number of tobacco farms per county.

I visually enhanced the two datasets by choosing graph styles that are more familiar to an average user and by choosing color palettes that could increase visual appeal and desire to focus on the data. You will see a line graph visualization (A) that includes every county in the state. The electric blue represents the lung cancer rate data, and the vibrant purple represents the tobacco farm data. Pink vertical lines connecting the two datasets highlight the counties that have tobacco farms and their affiliated lung cancer rates. The highlighted points transform into a new bar chart visualization that isolate these points (B). From here you can see a correlation between increasing and decreasing lung cancer rates and tobacco farms. There are outlier counties, such as the two with the highest lung cancer rates and the two with the highest quantities of tobacco farms. These outliers are pulled out of the bar chart and displayed in miniature pie charts that denote specific data points to inform users of the four counties to avoid if they do not want lung cancer.

The most significant learning moments were in the evolution of my data visualizations leading up to the final products. I included a mashup of six ways I transformed my data before making a final decision (C). Together, these provide more in-depth perspective on how to see data than they could provide individually.

Before this project, I never worked on data visualization and I saw data in black and white. I did not realize before this project that I could manipulate numbers to be seen through many perspectives. This realization clicked for me at the end of my process work. I still have much more to learn in the realm of data visualization, but now I know more about what is necessary for users to understand data: captions are always needed for comprehension; just because something looks pretty does not mean it is relevant to the context of the data; and seeing data in multiple dimensions (2D, 3D, and 4D) sparks new ideas and interpretations.

Students Graduate, Storms Ensue

Clément Bordas

× North Carolina State University Graduation Rates (1999–2016)

× Storm Events Database—Wake County (1999–2016)

I started the project by picking two unrelated datasets. The first one was the North Carolina State University Graduate Rate database. The second dataset was the Storm Events Database of Wake County. Both datasets overlapped the time period of 1999 to 2016.

I chose each of these datasets with certain interests in mind. The rationale for my choice of the graduation database came from the fact that our final visualizations are being displayed in one of the student libraries at North Carolina State University—Hunt Library. I chose the storms events database as I became interested in this subject upon my arrival in the United States where I was introduced to new kinds of storms, such as tornadoes.

I started the project by analyzing the data to gain a global overview of its content. After cleaning and inspecting the data, I created simple line graphs. This allowed me to be able to look at the data in a visual way to identify correlations. After comparing the graphs, I was unable to find a direct statistical correlation, but I did discover a strong correlation within the general trends. I started to compare the graduation rate trend with the frequency of storm events on an annual basis. In years where the graduation rate was higher than the trend, the occurrence of storm events was higher as well. For instance, 71% of lightning events in Wake County occurred in years when the graduation rate was above trend, and 69.2% of the storms events happened in those same high trends years.

Students Graduate, Storms Ensue is the title of the data visualization system I created from the correlation found in the two random data sets. This title came from turning the correlation into a lie of causation where graduation rate directly impacts the frequency of storm events. The higher the graduation rate, the higher the occurrence of storm events. In addition, the data showed that more students enroll every year and students graduate faster; the 5-to-6-year graduation rate is decreasing while the 4-year graduation rate is on the rise.

After having defined my lie and refined my story around the correlation of these two datasets, I started to design a range of graphical visualizations. For clarity, I decided to compose a dashboard to present the information using different approaches. The idea of the dashboard, the scattered data visualizations, and the circular diagram came from typical scientific weather visualizations, such as polar coordinate graphs and vortex-like 3D models. The complexity of the dashboard implemented multiple components, further reinforcing the lie through adding complexity to the data and suggesting scientific truth.

Spring 2017