The Power of Visualization

A picture really does say a thousand words.

“Visualization is really about external cognition, that is, how resources outside the mind can be used to boost the cognitive capabilities of the mind.” — Stuart Card
The importance of visualization is a topic taught to almost every data scientist in an entry-level course at university but is mastered by very few individuals. It is often regarded as obvious or unimportant due to its inherently subjective nature. In this article, I hope to dispel some of those thoughts and show you that visualization is incredibly important, not just in the field of data science, but for communicating any form of information.

Visualization Goals

Essentially, there are three goals to visualization:
  • Data Exploration — find the unknown
  • Data Analysis — check hypotheses
  • Presentation — communicate and disseminate
That is essentially it. However, these terms are pretty vague, and it is thus quite easy to understand why it is so difficult for individuals to master the art of communicating through visualizations. It is, therefore, useful to have a model to follow to help us meet these goals.

The Five-Step Model

Visualization is often described as the following five-step model, a process which follows a fairly logical progression.

Firstly, one is required to isolate a specific target or question that is to be the subject of evaluation.
This is followed by data wrangling, which is 90% of what data scientists do when they are working with data. This procedure involves getting the data into a workable format, performing exploratory data analysis to understand their data set, which may involve various ways of summarizing or plotting the data.
The third stage is the design stage, which involves the development of a story that you want to tell with the data. This is closely linked back to the target we defined. What is the message we are trying to communicate? This will also likely depend on who your audience is, as well as the level of objectivity of the analysis — for example, a political opponent is likely to want to send an exaggerated message of data in order to make their opponent look bad.
The fourth step involves the implementation of the visualization, such as via programming of interactive web-based visualizations using D3. This is the part of the process that involves some coding, whereas the design stage involves thinking, drawing, ideation, and so on.
The fifth stage is essentially a review stage, you look at your implementation and decide whether it sends the message that you want to communicate, or answers the question you set out to answer.
In reality, this is a non-linear process, although it is often presented as one. Here is a somewhat more realistic form of this model.
It seems simple right? Well, there are actually a lot of ways that you can screw this up, and often without realizing. Here are the three most common issues:
Domain situation — Did you correctly understand the users’ needs? Perhaps the wrong problem is being addressed. This is a problem associated with the target phase.
Data/task abstraction — Are you showing them the right thing? Perhaps the wrong abstraction is being used. This is also a problem associated with the target phase.
Visual encoding/interaction — Does the way you are showing the data work? Perhaps the wrong idiom or encoding is being used. This is a problem associated with the design phase.
Algorithm — Does your code break? Is your code too slow? Is it scalable? This is a problem with the implementation phase. Perhaps the wrong algorithm is being used.
It might be obvious to address the fact that your code is breaking, but how do you assess the more subjective problems we just addressed, such as the domain situation or the visual encoding used? We can lean towards evaluation metrics.
We can rely on qualitative and quantitative metrics. Qualitative metrics are often the most useful for visualizations since visualizations are developed for communicating information to people, some examples of metrics to use are:
  • Observational Studies (“Think Aloud”)
  • Expert Interviews (aka Design Critiques)
  • Focus Groups
The idea of these qualitative procedures is that individuals should be able to see the visualization and understand the message you are trying to convey without any additional information. These types of studies and metrics are commonly used in areas such as marketing and web design because they provide insight into how individuals will interpret and respond to their ideas or designs.

Rules of Thumb

Edward Tufte is a pioneer in the field of developing effective visualizations and has written multiple books on the topic (I will reference these at the end of the article).
Here are three of his rules for effective visualization:
  • Graphical integrity
  • Maximize data-ink ratio
  • Avoid chart junk

Graphical Integrity

We have already discussed this to somewhat extend when discussing misleading visualizations. In general, it is bad practice, and somewhat harmful to society, to try to mislead individuals with statistics.

Maximize Data-Ink Ratio

This rule of thumb is about clarity and minimalism. In general, 3D plots tend to be less clear and can be misleading in some cases. Examine the differences between the two charts below and decide which you think is better.

Avoid Chart Junk

Extraneous visual elements distract people from the message being conveyed.

Final Comments

As a way to reward you for sticking through this article, here is the visualization that was selected as the winner of the “Information is Beautiful Award 2018”. This visualization renames subway stations in London after the Instagram hashtag that is most often associated with that location.



Popular posts from this blog

Five predictions for AI and process automation in 2020

Leveraging Power BI for better insights than Excel