Most of us usually rely on the use of summary statistics (mean aka average) on analyzing and reporting the results of our performance and/or analysis. I might say that I have been trapped in this convention for quite a long time too. However, using this conventional way takes away insights that are significantly useful in our analysis and decision making. This is one realization I had when I used a boxplot for a report I did few weeks ago.

Illustrating some actual scenario

Let’s take an example. Say, I pulled out the data of the line output of HGA Assembly and take the average to represent the daily performance. Calculated average is 29.4KHGA and comparing it with the target of 31KHGA, I will conclude that we have a problem on not meeting the requirement. However, plotting the same data set using a boxplot might give us another perspective. Let’s set this aside and review some basic concepts about box plot before we proceed.

What does a box plot looks like?

The figure below shows the anatomy of a boxplot. This graphical technique is based on the principle of quartiles. A quartile divides your whole data set into four equal parts (1st to 4th quartile). In case you missed this one during your Six Sigma class, the second quartile is also our median and serves as the basis of the central tendency for a boxplot. By having a value of median, it tells us that half of the data points fall below that value and the other half above that value. The same logic applies for the rest of the quartiles.

1

The IQR (inter quartile range) is the measure of dispersion for a boxplot and is also used to determine the end points of the whiskers (upper and lower). The upper and lower whiskers are determined by adding 1.5 of the height of the IQR to Q3 and Q1, respectively. Any point that falls outside these whiskers are considered to be outliers.

2.png

How to think outside the boxplot?

One good thing about the boxplot is that it gives us a quick snapshot of the distribution of our data and in a glance can provide us insights about our data’s central tendency and dispersion. (See how good this tool is. )

Fundamentally, boxplot is used for two most common reasons.

  1. See the distribution of a data set for baselining and/or target setting
  2. Compare the distribution of data sets across a given category

Let’s apply #1 and create a boxplot to the HGA Assembly line output data we are talking about earlier. Just to recall, average line output is 29.5KHA.

Using the generated boxplot above, we can get more insights compared to using a summary statistic (mean). Q1 is at 29.1KHGA which means that around three-fourths of the plotted data points are already meeting the required target. Our Q3 is at 31.3KHGA which means that one-fourth of the data points are already demonstrating the 31KHA line target. We can also see several outliers extending up until ~21KHGA region. By looking at the mean alone, we will not have a grasp of this information. By having a boxplot like this, we are getting information about the variation within a group.

Now, say we are asked if how ready are we with the 31KHGA requirement, we cannot answer readily with some summary statistic and we need to consult the distribution of the data set. With the boxplot result, the insight can be taken and used to baseline, i.e., one-fourth of the data points already hitting the target, thus, the 31KHGA target is feasible.

Let’s apply #2 to the same data set but this time looking on a daily basis.

3.png

Looking at the boxplot above, we can see that the line output varies from day-to-day. Outliers are present daily and could trigger questions like:

What causes the variation seen in a daily basis?

Are the outliers from the same line?

What are the conditions that are present/existing when the line achieved the target output?

By answering the above questions, we can understand the variation between groups and take actions to reduce it and optimize performance.

Wrapping it up

Context is everything especially in statistics or data analysis. Boxplots can be useful in understanding our data in terms of the central tendency and spread. Its applications will vary depending on our creativity to do so. It will not be the graphical tool for all our requirements so I guess the wisdom is when to use this tool. Happy “Thinking Outside the Box……plot!”