This tutorial will walk you through plotting a histogram with Excel and then overlaying normal distribution bell-curve and showing average and standard-deviation lines.
To produce my random normal samples I used VBA function RandNormalDist by Mike Alexander. I created samples with a mean of 100 and standard deviation of 25, function RandNormalDist(100, 0.25).
The actual mean and standard deviation was 100.84 and 27.49 respectively.
The samples can be checked to confirm normally distributed by comparing the mean, median and mode which should all be equal.
The first thing to do is produce the histogram. This is done by creating bins of a certain width and counting the frequency of the samples that fall in each bin.
Start by calculating the minimum (28) and maximum (184) and then the range (156). Using Sturges’ formula the number of bins is 9, using the square root method the number of bins is 15.
To get a bin width, divide the range (156) by the number of bins (9) which results in 17.33, round this up to an even 20 to produce nice round bin widths.
Set up the bins starting at the minimum and ending at the maximum, using the Excel FREQUENCY function to determine frequency in each bin. The FREQUENCY Function must be entered as an array (ctrl-enter).
Using a column chart a histogram can be produced.
Histogram using Scatter Chart
Overlaying a normal curve is a little trickier, firstly, the above column chart can’t be used and the histogram must be produced using a scatter chart.
Select the data and produce a scatter chart with smooth lines.
Select the chart and click on the ribbon menu, Layout, then Error Bars and then More Error Bars Options. Select Display Direction Minus, End Style No Cap and Error Amount Percentage 100%.
This will produce a scatter chart with the following error bars.
Increase the Line Style Width so that it starts looking like a histogram with no gaps.
Finally, remove the scatter chart line.
For the normal curve the points need to be created first. The bell curve looks nice when it covers the full 6 standard deviations. 100 points will be created for a nice smooth curve.
Multiply the standard deviation (27.49) by 6 to get 164.96, divide by 100 to get an increment of 1.6496. Starting at minus 3 standard deviations (equal to the mean minus 3 standard deviations (18.36)) increment the value by 1.6496 all the way up to positive 3 standard deviations(183.32). There will actually be 101 total points.
Now for each of those points the normal distribution shall be calculated using Excel’s NORMDIST function. The first parameter is the values we calculated, the second the mean, the third the standard deviation and the last should be FALSE as we don’t want cumulative (NORMDIST(Q1,100.84,27.49,FALSE)).
If you plot the data you will notice a very short normal distribution curve, barely visible as a bell curve due to differences in scale. For our sample of 200 points with bin width of 20, each sample represents a square of 20 by 20. So the total area of our histogram is 200 by 20 which is 4000. The normal distribution has a total area of 1, so the normal curve must be scaled by 4000. And this produces a nice bell-shaped normal curve over the histogram.
You may notice that the histogram and bell curve is a little out of sync, this is due to the way the bins widths and frequencies are plotted. If you have a bin width of 20, and the bin value is 40, the corresponding frequency is all values between 20 and 40. When you plot this value on a scatter chart, the centre of the bar is at 40 and the bar width being plus and minus half the bin width (10), which is 30 to 50 respectively. That’s why the histogram looks shifted to the right.
To fix this, create a temporary fixed bin that has half the bin width (10) subtracted from it and use this when plotting the histogram.
Mean and Standard Deviation
Since it is a scatter chart, it is possible to add additional indicators including mean and standard deviation lines.
Simply produce a single line segment from 0 to the height of the bell curve using the previous NORMDIST function.
Tidying up the colours results in the following final histogram with overlaid normal curve and mean and standard deviation indications.