Evaluating Ammunition Performance

Is load A better than load B?  When I ask that question, I usually want to know whether A is faster than B or, perhaps, more accurate than B.  I try to answer that question by firing A and B in such a way that any difference in performance will be due solely to the differing characteristics of A and B and not to differences in shooting conditions.  I insure this by firing from a bench rest on calm days.  How, then, do I use the results to choose the better load?  In the following discussion I will try to answer that question.  I will use some terms from the science of statistics, but I am not a trained statistician.  I hope to show a practical, nonmathematical approach to load analysis that you can use to make good choices.


Open a new box of factory ammo and you will see a lovely, shiny bunch of cartridges looking more alike than peas in a pod.  Fire them in your favorite pole and you will actually find a range of velocities.  Also, if you shoot five groups, you will find five different group measurements.  In other words, there is “scatter” in the shooting results.  A statistical discussion may use this term or may use the more formal term, “dispersion.”

The factory took great pains to insure uniformity of these cartridges.  Why, then, don’t they all fly into the same hole at the same velocity?  Because the firing of a cartridge involves too many variables to insure perfect uniformity.  Think of all the variables involving primer, powder, bullet, and case, and realize that these varying components conspire to give maximum bullet boost with a few milliseconds following ignition in a chamber and barrel whose characteristics are changing a bit with each shot.

We have to live with scatter, but it stands to reason that we would like to see it minimized, and for that we need a measure.  The most common measure for scatter in the shooting fraternity is the range of values, more often called “extreme spread.”  It is simply the difference between the highest and the lowest of the measured values.

A Savage Model 45 .30-06 Used to Fire for Velocity of Load A Below

A Savage Model 45 .30-06 Used to Fire for Velocity of Load A Below

For example, say I want to know the velocity of Factory Brand A using a 180-gr bullet in my .30-06.  I fire ten rounds over the chronograph and get the following values (fps).

2622, 2619, 2601, 2593, 2589, 2579, 2578, 2560, 2511, 2507

I see that the extreme spread is 115 fps.  That seems large, but I also note that the two lowest values seem rather far removed from the group of the other eight, for which the range is only 62 fps.  These low values are called “outliers.”  Outliers are encountered frequently in a set of measured velocities.  I might be tempted to throw them out and calculate the average for the remaining cluster, but that would not be playing fair and cannot be allowed.  These are the measured values and I can only assume at this point that they reflect the true performance of the load.  The range is what it is and it does give me some preliminary idea of the uniformity of the load.

What Does Load A “Want” to Shoot?

Don’t bug me with scatter!  Just tell me what the load is likely to give me (most of the time?).  Is it fair to ask for this?  Well, yes, it is because the values of seemingly random scatter have a tendency to cluster around a certain value in the range.  Statistics calls this the “central tendency” and if we can identify it, we will have something to hang our hat on.  The most common measure of the central tendency for shooters is the “arithmetic mean,” and it is just fine to call it the “average velocity.”  To get it, sum the values and divide by the number of shots.  Of course, your chronograph will happily do the calculation and will give you this value after you have shot your string.  For my .30-06 data above that mean value is 2576 fps.  I may now go around telling people that the velocity of Load A in my rifle is 2576 fps.

Note that the mean value tells me nothing about the uniformity of the load, and I earlier noted that two of the values in the list seemed to be abnormally low.  In other words, the data are “skewed.”  Is the average value therefore the best expression of how my rifle shoots this ammo, since it seems to be unduly influenced by these low values?

A Better Kind of “Average?”

Let’s look at an alternative kind of average.  It is called the “median value.”  Put the velocities in order and the median is the value that has an equal number of values above and below it.  If the list has an even number of values, the median is the average of the two middle values.  It is 2584 fps for my .30-06 data.  Note that this value is higher than the mean value, and perhaps, therefore, gives a better indication of what the load is doing with its eight clustered values and two outliers.

Another illustration of the worth of the median value is shown by a second example, involving eight rounds of a .30-30 handload, 35.2 gr LEVERevolution plus Hornady’s 160-gr FTX bullet, fired from a Remington Model 788:

2372, 2356, 2346, 2345, 2314, 2229, 2217, 2180 fps

Can you calculate the mean, median, and extreme spread in the manner shown above?  The spread of 192 fps is quite large and is a result of a couple of outliers.  The mean value of 2288 fps in this case is much lower than the median value of 2330 fps.  Here, again, the median is a better indication of what this load is doing and it allows us to get better value from these eight shots.

As in the two examples shown above, one or more outliers will often be observed in a set of measured velocities.  It is my opinion, then, that the median is a better measure of a load’s tendency than the mean value.  However, the mean velocity is the value more often used in the shooting literature.  Sometimes the difference in the two measures is rather small, but sometimes it is not.

Back to Scatter

Now that we have identified expressions of what a load “wants to do,” we can return to the idea of uniformity.  One measure of the amount of scatter in a group of shots is called the Standard Deviation (SD).  A regular part of statistical treatment, the SD is calculated using squared values for the velocity differences found when comparing the individual shots with the mean value.  The Standard Deviation is the scatter measure most used for shooting results because most chronographs calculate SD for you after you fire a string of velocities.  Thus, you can get an SD value without knowing how it is derived or what it means.

The standard deviation of Factory Load A above is 39.9 fps.  A drawback of the SD value is that its meaning is not very intuitive.  It does not give an easy mental picture of its meaning.  On the other hand, a strong point of SD is that it can be used for additional calculations to determine whether the means of two series of values are really different, or different simply due to chance fluctuations in the data.  This is especially useful when comparing the velocities of two different loads.  An example will be given later.

Another Kind of Scatter Measure?

An alternative, more concrete measure for scatter is the “Average Deviation.”  To get it, determine the difference of each velocity from the mean velocity.  For the Load A above (mean velocity 2576 fps) the list is:

Velocities: 2622, 2619, 2601, 2593, 2589, 2579, 2578, 2560, 2511, 2507 fps

Deviations from the mean:  +46, +43, +25, +17, +13,+3 ,+2, -16, -65, -69..

Considering all differences to be positive values, calculate the average, and that is the AD.  The smaller the AD, the more uniform the load. The average deviation may be calculated for the mean or for the median.  For this data set, the values show 30 fps average deviation from the mean.  (Similarly, 29 fps is the average deviation from the median).  Note that you will have to make this calculation as your chronograph does not give it to you.  Having made it, I now know that a shot of Load A will fall, on average, 30 fps from the mean value.


So far I have only discussed the velocity performance of Load A.  Generally, however, I want to compare Load A with Load B, so I will also fire ten rounds of B over the chronograph.  Commonly, I might be comparing similar factory loads from different companies, or similar handloads with different bullets.  The most important question is: “Is there a difference in the velocities of Load A and Load B?”  If I determine that the answer is yes, I will also know which one has the higher velocity.  The alternative answer is that there is no difference in the velocities of Load A and Load B.  So we calculate the mean or median for Load B and, upon comparing our answer with that for Load A we will know for sure, right?  Maybe, but it is also possible that we will not know for sure.

Here are example velocities for ten rounds of Factory Load B (fps).

2567, 2553, 2541, 2519, 2510, 2507, 2506, 2505, 2502, 2491

The extreme spread is 76 fps, which seems good, and the mean velocity is 2520 fps, which is 56 fps less than that of Load A.  (The median velocity is 2508 fps).  Load A is identified, therefore, as the faster load.  Well, yes, but hold on, because there is “overlap”.  That is, some of the slower rounds of the faster mean load are slower than some of the faster rounds of the slower mean load.

Did you follow that?  Some of the slower load rounds beat some of the faster load rounds, and this creates a gray area that threatens our conclusion about the faster load.Presentation1

How much overlap is there?  Note that it extends from the value of 2567 of the slower load to the value of 2507 for the faster load.  We thus see that nine velocity values of the two sets are in the “overlap area.”  These nine of the total 20 shots cannot be unequivocally assigned as belonging to either Load A or Load B.  Therefore, in this area, 45 % of all shots, Load A is not faster than Load B; the two loads are not distinguishable in velocity in this area.

Overlap of two sets of data is quite common and is a result of scatter in the sets.  You can see that if the overlap area included, say, 16 of the twenty shots you would say  “No difference here!”  Alternatively, if the overlap was two shots or less you would say “Yup, those loads are certainly different.”  But how about this case where the overlap is moderate?

A Median/Average Deviation Treatment

Here is a way to get a reliable conclusion without using standard deviations or further statistical calculations.

Note the median velocity of Load A is 2584 fps with an average deviation of 29 fps.

Note the median velocity of Load B is 2508 fps with an average deviation of 18 fps.

Get hypothetical ranges for A and B by taking the median velocity plus or minus the AD in each case.

The results are:

On average, a shot of Load A will fall in the range 2613 to 2555 fps.

On average, a shot of Load B will fall in the range 2526 to 2490 fps.

Note that these two ranges do not overlap.  Therefore, I will conclude that the median values of A and B are reliably different.  Load A is faster than Load B by about 56 fps.

To verify this conclusion, and without showing the boring details, I will report that I did a standard statistical analysis using the standard deviations of the two velocity sets and the results showed that the difference in the mean values of the two sets is very significant.  That the mean values would not be different, given the velocity data obtained, is a situation of very low probability.  Therefore, I can say that hard statitistics support my earlier conclusion.  Load A is faster than Load B.

Generalizing this method, let me say that if the ranges of two loads, based on median velocities and average deviations as above, do not overlap, then you are pretty safe in concluding that the mean and median velocities are really different, and you can pick the winner with some confidence.

Statistical Treatment of Accuracy

The usual investigation of accuracy involves shooting groups, of three, four, or five shots, and using the center-to-center maximum spread of the groups as the measure for comparison.  This seems appropriate because it is similar to the way rifle shooting activities, especially target activities, are conducted.  Let me say here that if you are a hunter who is satisfied when your chosen ammo puts 3 in a couple of inches from a cold barrel at 100 yards, then you need read no further.  You know all you need to know.

To arrive at reliable conclusions when comparing two similar loads we need to fire more than one group with each.  The number of groups to be fired might be argued on the basis of time and ammo cost.  I will suggest, for a compromise minimum, fire at least four, three-shot groups with each load.  Five, 4-shot groups with each would be better and would consume just one box of ammo for each.

Evaluation of accuracy is a different animal because the condition of the rifle and the skill of the shooter are introduced as important factors.  This is no small deal, because accurate evaluation of ammo accuracy requires that no scatter be introduced by gun or shooter.

This condition is difficult to insure.  The rifle must have no issues with action bedding, sight attachment and condition, or trigger function.  The rifle rest must allow absolute uniformity of position from shot to shot.  The shooter must release every shot in exactly the same manner, and must prevent variations due to wind conditions or barrel heating.  It is a challenge to fire 20 consecutive shots with identical technique for each shot.  That is where the shooter’s ability and experience come in, and that is why we shoot a lot.

That said, I would really like to use examples to develop a common sense, intuitive approach to the evaluation of accuracy.  Why?  Because ammo is very expensive.  Few of us can afford to shoot as many groups as are often required for a statistically ironclad conclusion. Therefore we must be able to use good judgment in assessing a limited number of groups.

Two .30-30 Loads: Four Groups Each

Two .30-30 Loads: Four Groups Each

The first example above compares two factory loads using bonded bullets for the .30-30.  Four, three-shot groups were fired for each using a Remington Model 788 and the measurements can be easily read in the picture.  Both loads did very well.  The groups are small and show little tendency to produce flyers.  The Federal Fusion appears to be more accurate, but there are cautionary features.  The difference in the mean values is only 0.06,” but the ranges are much larger, 0.40” for the Winchester and 0.33” for the Federal.  Common sense says we have a problem if the ranges of each set of values are six times the value of the difference in means.  Overlap is excessive and the normal variation in either of the loads swamps any difference in the means.  Formal analysis verifies that the difference is not significant; in this case we cannot say that one load is more accurate than the other.

A second example provides another comparison of factory loads for the .30-30.  The first load is Hornady’s LE 140-gr Monoflex load.  It throws an all-gilding metal, boattail, spire-point bullet.  The second is PRVI Partizan Uzice 170-gr flat nose, a conventional type of .30-30 load.  In this case, six, three-shot groups were fired for each with the Remington 788, with results as follows.

Hornady LE Monoflex:  0.59, 0.55, 0.51, 0.60, 0.56, 0.64”  Mean = 0.575”

Prvi Partizan Uzice:  0.27, 0.31, 0.27, 0.44, 0.33, 0.64”  Mean = 0.377”

Groups Using Hornady LEVERevolution Monoflex 140-gr.

Groups Using Hornady LEVERevolution Monoflex 140-gr.

Groups Using PRVI 170-gr. .30-30

Groups Using PRVI 170-gr. .30-30


Examination shows that the performance of both loads is very good.  The Hornady rounds produce very uniform groups.  The Prvi rounds average a bit better, but the set has an outlier.  Some observations are that we have more groups to compare here than we had with the first example and the ranges are not as large in comparison to the difference in the means.  The common sense approach moves me to say that the Prvi is more accurate in my Model 788, and that most folks would probably not argue.

A formal statistical analysis supported this conclusion.  The difference in the means, 0.198”, was described as “very significant” according to the calculation method I used.  The chance that this difference could have arisen randomly was less than 1%.  Summarizing the common sense features here, we are comparing a fair number of groups and the difference in their means is relatively small compared to the group means, themselves.  Also, the range of the group with the larger mean is less than the difference in the means.

A third example compares a factory load with a handload for the .300 Savage.  The factory load is the Winchester Super-X with 150-grain bullet.  The handload used IMR 3031 behind a Hornady 150-gr, spire point.  The four, three-shot groups for each, fired with a Remington Model 722, are shown in the pictures.

Winchester Factory 150-gr .300 Savage

Winchester Factory 150-gr .300 Savage

Handload Using IMR 3031 and 150-gr Bullets

Handload Using IMR 3031 and 150-gr Bullets


The mean values are 0.82” for the Win factory load and 0.62” for the handload, and that seems to be a significant difference.  The Factory load is very uniform with a range of only .09”, while the handload reaches .30” in that regard.  That raises a flag because it is larger than the difference in the means.  Still, I would be tempted to report that my handload is the more accurate ammo.

However, the formal statistical analysis shows that the difference in the means is “not quite statistically significant.”  The range of group sizes for the handload is telling me that I need to shoot more groups if I want to be very sure of my conclusion.

Again we see that one must compare the difference in mean values for two sets of groups with the range of values observed in each set.  Reliability requires that the ranges be smaller than the difference in the means.