Most people would probably guess that the average umpire is really poor at calling balls and strikes, but this turns out to be untrue. Ian York has shown us the importance of catcher framing, how the strike zone has changed since the introduction of PITCHf/x, and the repertoires of the Red Sox starting rotation. Now he shows us that umpires are actually very good at calling balls and strikes, and what the impact of an inconsistent zone is on the game.
How many bad strike calls do umpires make? Umpires are very, very good at calling balls and strikes, but they are not perfect. So just how good are they?
Answering this using PITCHf/x data seems as if it should be relatively straightforward. PITCHf/x shows the final location of a pitch as it enters the area around the plate, and it shows the call that the umpire made. So the simple approach would be to ask how many of the called strikes were outside the strike zone and how many called balls were inside the strike zone.
There is one major problem with this approach: In reality, there is no such thing as the strike zone. The strike zone has different shapes and sizes for left-handed and right-handed batters, and the zones have increased in size, and changed shape, since 2010.
None of these strike zones match the zone that is defined in the rulebook, but they are the reality that hitters and pitchers face. If we were to use the official strike zone as a guide, we would conclude that all umpires call about 15% of balls and strikes incorrectly – which would be approximately 20 bad calls per game. But that is not completely helpful, because it is not based on the real-life strike zone. It also does not match our expectations: Only the most persnickety fans would say the average umpire blows three ball/strike calls every inning of every game. The umpires say they get more than 95% correct, which seems to more closely match what we see at the games – fewer than one blown call per inning.
It is more helpful to ask – given that there are real-life strike zones – if a particular umpire makes his calls based on those strike zones. But that is still missing another dimension of umpire ball/strike calls. Different ball/strike counts get different-sized strike zones – the zone as actually called by the umpires is much smaller for an 0-2 count than for a 3-0 count.
A legitimate argument can be made on whether it is acceptable for the zones to change based on count. But even if you do dislike the variance in zone size, it is not fair to double count it against the umpire. That is, it is fine to complain about the larger zone on a 3-0 count. However, if you then point to each of the strike calls that are only in the 3-0 zone, and say they are missed, it is blaming him twice for the same pitch.
Therefore, it is a question of uniformity vs. consistency. Umpires call non-uniform strike zones all the time. In a 3-0 count, the zone they call will not match the baseball-wide zone for an 0-0 or an 0-2 count. But if every time a count comes up, and they call the same zone as they did the last time it came up, they are being consistent with their calls. No umpire is completely uniform, but an umpire who calls a consistent zone is one who the players will perceive as being fair.
To look at the number of missed calls using PITCHf/x, it is necessary to look at the context of each call. What is the normal strike zone for all umpires in that particular year for left- or right-handed batters? What is the count? Given that context, we can then ask if a call matches what the players have learned to expect.
Fortunately for us, PITCHf/x has all of this information. To ask about ball/strike calls in context, the balls and called strikes were tabulated for each umpire, in each year since 2008. The pitches thrown in a neutral count (0-0, 1-0, 1-1, 2-1, and 3-1) were looked at to avoid the effect of the changing strike zone size for the pitchers’ and hitters’ counts. (Around 62-63% of called pitches are thrown in neutral counts. The trends shown here are equally obvious when all counts are used, but the actual numbers should be more reliable when limiting to neutral counts.) We know what the baseball-wide strike zone for each year is for both left and right-handed batters; the edges of the zone can be defined as the region where there is a 50/50 chance of a pitch being called a ball or a strike.
For the pool of neutral-count called pitches for each umpire, it was determined whether a called strike was outside the strike zone for that particular year and batter handedness, and whether a ball was inside that zone. (There was 1 1/2 inches of leeway – half a baseball diameter – on either side of the strike zone margins. If the edge of the ball overlaps the 50/50 line, it could legitimately be called either way without the umpire being inconsistent and PITCHf/x is probably only accurate to within an inch or so either way.) To generate a single number, the total number of out-of-zone (“OOZ”) pitches – that is, pitches that should have been balls, but were called strikes, or vice-versa – was divided by the number of called pitches for that umpire’s season.
Here is what that looks like for an umpire who is in the middle of the pack. Phil Cuzzi ranked 27th of 50 umpires in 2008 and 26th in 2014. These are all the pitches he called incorrectly in those years, according to the above criteria:
The strike zone for the particular year is the dashed grey polygon; we are looking at the strike zones from the umpire’s viewpoint. Called strikes that were outside the strike zone are shown in red; called balls that were inside the zone are shown in blue. The overall distributions of all the called pitches for Cuzzi are shown as a contour map in the background.
Overall, it is clear that even though Cuzzi is just middle of the pack in terms of accuracy, he still did an excellent job calling balls and strikes; he called just 4.2% and 3.4% incorrectly in 2008 and 2014 respectively, comfortably over the 95%-correct mark that the umpires claim. What is more, the vast majority of the missed calls were right on the edges of the zone. Over the course of a season, umpires will incorrectly call some singularly amazing balls and strikes, but almost all the calls that they miss are very close.
Remember that this is only showing calls made in neutral counts. If we also include 0-2 counts, for example, there are fewer out-of-zone strike calls, because all umpires shrink the strike zone slightly in those circumstances. Notice how much smaller the 2008 zone is than the 2014 zone. If we judge Cuzzi in 2014 based on the 2008 zone, or if we judge his 0-2 calls based on his 3-0 strike zone, he would be wildly inaccurate, but the same would be equally true of every umpire in baseball.
Assuming that this approach is actually measuring something relevant, what can we learn about umpire consistency over the years? Not only are umpires able to review their ball/strike calls after the fact, they receive evaluations based on the data from PITCHf/x. One unexpected side effect of this evaluation has probably been the gradual expansion of the strike zone since 2010. Has it actually had the desired effect? Here are the annual out-of-zone percentages (in neutral counts) for all the umpires who called at least 2,500 pitches in a season:
Umpire accuracy has improved significantly since 2008. Umpires in 2014 made about 20% fewer out-of-zone calls than in 2010, dropping from an average of 4.3% to 3.3%.
PITCHf/x became available in all ballparks in 2008, yet the improvement in umpire strike calls didn’t start until 2011, followed by even bigger improvements from 2011 to 2013. Why didn’t umpires begin to improve their accuracy starting in 2008? Did something happen between 2010 and 2011?
In January of 2010, the umpire’s union ratified a new contract with MLB that, for the first time, allowed management to use video to evaluate umpires. So from 2010 on, umpires have been evaluated using the “Zone Evaluation” system (essentially, PITCHf/x-based video evaluation). These evaluations are private, so we do not have access to their conclusions but we can see that as soon as umpire evaluation with PITCHf/x started, they got better at calling balls and strikes. Even though umpires were already very good at this part of their job in 2008, they have become significantly better at it since the implementation.
Robot umpires might be slightly more accurate than human umpires, but not by much. The average human umpire already achieves 97% accuracy. What would certainly be different is that the strike zones would immediately change shape and size, altering the game dramatically.