Does a .300 hitter actually hit .300? A small sample size can make a seemingly incredible stat less than impressive, but when does a sample stop being a small sample? Mike Richmond looks at some real world examples to test the theory he came up with in his previous article.
If a baseball player hits .300, does that mean that he’s a .300 hitter? In a previous article, I put this seemingly tautological question under the microscope, and decided that the answer can be “no.” If one makes a very simple model of a .300 hitter, one that treats him as a die with a fixed probability of .300 of getting a hit in each at-bat, then it turns out that his actual batting average after any reasonable number of attempts will very likely NOT be .300. Instead, it will fall usually within some small range of .300; the larger the sample, the smaller the range.
More precisely, our simple model predicts that if a batter has some constant intrinsic batting average P, and if his chance of getting a hit in each at-bat is an independent random event with probability P, then over a season, his actual batting average ought to fall within some particular range of P. For a sample of 648 ABs, our simulations suggest that 95 percent of the batters should fall within +/- .036 of P. For a more typical 600 ABs, the range should be a bit larger; perhaps +/- .040, as a round number (these values updated July 8; thanks to AN for the tip).
This article tests that simple model: Do its predictions match the performance of real players? Our first examples will be three batters who have hit well for a long time; after that, I’ll examine a couple of players whose poor performance in small sample sizes has raised questions about their futures.
Three good hitters with long track records
Let’s pick three players to check this prediction, players with careers long enough to provide a decent statistical sample. To make it interesting, we’ll choose good hitters: The two active leaders in career batting average in the American League, plus the active leader among the 2015 Red Sox. Can you name these players?
(I’ll wait for you to guess.)
In each case, I’ll discard the first three years from the batter’s record, to account for an extended learning period. I’ll also run simulations for 600 ABs (4 per game for 150 games), rather than 648 (4 per game for 162 games), since that matches more closely the typical experience of a healthy MLB batter.
In the case of Miguel Cabrera, his “mature” career phase began in 2006 at 23 years old with the Florida Marlins. After two good years in Florida, Cabrera moved to the Detroit Tigers, where he has flourished ever since. A set of 1,000 idealized .329 hitters would have a range of performance shown by the red bars in the histogram below; 95 percent of them would fall between .292 and .368, as indicated by the blue band. The real Cabrera’s statistics, shown by the black circles and magenta triangle, lie squarely within the range of the idealized model:
Our next batter, Joe Mauer, has spent his entire career with the Minnesota Twins. His first “mature” year was 2007, at age 24, so he started one year older than Cabrera. A group of idealized batters with his .311 batting average falls mostly into the range .273 to .349. All but one of Mauer’s actual year-end values do fall within this range… but that one outlier is a doozy. In 2009, Mauer hit .365 with a full slash line of .365/.444/.587 for a league-leading 171 OPS+, all while winning the AL MVP:
Our final example of a long-time player is Boston’s Dustin Pedroia. He was a quick learner, as his best batting average (.326) occurred in his third year (2008, at 24 years old). That won’t appear in our analysis, since we discard the first three years of each player’s career, leaving Pedroia’s “mature” batting average at .295. As the graph below shows, Pedroia has been very consistent, staying well within the bounds of the expected performance:
The simple model of batters behaving like dice does provide a decent match to the actual performance of real players. For these three players, we find that 25 out of 26 seasons fall within the 95-percent regions. Yes, that means that one season falls outside the expected range; but that’s exactly what one would expect: about 1 of 20 SHOULD be outliers.
Great! Our model appears to provide reasonable results. Now, let’s apply the model to a couple of interesting cases from this season’s Red Sox team.
Jackie Bradley Jr. – not really a fair test?
Jackie Bradley, Jr. is a fairly young player: He turned 25 years old in April and has spent portions of three seasons in the major leagues. As I mentioned in the previous article, all real baseball players are subject to a wide range of outside influences, but young players can also be strongly affected by inside influences, such as experiencing real game situations and facing major league pitchers for the first time. Young players who are still learning how to hit will (one hopes) improve their “true” batting average over time, reaching a plateau after one or two or three years which will afterwards be remembered as their basic skill level.
Since one can argue that Bradley falls into this category, it may not be fair to apply our statistical tests to his performance; after all, we began with the assumption that each player is like a die with a fixed probability of getting a hit at each at-bat. If Bradley has been learning and improving over the past two years, that assumption is false, and would invalidate any comparison with the model. So, please the take following with a generous grain of salt.
I will compare Bradley’s real performance to the results of two groups of idealized players: 1,000 who have a “true” batting average of .200, and 1,000 who have a “true” batting average of .250.
The first thing we might ask is “What can we learn from Bradley’s performance in 2015 alone?” Since he has only 30 ABs, you might immediately think “We can’t learn anything, really, from such a small sample.” And, I would say, you would be right. Quite a few players from both idealized groups perform just as well (or poorly) as Bradly during this short period:
Now, if you are willing to assume that Bradley has performed at a constant level over the past two seasons – his first experience in the big leagues, apart from a cup of coffee in 2013 – then you could add together his performances in 2014 and 2015, which cover 414 ABs. Over this stretch, Bradley has a cumulative batting average of .193.
If one is willing to claim that Bradley has reached his peak level of performance over this period, then one has to admit that it is rather unlikely that he is a .250 hitter.
Mike Napoli – a fair test?
Unlike Bradley, first-baseman/designated hitter Mike Napoli is a MLB veteran: He broke into the American League with the Angels in 2006 at age 24. If we give him three years to learn the craft of hitting, we can say that his skills ought to have reached their true level in 2009.
Between 2009 and 2014, while playing for the Angels, Rangers and Red Sox, his batting average bounced up and down between .227 and .320. This year, however, it currently sits at .192. Is this a sign that his skills have suddenly declined?
We can’t say for sure, but what we can do is compare his performance to that of an idealized batter or two. Let’s pick two models: an ideal .260 hitter, which might represent the projections for Napoli before the 2015 season, and an ideal .200 hitter, based on his performance so far this year. As of July 6, 2015, Napoli has had 260 ABs, which correspond to 65 games of 4 ABs each in our simulator:
First, we can see that the two idealized batters aren’t so easy to distinguish after 61 games; but if you’ve been paying attention, then that shouldn’t be a surprise.
Second, if one glances at the figure very quickly, one might focus on the blue symbols: Napoli’s 2015 average of .192 falls exactly in the middle of the distribution of values for the group of .200 hitters. “He’s cooked!” one might cry, “He’s turned into a terrible hitter.” If one wants to look at things that way, one certainly can – but it doesn’t tell the whole story.
Consider the red bars on the graph. Napoli’s performance over the past seven years covers quite a range – which happens very nearly to correspond to the 95-percent region of a true .260 hitter. We have to be careful here, since we are to some extent comparing apples to oranges: the black circle symbols represent Napoli’s performance over larger sample sizes (352 to 498 ABs) in previous years, while the magenta triangle and red bars represent performance over a small set of only 244 ABs. Still, the fact that the triangle appears at one end of the red band tells us that a true .260 hitter would sometimes finish June with a batting average similar to Napoli’s .192; in fact, just about as often as a true .260 hitter would end up with a .320 average, as Napoli did in his career year of 2011.
So, Neo, the choice is yours: you can look at the blue points, and convince yourself that Mike Napoli has fallen off the cliff, never to return; or you can look at the red points, and decide that his current struggles are just the result of a run of bad luck. It’s up to you.