As one of the developers of MLB's Statcast system, I had the chance to deal with large amounts of baseball data in the last years. Here, I'll try to share some interesting ways to visualize baseball data (I think), and capture your attention to what may be one of the biggest sports tracking system ever created.
Como um dos desenvolvedores do sistema de rastreamento Statcast (Major League Baseball), eu tive a chance de lidar com grandes quantidades de dados sobre baseball. Nesta página, eu vou compartilhar alguns meios interessantes de visualizar estes dados, e tentar atrair sua atenção para o que provavelmente é o maior sistema de rastreamento de dados de esportes já criado.
In a time when defensive shifts are commonplace and everyone is arguing about how efficient they might be, shouldn't we expect that more balls would be gloved by infielders? The percentage of balls gloved by infielders is roughly the same in the last 4 years however, suggesting that batted balls end where they always did, regardless of shifts (the data).
The play descriptions provided by MLBAM, like "Matt Carpenter grounds out, shortstop Jose Garcia to first baseman Xavier Scruggs", may be generalized and also used to group plays by similarity. They are focused on high-level actions, and provide some interesting insights when we look to the resulting groups.
Umpires are incredibly consistent, and one would expect random errors around the zone. Random errors would mean a balanced distribution of missed calls on both sides of the zone, but that's not exactly what we've seen. By taking the StatsAPI data into consideration, you can easily spot some shifted zones among the MLB venues.
Os árbitros do baseball são muito consistentes, e assim nós esperamos apenas erros aleatórios nas chamadas ao redor da strike zone. Erros aleatórios significam um número aproximadamente igual de erros nos dois lados da strike zone, mas isso não é exatamente o que pode ser observado nos dados de 2018. Em alguns estádios, o número de erros em um dos lados da strike zone é significativamente maior, o que pode indicar algum tipo de problema.