This post is about statistics. You have been warned.
Now then: isn’t Bojan adorable? Doesn’t Busi look like a horse? Shouldn’t we burn Ibra at the stake if he ever comes back to visit? These questions can all be answered, to some degree or another, with numbers. Bojan, for instance, is way more adorable when you say that he scored as many goals per minute as Ibra last season or if you point out that his productions levels at 20 are way ahead of Pedro’s at the same age—don’t forget that Pedro is already 23, after all. And Busi totally looks more like a horse if you replay the gif of him against Inter about 30 times.
The Ibra thing I won’t even get into because of the whole Hulk Smash thing I do when you disrespect The Guardiola and, really, my lovely lady is getting tired of replacing the coffee table (and you wondered how Ikea made so much money–aha! Now I know it’s a dastardly Swedish plot to get at my moolah!).
So, as Kevin is fond of pointing out, tread carefully around statistics because they can be bent to the will of most anyone. Except Kevin. Because he goes comatose at the site of decimals. The following is a series of pliable numbers that I’m trying out for the first time and will happily take your suggestions on how to better utilize them.
I wondered aloud the other day how easy it would be to calculate points earned per appearance for individual players and then I wondered aloud how easy it would be for me to do that. And so I set off on a long and arduous quest to figure that out. And it turns out all I had to do was scare up the match stats for every match Barcelona has played so far this year and then it was a synch to figure it all out. Boom, done.
So Busi is the king of that stat, with 2.67 points per match (6 appearances, 16 points earned) and Puyol is runner up with 2.60 (5 apps, 13 points). The much-maligned Mascherano earns 1.83 per appearance, which, interestingly enough, is the same as Lionel Messi. But hang on a minute—it wouldn’t be fair to compare Bojan (2.13) to Messi because the latter has played almost twice as many minutes, would it? I mean, kid comes on for the last few minutes of a match that’s already put away and gets full credit? Can’t be havin’ that!
Meaning, of course, that I spent about 2 hours figuring out whether Mascherano had played 4 or 5 minutes against Atlético—for the record I went with 4—and other such trivialities. So my numbers are by no means perfect, but on a large enough scale one or two minutes deviation from the real stats won’t make a huge difference.
Here’s the formula I worked out:
Loss = -1.00
Draw = 0.50
Win = 2.00
My thinking behind the numbers: A loss is a bad thing, thus it should be given a negative weighting. A draw is a good thing in that it’s worth a point (it’s not necessarily a good thing from the perspective of winning a league, but it’s still better than nothing). A win is worth 3 points and is, as such, worth quite a bit, so I’ve weighted it to be 3 points better than a loss and also worth 4 times as much as a draw. I think that’s about right, but you might be able to think of better reasons for a different number.
But then, of course, you can’t just give someone -1 for subbing into a loss or 2 for subbing into a win. Really, why not weight that based on their share of the playing time. So I did (Result)*(% of Playing Time), which means that a player gets a 0.00 if he didn’t play, of course, but also doesn’t penalize or gift him if he only played part of the match. Then I took those in-game values and added them together. Naturally if you play more, you’ll get a higher overall score, so I included averages as well (Total/Appearances). There are holes in this that I’ll discuss below, but for the time being, here are the results, which do not include the Supercopa, just Liga and CL:
Dani Alves is extremely valuable in this model because he appears in all of the matches while Busi is more valuable on a per-appearance basis because we tend to win more when he plays (or, if you prefer, happen to win while he plays, but that sounds like a bogus argument).
The holes I’ve noticed:
Most glaring is that I don’t have a way to weight for when the goals were scored in particular matches. If Thiago subs on with 3 minutes left at 0-2 down and scores, he gets no bonus for that, while if Maxwell comes on and we give up a goal, he gets no negative unless it adversely affects the outcome (taking a win to a draw or a draw to a loss). I haven’t played with the numbers enough to work in goals for and goals against into the weighting and I’m not sure I’m going to, though I have, actually, worked in goals for and goals against per appearance (again, though, without the very useful information of whether goals were scored or allowed while that particular player was on the field—if anyone has a handy way to figure that out, please let me know).
Should I assign “not playing” a negative value? If you’re not used during a loss, you actually come out better than if you had played a killer match and been let down by your teammates, but then do you also get a negative if you’re subbed? That sort of doesn’t make sense. That’s why I’m leaning a bit on the averages, which are calculated based on appearances, but as of right now aren’t weighted by minutes (that is probably pretty easy, though I haven’t looked into it).
So, then, to a question about statistics that Kevin raised that I think is highly important: goals do not tell the whole story of a player’s contribution. If you’re a striker, your job is to score goals. Period. Thing is—not to be too flippant—but that line of reasoning reminds me of this. Yeah, goals are a good stat to have, but it’s more useful to put them into more meaningful context than to simply refer to them as the true value of a player.
In the same thread about Bojan that I linked above, Jason makes a good point that Bojan’s goals from 09-10 may be less important than Ibra’s in terms of when they were scored. Certainly, it’s worth looking at that, but only in conjunction with the numbers. If there was a VORP or a PECOTA for football, I’d use them in a heartbeat (if they exist and you have all been holding out on me, I’ll be very upset–and you wouldn’t like me when I’m angry), because I think those numbers help. It’s just that, in terms of the dynamism of football, there aren’t the same static moments that create nerd-friendly stats.
Also, stats, I can’t quit you. Nor do I particularly want to, even when I’m frustrated with them because they won’t lend themselves easily to quantifying football—which, really, is a great reason why it’s The Beautiful Game. I find the most stat-heavy sport, baseball, very boring, but it’s not because of the stats, it’s because of everything else. I don’t want to track who earned the most points in April on sunny days when the goalie was from the southern half of Brasil, but I do want to track things (where is Diego Alves from again?) and I do think they’re meaningful. Actually, I know they’re meaningful. And because of that, you’re going to get random statistics-based posts if you continue to read this blog.