Just Another Way to Look at Things

This post is about statistics. You have been warned.

Now then: isn’t Bojan adorable? Doesn’t Busi look like a horse? Shouldn’t we burn Ibra at the stake if he ever comes back to visit? These questions can all be answered, to some degree or another, with numbers. Bojan, for instance, is way more adorable when you say that he scored as many goals per minute as Ibra last season or if you point out that his productions levels at 20 are way ahead of Pedro’s at the same age—don’t forget that Pedro is already 23, after all. And Busi totally looks more like a horse if you replay the gif of him against Inter about 30 times.

The Ibra thing I won’t even get into because of the whole Hulk Smash thing I do when you disrespect The Guardiola and, really, my lovely lady is getting tired of replacing the coffee table (and you wondered how Ikea made so much money–aha! Now I know it’s a dastardly Swedish plot to get at my moolah!).

So, as Kevin is fond of pointing out, tread carefully around statistics because they can be bent to the will of most anyone. Except Kevin. Because he goes comatose at the site of decimals. The following is a series of pliable numbers that I’m trying out for the first time and will happily take your suggestions on how to better utilize them.

I wondered aloud the other day how easy it would be to calculate points earned per appearance for individual players and then I wondered aloud how easy it would be for me to do that. And so I set off on a long and arduous quest to figure that out. And it turns out all I had to do was scare up the match stats for every match Barcelona has played so far this year and then it was a synch to figure it all out. Boom, done.

So Busi is the king of that stat, with 2.67 points per match (6 appearances, 16 points earned) and Puyol is runner up with 2.60 (5 apps, 13 points). The much-maligned Mascherano earns 1.83 per appearance, which, interestingly enough, is the same as Lionel Messi. But hang on a minute—it wouldn’t be fair to compare Bojan (2.13) to Messi because the latter has played almost twice as many minutes, would it? I mean, kid comes on for the last few minutes of a match that’s already put away and gets full credit? Can’t be havin’ that!

Meaning, of course, that I spent about 2 hours figuring out whether Mascherano had played 4 or 5 minutes against Atlético—for the record I went with 4—and other such trivialities. So my numbers are by no means perfect, but on a large enough scale one or two minutes deviation from the real stats won’t make a huge difference.

Here’s the formula I worked out:
Loss = -1.00
Draw = 0.50
Win = 2.00

My thinking behind the numbers: A loss is a bad thing, thus it should be given a negative weighting. A draw is a good thing in that it’s worth a point (it’s not necessarily a good thing from the perspective of winning a league, but it’s still better than nothing). A win is worth 3 points and is, as such, worth quite a bit, so I’ve weighted it to be 3 points better than a loss and also worth 4 times as much as a draw. I think that’s about right, but you might be able to think of better reasons for a different number.

But then, of course, you can’t just give someone -1 for subbing into a loss or 2 for subbing into a win. Really, why not weight that based on their share of the playing time. So I did (Result)*(% of Playing Time), which means that a player gets a 0.00 if he didn’t play, of course, but also doesn’t penalize or gift him if he only played part of the match. Then I took those in-game values and added them together. Naturally if you play more, you’ll get a higher overall score, so I included averages as well (Total/Appearances). There are holes in this that I’ll discuss below, but for the time being, here are the results, which do not include the Supercopa, just Liga and CL:

Total Average
Dani Alves 10.63 1.33
Busquets, Sergio 10.50 1.75
Valdes, Victor 10.00 1.25
Iniesta, Andres 9.49 1.19
Pique, Gerard 8.66 1.08
Xavi 8.46 1.21
Pedro 8.39 1.05
Villa, David 8.19 1.17
Puyol, Carles 7.50 1.50
Maxwell 6.52 1.09
Messi, Lionel 5.59 0.93
Keita, Seydou 4.74 0.79
Abidal, Eric 3.50 0.88
Milito, Gabriel 2.86 0.95
Krkic, Bojan 2.71 0.34
Mascherano, Javier 1.21 0.20
Adriano 0.58 0.19
Thiago 0.38 0.19
Nolito 0.08 0.08
Suarez, Jeffren 0.04 0.04

Dani Alves is extremely valuable in this model because he appears in all of the matches while Busi is more valuable on a per-appearance basis because we tend to win more when he plays (or, if you prefer, happen to win while he plays, but that sounds like a bogus argument).

The holes I’ve noticed:

Most glaring is that I don’t have a way to weight for when the goals were scored in particular matches. If Thiago subs on with 3 minutes left at 0-2 down and scores, he gets no bonus for that, while if Maxwell comes on and we give up a goal, he gets no negative unless it adversely affects the outcome (taking a win to a draw or a draw to a loss). I haven’t played with the numbers enough to work in goals for and goals against into the weighting and I’m not sure I’m going to, though I have, actually, worked in goals for and goals against per appearance (again, though, without the very useful information of whether goals were scored or allowed while that particular player was on the field—if anyone has a handy way to figure that out, please let me know).

Should I assign “not playing” a negative value? If you’re not used during a loss, you actually come out better than if you had played a killer match and been let down by your teammates, but then do you also get a negative if you’re subbed? That sort of doesn’t make sense. That’s why I’m leaning a bit on the averages, which are calculated based on appearances, but as of right now aren’t weighted by minutes (that is probably pretty easy, though I haven’t looked into it).

So, then, to a question about statistics that Kevin raised that I think is highly important: goals do not tell the whole story of a player’s contribution. If you’re a striker, your job is to score goals. Period. Thing is—not to be too flippant—but that line of reasoning reminds me of this. Yeah, goals are a good stat to have, but it’s more useful to put them into more meaningful context than to simply refer to them as the true value of a player.

In the same thread about Bojan that I linked above, Jason makes a good point that Bojan’s goals from 09-10 may be less important than Ibra’s in terms of when they were scored. Certainly, it’s worth looking at that, but only in conjunction with the numbers. If there was a VORP or a PECOTA for football, I’d use them in a heartbeat (if they exist and you have all been holding out on me, I’ll be very upset–and you wouldn’t like me when I’m angry), because I think those numbers help. It’s just that, in terms of the dynamism of football, there aren’t the same static moments that create nerd-friendly stats.

Also, stats, I can’t quit you. Nor do I particularly want to, even when I’m frustrated with them because they won’t lend themselves easily to quantifying football—which, really, is a great reason why it’s The Beautiful Game. I find the most stat-heavy sport, baseball, very boring, but it’s not because of the stats, it’s because of everything else. I don’t want to track who earned the most points in April on sunny days when the goalie was from the southern half of Brasil, but I do want to track things (where is Diego Alves from again?) and I do think they’re meaningful. Actually, I know they’re meaningful. And because of that, you’re going to get random statistics-based posts if you continue to read this blog.

  1. You could use similar statistical analysis to how Nate Silver set his up for the Soccer Power Index for international teams at ESPN. May provide you with better statistical analysis help, even though it is based on teams, not individuals: http://soccernet.espn.go.com/world-cup/story/_/id/4447078/ce/us/guide-espn-spi-ratings?cc=5901&ver=us

    Also, the reason PECOTA and VORP do not exist for football is because their aren’t enough comprehensive statistical measures for players based on different systems teams run and how they are used. For example: Busquets regularly has the highest or second highest amount of passes completed in any given game. This is a good statistic for him because we need his pivot action to transition between offense and defense, however, in another system, his short passes may mean absolutely nothing and may be a total waste. Whereas in baseball, a player’s RBI or stolen base, or error is going to account for the same amount in almost every situation.

    The point is, statistical variation, player movement, and positioning all account for the lack of concrete statistics in football. Until advanced metrics move forward light years we won’t see anything close to PECOTA or VORP in football, at least not reliably.

  2. I like this post. Not because it lead us to any great conclusion about who our most productive players are, but because it shows that the immesurable nuances of the the beautiful game are what make it just that. Beautiful.

    I am an Applied Mathematics major. So I am for all intents and purposes a huge math geek. I interpret and understand the world best through numbers. Yet I have a hard time applying that to football.

    Like you said, Baseball is probably the most quanitfiable sport. Every single action can be recorded and accounted for. But that makes the game boring.

    It is impossible to track football that way. From a CB being in the right position, to the DM forcing an oncomming counter wide with position and pressure, to CAM giving the Assist to the Assist. It makes football so much more organic than the mechanical nature of baseball, but it makes it more difficult to analyze.

    Even then, a good formula could be contructed. It would be messy and comlplex, but it could work. The only problem with that is that certain players, and their quality just down shine under the scrutiny of simple stats. For instance in our Yahoo! Fantasy League. Sergio Busquets and Iniesta often earn in the 0-5 points spectrum because what they are doing doesn’t count as goals, assists, shots on goal, etc. Yet your points per appearance clearly shows their value. And when we watch them play for Barcelona or Spain, we further see their contributions.

    So, while I think their will always be flaws in it, and you’re basically walking through a minefield here. I think I’d like to see the formula fleshed out. As long as we can take it with a grain of salt and consider the short commings of such a process.

  3. i think in basketball there is something calling “tendex” who is a way to value a player in a game or a whole season.i can.t understand the Isaiah system at all!i will try later!

    1. Tendex is, if I understand correctly: {[ Rebounds + 1.25*Assists + 1.25*Steals + Blocks – 1.25*Turnovers – Missed Field Goals – (Missed Free Throws/2) – Personal Fouls/2] / Minutes / Game Pace}

      I’m not totally positive I understand how to use it because I’m not sure how one calculates “game pace”. My “system” (which is hardly a system yet) asks basically “what is a player’s value?” and could easily be adjusted to say it in a “points per minute” way, I think.

      For instance, Busi has earned 16 points in his 564 minutes on the field, which is 0.028 points per minute. Puyol has earned 13 points in 422 minutes, or 0.030 points per minute. There are lots of ways to flip these stats, of course, so I’m open to any way of doing it. I’ve done some of it in the past and I think the problems I’m having are that I don’t have access to some of the stats I’d like. Luke mentions the SPI above and you can certainly learn a lot from them.

      More later as I get my spreadsheets running more complex numbers.

    1. Actually, if you look at the Eto’o, villa, bojan statistics the right way, it tells the story why I like Bojan very much. Yes, his open field play still has room to grow, and he needs to learn how to fight off defenders and keeping his balance. But when Bojan gets himself into striking positions, his shots are as good as anyone I have seen. Those statistics are for shots at goal, and they confirm that Bojans striking rate is topnotch. You can’t teach that.

  6. My only question would be why you chose to use the numbers you did for rating wins, loses, and ties. My thought would be that the point system for results (0,1,3) would lend itself extremely well as the “worth” of playing in a loss, tie, or win.

    I understand that it seems unfair to give a player the same rating (0) if he played in a game we lost as a player who didn’t play. But I think this would be solved by looking at the average – a player who earns a 0 for not playing in a game will not have their average, while a player who earns a 0 but played a whole game will have a lower average as a result.

    The only reason I say this is I think it would make the “average” statistic more tangible. While I have a general idea of what a 1.33 rating means (and it makes sense in context) I couldn’t easily put into words what that average rating suggests.

    I would also be interested in a points per minute played statistic. Something along the lines of:

    SUM(points earned * percentage of playing time) / total minutes played

    The reason is that while you have normalized the players’ earnings with the “average” – you have normalized based on appearances – and while you take into account playing time while calculating their overall score, you ignore playing time in favor of appearances when you calculate the average. I’m not a statistician, and there might be some issue with accounting for playing time twice in the equation, but I think it would be more consistent and easier to compare.

    Love the stats post though.

    1. Yeah, my rationale was to provide a very strong negative effect for playing when the team lost. I ran the numbers with 0,1,3 to begin with and came up with this (apologies for the bad formatting if it’s there when I post):

      Player Value Average
      Valdes, Victor 17.00 2.13
      Dani Alves 17.00 2.13
      Iniesta, Andres 16.24 2.03
      Busquets, Sergio 16.00 2.67
      Pique, Gerard 14.99 1.87
      Villa, David 14.01 2.00
      Pedro 13.82 1.73
      Xavi 13.72 1.96
      Maxwell 11.53 1.92
      Puyol, Carles 11.50 2.30
      Messi, Lionel 10.23 1.70
      Keita, Seydou 8.78 1.46
      Abidal, Eric 7.00 1.75
      Krkic, Bojan 5.06 0.63
      Milito, Gabriel 4.52 1.51
      Mascherano, Javier 2.95 0.49
      Adriano 1.82 0.61
      Thiago 0.65 0.32
      Nolito 0.17 0.17
      Suarez, Jeffren 0.07 0.07

      I’m fine with those numbers too, but you’ll never have a player that’s obviously horrible (-1.00) for instance.

      Here are the results of (points earned * %PT)/total minutes played:

      Player Points per minute Value Avg Value
      Iniesta, Andres 0.02255 16.24 2.03
      Pique, Gerard 0.02255 14.99 1.87
      Valdes, Victor 0.02255 17.00 2.13
      Dani Alves 0.02255 17.00 2.13
      Pedro 0.02255 13.82 1.73
      Krkic, Bojan 0.02255 5.06 0.63
      Xavi 0.02122 13.72 1.96
      Busquets, Sergio 0.02122 16.00 2.67
      Villa, David 0.02122 14.01 2.00
      Puyol, Carles 0.01724 11.50 2.30
      Keita, Seydou 0.01724 8.78 1.46
      Maxwell 0.01724 11.53 1.92
      Messi, Lionel 0.01459 10.23 1.70
      Mascherano, Javier 0.01459 2.95 0.49
      Abidal, Eric 0.00928 7.00 1.75
      Milito, Gabriel 0.00928 4.52 1.51
      Adriano 0.00796 1.82 0.61
      Thiago 0.00531 0.65 0.32
      Nolito 0.00133 0.17 0.17
      Suarez, Jeffren 0.00133 0.07 0.07


    2. I would like to suggest few ideas to benchmark our strikers.

      Shots on goal (of course) should be assigned different weight based on level of difficulty and probability of scoring. For an example, while Bojan did hit the post from a very tight angle, Messi was unsuccessful 1-1 with the keeper and had his right footed shot (in the second half) go wide from a much better angle.

      Both Zlatan last year and Villa this year so far had missed quite a few good scoring opportunities. In the case of Zlatan, the number itself (21) doesn’t mean much without putting it into context how many high scoring opportunities he missed. 21/32 is not as same as 21/45.

      Another category would be the fouls suffered in the opponents half since that usually means the player had to create some danger to the defending team. These can be further broken down by the proximity to the goal and position on the field (middle vs wing) as well as the outcome of the set piece.

      For the mid-fielders, the passes should be broken down by completion, length and direction (forward-productive vs lateral/backwards-not so productive).

      In general, the players that have meaningfully touched the ball when the goal was scored should get extra points. Naturally, the defender(s) that were caught out of place (for no valid reason),made a foul in the “danger zone” or had missed the tackle should be penalized with an appropriate point deduction.

    3. In addition, the points should be adjusted based on the level of difficulty team is playing against. Quality of the opponent, how many key players are they missing, the position in the standings they occupy, the system they play, and was the game played home or away.

    4. The issue I’m having is the lack of data. Nate Silver’s SPI that Luke linked above has a lot of information and I’m going to comb through it when I have the chance to see if I can’t come up with something more concrete. My guess is that Nate Silver has already far out-thought me and he has the data.

      I disagree with you on one thing, though, and it’s where all the stats tend to bog down: “For the mid-fielders, the passes should be broken down by completion, length and direction (forward-productive vs lateral/backwards-not so productive).” Is going backwards actually less productive than going forwards? What if that forward pass results in a turnover immediately afterwards?

      What’s a “meaningful touch” prior to a goal? If Pique passes to Valdes who passes to Puyol who passes to Xavi who passes to Messi who passes to Villa who scores, why does Xavi get more credit than Pique if what Pique did was save the possession? Or Puyol? How can we know? And then doesn’t it become subjective?

      Not that I’m hating on your thoughts. They’re good thoughts. But if we (yes, we) are to create a better series of stats, then we’ve got to approach it from a lot of angles.

      Most of what you’re suggesting goes into the “Efficiency” rating that the NBA enjoys these days. There are just a LOT more variables…

    5. Thank you for the constructive critique. My focus was on the strikers which prevented me from expressing myself properly in regards to mid-fielders. I wasn’t exactly sober either, Thirsty Thursday anyone 🙂

      What I had meant to say by meaningful is mainly a move performed under reasonable pressure that created a space or an extra player. For an example, while Pique passing under no serious pressure to Puyol 10yds away keeps the ball in our possession, it is nowhere near as valuable as, lets say, an Iniesta’s dribble and cut inside that pulled one defender out it’s defensive shell or a well thought Xavi’s pass (with a man breathing on his neck) and Dani’s timely run that opens up space and an opportunity for a dangerous cross from the right wing. Do you see my point?

      And when I was talking about the direction of passes, the way we play it is safe to say that every successful forward pass weighs more on our production than the lateral/backwards one. This is a general trend and certainly not a rule. The evaluation should be conducted on case-by-case basis.

      Not sure if you scrolled down to look at my most recent post from last night, but I’ve talked about shots on goal and balls recovered category for our mid-fielders (based on the assignments). I would add turnovers (the forward pass issue that you have mentioned above) and turnover ratios with total passes and time spent on the field as variables.

      The parameters I am suggesting do reflect the efficiency but, more importantly, they reflect the decision making which I hold to be a broader category and more reflective to each players contribution. How often does the player makes a right decision under severe pressure when there is very little time to think? For an example, Xavi passed on a number of goal scoring opportunities. The probabilities of him scoring in those particular situations were very high. Instead, he opted out for a pass assuming more risk and lowering the probability of scoring. This was a general trend and not a rule.

      There are a LOT related variables, I couldn’t agree more. The key is to assign a proper weight to each one of them.

    6. hey isaiah, im not sure if they do these for la liga games. but these are a wetdream for keepings track of just about everything a player does on the field. for example it not only tells you where a player was when he passed a ball but where it went and whether or not it got to the reciever


    7. Awesome, thanks.

      I’m not quite sure what happened with the points per minute. I’m debating whether the numbers converged because they are so small, and that is why there is no difference for many of the players, or if the fact that many of the players have the same number is telling us something. Any thoughts on this?

      Would be interested in other thoughts on this – however the statistic isn’t really telling me as much as I hoped it would (or is it?) Back to the drawing board I guess.

    8. Oh and another thought – the points per minute played could be interesting, but it also is really good at hiding a players true contribution to the team.

      As an example, imagine we had a player X who played 30 minutes in a game we won, but that was it. The player would have .033 points per minute and would seem to be leading the team – but in reality they would have had very little contribution to the team.

      However, this would make me think players who are typically used as substitutes to run out a game we are winning would have higher scores – but all of our substitutes seem to have lower scores. Could this an artifact of the fact that we tend to get worse results when we start more of the substitutes?

      I’m not sure if this is happening with Bojan here, or why exactly he has moved up so far in the standings. Unless I overlooked something he is the true outlier in that statistic (in that he was the only one who moved significantly in the table). Would be interesting to know why that is.

    9. The time of possession for each player (especially mid-fielders) divided by the game time (90’+ stoppage) would be telling of contribution knowing that the way we play limits the dwelling on the ball.

      Further more, the mid-fielders should be evaluated not only by assists but, pending on the assignment/position, shots (successful ones) on goal and tackles/balls recovered. That is what separates the truly great ones from the mediocre ones.

  19. cool system, as long as it shows busi is a winner it’s fine by me 😉

    IMO it is key (and a lotta work) to know in what minute the goal was scored/conceded. if thiago comes in in the 60th we are already loosing and nothing happens it’s like a “draw for thiago” even though it’s a team loss. In the 30 minutes he played the scored did not move so he shouldn’t get negative points me thinks. Same thing, if busi leaves the pitch with the team winning by one, bojan comes in and we get scored it should be a “win for busi” and “loss for bojan” … “draw for the team” 😀 Was it already like that? If not I think it could be but consider that in the same game players get different ratings for different scores. Don’t know if good or bad but… imagine we loose by 0-2 pep does 2 subs min 60 with that 0-2 and one more with 2-2 min 80. We end up winning 3-2. The players that went out in the 60th min loose, the one that went out in the 80th gets a draw, and all the others win… all of them weighted by minutes played. It just came to my mind reading your post, don’t know if is already used or bullshit or anything.

    anyway…. GOT ANY NEWS ABOUT XAVI???!!! please

  20. Very nice interview with Sergio Busquets in El Pais (Spanish):


    I particularly like the last bit:

    Q: What is the best game you have played?

    A: I don’t know. I played well against Bilbao a few days ago. I even got confused and scored a goal!


    1. I saw that on that con la roja website. I also read some nice words from Del Bosques’ assistant Toni Grande:

      Q: What will the team be like without Xavi?

      A: That’s the problem. There is a player that can take over: Cesc.

      Q: Is that the main concern?

      A: It’s not the only one. We always say the same thing, which is that we hope Busquets doesn’t get injured. He gives us a lot of security. Another player that would be difficult to replace is Puyol, who is all heart and generosity. The truth is that we have a marvelous group. I’ve never seen anything like it.


    1. Inter was less lucky, both Cambiasso and D. Milito were subbed due to (minor) injury problems.
      Anyway, Messi played pretty well but I missed Di Maria and Pastore in the starting eleven. I don’t know why Batista is so much into D’Alessandro, he played horrible. Bolatti was also a disappointment. Argentina played a lot of beautiful football, but there was always someone who then made a bad pass or couldn’t control the ball and thus the move was destroyed…
      Oh, and where were Samuel and Zanetti? Both injured? I really hate to see Demichelis play for Argentina, he is such a high risk factor in EVERY game for both club and country.

  21. I watched the Argentina – Japan match and it was quite even. Japan fully deserve their win. It was not as if Argentina did not try. Messi was quite ok. Milito i thought was average/poor along with demichelis. Also Romero is ok though definitely not good at current stage.

    Overall, Argentina’s defence was poor/average (As has been the case in recent past), midfield was ok (though one could see that with Banega in side, it will be much better) and attack was ok.

    Japan on other hand were very organized and have got one or two flair players that can do things.

