# Income inequality in professional sports

On his podcast Revisionist History, Malcolm Gladwell talks about the difference between "weak link" and "strong link" sports. A "weak link" sport is one in which the worst player's skill level has a large impact on the team's success. The example Gladwell gives is soccer, in which a long chain of events must go perfectly to score a point. To get the ball from one side of the field to a position in which a team can make a successful shot on goal requires a lot of dribbling and passing. Every time the ball is passed is an opportunity for the opposing team to break the chain, requiring the attacking team to start the chain from the beginning.

By contrast, basketball is a "strong link" sport [0]. In such a sport, the best player's skill level (rather than the worst) has a large impact on the team's success. A superstar in basketball can take a team to the playoffs almost entirely on his own.

If this "strong link"/"weak link" hypothesis is true and players are compensated proportionally to their contribution to the team's overall success[1], I would expect income inequality to be greater in basketball than in soccer. At this point, I went looking for data.

After some searching, I found Spotrac, which has salary data for the NFL, NBA, MLB, NHL, and MLS. After scraping the site, I had a decent dataset of salaries. First, I looked at a histogram of the salaries:

``````fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(16, 10))
axes[1, 2].set_visible(False)

for ndx, league in enumerate(df['League'].unique()):
league_df = df[df.League == league]
league_df.plot(kind='hist', ax=axes[ndx % 2, ndx % 3], title='{} salary distribution ({} players)'.format(league, league_df.shape[0]))
``````

Standard deviation quantifies the variation in the distribution, but comparing the standard deviations across leagues doesn't make sense because the mean salary in each league is so different. By dividing the standard deviation by the mean, we get the coefficient of variation.

``````aggregates = df.groupby('League').agg([len, np.mean, np.std, np.median])['Base Salary']
cv = (aggregates['std'] / aggregates['mean'])
cv.sort_values().plot(kind='bar', title='std as percent of mean')
``````

This tells us that MLS salaries vary most widely and NHL salaries vary the least. Digging deeper into the MLS, Bastian Schweinsteiger is making \$5,400,000 in base salary, with the next highest salary being Tim Howard's \$2,000,000. Removing just Schweinsteiger would leave the MLS with a CV of around 1.26, which is higher than the NBA's, but lower than the MLB's.

What have we learned? In terms of income inequality in the American professional sports leagues, soccer actually has the most income inequality, and the NBA has the second-to-least. I think the reason that my initial hypothesis is incorrect is twofold: (1) player contribution to team success is not the only factor in compensation and (2) teams don't universally believe that basketball and soccer are strong and weak link sports (respectively).

I would be curious to see how these results change if we're looking solely at starters (rather than entire rosters), but that'll have to be another question for another day.

0: Daniel Forsyth provides an interesting analysis of this claim

1: They aren't, at least, they aren't solely compensated according to this factor. The team owner also gets value out of selling jerseys and other merchandise, which is easier to do for more famous players.