Wednesday, November 3

A Treatise of Analysis in Basketball

Like most people throughout history have said of their own era: we live in quite interesting times. With the internet, satellite communication and other mechanisms that have expedited the proliferation of real-time media (or just close-real-time), the world today is one in which many ideas can be shared freely one way across the vast expanses of the Earth, then applied, assessed, re-tooled and shot back the other way in relatively short order. It comes as no surprise, then, that the sport of baseball was not able to maintain its grip on the title of "sole purveyor of heightened analysis [which everybody hates] in pro sports" for too long, and the practice--or attempt thereof--of reducing on-court happenings to readily identifiable numbers or equations with tangible meanings has begun in earnest with respect to the NBA.

The practice, however, has been met with much resistance from fans outside the game and those within it alike. Amidst cries of "the game is too dynamic!!" they cast these numbers aside in favor of more subjective remembrances of given players, or at the very least more impressionable, and thus improper, statistics. The argument against blind, objective statistics in basketball is really rather simple: basketball, unlike baseball where statistical analysis has flourished, cannot simply be explained as the summation of individual one-on-one match-ups in subsets of the game (as opposed to baseball, where those individual match-ups make up for more than two-thirds of the game). Therefore, to treat every statistic (every additional point, assist, etc.) equally is to treat the in-game context of those statistics equally, which we know from just watching the game itself is definitely not the case: shots may be contested from the right or left by one or more defenders, rebounds may occur near the basket in heavy traffic or outside the paint with little congestion from the opposition, assists are begotten from all manner of passes, from those that are of relative ease like a pass from the top of the three-point arc to a shooter uncovered on the wing, to those of seemingly impossible skill, like a  no-look dime from a guard hurtling towards the baseline underneath two defenders to a teammate on the opposite low-block. Thus, with the inherent state of flux the context in which these statistics are accrued is always in, it is improper--some say impossible--to develop a model based on these statistics that accurately captures and predicts the game of basketball.

This piece is not intended to be the unveiling of some unified statistical model that indeed does capture and predict the NBA brand of basketball, but rather is meant to be more of a metaphysical argument to the contrary of the widespread beliefs about statistical analysis not applying to the hardwood. I believe that the heavy resistance that has met statistical analysis in basketball is rooted in the same phenomena that surrounded the equally-tough resistance similar analysis has been met with and overcoming in baseball over the last decade, generational lag: at first there was nothing, then something (early statistics) but only in marginal amounts, and now there is a lot of that something, and observers have a hard time reconciling their perception of the game formed decades ago (when there was very little of that "something", if there was any at all) with what we have today, because in their own frame of reference, the game, the way they watched and the models they used to describe it worked just fine. Furthermore, I contend that statistical models can--and must--co-exist with subjective valuations in the present time, and that a statistical model not unlike the aforementioned unified model not only is possible, but probable. 

In his most recent book, The Grand Design, Stephen Hawking creates an interesting thought experiment that keys on how proposed models interact with what they wish to describe, as well as reality itself. Imagine a table sitting in the dining room of your house. Because you don't see the table when you exit the room (unless you have a very perplexing approach to home decor), it is possible to create a model of the world in which the table disappears when the light is turned off and you exit the room. This model, as it turns out, agrees with observation: when you leave the room you do not see the table, and so it is impossible to say that the table is definitely still in the room you just left it in. For the most part, this model holds: when you re-enter the room and turn on the light, you see the table, but as soon as you exit the room you no longer see it. However, the model would be hard-pressed to explain the condition of the table should the roof cave in on the room in which the table normally sits while you were somewhere else. The model predicts that upon re-entering the room, you would see the table sitting neatly atop the rubble of what was once your roof. Obviously this is ludicrous, and in reality upon re-entering the room you would see an equally caved in table sitting beneath the rubble. How does your model of the world explain this? The original thought experiment posed by Hawking was meant to be applied to quantum mechanics, but its message remains the same however it is applied: when met with an observation a given model cannot explain, said model must either be amended with an exception that allows for the observation (the more exceptions, the more incomplete the model), or be abandoned for a better model altogether.

Knowing this, let's create two different basketball theorems, both aimed at determining how good a given player is relative to his peers: the Empirical Model of Basketball (Empirical Model) determines player worth only from readily visible things the given player in question does on the court in a given game or season. Conversely, the Theoretical Model of Basketball determines player worth only from secondhand statistical accounts of the game. An Empirical Model observer is sent to observe a given player for a single season, while a Theoretical Model observer follows the same player, but from his or her computer at home. As a make-up for not being able to follow the player firsthand, the Theoretical Model observer is given access to game- and league-wide statistics. The player being observed by both sides is of a special breed: he has the unique ability to hit every other two-point shot he takes without regard to how contested he may be during a given attempt, however, once he reaches 21 total shots in a game, perhaps due to a rare psychological disorder related to gambling, his ability to make shots drops to zero, he is completely ineffective. Knowing this, his coach has engineered the offense to get him the ball exactly 21 times per game inside the three-point arc. Observation begins with the player in the current NBA, night in and night out, hitting every other shot he takes up to 21 total shots.

Mid-way through the season, the Empirical and Theoretical observer meet to compare notes on their player. By and large, their notes would agree: solidly above-average scorer. However, before their next observation, the commissioner, knowing that fans love scoring in professional sports and looking to line his coffers by boosting scoring in his own, decrees that any personal foul on an offensive player will carry with it a $75,000 fine. As a result, defense around the league effectively ceases. Every player is essentially allowed to take any shot he wants without worry of the defense coming to defend the shot, and as a result shooting percentages skyrocket along with total points per game. The observed player, however, continues making his regular shot after shot, all the way up until 21. After the end of the season, the Empirical and Theoretical observer again meet to compare notes, but this time there are more than a few differences: throughout the entire season, the Empirical observer watched and graded out their player as a consistent, above-average scoring talent (keep in mind that Empirical views are only based on what the observer can see happen on the court with respect to the player in question), whereas the Theoretical observer reports the player to have been a solidly above average scoring talent for one part of the season, but a wholly inefficient, below-average scorer for the remainder, resulting in just an average scoring talent overall.

Who has the better model of basketball reality? It is plainly obvious that the Theoretical Model is more apt to tackle a reality where the background environment is changing, and the Empirical Model is ineffective at describing how good a player is after a shift in environment, because the object of the Empirical Model is to watch and judge the player, not the environment. Therefore, the Theoretical Model can explain this reality through its own devices, whereas the Empirical Model necessitates a re-branding, an exception, if you will, to continue on judging the player accurately.

In this example, just as in actuality, there are portions of reality that the Empirical Model fails to account for, or is at the very least inept at explaining. Therefore, it is not unreasonable to conclude that any model that claims to have "solved" basketball, must include mechanisms with which we can account for things a strictly empirical approach glosses over. And since we have already observed a model with a theoretical framework accounting for things left unanswered by the Empirical Model (the flux of the environment), we can amend the above conclusion: a model for basketball must, on some level, hinge on a theoretical framework. And while it might not be possible yet to gauge the degree to which a theoretical framework should be involved, a sense of it can be grasped by looking at two areas in which theoretical models can be (and should be, to a certain extent) applied exclusively: retrospective player comparison and player forecasting. For the former, an obvious need for a theoretical framework arises from the simple realization that there is no way to go back in time to combine two players from different eras, and that there is little hope of all observers agreeing on subjective valuations, especially ones made years, or even decades, ago; as for the latter, another simple realization reveals the need for at least a partial basis in statistics: like going back in time, there is no going forward in time to read the future and report back, no one knows what will happen (outside of a few things, like death and taxes), all we can do is hedge our bets based upon what has happened as well as their given probabilities.

Up until now, the primary focus has been on the need for a theoretical (or statistical) framework for basketball, but this question was begged earlier and still requires an answer: Are there constructions of  basketball reality in which the Theoretical Model breaks down? Of course: imagine the same situation as above--same player, same ability, same observers and same goal--but this time after the Theoretical and Empirical observers meet, the rules are changed such that every basket made from inside the arc is worth three points, and every basket sunk from behind it is worth only two. At the end of the season, the Theoretical Model will be unable to explain the sudden ability of the observed player to hit thee-pointers, nor will it be able to explain how the observed player became such a prolific scorer without making a few exceptions, whereas the Empirical Model is able to make the rule change fit under the confines of its observational rules rather easily.

As we can see, there are constructions of basketball reality that degrade both models to the point of needing an exception to explain the current goings on. Therefore, it would be a better idea not to separate the two models and adjust them individually with the goal of explaining everything, but to join them together in order to explain through experience what statistics can't explain, and explain through numbers what simply watching and trying to remember cannot accomplish. Of course, some liberties were taken with each respective model: the keen Empirical observer would note something was amiss after everyone in the league simply stopped playing defense, and attempt to factor that in to their observations. Similarly, a slick-minded Theoretical observer could conceivably adjust his or her equations on the fly to adjust for the difference in points garnered per basket, and the model would hold true once again. So while there may be differences in how things actually play out for those of a more empirical or theoretical mind, the idea is still the same: regardless of how close a given model gets to perfectly capturing and predicting the game of basketball, if it subscribes wholeheartedly to one side of the debate over the other, it is imperfect, as it undoubtedly can be faced with a formation of reality that causes it to break down.

Thus, it is imperative that a complete model for basketball also include workings that agree empirically. This may seem to contradict with another rule developed earlier, but the two actually go together well: a complete model for basketball must have statistical workings that agree with observation and experience. This is not to say that any contradiction between statistical analysis and empirical portions of a model is grounds to scrap the whole thing and start over. Rather, these contradictions in terms will have the opposite affect: they will challenge the model to reconcile the differences between the accounts, perhaps along the way revealing previously held notions about either side to be false.

What is actually being discussed here is the true meshing of statistical analysis and subjective accounts. All the way up to now, it has been assumed that these two sides can never agree, that one must choose one side or the other or dare an attempt at walking the line between subjective and objective accounts of the same happenings. However, there exists a point, on both sides, that taken to the extreme will reveal that the two previously-divided sides have indeed become one. Imagine a basketball world where everything that can have a number associated with it gets written down and logged: the ambient air temperature, the barometric pressure, the weight distribution of a player as he takes a free throw, the force exerted by a player's leg as he jumps for a dunk, the ultimate height above the floor reached by a player going for a rebound, the angular velocity and acceleration imparted to the ball during a shot, the speed and angle the ball takes to the hoop given a certain shot style; everything. At that point, gathering data has become the equivalent of watching the game. Similarly, imagine taking extremely detailed notes about all the happenings on the court in a given game, from the exact spot on the floor each shot was taken to the path the ball followed to the rim, and so on. When the notes on a given game become more and more detailed, they become more and more like individual statistics, and when at that point, watching the game is the same as gathering data.

Therefore, there need not be an arbitrary division between statistics and experience, and indeed eventually, that barrier becomes obliterated completely, and the two become one. So I say more stats and more detailed play-by-play data. I call for the institution of QuesTec-type cameras (the ones used to keep tabs on the umpires in baseball) in all NBA arenas meant to track player movement, position, velocity, ball movement, shot selection, shot angle and so forth. Reduction of the game into a series of numbers is not only not impossible, but it is definitely probable given the technology at our disposal. With all that information in hand, there is no doubt that a near-infallible model of basketball isn't too far off.

No comments: