Human beings are the ultimate black boxes

I find that the broad media discussion of machine learning models, algorithmic decision making, and artificial intelligence is written by some very ill informed commentators on the subject. A sense of fear imbues a good deal of the pop literature, raising concerns about algorithms in our lives. I do believe this is an important topic. We covered some of these ideas in episodes like Auditing Algorithms with Christian Sandvig and Predictive Policing with Kristian Lum. It's a central theme in the widely discussed Weapons of Math Destruction by Cathy O'Neil.

It's important that our society has active discussions about the ways in which we want algorithms to play a role in our government, our society, and our lives. These are boundaries that will be hotly debated, but ultimately, boundaries that need to be set as a group. I suspect they will drift over time, as well. But whatever the consensus arrived at about where we do and do not accept the use of algorithms, the discussion should be driven by logic and evidence.

There's an argument I frequently hear condemning algorithmic solutions to problems: it's bad because it's a black box. The claim is that certain processes for creating models (deep learning being the main punching bag) are too complicated to admit how they work. More than a few sensationalized headlines have described how baffled scientists are, and how they have no idea how their neural network works.

Hogwash!

Describing certain complex models as black boxes is no entirely a bad practice. It does convey a certain truth about the creator's understanding of the internal mechanics of the models. A logistic regression is often praised for it's interpretability. The weights the model places on each input variable are readable. Their magnitude tells us the value the model places on the information. Their sign tell us whether the information is positively or negatively correlated with the result we attempt to predict.

Popular techniques like the random forest have created a demand for some notion of interpretability and the ubiquitous "feature importance" fills the need. It's not something derived from first principles. Just a generally "nice procedure" that acts as a reasonable diagnostic and hint of what inputs are most influential on how the model makes decisions.

In deep learning, new visualization techniques, tools, and methods appear frequently in this time of innovation. The very idea of style transfer emerged strictly because researchers were doing deep inspection on precisely how their so-called black boxes worked.

Granted, the systems data scientists are creating today have grown to a level of complexity where a definitive explanation that is both precise and abstract specifying, more or less, how the network accomplishes its objective is going to be a difficult achievement. This is less about the complexity of the networks themselves and more about the complexity of the classes of problems they take on.

If you'd like to know how an image recognition algorithm can recognize one images as a beach photo and another as a trombone, the creator of that neural network can give you a perfectly explanatory answer, that is, if you'll accept your answer expressed as a set of tensors that represent all the many hyperparameters of the system and the operations performed on them. With this information one could trace any decision in it's entirety and provide an arithmetic description of why the machine chose the answer it chose. But this is a bit like a physicist who explains why World War 2 occurred based on a discussion of the superpositions of all the electrons and protons on the Earth at the time, with no mention of politics whatsoever.

A suite of tools and ideas are emerging for how we can inspect these so called black-box models. We're putting together a more complete list of research in this area for a future post, but in the meantime, our interview with Marco Tulio Ribeiro on LIME is a related topic. But for the sake of argument, let's assume all efforts into model interpretability and understanding are eventually abandoned entirely and regarded as a complete failure. What happens in a hypothetical world in which we accept that black boxes have sealed lids?

Understanding how something works and confirming that it works as described are two completely different ideas. As a committed skeptic, I'm reminded of the million dollar prize offered by the James Randi Educational Foundation (JREF) for anyone that can provide empirical evidence of anything paranormal. Despite many tests to date, no one has yet claimed their prize by successfully demonstrating supernatural abilities. A typical applicant for the prize might be a psychic who claims they can see the future or read the mind of another person. By what mechanism do they accomplish this feat? Strangely, the claimants rarely have much ability to describe how their powers work. Regardless, a protocol can be put in place to established whether or not the mind reader is able to perform a mind reading task at a rate better than chance. Outlandish claims of abilities that defy all scientific understanding are still testable. This core truth is true of deep learning models as well.

When someone produces a deep learning model, the model itself is a large collection of weights and other metadata that describe it's structure. Just numbers, really. The network is not particularly interesting until it's creator puts forward a claim of what this network can do, and the accuracy with which it can do it. Perhaps a Real Estate startup will claim they can guess the selling price of a home within $5k by nothing more than a photograph 99% of the time. How does it work? No one knows. Can we test it? Absolutely. In fact, just as the JREF constructs well designed experimental design to measure claims of the paranormal, so too should skeptical data scientists conduct empirical evaluations and perhaps construct deliberately difficult edge cases in an attempt to falsify the claim of the creator.

Another casual heuristic I like to apply is asking myself what my expectation would be from a human expert. The aforementioned real estate startup claims a model that predicts home prices within 5k from pictures. Do I expect an appraiser could achieve such a low variances? No, I certainly do not. The information content of the photo, while useful, does not contain enough data to full predict the price of the home. Important considerations like taxes, school district, state of repair, and other details are not directly observable or inferable in the photo. While the photo does have some predictive power, there's absolutely a ceiling on it, and that ceiling is (I expect) much lower than the dubious 5k claim.

Many concerns have been raised about algorithms which could develop an unintended bias. What if a prediction tool used in the court system was built on training data that was embedded with systematic racism? Would the algorithm learn the racism from the training data? Indeed, without proper controls, it would! This is a very real concern that machine learning researchers should always be cognizant of. While dating sites might include race as an input to an algorithm due to the demand of it's users, a bank considering giving a loan would never include race as an input variable. I can say this with deep confidence. If a bank were committed to being racist, there are easier ways to do it than by including a noisy variable in a machine learning model. It would be an obvious and embarrassing rookie mistake for any ML practitioner to make. Yet, the bank might include current zipcode without much forethought. That variable can sometimes correlate with race. It's hard to estimate the amount of bias that could be introduced by proxy this way. It really depends on how much implicit racism exists in the training data, and the mutual information between race and zipcode.

Ok, so it seems like there's no perfect solution to ensuring that our algorithms are not in anyway influenced by gender, race, age, or any other protected social class. Does that mean we should abandon algorithmic approaches? Some say yes. But if we're willing to accept the fact that our training data reflects discriminative choices made by humans, then we must also be willing to accept that human beings that are decision makers practice discrimination. So are they any better than the algorithms that might inadvertently learn to mimic their bigotry?

No, in fact, they're worse. A model is a deterministic process. At the exact time a model makes a decision (hire or don't hire), the exact state of the model can be captured. At any time in the future, we can perfectly capture it's state and reproduce its decision making process. It can be traced and audited. A human being, on the other hand, is not directly inspectable using modern technology. I don't think it's particularly controversial for me to claim that people's professional choices are, to some degree, influenced by outside factors. I've heard (but not verified) that judges issue lighter sentences immediately after lunch. The human is certainly going to exhibit more variance than any machine will.

Let's assume a judge, deep down, is a racist person, and allows that bias to influence their sentencing choices despite the illegality of such a practice. Sentencing is a complicated process that could be argued to include infinite degrees of freedom. Leniency could be applied rather arbitrarily by a human judge. Should this hypothetical racist judge be asked to explain a imbalanced severity of their sentencing practices, it seems quite plausible they could create post hoc rationalizations for each and every decision made which are utterly convincing while not referencing race, gender, sexual preference, etc. I suspect it would be easy to obfuscate one's prejudice.

The machine, on the other hand, which we worry might have inadvertently developed an undesirable bias, is inspectable. Better yet, it could be open source. Either way, it can be audited. The more important the decision, the larger the audience of people that will pay attention to the mechanism of that model. While a concise abstract description of how a model performs its task might be a matter of some debate due to the inherent complexity of the system, that debate can occur in a full reproducible way for a machine decision. This is not possible for a human decision.

It's true that the phrase black-box can be an accurate label for some state-of-the-art machine learning solutions. While promising work exists in helping facilitate the means to probe and understand the inner workings of these models, it seems likely that the effectiveness of certain models can only be established empirically. Yet, these tools might only take us from black box models to shades of gray models. There's no guarantee that every useful model is going to have an elegant explanatory story. I believe that's an unnecessarily high bar to set for taking advantage of useful models. Regardless of their complexity, the inspectability of machine learning solutions means they will always be less black box than human decision makers. While we should be vigilant about the systems we allow to make decisions for us, we should be embracing instead of fearing these tools, since the black-box stigma exists and is in fact more significant in human decision makers.