Imagine this hypothetical situation: A portfolio of historical product transactions comprising a complete dataset of customer demographics is fed into an AI machine learning model. The model yields one of the following results for each customer/product combination:
• “Recommend this project”
• “Do not recommend this product”
Statistical analysis suggests that the model is performing well, the user acceptance testing is successful, and the company agrees to put it into production. During the demo, a stakeholder standing at the back of the room asks a simple question: “Why did the model recommend product A?”
It’s amazing how quickly such a simple question can silence a room. When the model says go for product A, there must be a reason, right?
There is indeed a reason why the model opted to recommend the product. The question is, can it be explained using succinct language? The model employs layers of intricate mathematical algorithms that eventually spit out a single outcome. The outcome is easy enough to understand, but try explaining those algorithms to a human.
Unless someone has a Ph.D in mathematics, it just isn’t possible to digest the complex logic undertaken by the algorithms. Even if it were possible, would it be worth translating for business stakeholders using business language? Would it be trusted?
That being said, some models are indeed more revealing than others. Take a decision tree, for example, which clearly shows each branch and why a particular decision was made. However, turn a decision tree into a random forest algorithm and the transparency suddenly disappears.
How many data scientists are opting to use standalone decision tree models? Not many, now that there are far more powerful models available. Random forest and many of the popular machine learning models are defined as a black box — in which decision-making steps are not revealed to the human.
The model provides a decision and we are simply asked to trust it. Is that OK?
In the earlier example, perhaps not. However, it does depend on what use case the model is being applied to. There are numerous scenarios where black box models could fit.
Let’s look at some use cases where black box models are acceptable.
Safe use cases
Scheduling teacher-pupil classes, classifying pet images and scheduling sports matches are all examples that can be undertaken without too much scrutiny of the inner workings of the model. If the lesson plan works for the teachers, do they need to know why Mrs. Willis teaches music to fourth-graders on a Wednesday afternoon? Do we need to know why our deep learning model correctly labelled a cat as a cat? Our eyes can verify that.
Similar principles can be applied for sports. If the football association is satisfied that Liverpool is playing Manchester United at the right time, it doesn’t matter why the model chose to schedule the match in December.
Each of these use cases also has one thing in common: The price of being wrong occasionally is not very significant.
For the use cases mentioned above, being wrong sometimes is not a big deal. If teachers are not happy with the assigned class schedule, they can simply run the algorithm again. People will not lose their jobs if a model incorrectly labels one out of every 100 images as a dog. As for football matches, a human can alter the schedule if they really want to.
Now let’s look at some of the not-so-safe use cases.
Not-so-safe use cases
Black boxes are not appropriate for use cases with little margin for error, like policing or any public sector body that strives to keep our citizens safe and well. Erroneous decisions can cost lives, damage reputations and end up on the front page of a national newspaper.
Suddenly, the demand to know why a model is making a decision is self-evident.
To walk into a staff meeting and say that Suspect A has an 80% probability of committing a crime according to a convolutional neural network (CNN) just isn’t good enough. Who would put their neck on the line for a computer that won’t even reveal its inner workings?
There is a pressing need to audit everything that happens in high-pressure public domains such as policing, healthcare, law and the military. All steps are logged, and every decision ever made is accessible somewhere. Someone must be accountable, and integrity and on-the-job expertise is valued above all.
It’s no wonder that the authorities get nervous when AI products are pitched to these sectors.
So, are we missing a trick with AI? Lots of effort has gone into making models more powerful and accurate, but has accuracy been prioritized at the expense of transparency? It would appear so.
Popular ML algorithms in a data scientist’s arsenal (such as Random Forest, XGBoost and CNNs) all have one thing in common: a lack of transparency. Are they powerful? Yes, but they are of little interest to those who want to know why a decision was made.
If AI is to expand into domains with intense scrutiny and little room for error, there must be more research and development into algorithms that have an additional “why” dimension.
Showing police officers or military personnel a mean squared error or set of coefficients won’t even scratch the surface. What we need is a clear audit trail of why the algorithm interpreted the data in a particular way, and how the subsequent decisions were arrived at. We expect humans employed in these sectors to act with complete integrity and transparency, and the same should be expected from an algorithm.
Where do we go next?
Not all domains require us to know what the AI models are doing. A clean set of outputs shared with stakeholders may suffice. For others, the need to know what is going on is not just a nice-to-have, but an absolute must-have to give AI any form of credibility.
As AI evolves and becomes a part of our day-to-day lives, it’s imperative that transparent models are readily available. Otherwise, we run the risk of some of our most critical services opting to not use AI altogether — and what a waste that would be.
The next step in the evolution of AI is not to increase its power, but to give it the succinct intelligence to work in a way that is easily explainable using basic human language. After all, humans are the decision makers, right?