Suppose you're given a business problem to predict the outcome of a business process such as–
"Whether an order will be fulfilled on time or not"
In other words, you're given an order with an order-date and expected-delivery-date, and you need to predict if this order will be delivered within its expected-delivery-date or not and if possible predict the actual-delivery-date.
Given the text & nature of the problem statement and what we have been reading/training about predictive analysis these days, most of us tend to think along the following lines
If you have delivered (or have simply worked on) various enterprise-grade solutions before, you may also be wondering about
And most importantly, after you have delivered a solution
Now, suppose a business user has two solutions against the same business problem which is stated above.
First Solution
The first solution claims that its output is highly accurate, but cannot explain how it has reached that specific output. For example, its output is along the lines of
Order Id |
Probability/Confidence Score |
Is On Time? |
O1 |
0.7 |
Yes |
O2 |
0.2 |
No |
O3 |
0.3 |
No |
O4 |
0.8 |
Yes |
Obviously, the product team would have found the models to be release-worthy after testing and fine-tuning those models against training and testing data. However, business users have the following dilemma which seems to be unaddressed in the output shown above –
How is this specific confidence score value calculated? How can I blindly trust the output coming out of this black-box?
If I don't know how the value is calculated, what kind of feedback can I provide (other than prediction is right or wrong) for the improvement of the system?
Let's understand why it is important for business users to know how a confidence score is evaluated and what its significance is
Please remember that recommendations made by your system will have some effect on the future output of "Is On Time" column since your client may have taken some steps to improve the situation offline, like breaking up a large order into smaller ones, etc. This means a bad recommendation from your system may result in bad recommendations from your client to their suppliers thereby increasing their trust issues with your black box system.
So, if your client is willing to consider your proposal after knowing the explainability limitations of your system, he may be having a few doubts such as
Second Solution
Now imagine the second solution, which claims high explainability potentially at the cost of high accuracy. It tells you how a confidence score is calculated, for example
Order Id |
Probability/Confidence Score |
Is On Time? |
Notes |
O1 |
0.9 |
Yes |
+0.4 Last 10 out of 10 orders successfully delivered on time + 0.2 Difference between the expected delivery date and the actual delivery date is moderate + 0.3 overall on-time delivery ratio is 75% |
O2 |
0.2 |
No |
+0.1 Last 5 out of 10 orders successfully delivered on time + -0.2 Difference between the expected delivery date and the actual delivery date is high negative + 0.3 overall on-time delivery ratio is 75% |
O3 |
0.3 |
No |
+0.2 Last 5 out of 10 orders successfully delivered on time + -0.1 Difference between the expected delivery date and the actual delivery date is low negative + 0.2 overall on-time delivery ratio is 50% |
O4 |
0.8 |
Yes |
+0.4 Last 8 out of 10 orders successfully delivered on time + 0.2 Difference between the expected delivery date and the actual delivery date is moderate + 0.2 overall on-time delivery ratio is 60% |
Or something like
Order Id |
Probability/Confidence Score |
Is On Time? |
Factors that affect positively |
Factors that affect negatively |
O1 |
0.9 |
Yes |
|
|
O2 |
0.2 |
No |
|
|
O3 |
0.3 |
No |
|
|
O4 |
0.8 |
Yes |
|
Can such a system be implemented?
Answer is –
Yes, if number/type of explanations/messages are finite, then it can be programmatically implemented.
Following is an example on how it can be implemented even with a tech-stack that you already have expertise on
For the sake of illustration, in the example shown above, calculation of the confidence score is a simple sum of the value of these 3 derived (from historical data) parameters
Dozens of such parameters can be pre-calculated and based on regions more (or less) weightage can be given to some of these parameters such as
Confidence score = 50% PTen + 25% Diff + 25% POverall
More complex rules can be created like
More parameters (high in numbers and complexity) can be introduced into the equation without impacting the output of existing parameters, such as
How this approach helps your system?
Observe that by pre-calculating the parameter values for all the suppliers, categories and regions, and reducing the calculation to a simple algebraic equation, you have taken the data out of your model and made your system scalable and extensible since
Does this approach ease up your client’s dilemma?
Also, let's analyze how the second solution impacts the dilemma of your client.
Since the system has explained how confidence score is calculated, the client can know what data points have been considered and more importantly, which data points have not been considered. The client can share those extra data points if they have or add more derived information into the equation.
"Notes" can explain the difference between two 0 confidence scores as 0 can be achieved by offsetting two parameter values against each other and also when your system has no history whatsoever for a certain region/category.
Still no, since you cannot claim that one equation will fit for all regions/categories. But you can do faster what-if analysis of different models against different regions/categories and tell the system which equation is more suited for some certain regions/categories.
Conclusion
I am inclined to believe that the second system is more transparent and hackable. Since there is transparency in how the system does predictions and make recommendations, it will earn business user's trust faster than the first system which will have a long term impact on how business users provide feedback to your product team, hence helping in smoothening the process of product maturity.
Please note that significance of explain-ability vs accuracy may vary across different business scenario such as Hotstar/Netflix trying to predict whether a user is likely to continue their membership on their platform, or what kind of videos should be promoted to a certain user. It may not be cost-effective (if not pointless) to pre-calculate more than a few parameters (factors that affect their likelihood of continuing their membership) for millions of users.
I hope this article would have given a different perspective and a starting point as well on what you can do about riding this new age-of-AI wave.
Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.
Comments (1)
Commented: