Role of communication in machine learning

Most important part of machine learning? Communication with all stakeholders and removing barriers in communication

For one of our clients, we have built out a random forest model that predicted lifetime value of any user. And, as with most models, we knew that communication was key.

Communication from the start

Ok, so first of all, communication is never an afterthought. In reality, you want to be communicating your model to the stakeholders from the start.

Do I need a model?

Please, please, please — make sure to do cost-benefit analysis of making a model. I know that as data scientists, we are excited by the complexities and learning involved in executing a machine learning project. But, do realize that the gains of the machine learning model must be huge enough to justify the long amount of time it takes to build a full machine learning model. So, do seek the simplest solution. Avoid machine learning solutions at first and see if a much simpler solution would suffice. Only if the problem is important and can’t be solved without machine learning, do you embark on a machine learning project.

Communication barrier

As stakeholders listen to you — they have one thing on their mind “how does this affect or concern my business?” So, tailor your information accordingly. Also, business stakeholders don’t have the technical background that you have, so here is what your detailed explanation of a random forest algorithm sounds like — “bla bla lalala bla”.

boringmeeting.jpg

So, does this mean not talking about the model at all and just stating the error of the model? Well, no. The goal is to abstract away from the model to the point that it is understandable by anyone. There are 2 ways of explaining gravity, for example — you can write out all the physics equations or say “it’s a force that makes things fall when we drop them”. Choose the easier one.

How we did it?

We realized that stakeholders did not quite care all that much about nitty gritty of the algorithms we were using. So, we spared the details and talked about the model in general abstracted terms.

The error metric we used internally as a team was RMSE. However, the RMSE idea was a bit complex and intangible for stakeholders. Also, we built the model on user level but then we rolled it up into really granular cohorts. So, we picked a different, more fitting metric for stakeholders.

Metric we used internally
Metric we used internally

The metric we picked was the percentage difference of rolled up actual vs predicted values. Then, we would look at distribution of that error across cohorts. Internally, as a team, however, when we were tuning the model, we were using RMSE as a metric we were trying to minimize. But, when we talked to stakeholders we showed a much less intimidating, and more fitting "percentage difference between predicted and actual values" metric.

And, because RMSE is a much more sensitive error measure, whatever we did to decrease RMSE would also decrease the more downstream metric of cohort percentage difference.


View Products

Interested in talking to us?

Let's chat