Have you come across people saying, "I make decisions based on data and hence I am always correct"? I don't take your recommendations based on your experience. Show me the data?... Sounds familiar. But is it true? Let us go back to the basics to understand the difference between data and information.
At a high level, data is a fact—a record of a past event. What you infer from the fact is called information.
Let us try to understand using an example. John has appeared for a job interview. He and others who appeared for the interview were presented with a picture in the interview hall. Then, everyone was asked to write a story about the same image. Though the picture is the same, everyone came up with a different story. It is because everyone's assumptions are different, and the assumptions are based on everyone's experiences.
The same applies to data-based decisions. You infer from data, and the inference is based on some assumptions and the data. But, again, the assumptions are based on your experiences. Hence, the decisions will differ from person to person as everyone has a chance to assume differently. Let us explore this further with a few scenarios: -
Scenario1: The school management assess the teacher's effectiveness based on the students' scores. It may sound natural but let us see how different assumptions impact your decisions.
Assumption: Students are scoring well only due to the teacher's efforts.
The intended action by the school management: Incentivise the teachers whose class students scored max marks.
Validate the assumption: When checked with the students, you understood that the students in that class are preparing for competitive exams. They are taking extra coaching from a reputed coaching centre outside the school. This reality invalidates the assumption. Hence, your decision to incentivise that class teacher was grossly wrong. Further investigation revealed that the teachers of other classes where all the students pass the exam have put in additional effort. Even though the students from these classes didn't score max marks, the teachers had to put in extra measures to make all of them pass. Hence, these teachers should have been incentivised.
Scenario2: You assess the Management institutes in India based on the average annual pay a student gets during campus recruitment. Then, you decide to join a management course in a particular institute. Higher the average salary, the better the Institute.
Assumption: Most students get paid closer to the average mentioned. There is no outlier. The distribution is assumed to be a bell curve as follows."
The intended action by you: Join that Institute and pay the fees.
Validate the assumption: When checked, you understood that the average pay is skewed due to few students' placements abroad. Those placed abroad are paid in USD, and converting to INR becomes massive due to exchange value. Hence, this datapoint skews the whole distribution and, therefore, the mean value. Therefore, this reality invalidates the assumption that most students get a job with a salary closer to the average pay shared by the Media/ Institute. The average was higher due to a few data points, which are the outlier and don't happen yearly. Hence, your decision to join that Institute was not entirely correct.
So, returning to the first point, are your data-based decisions always correct? Unfortunately, the answer is not always. It vastly depends on the assumptions you make. Don't forget what Aaron Levenstein said long ago.
I have not even talked about machine learning, and the above example talks about basic stats like max and mean. So, be cognizant of your assumptions when choosing a machine learning solution. The irony is that we are nowadays happy to accept machine learning solutions but hesitant to take human advice, i.e., advice based on the human experience.
I hope you enjoyed reading. Please like, share, and comment.
Views are personal.
Banner created using canva.com