Measuring Evidence to Strengthen Belief (Bayes’ Theorem)

Reverend Thomas Bayes of Tunbridge Wells in Kent, England, was a Georgian gentleman scientist. These wealthy, independent, male citizens practiced science as a passionate hobby and not a profession. Many were priests. Church provided easy and respectable means of an ample income and vast spare time to indulge scientific queries. By most accounts, Bayes was a hopeless preacher, but an ingenious mathematician. At some point in his life, exactly when is not known, he devised a formula to work out various probability distributions of an uncertain event. And then; he forgot about it. Richard Price, Bayes’ friend, submitted this formula to Royal Society in 1763, two years after his death, under the title, 'An essay towards solving a problem in the doctrine of chances'. 

Bayes’ theorem, as the formula came to be called, had no utility in Bayes’ lifetime. Today it is considered a landmark in the history of probability science and is used for: spam filtering, weather forecasting, DNA testing, medical diagnosis and testing of new treatment modalities, financial forecasting, fault diagnosis in engineering, training of AI models, etc.

Data is the reigning attribute of universe today. It is being generated incessantly, at an astronomical rate, about 400 X 1012 bytes daily. Raw data is meaningless. Statistical analysis imbues it with meaning. And turns it into assertions with varying degrees of evidence. Data is ever changing and so is the evidence. Belief in rational world demands that one changes their opinion with changing data.

Bayes’ rule or Bayes’ theorem tells us how to revise our beliefs consistently, continuously, and rationally, considering new data and the consequent evidence.

Most distinguishing feature of modern medicine is its reliance on evidence; hence the name EBM, Evidence Based Medicine. Bayes’ rule is widely used to interpret evidence in EBM. Expanse of its applicability in modern medicine is surpassed only by its widespread ignorance among medical professionals.

 

Bayesian reasoning finds its paradigm application in medical diagnosis.

Presume mammography is used to screen asymptomatic women for breast cancer in a population. Suppose the probability that a woman has breast cancer is 1 percent (the prevalence of breast cancer in the population). If a woman has breast cancer, the probability that she tests positive is 90 percent (the sensitivity of mammography). If a woman does not have breast cancer the probability that she nevertheless tests positive is 9 percent (the false positive rate of the test). A woman tests positive. What is the chance that she has breast cancer?

In a study by Gerd Gigerenzer, 160 gynaecologists were asked this question. Majority (60 percent) believed the chance was between 80 percent to 90 percent and 19 percent believed it would be 1 percent. The correct answer is 9 percent that can be derived from Bayes’ law. Thus, 80 percent of specialists, who are routinely called to pronounce judgement on such tests were way, i.e., ten times, off the correct answer. The most common treatment of breast cancer is a major surgery. Nine out of ten patients, not suffering from breast cancer, but offered diagnosis of cancer on mammogram, in this example, may undergo a disfiguring surgery that was pointless.

Bayes’ theorem can be stated as:

P (A/B) = P(A) X P (B/A)

                        P (B)

Our minds abhor symbols. I was foxed when I read this formula first. Stated in words it is quite simple.

        P is probability of an event. A and B are events. In above example A is a patient suffering from breast cancer. B is positive mammogram for breast cancer.


        P (A/B) is probability of A if B is true. Or probability that a woman has a disease if her mammography is positive. This is the figure a doctor, and the patient, are interested in.


        P (B/A) is probability of B if A is true. In above case, probability of the test being positive if women tested suffer from cancer. This is the sensitivity of the test.

        P (A) & P (B) are the independent probabilities of events A or B.

Rendered in lay language theorem means, now that you have the evidence, i.e., the result of her mammography, how would you like to revise the probability of the woman suffering from breast cancer? If you did not have the evidence, i.e., mammography result, you could only have said that she had a one percent chance of suffering from breast cancer, i.e., the prevalence of breast cancer we assumed in the population.

Calculation of probability using percentages is confusing. We tend to take recourse to the easier, though fallacious, short cuts, as most of the gynaecologists in the Gigerenzer’s study did.

Gerd Gigerenzer, a psychologist, and a long-time director at the Max Planck Institute for Human Development, is a world-renowned expert in risk-literacy and decision making. He has authored many books and papers on risk. He has suggested that instead of percentages, as in the above example, data be presented as ‘natural frequencies’ which foster understanding.

Data in above example can be re-written in natural frequencies as:

Ten in every 1000 women have breast cancer. Of these 10 women with breast cancer, nine will test positive. Of the 990 women without breast cancer, about 89 nevertheless test positive. What are the chances that woman who tests positive has a breast cancer. 98 women (10+89) will test positive. Of these 10 have breast cancer. The chances that a woman who tests positive has breast cancer are 9/98, or roughly one in ten. Not eight in ten as 60 percent gynaecologists believed.

There are three important deductions from Bayes’ formula.

Chance of the disease is high if its prevalence is high. This is the first figure in the numerator. As it is taught to medical students, if you hear hoofbeats outside, it is likely to be a horse, and not a zebra. If a patient presented with cough and fever in May 2021, when Covid raged, chances that she had Corona virus infection was high. If a patient reports similar complaints today, the chances of Covid infection are much less.

This is the reason claims of supernatural phenomena are pooh-poohed by rationalists. Such phenomena are exceedingly rare. Any proof that claims to prove their veracity, has to be so good that its falsehood is more unlikely, than the phenomenon it is trying to prove. Carl Sagan stated this pithily: 'Extraordinary claims require extraordinary evidence'.

Second, one should believe the idea (individual is suffering from the disease) more if the evidence (the positive test) is likely to occur when the Idea is true - implying high sensitivity of the test. This is the second term in the numerator. A child from Nilgiri Hills in Central India presenting with swollen painful fingers and long-standing anaemia is likely to be suffering from Sickle Cell disease. If a person complains of fever of a few days’ duration and bleeding gums, in August, Dengue fever must be ruled out.

Thirdly, believe the idea less if the evidence is commonplace – If it occurs frequently even in healthy individuals. This is the figure in the denominator of Bayes’ formula. Followers of a particular system of belief may be the dominant suspects in specific crimes in a locality. But the said belief-system cannot be used to identify the troublemakers. This is because most of the followers of this system are not troublemakers. Belief is a poor touchstone here as it throws up many false positives.

Daniel Kahneman and Amos Tversky, both psychologists, singled out a major fault in our Bayesian reasoning: the base rate neglect. In our example it is the prevalence of breast cancer in population. The probability of positive test in cancer attracts our attention and we forget how rare the disease is in the population.

The duo suggested further that human mind doesn’t engage in Bayesian reasoning at all. Instead, we judge the probability that an instance belongs to a category by how representative it is: how similar it is to the prototype of the category that we nebulously represent mentally as a family with its crisscrossing resemblances. Breast cancer typically gets positive mammogram in our minds. Therefore, any woman with a positive mammogram is deemed to be suffering from breast cancer. How common is breast cancer, is conveniently forgotten.

One of the manifestations of base rate neglect is hypochondria. A headache recalls fear of dreaded brain tumour. We do not pause to consider that brain tumour is rare while headache has many commonplace causes like cold, lack of sleep.

Base rate neglect also drives thinking in stereotypes. An under-graduate in a university is concerned about environment protection, human rights, freedom of speech, attends literature-festivals, writes poetry. Is she more likely to be studying history or science? History, of course! No, science courses have 10-15 times more students than History in most universities. A dull, shabbily dressed nerd is more likely to be a clerk in one of capital's many government offices than a research scholar at the university.

 

Bayes’ reasoning has many applications in law. A judge must decide if the evidence presented by the prosecution overwhelmingly proves defendant’s culpability. Prevalence of crime in the population (the base rate), the false positive signals given by the test, and its sensitivity should influence the decision.

In a criminal case, DNA evidence matches the suspect. The probability of a random match is 1 in 100,000. The prosecutor argues that there's only a 1 in 100,000 chance that the suspect is innocent. Is this correct? No, it is false. 1 in 100,000 is the probability of a DNA match given innocence, not probability of innocence given the match.

Rahul drives a blue taxi. Proportion of blue taxis in the city is one in five. A blue taxi is reported to be involved in an accident. Chances that Rahul’s taxi was involved are 20 percent.  Correct? No, not by a long shot. 20 percent is the proportion of all blue cars in city, not the chance of a blue car being spotted on a road being a particular blue car. This will depend on the number of blue cars in the city and the veracity of the witness.

To calculate the probability of innocence in a DNA test one must know the size of the potential suspect pool. Assume the population of the city where the crime happened to be 1 million. 10 people would have false positive matches. Only one is guilty. Therefore, true positive is only one. Thus, there is a 10 in 11 chance, that the suspect is innocent.

Not all evidence presented in a court is discrete like DNA matching. Most, I believe, are fuzzy. They do not easily lend themselves for statistical interpretation. State is always interested in fobbing of woolly evidence as unassailable. Laws are enacted to give these ambiguous allegations sanctity of unimpeachable evidence.

One such law is the Unlawful Activities (Prevention) Act, the UAPA. People can be arrested if they are suspected to have committed an offence as defined under the act. Bail has been made extremely difficult for the accused arrested under UAPA. Courts are required do deny bail if they find that there are reasonable grounds to believe that the case against the accused was prima facie true. Courts are mandated to assess guilt only by examining the chargesheet prepared by the investigating agency. Accused cannot provide any evidence outside the chargesheet. During bail hearing, they cannot examine or cross-examine witnesses to challenge the chargesheet or the evidence it contains.

Conviction rate of those accused under UAPA is only 2%. But not all UAPA cases have been tried till date. Between 2014 and 2020, of the total UAPA cases that saw completion of the trial, an average of 72.4% were discharged or acquitted and 27.5% saw convictions.

27.5% is a mind-bogglingly poor sensitivity test. Were it a diagnostic test for a disease, it would not even see the light of publication in the most obscure medical literature. Yet judges in most courts, appear to treat UAPA as faultless, and deny bail to accused, thus incarcerating them for years, before the trial is completed. 

We have seen the extravagant folly of applying a medical test with more than 90 percent sensitivity as a screening test for a rare disease. False positives would overwhelm the true positives. If misdiagnosis in medicine can usher debilitating surgery in its wake, the result of false-positive in judiciary can be more fatal: incarceration for years, or even a death sentence. This is the reason that nowhere in the world can the terrorists be profiled and detained preventively. Base rate, the proportion of population that is terrorist, is an infinitesimal fraction. Any test that is less than perfect would throw up a huge cache of innocents.

 

Humans did not evolve to intuitively understand chance and probability. Lord Balfour, a onetime prime minister of UK, once remarked that ‘the human mind is no more a truth finding apparatus than the snout of a pig’. True, we did not evolve to discern truth hidden behind the veil of chance. But we also did not evolve to treat cancer or kill people because they differed from us in their ideology.

Millenia of thought has equipped us to make sense of a life that is uncertain at its core. For a professional, trained and trusted to separate chaff from the grain of truth, surrender of reason in understanding the world is a gross dereliction of duty at its best. At its worst it is an unpardonable sin condemning fellow humans to a fate worse than the blind chance had in store for them.

 

Comments

Popular posts from this blog

Graveyards of Mind

Travels with My Aunt-Graham Greene

Born To Die