Photo by Nong V on Unsplash
Photo by Nong V on Unsplash

The many ways to make inferences

There are a LOT of ways to make inferences. Many more, I think, than is generally realized. And they all have their weaknesses.

You can make inferences using…


(1) Deduction:

As a consequence of the definition of X and Y, if X then Y.

X applies to this case. Therefore Y.

“Plato is a man, and all men are mortal; therefore Plato is mortal.”

“For any number that is an integer, there exists another integer greater than that number. 1,000,000 is an integer. So there exists an integer greater than 1,000,000.”

Especially popular among philosophers and mathematicians?

Flaws: to apply to the world, you need to add in assumptions about the world, or to apply other methods of inference on top.


(2) Frequencies:

In the past, 95% of the time that X occurred, Y occurred.

X occurred. Therefore Y (with high probability).

“95% of the time when we saw a transaction identical to this one, it was fraudulent. So this transaction is fraudulent.”

Especially popular among applied statisticians and data scientists?

Flaws: You need to have a moderately large number of examples like the current one to perform calculations on, and the method assumes that those past examples were drawn from a process that is (statistically) just like the one that generated this latest example. Moreover, sometimes it is unclear what it means for “X” to have occurred. What if it’s something that’s very similar to but not quite like X that occurred – should that be counted? If we broaden our class of what counts or change to another class that still encompasses all of our prior examples, we’ll potentially get a different answer. Though, fortunately, there are plenty of cases where the class to use is fairly obvious.


(3) Models:

Given our probabilistic model of this thing, when X occurs, the probability of Y is 0.95.

X occurred. Therefore Y (with high probability).

“Given our multivariate Gaussian model of loan prices, when this loan defaults, there is a 0.95 probability of this other loan defaulting.”

Especially popular among financial engineers and risk modelers?

Flaws: hinges on the appropriateness of the parameterized probabilistic model chosen, may require a moderately large amount of past data to estimate free model parameters, and may go haywire if modeling assumptions are suddenly violated.


(4) Regression:

In prior data, as X and Z increased, the likelihood of Y increased.

X and Z are at high levels. Therefore Y.

“Height for children can be approximately predicted as an (increasing/positive) linear function of age and weight. This child is older and heavier than the others, so we predict he is also taller than the others.”

Especially common among economists and data scientists?

Flaws: often is applied with simple assumptions (e.g., linearity) that may not capture the complexity of the inference, but very large amounts of data may be needed to apply much more complex models (e.g., to use neural networks).


(5) Bayesianism:

Given my prior odds on Y being true…

And given evidence X…

And given my Bayes factor, which is my estimate of how much more likely X is to occur if Y is true than if Y is not true…

I calculate that Y is far more likely to be true than to not be true (by multiplying the prior odds by the Bayes factor to get the posterior odds).

Therefore Y (with high probability).

“My prior odds that my boss is angry at me were 1 to 4, because he’s angry at me about 20% of the time. But then he came into my office shouting and flipped over my desk, which I estimate is 200 times more likely to occur if he’s angry at me compared to if he’s not. So now the odds of him being angry at me are 200 * (1/4) = 50 to 1 in favor of him being angry.”

Not as popular as it should be?

Flaws: it is sometimes hard to know what to set your prior odds at, and it can be very hard in some cases to perform the calculation. In practice, carrying out the calculation might end up relying on subjective estimates of the odds, which can be especially tricky to guess when the evidence is not binary (i.e., not of the form “happened” vs. “didn’t happened”), or if you have lots of different pieces of evidence that are partially correlated. On the other hand, if you can do the calculations in a given instance, and have a sensible way to set a prior, this is, in my opinion, the mathematically optimal framework to use for probabilistic prediction. In that sense, we can think of many of the other approaches on this list as (hopefully pragmatic) approximations of Bayesianism (sometimes good approximations, sometimes bad ones).


(6) Theories:

Given our theory, when X occurs, Y occurs.

X occurred. Therefore Y.

“One theory is that depressed people are most at risk for suicide when they are beginning to come out of a really bad depressive episode. So as depression is remitting, patients should be carefully screened for potentially increasing risk factors.”

“When inflation rises, unemployment falls. Inflation is rising, so unemployment will fall.”

Especially popular among psychologists and economists?

Flaws: it’s very challenging to come up with reliable theories, and often you will not know how accurate such a theory is. Even if it has substantial truth to it and is often right, there may be cases where the opposite of what was predicted actually happens, and for reasons that the theory can’t explain.


(7) Causes:

We know that X causes Y to occur.

X occurred. Therefore Y.

“Rusting of gears causes increased friction, leading to greater wearing down. In this case, the gears were heavily rusted, so we expect to find a lot of wearing down.”

“This gene produces this phenotype, and we see that this gene is present, so we expect to see the phenotype.”

Especially popular among engineers and biologists?

Flaws: it’s often extremely hard to figure out causality in a highly complex system, especially in “softer” subjects like nutrition.


(8) Experts:

This expert (or prediction market, or prediction algorithm) X is 90% accurate at predicting things in this general domain of prediction.

X predicts Y. Therefore Y (with high probability).

“This prediction market has been right 90% of the time when predicting recent baseball outcomes, and in this case, they predict that the Yankees will win.”

Not as popular as it should be?

Flaws: you often don’t have access to the predictions of experts (or of prediction markets or prediction algorithms), and when you do, you usually don’t have reliable measures of their past accuracy.


(9) Metaphors:

X, which is what we are dealing with now, is metaphorically a Z.

For Z, when W is true, then obviously Y.

Now W (or its metaphorical equivalent) is true for X. Therefore Y.

“Your life is but a boat, and you are riding on the waves of your experiences. When a raging storm hits, a boat can’t be under full sail. It can’t continue at its maximum speed. You are experiencing a storm now, and so you too must learn to slow down.”

Especially popular among self-help gurus and some ancient philosophers?

Flaws: Z working as a metaphor for X doesn’t mean that all (or even most) solutions that are good for situations involving Z are appropriate (or even make any sense) for X.


(10) Similarities:

X occurred, and X is very similar to Z in properties A, B, and C.

When things similar to Z in properties A, B, and C occur, Y usually occurs.

Therefore Y (with high probability).

“This conflict is similar to the Gulf war in that…and with “Gulf”-like wars, we have always seen that…”

“This data point (with unknown label) is closest in feature space to this other data point which is labeled ‘cat,’ and all the other labeled points around that point are also labeled ‘cat,’ so this unlabeled point should also likely get the label ‘cat.’”

Especially popular among historians and within some machine learning algorithms?

Flaws: in the history case, it is difficult to know which features are the appropriate ones to use to compare similarities, and often the conclusions are based on a relatively small number of examples. In the machine learning case, a very large amount of data may be needed to train the model.


(11) Cases:

In this handful of examples (or perhaps even just one example) where X occurred, Y occurred.

X occurred. Therefore Y.

“The last time we elected a [insert political group you don’t like] as president, we saw what happened. Let’s not make that mistake again.”

“The last three times I went to action movies, I didn’t like them. So I don’t want to go to one again.”

Especially popular with politicians and with nearly everyone in daily living?

Flaws: unless we are in a situation with very little noise/variability, a few examples likely will not be enough to accurately generalize from.


(12) Intuition:

X occurred. My intuition (that I may have trouble explaining) predicts that when X occurs, Y is true. Therefore Y.

“The tone of voice he used when he talked about his family gave me a bad vibe. My feeling is that anyone who talks about their family with that tone of voice probably does not really love them.”

Popular with nearly everyone in daily living?

Flaws: our intuitions can be very well-honed in situations we’ve encountered many times and that we received feedback on (i.e., where there was some sort of answer we got about how well our intuition performed), but in highly novel situations or in situations where we receive no feedback on how well our intuition is performing, our intuitions may be highly inaccurate (even though we may not FEEL any less confident about our correctness).


This essay was first written on October 7, 2018, and first appeared on this site on December 31, 2021. 



Comments

Leave a Reply

Your email address will not be published. Required fields are marked *


  1. This is well-thought out and extremely helpful. It made me realize how much I use (11) Cases, when in fact I should use a more robust method. Thank you Spencer. I appreciate you.