Posted by Jay Livingston
Can face-recognition software tell if you’re gay?
Here’s the headline from The Guardian a week ago.
He predicts that self-learning algorithms with human characteristics will also be able to identify:
When I read that last line, something clicked. I remembered that a while ago I had blogged about an Israeli company, Faception, that claimed its face recognition software could pick out the faces of terrorists, professional poker players, and other types. It all reminded me of Cesare Lombroso, the Italian criminologist. Nearly 150 years ago, Lombroso claimed that criminals could be distinguished by the shape of their skulls, ears, noses, chins, etc. (That blog post, complete with pictures from Lombroso’s book, is here.) So I was not surprised to learn that Kosinski had worked with Faception.
- a person’s political beliefs
- whether they have high IQs
- whether they are predisposed to criminal behaviour
For a thorough (3000 word) critique of the Wang-Kosinski paper, see Greggor Mattson’s post at Scatterplot. The part I want to emphasize here is the problem of False Positives.
Wang-Kosinski tested their algorithm by showing a series of paired pictures from a dating site. In each pair, one person was gay, the other straight. The task was to guess which was which. The machine’s accuracy was roughly 80% – much better than guessing randomly and better than the guesses made by actual humans, who got about 60% right. (These are the numbers for photos of men only. The machine and humans were not as good at spotting lesbians. In my hypothetical example that follows, assume that all the photos are of men.)
But does that mean that the face-recognition algorithm can spot the gay person? The trouble with Wang-Kosinki’s gaydar test was that it created a world where half the population was gay. For each trial, people or machine saw one gay person and one straight.
Let’s suppose that the machine had an accuracy rate of 90%. Let’s also present the machine with a 50-50 world. Looking at the 50 gays, the machine will guess correctly on 45. These are “True Positives.” It identified them as gay, and they were gay. But it will also classify 5 of the gay people as not-gay. These are the False Negatives.
It will have the same ratio of true and false for the not-gay population. It will correctly identify 45 of the not-gays (True Negatives), but it will guess incorrectly that 5 of these straight people are gay (False Positive).
Again, let’s give the machine an accuracy rate of 90%. For the 50 gays, it will again have 45 True Positives and 5 False Negatives. But what about the 950 not-gays. It will be correct 90% of the time and identify 885 of them as not-gay (True Negatives). But it will also guess incorrectly that 10% are gay. That’s 95 False Positives.
The rarer the thing that you’re trying to predict, the greater the ratio of False Positives to True Positives. And those False Positives can have bad consequences. In medicine, a false positive diagnosis can lead to unnecessary treatment that is physically and psychologically damaging. As for politics and policy, think of the consequences if the government goes full Lomborso and uses algorithms for predicting “predisposition to criminal behavior.”