Showing posts with label Methods. Show all posts
Showing posts with label Methods. Show all posts

Replication Complications

December 14, 2019
Posted by Jay Livingston

Some people can tell a joke. Others can’t. Same joke. One person has everyone laughing, the other gets zilch. Does the null response mean that the joke isn’t funny?

 What we have here is a failure to replicate.

 A couple of days ago, the Psychology Archive (PsyArXiv) published results showing a failure to replicate an experiment on Terror Management Theory (TMT).* Among the possible reasons for this failure, the authors say,

There was substantial nuance required in implementing a successful TMT study. . . . These nuances include how the experimenter delivers the experimental script (tone, manner ). . .

I offered this same idea five years ago. I didn’t use the term “nuance.” Instead, I speculated that some experimenters knew how to “sell it” —  “it” in this case being the basic manipulation or deception in the experimental set-up. You can read the whole post (here), but here’s a somewhat shorter replication. I’m copy-and-pasting because as we get more results from replication studies, it’s still relevant. Also, I liked it.

*              *            *             *

One of the seminal experiments in cognitive dissonance is the one-dollar-twenty-dollar lie, more widely known as Aronson and Carlsmith, 1963. Carlsmith was J. Merrill Carlsmith. The name itself seems like something from central casting, and so did the man – a mild mannered, WASP who prepped at Andover, etc. Aronson is Eliot Aronson, one of the godfathers of social psychology, a Jewish kid from Revere, a decidedly non-preppy city just north of Boston.

In the experiment, the subject was given a boring task to do — taking spools out of a rack and then putting them back, again and again — while Carlsmith as experimenter stood there with a stopwatch. The next step was to convince the subject to help the experimenter. In his memoir, Not by Chance Alone, Aronson, describes the scenario.

[Merrill] would explain that he was testing the hypothesis that people work faster if they are told in advance that the tast is incredibly interesting than if they are told nothing and informed, “You were in the control condition. That is why you were told nothing.”

At this point Merrill would say that the guy who was supposed to give the ecstatic description to the next subject had just phoned in to say he couldn't make it. Merrill would beg the “control” subject to do him a favor and play the role, offering him a dollar (or twenty dollars) to do it. Once the subject agreed, Merrill was to give him the money and a sheet listing the main things to say praising the experiment and leave him alone for a few minutes to prepare.

But Carlsmith could not do a credible job. Subjects immediately became suspicious.

It was crystal clear why the subjects weren't buying it: He wasn't selling it. Leon [Festinger] said to me, “Train him.”

Sell it. If you’ve seen “American Hustle,” you might remember the scene where Rosenfeld (Christian Bale) is trying to show the FBI agent disguised as an Arab prince how to give a gift to the politician they are setting up.  (The relevant part starts at 0:12 and ends at about 0:38)

Here is the script:

Aronson had to do something similar, and he had the qualifications. As a teenager, he had worked at a Fascination booth on the boardwalk in Revere, Massachusetts, reeling off a spiel to draw strollers in to try their luck.

Walk right in, sit in, get a seat, get a ball. Play poker for a nickel. . . You get five rubber balls. You roll them nice and easy . . . Any three of a kind or better poker hand, and you are a winner. So walk in, sit in, play poker for a nickel. Five cents. Hey! There’s three jacks on table number 27. Payoff that lucky winner!

Twenty years later, he still had the knack, and he could impart it to others.

I gave Merrill a crash course in acting. “You don't simply saythat the assistant hasn't shown up,” I said. “You fidget, you sweat, you pace up and down, you wring your hands, you convey to the subject that you are in real trouble here. And then, you act as if you just now got an idea. You look at the subject, and you brighten up. ‘You! You can do this for me. I can even pay you.’”

The deception worked, and the experiment worked. When asked to say how interesting the task was, the $1 subjects give it higher ratings than did the $20 subjects. Less pay for lying, more attitude shift.

 The experiment is now part of the cognitive dissonance canon. Surely, others have tried to replicate it. Maybe some replications have not gotten similar results. But that does not mean we should toss cognitive dissonance out of the boat. The same may be true for TMT. It’s just that some experimenters are good at instilling terror, and others are not.

  * If you’ve never heard of TMT (I hadn’t), it’s basically the idea that if you get people to think about their own mortality, their attitudes will become more defensive about themselves and their group. Of the twenty-one replications, a very few got results that supported TMT, a very few got results that contradicted TMT. Most found no statistically significant or meaningful differences. 

Here’s the set-up for the independent variable: The subjects in the Terror condition were asked to write about “the emotions they experienced when thinking about their own death, and about what would happen to their physical body as they were dying and once they were dead.” The non-Terror subjects were asked to write about the same things about watching television — e.g., what happens to your physical body when you watch TV. (I am not making this up.)

Methodological Trees and Forests

December 12, 2019
Posted by Jay Livingston

The units of analysis that researchers choose usually constrain the explanations they come up with. Measuring variables on individuals makes it harder to see the effects of larger units like neighborhoods.

For example, much research has found a correlation between female-headed households and crime. Most explanations for this correlation focus on the households, with much talk about the lack of role models or the quality of parent-child interaction. But these explanations are looking at individual trees and ignoring the forest. The better question is not “What are the effects of growing up in a single-parent home?” It’s “What are the effects of growing up in a neighborhood where half the households are headed by single mothers?”

In the early 1990s, I wrote a criminology textbook, and one of the things that differentiated it from others was that it took seriously the idea of neighborhoods and neighborhood-level variables.

That was then. But now, Christina Cross in a recent Times op-ed makes a similar argument. Research generally shows that it’s better for kids to grow up with two parents rather than one. That fits with our assumptions about “broken homes” even if we now call them “single-parent households.” But Cross’s research finds a crucial Black-White difference in the importance of this one dimension of family structure.

Looking at educational outcomes, she finds that White kids from two-parent families do much better than their single-parent counterparts. But for Black kids, the advantage of a two-parent home is not so great.
living in a single-mother family does not decrease the chances of on-time high school completion as significantly for black youths as for white youths. Conversely, living in a two-parent family does not increase the chances of finishing high school as much for black students as for their white peers.

 Why does a two-parent family have less impact among Blacks? Cross looks at two explanations. The first is that the effect of a very low-income neighborhood (“socioeconomically stressful environments”) is so great that it washes out most of the effect of the number of parents inside the home. For a kid growing up in an area with a high concentration of poverty, having a father at home might make a difference, but that difference will be relatively small, especially if the father is unemployed or working for poverty-level wages.

The other explanation is that having other relatives close by mitigates the impact of having only one parent in the home. Cross says that her data supports this idea, but the extend-family network explanation is not nearly as powerful as the neighborhood-poverty explanation.

For policy-makers, what all this means is that the traditional conservative, individual-based solutions miss the point. Exhorting people to stay married (and providing costly government programs along the same lines) aren’t going to have much impact as long as we still have racially segregated neighborhoods with high levels of unemployment and poverty.

The message for researchers is similar: if you confine your thinking or your variables to individuals, you risk ignoring more important variables.

Confidence Games

January 19, 2019
Posted by Jay Livingston

Timing is crucial in comedy. In can be important in survey research as well. If you ask about satisfaction with government, and you take your survey at a historical moment when the Republican party controls the government, don’t be surprised if Republicans are more satisfied than Democrats. But also don’t write up your findings to imply that this means that Republicans have a deep and abiding faith in American institutions.

We’ve been here before, not with “satisfaction,” but with something similar — happiness. People who make claims about the relation between happiness and political views — people like Arthur Brooks, for example — often don’t bother to look at which party was holding sway at the time the survey they’re using was done. But that context matters a lot, especially now that the country has become so partisan and polarized, with people remaining loyal to their party the way sports fans are loyal to their team. In a post two years ago inspired by a Brooks column, I put it this way

When you’re talking about the relation between political views and happiness, you ought to consider who is in power. Otherwise, it’s like asking whether Yankee fans are happier than RedSox fans without checking the AL East standings. [the full post is here.]

I had a similar reaction to a recent thread on Twitter about who has lost confidence in American institutions. The answer is: everybody. But some more than others.  Patrick Egan of NYU looked at the “confidence” items in the General Social Survey and created these graphs showing the average confidence in twelve different institutions.

(Click on an image for a larger view.)

Confidence has dropped among all categories. But the steepest decline has come among non-college Whites. Their overall level of confidence is the lowest of any of these groups. They are also the strongest supporters of Donald Trump. This reinforces the image of the core Republican constituency — Trump’s staunchest supporters — as dissatisfied, even resentful. They have lost confidence in traditional American institutions, and they acclaim the strong outsider who could bring sweeping changes.

In response, Joshua Tucker posted a link to a report he was co-author on — the American Institutional Confidence Poll (AICP) from the Baker Center for Leadership & Governance at Georgetown University. The AICP found that demographic characteristics didn’t make much difference. Politics did. Here is AICP’s Number One Key Finding:

Why the discrepancy between the GSS data the AICP conclusions? I wondered if it might be the sample. It wasn’t.

The interviews were conducted online from June 12 to July 19, 2018, by the survey firm YouGov. The sample includes 3,000 respondents from the U.S. general population. Additionally, the poll includes samples of 800 African-Americans, 800 Latinx Americans, and 800 Asian Americans.

Their sample, as they note elsewhere, is larger than that of most political surveys, plus the  oversampling of smaller populations they want to have good data about. No problem there.

But what about the timing? We know that on November 1, 2016, Democrats were much more likely than were Republicans to say that the economy looked good. Two weeks later, those positions were reversed. The economy did not change in those two weeks. The occupancy of the White House did.

The AICP survey was done last summer, months before the midterm elections, when the GOP controlled the White House, the Senate, the House, and the Supreme Court. That seems like kind of an important fact, but to find it, you have to scroll down to the methodology notes at the end of the report. 

Even in the GSS graphs, Egan has drawn a trend line that smooths out these shifts that are possibly caused by electoral changes. Egan also has lumped together twelve institutions. Separating them in to categories (e.g,. government, non-government) might allow us to see even sharper demographic differences.

The AICP, on the other hand, does report about confidence in specific institutions, twenty in all. The authors conclude that “confidence in institutions is largely driven by party affiliation.” They neglect the corollary: who has confidence in which institution can shift quickly when an election changes the party in power. This volatility makes it a bit misleading to talk about confidence in “institutions” as though people were thinking about them in the abstract. For example, the authors say, “The executive branch is the institution in which Democrats have the least confidence, while Republicans rank it the fourth highest.” Surely this difference is not about what people think of “the executive branch.” It’s about Donald Trump. These days, isn’t everything?

I’ve Just Met a Face

January 3, 2019
Posted by Jay Livingston

Each month, the Harvard Business Review has a feature called “Defend Your Research.” I confess, I am not a regular HBR reader, but as I was searching for something else, a serendipitous click whisked me to an episode of “Defend Your Research” that was about names, something I am interested in. The researcher, Anne-Laurier Sellier, had found that people look like their names. More specifically, people shown a photo of a stranger can make a better-than-chance guess as to what that person’s name is.(The HBR article is here.)

I was a tad skeptical. Hadn’t we been through something like this before with men named Dennis choosing to become dentists and women named Florence living in Florida? At least that research had a theory to explain the supposed connection — “implicit egotism” — even if the data turned out to be less than what met the researchers’ eye.* And now we have people named Charlotte choosing to look like a Charlotte?

Plausible or not, the empirical findings about faces and names were interesting, and I was curious to try my luck. Conveniently, Sellier had provided HBR two examples.

George, Scott, Adam, Bruce. Which could it be? “What if it's just that the other names on the list were rarer and less likely?” asks Scott Berinato, the HBR interviewer.

We controlled for that by offering only choices that were as popular as the actual name, based on the frequency of use. We controlled for most things we could think of, including ethnicity, name length, and the socioeconomic background of the subjects and of the people in the photos.

Any good researcher would control for these things. Everyone knows that. But “Bruce?” My spider sense suggested that the names Bruce and Scott are not really equivalent in popularity. To check, I went to the Social Security database on names.

The guy on the left looks like he’s about 40, the one on the right, early 30s. The HBR article came out in 2017. I guessed that the research was done a couple of years earlier. So I looked up the numbers for boy baby-names in 1975 for the older guy, 1983 for the younger. Here are the results.

And what are the answers to the name-that-face quiz? The man on the left is Scott. The man on the right is James. The correct name is two to three times more frequent than the second-most popular name on the list. It’s possible that Sellier’s subjects were putting together their estimate of the man’s age and their intuitive knowledge of name popularity. A better design might have been to show people four pictures of men roughly the same age and ask, “Which one is Scott?”

Maybe Sellier just picked the wrong examples to illustrate her point. After all, she says that she and her fellow researchers did this study in the US, France, and Israel and got positive results in all three countries. And they do have a theory — that people change their appearance so as to conform with the cultural stereotype of their name. “In America people presumably share a stereotype of what a Scott looks like. . . and Scotts want to fit that stereotype.”

I haven’t looked at Sellier’s publications. All I know is what I see in the HBR. Maybe, knowing that the HBR interviewer was named Scott, she picked a couple of photos — one Scott, one not-Scott — just for this occasion and selected Bruce and the other names on the spur of the moment. Still, I assume that a researcher being interviewed for a feature called “Defend Your Research” would bring examples that best illustrate her ideas. If this is the best she’s got, I’m afraid I remain unconvinced


* For more on Dennis the dentist, see this 2018 post by Andrew Gelman (here  and follow the links.

Pointers on the Zero Point (à la Jonah Goldberg)

August 5, 2018
Posted by Jay Livingston

As cheap tricks in data visualization go, leaving out the zero point is one of the easiest and most common ways to make a molehill of difference appear to be a mountain. Here’s an example I’ve used before — the Fox News graph showing that a tax rate 39.6% is five times the size of a tax rate of 35%

(Click on an image to enlarge it.)

I’ve blogged on this before (here and here), and as some of the comments on those posts argue, cutting the y-axis down to size is not always deceptive. But in most cases, it’s good to include the zero-point.

Jonah Goldberg, the conservative political writer, has learned that lesson. Sort of. Philip Cohen, in his review (here) of Goldberg’s latest book Suicide of the West: How the Rebirth of Tribalism, Populism, Nationalism, and Identity Politics is Destroying American Democracy, has provided examples of Goldberg’s data-viz facility. The problem: how to exaggerate effects while yet including the zero point. Goldberg’s solution: simple – just truncate the y-axis as usual, but then stick a label of zero on the lowest point.

From these graphs we learn
  • In 1960, life expectancy worldwide was nearly 0.
  • By 2015, infant mortality worldwide had decreased to nearly 0
In a mere 55 years, we went from a world where nearly all infants died to a world in which almost no infants died.

As Philip Cohen notes, the book’s blurbs from conservative pals and colleagues (e.g., John Podhoretz, Arthur Brooks) mention Golberg’s “erudition.” Apparently, this erudition stops short of knowing that the distance between 54 and 56 is not the same as the distance between 0 and 54.

A Behavioral Econ Lab Is Not a Restaurant

July 16, 2018
Posted by Jay Livingston

Great title for an article
We should totally open a restaurant:
How optimism and overconfidence affect beliefs
It will be in the August issue of the Journal of Economic Psychology. The link popped up in my Twitter feed this morning.

No, the failure rate for restaurants is not 90% in the first year as a 2003 American Express ad claimed. But most restaurants don’t make it to three years. So it’s only natural to ask about the people who think that their new restaurant will be among those that beat the odds. This was an article I wanted to read.

Imagine my surprise when I discovered that the article was not at all about people who started up a restaurant. True, the word restaurant appears 13 times in the article, plus another seven if you include restauranteur [sic – the preferred term is still restaurateur, no n]. But the data in the article is a from a laboratory experiment where subjects try to guess whether a ball drawn from an urn will be white or black. No chefs brilliant but overweaning, no surly waitstaff, no price-gouging suppliers, no unpredictable customers, no food, and no location, location, location. Just opaque jars with white balls and black balls.

The procedure is too complicated to summarize here – I’m still not sure I understand it – but the authors (Stephanie A. Hegera and Nicholas W. Papageorge) want to distinguish, as the title of the article says, between optimism and overconfidence. Both are rosy perceptions that can make risky ventures seem less risky. Optimism looks outward; it overestimates the chances of success that are inherent in the external situation. Optimism would be the misperception that most restaurants survive for years and bring their owners wealth and happiness. Overconfidence, by contrast, looks inward; it is an inflated belief in one’s own abilities.

Both in the lab and probably in real life, there’s a strong correlation between optimism and overconfidence. People who were optimistic also overestimated their own abilities. (Not their ability to run a restaurant, remember, but their ability to predict white balls.) So it’s hard to know which process is really influencing decisions.

The big trouble is that the leap from lab to restaurant is a long one. It’s the same long leap that Cass Sunstein takes in using his experiment about “blaps” to conclude that New York Times readers would not choose a doctor who was a Republican. (See this earlier post.)

The Hegera-Papageorge article left me hungry for an ethnography about real people starting a real restaurant. How did they estimate their chances of success, how did they size up the external conditions (the “market”), and how did they estimate their own abilities. How did those perceptions change over time from the germ of the idea (“You know, I’ve always thought I could . . .”) to the actual restaurant and everything in between — and what caused those perceptions to change? On these questions, the lab experiment has nothing to say.

But you’ve got to admit, it’s a great title. Totally.

R-E-S-P-E-C-T, Find Out What It Means to Me . . . Or Not

July 8, 2018
Posted by Jay Livingston

In the previous post, I wondered why Republican women surveyed by Pew saw Donald Trump as having “a great deal” or “a fair amount” of respect for women. One of the explanations I didn’t consider is that people don’t always answer the question that researchers are asking. The Pew survey asked dozens of questions. Several were about respect — how much respect does Trump have for women, men, Blacks, Hispanics, Evangelicals, and more. Others asked how believable Trump is, whether he keeps his business interests separate from his presidential decisions, whether he respects democratic institutions. (Results from the survey are here.)

But maybe to the people being interviewed, these were all the same question: Trump – good or bad?

Claude Fischer blogged recently (here) about this difference between questions researchers think they are asking and the questions people are actually responding to. Sometimes people give incorrect answers to basic factual questions. But it’s not that these respondents are ignorant.

an interesting fragment of respondents treat polls not as a quiz to be graded on but as an opportunity for what survey scholars have termed “expressiveness” and partisan “cheerleading.”

I would broaden this kind of poll responding to include “self-presentation” or, more simply, “sending a message.” That is, there are respondents who treat some factual questions not as chances to show what they know but as chances to tell the interviewer, or data analyst, or reader, or even themselves something more important than facts.

If expressing feelings or sending a message underlie people’s responses to factual questions, those same purposes should have even more importance when it comes to subjective judgments, like whether Trump has a lot of respect for women.

Fischer seems to side with the “sending a message” explanation. But that phrase suggests, to me at least, an intention to have some specific effect. For example, proponents of harsher criminal penalties claim that these will “send a message” to potential criminals. The obvious corollary is that these punishments will have an actual effect – less crime.

When pollsters call me, I’m often tempted to send a message. I consider what the implications of my answer will be when it’s reported in the survey and how that might affect politicians’ decisions. I’m even tempted to lie on demographic questions (age, income, party affiliation). Maybe my preferences will swing more weight coming from a young Independent.

But my hunch is that in most of the Pew questions about respect, people are not trying to influence policy. They’re just expressing a global feeling about Trump. The message, as Fischer says, is that they want others to know how they feel.        

Which is it — a deliberate strategy or an expression of sentiment? The trouble is that the only way to know what people are thinking when we ask them whether Trump respects women is to ask them and to listen to their answers instead of giving them four choices and then moving on to the next question. That is the great limitation of questionnaire surveys.

A Class of Rich People — Gallup Goes Marxist

June 10, 2018
Posted by Jay Livingston

Gallup asked “Do You Think the United States Benefits From Having a Class of Rich People, or Not?” Here are the results.

Gallup’s lede is that Democrats have grown more skeptical about the rich while Independents and Republicans haven’t changed their views. The other obvious conclusions from the survey is that Republicans think far more favorably of the rich and that Independents are closer to Democrats than to Republicans. (The Gallup summary is here.)

What surprised me is that Republicans would agree to even answer the question given that it was about “a class of rich people.” The true conservative would tell the Gallup interviewer, “There are no classes in America. We have only individuals; some of them get rich.” But overall, only 3% of the 1500 people surveyed refused to answer, though Gallup does not provide data on the political affiliation of these refuseniks.

Most of the time, when Americans talk about “class” they really mean “social status” – a scale based mostly on money which, therefore, has infinite gradations. A person with $100,000 is higher on the scale than is a person with $90,000. But “class” in the Gallup question implies a more Marxian definition — a group of people who share common economic interests and who act to secure those interests against the interests of other classes.

Unfortunately, we don’t know what Gallup’s respondents had in mind when they heard the question. Maybe Republicans, Independents, and Democrats interpreted the question differently.

What else could Gallup have asked?

“Does the US benefit from policies that allow some people to get very rich?” frames wealth as an individual matter with America as the land of unlimited opportunity.  A question like this would probably draw higher rates of agreement across the board.

“Do Americans in general benefit from policies that benefit the rich?” treats the rich more as a true class. It implies that some policies benefit one class, the rich, even though they might not benefit most people. This question might have fewer people agreeing.

I wonder what the results would be if Gallup asked both these questions.

Experiments and the Real World

May 26, 2018
Posted by Jay Livingston

Two days ago, the NY Times published an op-ed by Tali Sharot and Cass Sunstein, “Would You Go to a Republican Doctor?” It is based on a single social psychology experiment. That experiment does not involve going to the doctor. It does not involve anything resembling choices that people make in their real lives. I was going to blog about it, but Anderw Gelman (here) beat me to it and has done a much better and more thorough job than I could have done. Here, for example, is a quote from the op-ed and Gelman’s follow-up.

“Knowing a person’s political leanings should not affect your assessment of how good a doctor she is — or whether she is likely to be a good accountant or a talented architect. But in practice, does it?”

I followed the link to the research article and did a quick search. The words “doctor,” “accountant,” and “architect” appear . . . exactly zero times.

Gelman takes the article apart piece by piece. But when you put the pieces together, what you get is a picture of the larger problem with experiments. They are metaphors or analogies. They are clever and contrived. They can sharpen our view of the world outside the lab, the “real” world  — but they are not that world.

 “My love is like a red, red rose.” Well, yes, Bobby, in some ways she is. But she is not in fact a red, red rose.

Here is the world of the Sharot-Sunstein experiment.

We assigned people the most boring imaginable task: to sort 204 colored geometric shapes into one of two categories, “blaps” and “not blaps,” based on the shape’s features. We invented the term “blap,” and the participants had to try to figure out by trial and error what made a shape a blap. Unknown to the participants, whether a shape was deemed a blap was in fact random.

The 97 Mechanical Turkers in the experiment had to work with a partner (that is, they thought they would work with a partner – there was no actual collaboration and no actual partner). Players thought they would be paid according to how well they sorted blaps. The result:

[The players] most often chose to hear about blaps from co-players who were politically like-minded, even when those with different political views were much better at the task.

To repeat, despite the title of the article (“Would You Go to a Republican Doctor?”), this experiment was not about choosing a doctor. To get to New York Times readers choosing doctors, you have to make a long inferential leap from Mechanical Turkers choosing blap-sorters. Sharot-Sunstein are saying, “My partner in sorting ‘blaps’ is like a red, red rose a doctor or an architect.” Well, yes, but . . . .

See the Gelman post for the full critique.

Full disclosure: my dentist has a MAGA hat in his office, and I’m still going back for a crown next month. A crown is like a hat in some ways, but not in others. 

Evidence at the Upshot

March 31, 2018
Posted by Jay Livingston

“Common sense” is not evidence. Neither is “what everyone knows” or, to use a source of data favored by our president, what “people say.”  That’s one of the first things students hear in the intro sociology course. Our discipline is empirical, we insist. It is evidence-based, and evidence is something that really happened. Often you have to actually count those things.

The Upshot is the “data-driven” site that the New York Times created to compete with FiveThirtyEight. Friday, an Upshot article about marriage, social class, and college had this lede,* a six-word graf.*
Princetonians like to marry one another.
The article, by Kevin Carey, showed that students from wealthier families are more likely to be married by their early thirties than are students from the bottom fifth of the income ladder. Carey argued that the cause was “assortative mating” – like marries like – and that the pattern holds even for graduates of the same elite school – Princeton, for example. Rich Princetonians marry other rich Princetonians, says Carey. Poor Princetonians remain unmarried. In their early thirties, only a third of them were married.

(Click on the image for a larger view.)

According to Carey, the sorting that leads to mating takes place in the “eating clubs” – Princeton’s version of fraternities and sororities. Acceptance into this or that club depends in part on social class, so as Carey sees it, “Eating clubs are where many upper-income marriages begin.”

It’s logical and it makes sense. The only trouble is that Carey provides no evidence for Tiger intermarriage. That 56% of rich Princeton alums who were married by age 32-34 – we don’t know who they married. Another rich Princetonian? Maybe, maybe not. We know only that they were married, not to whom.

Oh, wait. I said Carey provided no evidence. I take that back. Here’s the second graf.

Although the university is coy about the exact number of Tiger-Tiger marriages, Princeton tour guides are often asked about matrimonial prospects, and sometimes include apocryphal statistics — 50 percent! Maybe 75! — in their patter. With an insular campus social scene, annual reunions and a network of alumni organizations in most major cities, opportunities to find a special someone wearing orange and black are many.

You don’t have to have taken a methods course to know that this is not good evidence, or even evidence at all. What people say, and even logical reasons that something should happen, are not evidence that it does happen. Carey all but admits that he has no real data on Princeton intermarriage, but that doesn’t stop him from writing about it as though it’s a solid fact.

Is it? Five years ago, a Princeton alumna, president of the class of ’77, published a letter in The Daily Princetonian giving her 21st-century counterparts this bit of advice: “Find a husband on campus before you graduate.”

The reaction was swift and predictable. Some even thought that the Princetonian had run the piece as an April Fool’s joke. Besides, people these days typically do not get married till their late twenties – at least five years after they graduate. A lot can happen to that eating-club romance in those five years.

Let me clear: the negative reaction to the letter and the median marriage age of the US population are not evidence that Princetonians are not marrying one another. But it’s just as good (or bad) as Carey’s evidence that they are.

*Using journalism jargon when I’m writing about journalism is one of my favorite affectations.

Connecting the Dots

March 22, 2018
Posted by Jay Livingston

Brilliance in science is sometimes a matter of simplifying – paring away complicated scientific techniques and seeing what non-scientists would see if they looked in the right place. That’s what Richard Feynman did when he dropped a rubber ring into a glass of ice water – a flash of brilliance that allowed everyone to understand what caused the space shuttle Challenger disaster.

Andrew Gelman isn’t Richard Feynman, but he did something similar in his blog post about an article that’s been getting much buzz, including at Buzzfeed, since it was posted at SSRN two weeks ago. The article is about Naloxone, the drug administered to people who have overdosed on heroin or opoiods. It keeps them from dying.

The authors of the article, Jennifer Doleac and Anita Mukherjee, argue that while the drug may save lives in the immediate situation, it does not reduce overall drug deaths. Worse, the unintended consequences of the drug outweigh its short-run benefits. Those whose lives are saved go back to using drugs, committing crimes, and winding up in emergency rooms. In addition, a drug that will prevent overdoes death “[makes] riskier opioid use more appealing.” 

The title is “The Moral Hazard of Lifesaving Innovations: Naloxone Access, Opioid Abuse, and Crime.” (A moral hazard is something that encourages people to do bad things by protecting them from negative consequences.)

Naloxone didn’t happen all at once. In 2013 fewer than ten states allowed it; the next year the number had doubled. In 2015, only nine states still did not allow its use. Doleac and Mukherjee used these time differences to look at bad outcomes (theft, death, ER admissions) before and after the introduction Naloxone in the different states.  Here are some of their graphs.

(Click on an image for a larger view.)

They conclude that “broadening Naloxone access led to more opioid-related ER visits.” As for deaths, “in some areas, particularly the Midwest, expanding Naloxone access has increased opioid-related mortality.”

There are reasons to be skeptical of the data, but let’s assume that the numbers – the points in the graph – are accurate. Even so, says Andrew Gelman (here), there’s still the question of how to interpret that array of points. Doleac and Mukherjee add lines and what I assume are confidence bands to clarify the trends. But do these added techniques clarify, or do they create a picture that is different from the underlying reality? Here’s Gelman:

The weird curvy lines are clearly the result of overfitting some sort of non-regularized curves. More to the point, if you take away the lines and the gray bands, I don’t see any patterns at all! Figure 4 just looks like a general positive trend, and figure 8 doesn’t look like anything at all. The discontinuity in the midwest is the big thing—this is the 14% increase mentioned in the abstract to the paper—but, just looking at the dots, I don’t see it.

Are these graphs really an optical illusion, with the lines and shadings getting me to see something that isn’t really there? My powers of visualization are not so acute, so to see what Gelman meant about looking only at the dots, I erased the added lines and bands. Here is what the graphs looked like.

Like Gelman, I can’t see any clear patterns showing the effect of Naloxone. And as I read the reactions to the paper, I sense that its results are ambiguous enough to provide rich material for motivated perception. Conservatives and libertarians often start from the assumption that government attempts to help people only make things worse. The unintended-consequences crowd – Megan McCardle, for example (here) – take the paper at face value. Liberals Richard G. Frank, Keith Humphreys, and Harold A. Pollack (here), who have done their own research on Naloxone – are more skeptical about the accuracy of the data.*


* This reminded me of a post I did in the first year of this blog.  It was about an editorial in the WSJ that included an utterly dishonest, ideologically motivated connect-the-dots line imposed on an array of points. The post is here.

Ass-Backwards Through the Gateway

March 11, 2018
Posted by Jay Livingston

Imagine that you’re a US Attorney on the drug beat. Your boss is Jeff Sessions, who has announced that he’s going to vigorously enforce laws against marijuana and use the federal law when state laws are more lax. Maybe you also think that weed is a dangerous drug. You do a little “research” and tweet out your findings.

This brief tweet might serve as an example of how not to do real research. The sample, which excludes people who have not gone to treatment centers, is hardly representative of all users. There’s researcher bias since the guy with the ax to grind is the one asking the questions. The respondents too (the drug counselors) no doubt feel some pressure to give the Sessions-politically-correct answer. They may also be selectively remembering their patients. 

But even without the obvious bias, this tweet makes an error that mars research on less contentious issues. It samples on the dependent variable. The use of heavier drugs (opioids, heroin, meth, etc.) is the dependent variable – the outcome you are trying to predict. Marijuana use is the independent variable – the one you use to make that prediction. Taking your sample from confirmed heroin/opioid addicts gets things backwards. To see if weed makes a difference, you have to compare weed users with those who do not use and then see how many in each group take up more serious drugs.

Here’s an analogy – back pain. Suppose that, thanks to advances in imaging (MRIs and the like) doctors find that many of the people who show up with back pain have spinal abnormalities, especially disk bulges and protrusions. These bugles must be the gateway to back pain. So the doctors start doing more surgeries to correct these bulges. These surgeries often fail to improve things.

The doctors were sampling on the dependent variable (back pain), not on the independent variable (disk bulges). The right way to find out if spinal abnormalities cause back pain is to take MRIs of all people, not just those who show up in the doctor’s office. This is pretty much the way it happene in the real world. Eventually, researchers started doing the research the right way and found that lots of people with spinal abnormalities did not pass through the gateway and on to back pain.

The same problem often plagues explanations that try to reverse-engineer success. Find a bunch of highly effective people, then see what habits they share. Or look at some highly successful people (The Beatles, Bill Gates) and discover that early on in their careers they spent 10,000 hours working on their trade.

US Atty. Stuart’s tweet tells a good story, and it’s persuasive. But like other anecdotal evidence and eyewitness testimony, it is frequently misleading or wrong. The systematic research – many studies over many years – shows little or no gateway effect of marijuana. No wonder US Attorney Stuart chose to ignore that research.*

* As Mark Kleiman has argued, even when a marijuana user does add harder drugs to his repertoire, the causes may have less to do with the drug itself than with the marketplace. The dealer you go to for your weed probably also carries heavier drugs and would be only too happy to sell them to you.  Legalizing weed so that it’s sold openly by specialty shops rather than by criminals may break that link to other drugs.

Algorithms and False Positives

September 13, 2017
Posted by Jay Livingston

Can face-recognition software tell if you’re gay?

Here’s the headline from The Guardian a week ago.

Yilun Wang and Michal Kosinski at Stanford’s School of Business have written an article showing that artificial intelligence – machines that can learn from their experiences – can develop algorithms to distinguish the gay from the straight. Kosinski goes farther. According to Business Insider,
He predicts that self-learning algorithms with human characteristics will also be able to identify:
  • a person’s political beliefs
  • whether they have high IQs
  • whether they are predisposed to criminal behaviour
When I read that last line, something clicked. I remembered that a while ago I had blogged about an Israeli company, Faception, that claimed its face recognition software could pick out the faces of terrorists, professional poker players, and other types. It all reminded me of Cesare Lombroso, the Italian criminologist. Nearly 150 years ago, Lombroso claimed that criminals could be distinguished by the shape of their skulls, ears, noses, chins, etc. (That blog post, complete with pictures from Lombroso’s book, is here.) So I was not surprised to learn that Kosinski had worked with Faception.

For a thorough (3000 word) critique of the Wang-Kosinski paper, see Greggor Mattson’s post at Scatterplot. The part I want to emphasize here is the problem of False Positives.

Wang-Kosinski tested their algorithm by showing a series of paired pictures from a dating site. In each pair, one person was gay, the other straight. The task was to guess which was which. The machine’s accuracy was roughly 80% – much better than guessing randomly and better than the guesses made by actual humans, who got about 60% right. (These are the numbers for photos of men only. The machine and humans were not as good at spotting lesbians. In my hypothetical example that follows, assume that all the photos are of men.)

But does that mean that the face-recognition algorithm can spot the gay person? The trouble with Wang-Kosinki’s gaydar test was that it created a world where half the population was gay. For each trial, people or machine saw one gay person and one straight.

Let’s suppose that the machine had an accuracy rate of 90%. Let’s also present the machine with a 50-50 world. Looking at the 50 gays, the machine will guess correctly on 45. These are “True Positives.” It identified them as gay, and they were gay. But it will also classify 5 of the gay people as not-gay. These are the False Negatives.

It will have the same ratio of true and false for the not-gay population. It will correctly identify 45 of the not-gays (True Negatives), but it will guess incorrectly that 5 of these straight people are gay (False Positive).

It looks pretty good. But how well will this work in the real world, where the gay-straight ratio is nowhere near 50-50? Just what that ratio is depends on definitions. But to make the math easier, I’m going to use 5% as my estimate. In a sample of 1000, only 50 will be gay. The other 950 will be straight.

Again, let’s give the machine an accuracy rate of 90%. For the 50 gays, it will again have 45 True Positives and 5 False Negatives. But what about the 950 not-gays. It will be correct 90% of the time and identify 885 of them as not-gay (True Negatives). But it will also guess incorrectly that 10% are gay. That’s 95 False Positives.

The number of False Positives is more than double the number of True Positives. The overall accuracy may be 90%, but when it comes to picking out gays, the machine is wrong far more often than it’s right.

The rarer the thing that you’re trying to predict, the greater the ratio of False Positives to True Positives. And those False Positives can have bad consequences. In medicine, a false positive diagnosis can lead to unnecessary treatment that is physically and psychologically damaging. As for politics and policy, think of the consequences if the government goes full Lomborso and uses algorithms for predicting “predisposition to criminal behavior.”

Somewhat Likely to Mess Up on the Likert Scale

May 27, 2017
Posted by Jay Livingston

Ipsos called last night, and I blew it. The interviewer, a very nice-sounding man in Toronto, didn’t have to tell me what Ipsos was, though he did, sticking with his script. I’d regularly seen their numbers cited (The latest “Reuters/Ipsos” poll shows Trump’s approve/disapprove at 37%/57%.)

The interviewer wanted to speak with someone in the household older than 18. No problem; I’m your man. After all, when I vote, I am a mere one among millions. The Ipsos sample, I figured, was only 1,000.  My voice would be heard.

He said at the start that the survey was about energy. Maybe he even said it was sponsored by some energy group. I wish I could remember.

 After a few questions about whether I intended to vote in local elections and how often I got news from various sources (newspapers, TV, Internet), he asked how well-informed I was about energy issues. Again, I can’t remember the exact phrasing, but my Likert choices ranged from Very Well Informed to Not At All Informed.

I thought about people who are really up on this sort of thing – a guy I know who writes an oil industry newsletter, bloggers who post about fracking and earthquakes or the history of the cost of solar energy.  I feel so ignorant compared with them when I read about these things. So I went for the next-to-least informed choice. I think it was “not so well informed.”

“That concludes the interview. Thank you.”
“Wait a minute,” I said. “I don’t get to say what I think about energy companies? Don’t you want to know what bastards I think they are?”
“I’m sorry, we have to go with the first response.”
“I was being falsely modest.”
He laughed.
“The Koch brothers, Rex Tillerson, climate change, Massey Coal . . .”
He laughed again, but he wouldn’t budge. They run a tight ship at Ipsos.

Next time they ask, whatever the topic, I’m a freakin’ expert.

Imagine There’s a $5 Discount. It’s Easy If You Try. . . .

June 21, 2016
Posted by Jay Livingston

Reading Robert H. Frank’s new book Luck and Success, I came across this allusion to the famous Kahneman and Tversky finding about “framing.”

It is common . . . for someone to be willing to drive across town to save $10 on a $20 clock radio, but unwilling to do so to save $10 on a $1,000 television set.

Is it common? Do we really have data on crosstown driving to save $10? The research that I assume Frank is alluding to is a 1981 study by Daniel Kahneman and Amos Tversky (pdf here). Here are the two scenarios that Kahneman and Tversky presented to their subjects.

A.  Imagine that you are about to purchase a jacket for $125 and a calculator for $15. The calculator salesman informs you that the calculator you wish to buy is on sale for $10 at the other branch of the store, located 20 minutes drive away. Would you make the trip to the other store?

B. Imagine that you are about to purchase a calculator for $125 and a jacket for $15. The calculator salesman informs you that the calculator you wish to buy is on sale for $120 at the other branch of the store, located 20 minutes drive away. Would you make the trip to the other store?

The two are really the same: would you drive 20 minutes to save $5 on a calculator? But when the discount was on a $15 calculator, 68% of the subject said they would make the 20 minute trip. When the $5 savings applied to the $125 calculator, only 29% said they’d make the trip.

The study is famous even outside behavioral economics, and rightly so. It points up one of the many ways that we are not perfectly rational when we think about money. But whenever I read about this result, I wonder: how many of those people actually did drive to the other store? The answer of course is none. There was no actual store, no $125 calculator, no $15 jacket. The subjects were asked to “imagine.” They were thinking about an abstract calculator and an abstract 20-minute drive, not real ones.*

But if they really did want a jacket and a calculator, would 60 of the 90 people really have driven the 20 minutes to save $5 on a $15 calculator? One of the things we have long known in social research is that what people say they would do is not always what they actually will do. And even if these subjects were accurate about what they would do, their thinking might be including real-world factors beyond just the two in the Kahneman-Tversky abstract scenario (20 minutes, $5). Maybe they were thinking that they might be over by that other mall later in the week, or that if they didn’t buy the $15 calculator right now, they could always come back to this same store and get it.

It’s surprising that social scientists who cite this study take the “would do” response at face value, surprising because another well-known topic in behavioral economics is the discrepancy between what people say they will do and what they actually do. People say that they will start exercising regularly, or save more of their income, or start that diet on Monday. Then Monday comes, and everyone else at the table is having dessert, and well, you know how it is.

In the absence of data on behavior, I prefer to think that these results tell us not so much what people will do. They tell us what people think a rational person in that situation would do. What’s interesting then is that their ideas about abstract economic rationality are themselves not so rational.

* I had the same reaction to another Kahneman study, the one involving “Linda,” an imaginary bank teller. (My post about that one, nearly four years ago, is here ). What I said of the Linda problem might also apply to the jacket-and-calculator problem: “It’s like some clever riddle or a joke – something with little relevance outside its own small universe. You’re never going to be having a real drink in a real bar and see, walking in through the door, an Irishman, a rabbi, and a panda.”

The Face That Launched a Thousand False Positives

May 27, 2016
Posted by Jay Livingston

What bothered the woman sitting next to him wasn’t just that the guy was writing in what might have been Arabic (it turned out to be math). But he also looked like a terrorist. (WaPo story here.)

We know what terrorists look like. And now an Israeli company, Faception, has combined big data with facial recognition software to come up with this.

According to their Website:

Faception can analyze faces from video streams, cameras, or . . . databases. We match an individual with various personality traits or types such as an Extrovert, a person with High IQ, Professional Poker Player or a Terrorist.

My first thought was, “Oh my god, Lombroso.”

If you’ve taken Crim 101, you might remember that Lombroso, often called “the father of criminology,” had the idea that criminals were atavisms, throwbacks to earlier stages of human evolution, with different skull shapes and facial features. A careful examination of a person’s head and face could diagnose criminality – even the specific type of lawbreaking the criminal favored. Here is an illustration from an 1876 edition of his book. Can you spot the poisoner, the Neapolitan thief, the Piedmont forger?

(Click on the image for a larger view.)

Criminology textbooks still mention Lombroso, though rarely as a source enlightenment. For example, one book concludes the section on Lombroso, “At this point, you may be asking: If Lombroso, with his ideas about criminal ears and jaws, is the ‘father of criminology,’ what can we expect of subsequent generations of criminologists?”

Apparently there’s just something irresistible in the idea that people’s looks reveal their character. Some people really do look like criminals, and some people look like cops.* Some look like a terrorist or a soccer mom or a priest. That’s why Hollywood still pays casting directors. After all, we know that faces show emotion, and most of us know at a glance whether the person we’re looking at is feeling happy, angry, puzzled, hurt, etc. So it’s only logical that a face will reveal more permanent characteristics. As Faception puts it, “According to social and life science research, our personality is determined by our DNA reflected in our face.” It’s not quite true, but it sounds plausible.

The problem with this technique is not the theory or science behind it, and probably not even its ability to pick out terrorists, brand promoters, bingo players, or any of their other dramatis personae in the Faception cast of characters. The problem is false positives. Even when a test is highly accurate, if the thing it’s testing for is rare, a positive identification is likely to be wrong. Mammograms, for example, have an accuracy rate as high as 90%. Each year, about 37 million women in the US are given mammograms. The number who have breast cancer is about 180,000. The 10% error rate means that of the 37 million women tested, 3.7 million will get results that are false positives. It also means that for the woman who does test positive, the likelihood that the diagnosis is wrong is 95%.**

Think of these screening tests as stereotypes. The problem with stereotypes is not that they are wrong; without some grain of truth, they wouldn’t exist. The problem is that they have many grains of untruth – false positives. We have been taught to be wary of stereotypes not just because they denigrate an entire class of people but because in making decisions about individuals, those stereotypes yield a lot of false positives.  

Faception does provide some data on the accuracy of its screening. But poker champions and terrorists are rarer even than breast cancer. So even if the test can pick out the true terrorist waiting to board the plane, it’s also going to pick out a lot of bearded Italian economists jotting integral signs and Greek letters on their notepads.

(h/t Cathy O’Neil at

* Some people look like cops. My favorite example is the opening of Richard Price’s novel Lush Life – four undercover cops, though the cover they are under is not especially effective.

The Quality of Life Task Force: four sweatshirts in a bogus taxi set up on the corner of Clinton Street alongside the Williamsburg Bridge off-ramp to profile the incoming salmon run; their mantra: Dope, guns, overtime; their motto: Everyone’s got something to lose. 
At the corner of Houston and Chrystie, a cherry-red Denali pulls up alongside them, three overdressed women in the backseat, the driver alone up front and wearing sunglasses.
The passenger-side window glides down . “Officers, where the Howard Johnson hotel at around here ...”
“Straight ahead three blocks on the far corner,” Lugo offers.
“Thank you.” [. . .]
The window glides back up and he shoots east on Houston.
“Did he call us officers?”
“It’s that stupid flattop of yours.”
“It’s that fuckin’ tractor hat of yours.”

It wasn’t the haircut or the hat. They just looked like cops.

** The probability that the diagnosis is correct is 5% – the 180,000 true positives divided by the 3.7 million false positives plus the 180,000 true positives – roughly 180,000 / 3,900,000. (I took this example from Howard Wainer’s recent book, Truth and Truthiness.)

Show, Don’t Tell

March 23, 2016
Posted by Jay Livingston

Can the mood of a piece of writing be graphed?

For his final project in Andrew Gelman’s course on statistical communication and graphics, Lucas Estevem created a “Text Sentiment Visiualizer.” Gelman discusses it on his blog, putting the Visualizer through its paces with the opening of Moby Dick.

(Click on an image for a slightly larger view.)

Without reading too carefully, I thought that the picture – about equally positive and negative – seemed about right. Sure things ended badly, but Ishmael himself seemed like a fairly positive fellow. So I went to the Visualizer (here)  and pasted in the text of one of my blogposts. It came out mostly negative. I tried another. Ditto. And another. The results were not surprising when I thought about what I write here, but they were sobering nevertheless. Gotta be more positive.

But how did the Visualizer know? What was its formula for sussing out the sentiment in a sentence? Could the Visualizer itself be a glum creature, tilted towards the dark side, seeing negativity where others might see neutrality? I tried other novel openings. Kafka’s Metamorphosis was entirely in the red, and Holden Caulfield looked to be at about 90%. But Augie March, not exactly a brooding or nasty type, scored about 75% negative. Joyce’s Ulysses came in at about 50-50.

To get a somewhat better idea of the scoring, I looked more closely at page one of The Great Gatsby. The Visualizer scored the third paragraph heavily negative – 17 out of 21 lines. But many of those lines had words that I thought would be scored as positive.

Did the Visualizer think that extraordinary gift, gorgeous, and successful were not such a good thing?

Feeling slightly more positive about my own negative scores, I tried Dr. Seuss. He too skewed negative.

What about A Tale of Two Cities? Surely the best of times would balance out the worst of times, and that famous opening paragraph would finish in a draw. But a line-by-line analysis came out almost all negative.

Only best, hope, and Heaven made it to the blue side.

I’m not sure what the moral of the story is except that, as I said in a recent post, content analysis is a bitch.

Gelman is more on the positive side about the Visualizer. It’s “far from perfect,” but it’s a step in the right direction – i.e., towards visual presentation – and we can play around with it, as I’ve done here, to see how it works and how it might be improved. Or as Gelman concludes, “Visualization. It’s not just about showing off. It’s a tool for discovering and learning about anomalies.”

Race and Tweets

March 20, 2016
Posted by Jay Livingston

Nigger* is a racially charged word. And if you sort cities or states according to how frequently words like nigger turn up from them on Twitter, you’ll find large differences. In some states these words appear forty times more often than in others. But do those frequencies tell us about the local climate of race relations? The answer seems to be: it depends on who is tweeting.

In the previous post, I wondered whether the frequency of tweets with words like bitch, cunt, etc. tell us about general levels of misogyny in a state or city., the Website that mapped the geography of sexist tweets, also had charts and maps showing both racially charged tweets (with words like “nigger”) and more neutral, politically correct, tweets (“African Americans” or “Black people”). Here are the maps of the two different linguistic choices.

(Click on the image for a larger view.)

West Virginia certainly looks like the poster state for racism – highest in “anti-Black” tweets, and among the lowest in “neutral or tolerant” tweets. West Virginia is 95% White, so it’s clear that we’re looking at how White people there talk about Blacks. That guy who sang about the Mountaineer State being “almost heaven” – I’m pretty sure he wasn’t a Black dude. Nevada too is heavily White (75% , Black 9%), but there, tweets with polite terms well outnumber those with slurs. Probably, Nevada is a less racist place than West Virginia.

But what about states with more Blacks? Maryland, about 30% Black, is in the upper range for neutral race-tweets, but it’s far from the bottom on “anti-Black” tweets. The same is true for Georgia and Louisiana, both about 30% Black. These states score high on both kinds of tweet – what we might call, with a hat-tip to Chris Rock, “nigger tweets” and “Black people tweets.” (If you are not familiar with Rock’s “Niggers and Black People,” watch it here.) If he had released this 8-minute stand-up routine as a series of tweets, and if Chris Rock were a state instead of a person, that state would be at the top in both categories – “anti-Black” and “neutral and tolerant.” How can a state or city be both?

The answer of course is that the meaning of nigger depends on who is using it.  When White people are tweeting about Blacks, then the choice of words probably tells us about racism. But when most of the people tweeting are Black, it’s harder to know. Here, for example, are Abodo’s top ten cities for “anti-Black tweets.”

Blacks make up a large percent of the population in most of these cities.  The top four – Baltimore, Atlanta, and New Orleans – are over 50% Black. It’s highly unlikely that it’s the Whites there who are flooding Twitter with tweets teeming with “nigger, coon, dindu, jungle bunny, monkey, or spear chucker” – the words included in Abodo’s anti-Black tag.** If the tag had included niggas, the “anti-Black” count in these cities would have been even higher.

All this tells us is that Black people tweet about things concerning Black people. And since hip-hop has been around for more than thirty years, it shouldn’t surprise anyone that Blacks use these words with no slur intended. When I searched Twitter yesterday for nigger, the tweets I saw on the first page were all from Black people, and some of those tweets, rather than using the word nigger were talking about the use of it.  (Needless to say, if you search for niggas, you can scroll through many, many screens trying to find a tweet with a White profile picture.)

For some reason, Abodo refused to draw this obvious conclusion. They do say in another section of the article that  “anti-Hispanic slurs have largely not been reclaimed by Hispanic and Latino people in the way that the N-word is commonly used in black communities.” So they know what’s going on. Nevertheless, in the section on Blacks, they say nothing, tacitly implying that these “anti-Black” tweets announce an anti-Black atmosphere. But that’s true only if the area is mostly White. When those tweets are coming from Blacks, it’s much more complicated.


*Abodo backs away from using the actual word. They substitute the usual euphemism – “the N-word.” As I have said elsewhere in this blog, if you can’t say the word you’re talking about when you’re talking about it as a word, then the terrorists have won. In this view, I differ from another Jay (Smooth) whose views I respect. A third Jay (Z) has no problems with using the word. A lot.

** I confess, porch monkey and dindu were new to me, but then, I don’t get out much, at least not in the right circles. Abodo ignored most of the terms in the old SNL sketch with Richard Pryor and Chevy Chase.  (The available videos, last time I checked, are of low quality (this one, for instance), but like Chris Rock’s routine, it is an important document that everyone interested in race and media should be familiar with. A partial transcript is in this earlier post.)

Content Analysis Is a Bitch

March 18, 2016
Posted by Jay Livingston

Can Twitter tell us about the climate of intolerance? Do the words in all those tweets reveal something about levels of racism and sexism? Maybe. But the language of intolerance – “hate speech” – can be tricky to read.

Adobo is website for people seeking apartments – Zillow for renters – and it recently posted an article, “America’s Most P.C. and Prejudiced Places” (here), with maps and graphs of data from Twitter. Here, for example, are the cities with the highest rates of misogynistic tweets. 

Unfortunately, Abodo does not say which words are in its formula for “deragotory language against women.” But Abodo does recognize that bitch might be a problem because “it is commonly used as profanity but not always with sexist intent.”  Just to see what those uses might be, I searched for “bitch” on Twitter, but the results, if not overtly sexist, all referred to a female as a bitch.

Maybe it was New Orleans. I tried again adding “NOLA” as a search term and found one non-sexist bitch.

When Abodo ran their much larger database of tweets but excluded the word bitch from its misogyny algorithm, New Orleans dropped from first place to fourth, and Baton Rouge disappeared from the top ten. Several Northeast and Western cities now made the cut.

This tells us what we might have known if we’d been following Jack Grieve’s Twitter research (here) – that bitch is especially popular in the South.

The Twitter map of cunt is just the opposite. It appears far more frequently in tweets from the Northeast than from the South.

The bitch factor changes the estimated sexism of states as well as cities. Here are two maps, one with and one without bitch in its sexism screen.

(Click on the image for a larger view.)

With bitch out of the equation, Louisiana looks much less nasty, and the other Southeast states also shade more towards the less sexist green. The Northeast and West, especially Nevada, now look more misogynistic. A few states remain nice no matter how you score the tweets – Montana, Wyoming, Vermont – but they are among the least populous states so even with Twitter data, sample size might be a problem. Also note that bitch accounts for most of what Abodo calls sexist language. Without bitch, the rates range from 26 to 133 per 100,000 tweets. Add bitch to the formula and the range moves to 74 to 894 per 100,000.  That means that at least two-thirds of all the “derogatory language against women” on Twitter is the word bitch.

There’s a further problem in using these tweets as an index of sexism. Apparently a lot of these bitch tweets are coming from women (if my small sample of tweets is at all representative). Does that mean that the word has lost some of its misogyny? Or, as I’m sure some will argue, do these tweets mean that women have become “self-hating”? This same question is raised, in spades, by the use of nigger. Abodo has data on that too, but I will leave it for another post.