Based – Off and On

March 26, 2016
Posted by Jay Livingston


“This is based off of self-interest . . . .” wrote one student. Another wrote, “It’s an idea based off others from past years.” 

This construction sounded wrong to my ears. What happened to “based on”? Was this some local North New Jersey variant, like the New Yorkers’ waiting on line when everyone else in the US waits in line? But then I saw it in The Guardian last week:
Kang and her colleagues sent out 1,600 fabricated resumes, based off of real candidates, to employers in 16 different metropolitan areas in the US.
Lexis-Nexis turned up a few others just since the start of the year, and it wasn’t just New Jersey, or even the US.
  • “Me and Earl and the Dying Girl “ is based off of the book by Jesse Andrews, (Berkshire Eagle) “We should set a baseline, and that's what the salaries should be based off.” (Chicago Daily Herald)
  • . . .little should be read into the upcoming Capital One Cup game based off this result. (Manchester Evening News, UK)
  • . . . schools estimated the number of children in their zone based off a ballot sent out in September (Manawatu Standard, New Zealand)
“Busy prepositions, always on the go,” said “Schoolhouse Rock.”* But it seems to me that prepositions are remarkably stable – those New Yorkers are still waiting on line, even though “on line” has added a much different and widely used meaning.



How did we get “based off” and “based off of”? How did this diffusion happen? It’s not like some fashion in clothing. It’s not created in Language Central and sent out amid a big publicity campaign. Nor did any celebrities start using it. Nor is it like the words that people are fully aware of and consciously choose, the phrases that are groovy for a minute or two and then become old hat, or those that are totally awesome and become part of the language and that nobody has an issue with.

My Lexis-Nexis search for “based off” turned up about 300 hits for 2016. (Lexis-Nexis does not consider “of” to be worthy of counting, so adding it to a word or phrase – “based off of” – is useless.) In the same period for 2010, the count was 100. In 2000, zero.

The Google nGrams database of books tells a similar story of the rapid rise of “based off of.” Of course, it is, by several orders of magnitude, still dwarfed by “based on.” But this graph, with “based off of multiplied by 100,000, shows its recent and rapid rise.



The change is probably generational. Older speakers like me will cling to “based on”; but “based off” or “based off of” will be the choice of an increasing number of younger people. It won’t catch up to “based on” immediately. It’s not the faddish kind of change that will happen in a couple of days. Or do I mean “in a couple days”?


------------------------
* The song is here. It was written by jazz pianist/composer Bob Dorough, and he performs it with trumpeter Jack Sheldon. Other jazzers, notably Dave Frishberg and Blossom Dearie, contributed to “Schoolhouse Rock” as writers and performers. Busy jazz musicians.


Show, Don’t Tell

March 23, 2016
Posted by Jay Livingston

Can the mood of a piece of writing be graphed?

For his final project in Andrew Gelman’s course on statistical communication and graphics, Lucas Estevem created a “Text Sentiment Visiualizer.” Gelman discusses it on his blog, putting the Visualizer through its paces with the opening of Moby Dick.

(Click on an image for a slightly larger view.)

Without reading too carefully, I thought that the picture – about equally positive and negative – seemed about right. Sure things ended badly, but Ishmael himself seemed like a fairly positive fellow. So I went to the Visualizer (here)  and pasted in the text of one of my blogposts. It came out mostly negative. I tried another. Ditto. And another. The results were not surprising when I thought about what I write here, but they were sobering nevertheless. Gotta be more positive.

But how did the Visualizer know? What was its formula for sussing out the sentiment in a sentence? Could the Visualizer itself be a glum creature, tilted towards the dark side, seeing negativity where others might see neutrality? I tried other novel openings. Kafka’s Metamorphosis was entirely in the red, and Holden Caulfield looked to be at about 90%. But Augie March, not exactly a brooding or nasty type, scored about 75% negative. Joyce’s Ulysses came in at about 50-50.

To get a somewhat better idea of the scoring, I looked more closely at page one of The Great Gatsby. The Visualizer scored the third paragraph heavily negative – 17 out of 21 lines. But many of those lines had words that I thought would be scored as positive.

Did the Visualizer think that extraordinary gift, gorgeous, and successful were not such a good thing?

Feeling slightly more positive about my own negative scores, I tried Dr. Seuss. He too skewed negative.


What about A Tale of Two Cities? Surely the best of times would balance out the worst of times, and that famous opening paragraph would finish in a draw. But a line-by-line analysis came out almost all negative.


Only best, hope, and Heaven made it to the blue side.

I’m not sure what the moral of the story is except that, as I said in a recent post, content analysis is a bitch.

Gelman is more on the positive side about the Visualizer. It’s “far from perfect,” but it’s a step in the right direction – i.e., towards visual presentation – and we can play around with it, as I’ve done here, to see how it works and how it might be improved. Or as Gelman concludes, “Visualization. It’s not just about showing off. It’s a tool for discovering and learning about anomalies.”

Race and Tweets

March 20, 2016
Posted by Jay Livingston


Nigger* is a racially charged word. And if you sort cities or states according to how frequently words like nigger turn up from them on Twitter, you’ll find large differences. In some states these words appear forty times more often than in others. But do those frequencies tell us about the local climate of race relations? The answer seems to be: it depends on who is tweeting.

In the previous post, I wondered whether the frequency of tweets with words like bitch, cunt, etc. tell us about general levels of misogyny in a state or city. Abodo.com, the Website that mapped the geography of sexist tweets, also had charts and maps showing both racially charged tweets (with words like “nigger”) and more neutral, politically correct, tweets (“African Americans” or “Black people”). Here are the maps of the two different linguistic choices.

(Click on the image for a larger view.)

West Virginia certainly looks like the poster state for racism – highest in “anti-Black” tweets, and among the lowest in “neutral or tolerant” tweets. West Virginia is 95% White, so it’s clear that we’re looking at how White people there talk about Blacks. That guy who sang about the Mountaineer State being “almost heaven” – I’m pretty sure he wasn’t a Black dude. Nevada too is heavily White (75% , Black 9%), but there, tweets with polite terms well outnumber those with slurs. Probably, Nevada is a less racist place than West Virginia.

But what about states with more Blacks? Maryland, about 30% Black, is in the upper range for neutral race-tweets, but it’s far from the bottom on “anti-Black” tweets. The same is true for Georgia and Louisiana, both about 30% Black. These states score high on both kinds of tweet – what we might call, with a hat-tip to Chris Rock, “nigger tweets” and “Black people tweets.” (If you are not familiar with Rock’s “Niggers and Black People,” watch it here.) If he had released this 8-minute stand-up routine as a series of tweets, and if Chris Rock were a state instead of a person, that state would be at the top in both categories – “anti-Black” and “neutral and tolerant.” How can a state or city be both?

The answer of course is that the meaning of nigger depends on who is using it.  When White people are tweeting about Blacks, then the choice of words probably tells us about racism. But when most of the people tweeting are Black, it’s harder to know. Here, for example, are Abodo’s top ten cities for “anti-Black tweets.”


Blacks make up a large percent of the population in most of these cities.  The top four – Baltimore, Atlanta, and New Orleans – are over 50% Black. It’s highly unlikely that it’s the Whites there who are flooding Twitter with tweets teeming with “nigger, coon, dindu, jungle bunny, monkey, or spear chucker” – the words included in Abodo’s anti-Black tag.** If the tag had included niggas, the “anti-Black” count in these cities would have been even higher.

All this tells us is that Black people tweet about things concerning Black people. And since hip-hop has been around for more than thirty years, it shouldn’t surprise anyone that Blacks use these words with no slur intended. When I searched Twitter yesterday for nigger, the tweets I saw on the first page were all from Black people, and some of those tweets, rather than using the word nigger were talking about the use of it.  (Needless to say, if you search for niggas, you can scroll through many, many screens trying to find a tweet with a White profile picture.)



For some reason, Abodo refused to draw this obvious conclusion. They do say in another section of the article that  “anti-Hispanic slurs have largely not been reclaimed by Hispanic and Latino people in the way that the N-word is commonly used in black communities.” So they know what’s going on. Nevertheless, in the section on Blacks, they say nothing, tacitly implying that these “anti-Black” tweets announce an anti-Black atmosphere. But that’s true only if the area is mostly White. When those tweets are coming from Blacks, it’s much more complicated.

----------------------------

*Abodo backs away from using the actual word. They substitute the usual euphemism – “the N-word.” As I have said elsewhere in this blog, if you can’t say the word you’re talking about when you’re talking about it as a word, then the terrorists have won. In this view, I differ from another Jay (Smooth) whose views I respect. A third Jay (Z) has no problems with using the word. A lot.

** I confess, porch monkey and dindu were new to me, but then, I don’t get out much, at least not in the right circles. Abodo ignored most of the terms in the old SNL sketch with Richard Pryor and Chevy Chase.  (The available videos, last time I checked, are of low quality (this one, for instance), but like Chris Rock’s routine, it is an important document that everyone interested in race and media should be familiar with. A partial transcript is in this earlier post.)

Content Analysis Is a Bitch

March 18, 2016
Posted by Jay Livingston

Can Twitter tell us about the climate of intolerance? Do the words in all those tweets reveal something about levels of racism and sexism? Maybe. But the language of intolerance – “hate speech” – can be tricky to read.

Adobo is website for people seeking apartments – Zillow for renters – and it recently posted an article, “America’s Most P.C. and Prejudiced Places” (here), with maps and graphs of data from Twitter. Here, for example, are the cities with the highest rates of misogynistic tweets. 


Unfortunately, Abodo does not say which words are in its formula for “deragotory language against women.” But Abodo does recognize that bitch might be a problem because “it is commonly used as profanity but not always with sexist intent.”  Just to see what those uses might be, I searched for “bitch” on Twitter, but the results, if not overtly sexist, all referred to a female as a bitch.


Maybe it was New Orleans. I tried again adding “NOLA” as a search term and found one non-sexist bitch.


When Abodo ran their much larger database of tweets but excluded the word bitch from its misogyny algorithm, New Orleans dropped from first place to fourth, and Baton Rouge disappeared from the top ten. Several Northeast and Western cities now made the cut.


This tells us what we might have known if we’d been following Jack Grieve’s Twitter research (here) – that bitch is especially popular in the South.


The Twitter map of cunt is just the opposite. It appears far more frequently in tweets from the Northeast than from the South.


The bitch factor changes the estimated sexism of states as well as cities. Here are two maps, one with and one without bitch in its sexism screen.

(Click on the image for a larger view.)

With bitch out of the equation, Louisiana looks much less nasty, and the other Southeast states also shade more towards the less sexist green. The Northeast and West, especially Nevada, now look more misogynistic. A few states remain nice no matter how you score the tweets – Montana, Wyoming, Vermont – but they are among the least populous states so even with Twitter data, sample size might be a problem. Also note that bitch accounts for most of what Abodo calls sexist language. Without bitch, the rates range from 26 to 133 per 100,000 tweets. Add bitch to the formula and the range moves to 74 to 894 per 100,000.  That means that at least two-thirds of all the “derogatory language against women” on Twitter is the word bitch.

There’s a further problem in using these tweets as an index of sexism. Apparently a lot of these bitch tweets are coming from women (if my small sample of tweets is at all representative). Does that mean that the word has lost some of its misogyny? Or, as I’m sure some will argue, do these tweets mean that women have become “self-hating”? This same question is raised, in spades, by the use of nigger. Abodo has data on that too, but I will leave it for another post.