Data Is Like Spaghetti

June 1, 2015
Posted by Jay Livingston

I used to say, “The data are.” Pretentious I know. But no more.  Now I’m a “the data is” kind of guy.

I’m not alone. Here’s the chart from Google n-grams, which also shows that we’ve become steadily more data-conscious.

For much of the twentieth century, most people who wrote about data preferred the word as a plural. Even as the references to data increased, the pluralists maintained their lead. Then in about 1985, the tide turned.

When we talk about “the data,” we are referring to a whole -- a large thing made up of lots of smaller similar things. The word data is plural only in the most technical sense – it’s plural in a foreign language. The trouble is not that the language is foreign or that nobody speaks it. The problem is that data is a plural of a word that in English has no real singular. Nobody talks about a datum. When we select a particular instance in our data, we call it a “data point.”

It’s like spaghetti, another plural word in a foreign language. Spaghetti refers to a lot of similar things all combined to create a whole thing, a dish. We speak of that ensemble as a singular thing. We don’t say, “The spaghetti are delicious.” If we were speaking Italian, then yes, we would follow Italian grammar and use the plural “Gli spaghetti sono deliziosi.” And in Latin we would use the plural conjugation for data. But we’re speaking English. 

With spaghetti, for a single instance analogous to a data point, we refer to “a strand of spaghetti.” I would bet that even in Italian cookbooks authors do not use the singular. They do not say, “to check for al dente, bite into uno spaghetto.”*

I have two Italian cookbooks on my shelf – gifts from people who thought my Italian is much better than it actually is – but I’m not going to try searching for something that probably is not there.

No comments: