On Wednesday the Global Language Monitor “announced” that English got its one-millionth word at precisely 10:22 am GMT that day. And the word was Web 2.0, so naturally, blogs such as  Mashable, John Battelle’s Searchblog,  and TechCrunch took notice.

Now, The Name Inspector realizes that the “millionth word” story is a ridiculous play for attention that’s not to be taken seriously, and that the folks at the Global Language Monitor know it. But the story has gotten people talking about what a word is, and that’s a topic that The Name Inspector can warm to.

The easiest criticism of the millionth-word story is that Web 2.0 isn’t a word, but a phrase. That’s the main thing that linguist Geoffrey Pullum had to say about the matter on Language Log. And that’s pretty disappointing, actually, because it ignores the fact that the whole enterprise of counting words that precisely is linguistically suspect.

Why would The Name Inspector object to counting words? Believe it or not, it’s not due to a perverse academic refusal to give simple answers to simple questions. The innocent word, which seems to be the very simplest little bit of language to understand, is remarkably hard to pin down. There are very clear examples of words, like dog, but around the edges the word category is fuzzy. That makes it hard to count words with any precision, let alone announce the exact time of day when a word enters the language.

Let’s start with the very dumbest definition of word, the one used by the “word count” function on your word processor: A word is a string of characters (lets say letters) with no spaces. Well, that would mean the following string consists of five words: jjj akjsdhfjkh auygfh tg drqwds.

We can do better than that: A word is a string of letters with no spaces that has a meaning and can be used in a sentence. By this definition, Web 2.0 doesn’t cut it. And many people who’ve weighed in on the issue in blog comments have raised just that objection. Some object to the space, some to the digits, some to the punctuation. Sorry, sorry, and sorry. If inclusion in a dictionary is the ultimate proof of wordhood, then consider this: Even the abridged online version of the Merriam-Webster dictionary includes entries for deep six, 12-step, 20/20, 24-7, 3 D, and even 86ing (a slang term for refusing to serve a customer). All these include numbers, all but one include digits, some have punctuation, and one has a space.

Now about spaces. It’s commonly accepted that English has complex prepositions that consist of parts. In some cases the parts are separated by spaces, and in others they’re not. We write in lieu of as three chunks, and instead of as two, even though their structures are parallel, etymologically speaking. Then there’s notwithstanding.

There are many compound words in the Merriam-Webster dictionary (and others) that are written with spaces. A space is a purely orthographic entity, and it’s silly to define a linguistic unit based on orthography alone. Spoken language is primary. Written language is, ultimately, a representation of spoken language. There are compounds that some people write as “one word” and some people write as “two words”, though the pronunciation remains constant. Website/web site is one example. If you use the no-space criterion, you end up saying that such expressions are sometimes words and sometimes not words, based on orthographic variation. And that just doesn’t make good sense.

Of course, you might appeal to The Language Boss to tell you which version is “correct”. But people, it’s time to wake up and realize that The Language Boss is a fiction, like the Wizard of Oz. There are just different people, sometimes with different opinions, bumbling around behind their curtains. Pay no attention to that language maven behind the curtain!

Lurking behind the orthographic issue, of course, is a deeper linguistic one: If some words are made of pieces that are themselves words, how do we know when a group of words adds up to a complex word as opposed to a phrase or a random stretch of language? Here linguists begin to rely on criteria that distance the definition of word from the pragmatic, what-you-list-in-the-dictionary understanding of what a word is. The linguists might, for example, think about how an expression interacts with the rules of English stress assignment, or about it’s syntactic behavior. In any event, for a group of words to add up to a complex word, it has to be a conventional, cohesive unit.

And here there are no hard and fast rules. Idioms make things especially complicated. Merriam-Webster lists kick ass and kick the bucket under its entry for kick. So these idioms get a sort of honorary word treatment. But notice that idioms don’t always occur in exactly the same form: we can kick a little ass or kick some ass or even kick some Raider ass. In idioms, words begin to blend into grammar, and that’s where things get really tricky.

Some idioms, like kick the bucket and kick ass, are identified mostly by the presence of certain component  words. Others, however, are more like grammatical templates. Consider sentences like There’s only so far a car can go with a flat tire, There’s only so long you can sit before you have to get up and walk around, and There’s only so often you can talk or sneak your way out of a fight. There’s a pattern here that’s something like There’s only so X Y can Z, where X is a scalar measure or property of some kind,  Y is a noun phrase, and Z is a verb phrase. Most people wouldn’t call this pattern a word, but it’s hard to find the exact barrier between this pattern and something like kick ass. (To see lots of patterns like this, you might take a look at the Snowclone Database).

Even when you’re talking about words with simple forms, it can be hard to decide how to count them. That’s because words aren’t just forms–they also have meanings, and it’s often the case that the same form has more than one meaning. If the meanings are very different, we usually think of there being more than one word. For example, bank used in connection with a river is one word, and bank used in reference to a financial institution is another.

But what if the meanings are only a little different? How many “words” are represented by these different uses of the verb see?

Can you see the car?
I see that it’s raining.
I don’t see why you’re so angry.
Let’s go see grandma.
Are you seeing anyone?
I’ll see your twenty and raise you ten.
Let me see you to your door.
See to it that this doesn’t get out.

All these complexities don’t mean it’s impossible in principle to count the number of words in the English language. They do, however, mean that it’s very, very hard, and that you have to know what you mean by word before you start.

One Response to “Why the “millionth word” story is silly”

  1. on 12 Jun 2009 at 4:57 pm Benjamin Lukoff

    Excellent piece, but I do have to take issue with this: “The easiest criticism of the millionth-word story is that Web 2.0 isn’t a word, but a phrase. That’s the main thing that linguist Geoffrey Pullum had to say about the matter on Language Log. And that’s pretty disappointing, actually, because it ignores the fact that the whole enterprise of counting words that precisely is linguistically suspect.”

    Yes, that’s what Pullum had to say this time. But they’ve posted on Language Log many times before this about Payack and his Global Language Monitor shenanigans. Just see http://languagelog.ldc.upenn.edu/nll/index.php?s=payack.

    Yes, fundamentally this is about (heh) definitions. What do you mean by a word? What’s your methodology for counting them? Etc. For me, one of the sillier ideas here is that you can figure out precisely when a word — whatever that is — becomes part of the English language — whatever that is. Another silly — or, rather, sad — idea here is that professional journalists are largely taken in by this. I am glad to see you contributing to the discussion in a meaningful way — unlike Mashable, John Battelle’s Searchblog, TechCrunch, and CNN (well, at least the latter gave some good space to linguistics, although Payack got more). http://www.cnn.com/2009/TECH/06/10/million.words/index.html

Comments RSS

Leave a Reply