03 November 2008

Stylometry for Fun and Profit

At the recent Oz Club convention in Fayetteville, New York, there was a discussion of whether stylometry (I remembered the word with the help of my PDA) could identify certain anonymous newspaper articles as the writing of L. Frank Baum.

Stylometry is the systematic analysis of writing style, based on the notion that we all have unique and unconscious quirks, preferences, and patterns in our prose. Though people have been using various stylometric techniques for centuries, the field has really taken off with computers.

Don Foster is probably the best-known stylometrist now, as he was glad to explain in his book Author Unknown (2000). But other practitioners have their own methods, based on other stylistic details and differences.

Stylometry has even been applied to Baum's writing already. In 2003, José Binongo used software to analyze The Royal Book of Oz, which was published under Baum's name but for the last fifty years been identified as Ruth Plumly Thompson's work. To no one's surprise, the book's prose turned out to have much more in common with Thompson's other novels than with Baum's. (I suspect this paper was really meant to validate that stylometric method rather than to solve the "mystery.")

On Sunday the Times of London reported on a Republican attempt to use stylometry to affect tomorrow's US election:

Dr Peter Millican, a philosophy don at Hertford College, Oxford, has devised a computer software program that can detect when works are by the same author by comparing favourite words and phrases.

He was contacted last weekend and offered $10,000 (£6,200) to assess alleged similarities between [Barack] Obama’s bestseller, Dreams from My Father, and Fugitive Days, a memoir by William Ayers. . . .

The offer to Millican to prove that Ayers wrote Obama’s book was made by Robert Fox, a California businessman and brother-in-law of Chris Cannon, a Republican congressman from Utah. He hoped to corroborate a theory advanced by Jack Cashill, an American writer.

Fox and Cannon each suggested to The Sunday Times that the other had taken the initiative.

Cannon said that he merely recommended computer testing of the books. He doubted whether Obama wrote his autobiography, adding: “If Ayers was the author, that would be interesting.”

Fox said he had hoped that Cannon would raise the $10,000 to run a computer test. “It was Congressman Cannon who initially pointed me in that direction and, from our conversation, I thought he might be able to find someone [to raise the $10,000].”

He believed that if “proof” of Ayers’s involvement was provided by an Oxford academic it would be political dynamite.

Fox contacted Millican, who said: “He was entirely upfront about this. He offered me $10,000 and sent me electronic versions of the text from both books.”

Millican took a preliminary look and found the charges “very implausible”. A deal was agreed for more detailed research but when Millican said the results had to be made public, even if no link to Ayers was proved, interest waned.

Millican said: “I thought it was extremely unlikely that we would get a positive result. It is the sort of thing where people make claims after seeing a few crude similarities and go overboard on them.” He said Fox gave him the impression that Cannon had got “cold feet about it being seen to be funded by the Republicans”.
Which, of course, it would have been.

Millican also described the experience on his website, and explained how Cashill used his Signature software crudely to produce a false positive result. Anyone can download the program for educational purposes, and no doubt use it for better purposes.

Cashill is a right-wing writer known until last year for his TWA 800 and Clinton conspiracy theories. As for Rep. Cannon, he lost his Republican primary this summer and will leave Congress shortly.

Why would those two men and Fox believe that Sen. Obama couldn't have written his own autobiography? He has been, after all, a law review editor, a law school professor, and a politician known for his speeches even before he could afford speechwriters. What about Obama could make those men doubt that he could write? I wonder.


Blair Frodelius said...

After hearing Evan Schwartz's talk at the National IWOC convention, I was wondering if this kind of analysis would point towards Baum's writings in the Chicago dailies of the early 1890's. I would think using his Saturday Evening Pioneer columns, that similarities could be found, if they exist. But, $10,000 is a lot of money.

J. L. Bell said...

I think stylometry software is becoming cheap enough for ordinary people to use, but we have to know how.

There were obvious problems in the right-wingers' analysis of the Obama and Ayers memoirs: no control to compare against, no appropriate control (comparing an old novel to two recent memoirs—that's obvious cheating), use of only one metric instead of all available. Not to mention their obvious bias, wanting to publish results only if their hypothesis was confirmed.

Stylometry, like other scientific methods, also works better the more data one has to crunch. So someone would have to transcribe lots of Chicago newspaper articles (including those possibly by Baum and some definitely not by him), plus comparable writing known to be by Baum. All that transcription time might be where the $10,000 would come in.

Wosniak said...

"What about Obama could make those men doubt that he could write?"

Since you asked...

Well, here are four examples of Obama's misuse of grammar that would make any English teacher cringe:

"The very real advantages of concentrating on a single issue is leading the National Freeze movement to challenge individual missile systems, while continuing the broader campaign.

The belief that moribund institutions, rather than individuals are at the root of the problem,
keep SAM's energies alive.

Facing these realities, at least three major strands of earlier movements are apparent.

Since the merits of the Law Review's selection policy has been the subject of commentary for the last three issues, I'd like to take the time to clarify exactly how our selection process works."

J. L. Bell said...

Oh, my goodness gracious—a law student on deadline writes like a law student on deadline!

Thanks for sharing the hate, “Wosniak.” Since you’re concerned about proper writing, you’ll be delighted to know that you have a lot to learn about the correct punctuation of quotations.

Wozniak said...

You jumped to conclusions, Mr. Bell.
You asked a question, and I answered it. There's no hatred at all. If you can find a very well-written essay or article by Obama before Dreams came out, please share it with us. If you can't, then please don't just blame tight deadlines.

I could add another answer, however. Here is Obama's not-so-stellar poem from 1981:


Under water grottos, caverns
Filled with apes
That eat figs.
Stepping on the figs
That the apes
Eat, they crunch.
The apes howl, bare
Their fangs, dance,
Tumble in the
Rushing water,
Musty, wet pelts
Glistening in the blue.

J. L. Bell said...

"Wozniak" insists he's not motivated by hate for the President, but he feels strongly enough to leave comments on a three-year-old blog entry to share thirty-year-old student poetry. Clearly some powerful emotion is motivating him to endorse ludicrous conspiracy theories about the President.

"Wozniak" can't even keep track of the spelling of his own protective pseudonym (it was "Wosniak" before). Yet he feels that if a law-review editor makes a common error of agreement twice, that shows that he couldn't have written his own memoir.

Don't conspiracy theorists realize that antics like this just make them look more laughable to the world?

Woszniak said...

You failed to provide any good writing of president Obama before Dreams came out. Your silence, and the lack of such writing, is the perfect answer to your own question: "What about Obama could make those men doubt that he could write?"

J. L. Bell said...

Most rational people have little interest in digging up writing from President Obama’s university days and searching for common grammatical slips. We've seen the President speak extemporaneously on many topics and in many arenas. We have the evidence we need to judge his ability to express himself intelligently, whether or not we agree with him.

The slippery-named pseudonym is actually trying to convince us that the Republican conspiracists mentioned in this posting first stumbled across Obama's student writings and then decided he couldn't have written his own memoir. How dumb does he think people are? It's obvious that those men started with the goal of tearing down Obama.

Why were those men so frightened by the sight of Barack Obama in a position of power? For the same reason that this commenter (who could, of course, be one of those men) is seeking out three-year-old blog posts on the topic. They can't acknowledge the real root of their insecurities on seeing the President, so they flail about to convince themselves that his memoir was written by a white man.

That's a pathetic attempt to deceive, but the only people they're fooling are themselves. The rest of us understand the sort of irrational hatred that drives this silly endeavor.

Wozniack said...

Instead of focusing on my motives, or anyone else's for that matter, which are wholly irrelevant, why not just answer: What part of "people don't suddenly write beautiful books after a short trail of mediocre writings" do you fail to understand?

Wozniac said...

I should let you know that The Narrative of the Life of Frederick Douglass is one of my favorite books.
I believe Douglass wrote it himself.

Your attempt at race baiting is pathetic.

J. L. Bell said...

Does this pseudonymous commenter expect us to believe that the Republican conspiracists mentioned in this posting were not trying to “convince themselves that his memoir was written by a white man”? Does he truly not recognize what endorsing their absurd theory says about himself? The powers of denial are strong in this one.

Hiding safely behind an untraceable pseudonym, our “Wozniak” insists that some of his best friends are black authors. Of course, Frederick Douglass is a dead black author who can no longer threaten him.

Interestingly, some of Douglass’s political opponents declared that he couldn’t have written his books, for the same pathetic reasons that people have made the same charge against Obama. Has “Wosniac” delved into Douglass’s earliest writings to test their grammatical rigor, applying the same standard he wants to apply to the President’s? Of course not. That would be too rational, and would threaten his fixed ideas.

J. L. Bell said...

“Wosniak” doesn’t want anyone to discuss his motives in bringing up this absurd conspiracy theory about Barack Obama’s memoir. But if someone comes into your life spouting off about a conspiracy with racist overtones while refusing to identify himself, wouldn’t you ask what he’s up to?

Motive is even more important as “Wozniac" and his fellow travelers want us to accept them as unbiased arbiters of Obama's student writing. We're supposed to trust them to pick out a fair sample. We're supposed to accept their aesthetic judgments. But of course that would be foolish.

As the posting shows, a known right-wing conspiracist came up with this theory in the middle of a presidential campaign. A Republican politician tried to rope a British stylometrician into supporting it, but only in a way that couldn't be traced to the GOP. And now, three years later and as the next campaign heats up, a pseudonymous visitor is puffing it again. In those circumstances, the only person who wouldn't want motivations examined is someone with something to hide.

Our courageous commenter apparently believes that if a person composes poor poetry as an teen-aged undergraduate he can't possibly publish an affecting memoir in his mid-thirties. That anyone who make common grammatical errors must have entered into a conspiracy to have a ghost writer for his personal story.

Does this visitor apply the same rigor to any other first-time author, demanding to see their college papers? Has he examined the law-school writings of other politicians to evaluate their prose? Of course not. Because those people don't bother him in the visceral way Barack Obama does.

Woszniak said...

"Does this pseudonymous commenter expect us to believe that the Republican conspiracists mentioned in this posting were not trying to “convince themselves that his memoir was written by a white man”?"

They don't care whether Ayers is white or black. Nor do I. It's the fact that he was a terrorist that matters. Oh, you forgot that part.

"Does this pseudonymous commenter..."

We get the point. No need to waste your time pointing out that I'm using a pseudonym, just like millions of other bloggers out there?

"Of course, Frederick Douglass is a dead black author who can no longer threaten him. "

Well, I can name three other black men who I wouldn't mind being my president. Your insinuation of racism is still pathetic.

"Our courageous commenter apparently believes that if a person composes poor poetry as an teen-aged undergraduate he can't possibly publish an affecting memoir in his mid-thirties. That anyone who make common grammatical errors must have entered into a conspiracy to have a ghost writer for his personal story. "

What a distortion of what I presented! Obama's poor grammar is but a PART of his available (but incredibly sparse) pre-Dreams writing, all of which is of rather lame form, style, and content. And this is a fact that you are trying desperately to avoid addressing.

J. L. Bell said...

Showing his shaky grasp of both punctuation and honesty, "Wozniak" claims that he's "using a pseudonym, just like millions of other bloggers out there?" But he's not a blogger. His shifting pseudonyms come with no link to a blog or website. He's just a guy writing under a fake name to make ludicrous accusations that someone else wrote under a fake name—and he doesn't see how laughable that makes him look.

"Wosniac's" need for secrecy means we can't know if he's actually a "sock puppet" for the author he's touting. We can't know if he's signed on to that author's other theories, including creationism, conspiracies within the US government to crash passenger jets, the President being the son of Jimi Hendrix, and this Photoshopped photo being real. We can't know what else he's said about President Obama or other African-Americans. But we do know that he has something to hide.

Our courageously camouflaged commenter offers no evidence of his literary acumen (and much evidence to the contrary). Yet he wants us to accept his literary judgments as fair and convincing. Since I blog under my own name, and don't try to conceal my past, folks know I've worked in publishing for over two decades. Based on that experience, I can say the "evidence" that Obama didn't write his own autobiography to be so thin as to be laughable. Adherents of that theory must therefore be driven by something besides rational thought.

"Wozniak" denies that racism is involved in his declarations that Obama couldn't possibly have written his own life story. Of course, hardly any American admits to being racist in public now. Most Americans wouldn't want to admit it to themselves. People frightened by a smart black man with authority would be especially wary of admitting to racism because that would also mean admitting that the smart black man has some power over them. One response for those people is to try to undercut the evidence that that man is actually smart.

"Wosniak" insists that's not what he's doing, but his repeated claims look very much like white supremacists' statements about other African-American authors, from Thomas Jefferson's dismissal of Phillis Wheatley through the teacher who accused August Wilson of plagiarizing a high-school paper. The slippery pseudonym has made the unverifiable statement that he likes Frederick Douglass's autobiography. Yet he shows no grasp of how apologists for slavery claimed that Douglass wasn't telling the truth about his life and/or that a white author like Lydia Maria Child actually wrote his book. "Wozniac" puts himself squarely in the long tradition of the racists who denigrated black authors, and he isn't even capable of recognizing what that says about him.

It might be different if "Wosniack" showed how he applies the same standards to all authors. Does he respond to every memoir by someone in his thirties by digging back to what they're written in their teens? Has he combed through other politicians' law-school papers to find common grammatical errors? Of course not. That's a standard he's set for only one man. Treating a black author more harshly than any other is an obvious hallmark of racism.

Like a worm ingesting its own droppings, this commenter has twisted back on his path over and over and extruded the same worthless claims. His statements have received all the respect they deserve, and more than enough space.