Apparently I confuse computers now

The Gender Genie is an implementation of a neat little thing that purports to guess the gender of the author of a particular piece of writing. It's based on a very simple word count algorithm, which in turn is based on statistical observations on what words males and females use most in writing. A New York Times article detailing some of the original work is here, and there is a formal paper that I haven't gotten to yet.

I find statistical analysis of documents to be fascinating but iffy at best; results are always in a sort of quantum flux, where you can tell that the author is probably this and likely that, but there is always the chance of whatever characteristic you think you're picking up on being a bizarre anomaly. And since it's not a hard science, a lot of what I -- and I presume professional document examiners -- do is actually based on intuition. You look at it, you realize you're developing an opinion, and then you get down to sciencing your little heart out to find out if you're right or wrong. The trick is not learning how to ignore these hunches, but how to not take it personally when said hunches are wrong, because at some point they're going to be.

The Gender Genie is not without its flaws. It can't decide what gender I am, for starters. I fed it some of my stuff, and its verdict often flips back and forth depending on whether I tell it that the passage is non-fiction or bloggery -- an argument can be made that much of my writing is both. (I didn't feed it any fiction; I don't have any recent things around that I'd call 'completed' and the idea of slapping an incomplete anything into an analyzer makes me twitch.) I made it a point to find entries of about 1000 words, which is twice what it wants, so sample size probably isn't the issue.

Another implementation, the Gender Guesser, gives a clue as to why the Genie gets so confused. It explicitly notes that the listed of weighted words is from American English -- and when I feed it my writing, the results box likes to tell me the results are weak, whichever way they fall, and that I should double-check to make sure I'm not unexpectedly from Europe. They don't say specifically what skews results on "European English", or what they mean by that in the first place -- there's British English, with the same wacky spread of dialects as American English only in a much smaller place, plus there are the varieties of British English and modified British English taught in schools in countries where English is required pedagogy but not the primary language of the area. Un-primary English is subtly different between people and places with different native languages, as anyone who has seen me do the party trick where I tell people on the internet where they're from by the way they type in my native language will attest.

European English strikes Americans as much less gendered than American English does. (No comment on relative rates and manifestations of sexism. And no bonus points for filling them in yourselves.) This even extends to manuscript, depending on where in Europe you're looking at; a friend of mine grew up partly in the US, partly in Italy, and his handwriting strikes a lot of Americans as confusingly girly. He doesn't write in pink gel pen or dot his i's with hearts or anything -- there's just something about the letter forms that resemble those found in young-lady-American holography. You can do similar things when eyeballing old-fashioned scripts. If you know where the sample is from, you can often guess the when down to about a decade.

Likely I confuse the GGs because, although I was born and did all my schooling in the US, a lot of my influences have been from elsewhere. I have a lot of Brits in my literary lineage in particular, heavy on the snark and other dry forms of humor. (This includes authors most Americans wouldn't really think of as either British or funny -- you read any Stephen Hawking lately? The speech synth is from an American company and has an American accent, but his writing is extremely Brit.) There are probably also some small traces of other languages, in here, where I've stolen a phrasing or a style from another language family which works but is unusual to find in English. Even in translation, someone like Antoine de Saint-Exupéry is pretty distinctive.

I've also picked up a lot of quirks that are temporally, rather than geographically displaced. Even the things I read by American authors range from modern back to the late 19th c., with the major stopping-off points in the Jazz Age and assorted Victoriana. American English didn't diverge much from British English until the 17th c. for obvious reasons, and we didn't quibble about -or/-our and -ize/-ise until we started shooting back at the Old Country in the late 18th c., so the farther back you go, the more European American writing sounded, particularly formal pieces. I'd expect similar things to have happened in Australia, but one of the Ozzies of my acquaintance is known to complain that people who were born and have lived all their lives there still think of the UK as "home", so they may be consciously fighting it.

I'm a 30-year-old cisgender American female, if anyone has somehow missed it. My internet friends usually stay on the internet, being as physical distance is damned inconvenient sometimes, but the ones I've met in person can testify that I talk exactly like I write. Ponder that for a minute. Apparently I sound kind of weird but articulate.

Comments

Popular posts from this blog

The mystery of "Himmmm"

WARNING! Sweeping generalizations inside!