By Lisabet Sarai

I’m currently reading a book that should never have been published. Unfortunately, I’m committed to reviewing this three hundred fifty page novel, so I can’t just erase it from my e-reader and breathe a sigh of relief. I have to endure the run-on sentences, misspellings and incorrect vocabulary; the point of view that does a random walk from one character’s head to another’s; the verb tenses that shift from present to past and back again in the same paragraph.

I have to wonder about an author who sends a book in this sorry state out to the world. Did she really not know any better? Like many first erotica novels (including my own), the story (a moderately intense tale of extreme submission) feels like personal fantasy. I appreciate, from my own experience, the thrill that comes from baring your sexual soul, the rush one feels being brave enough to bring those filthy imagined scenarios into the light. It’s easy to get carried away. Still, even when writing for one’s own satisfaction, doesn’t an author have at least some responsibility to her readers? Shouldn’t there be some minimum criterion an author must satisfy, in terms of language skills, before he or she is entitled to ask other people to actually pay for privilege of reading?

Unfortunately, this book is far from unique. At least twenty percent of the ebooks I read appear to have never been examined by a (competent) editor. Some have dreadful formatting problems as well – text that switches from one font to another in the middle of sentences, negative leading between lines so that one overlaps another, and so on. Furthermore, these issues don’t appear just in self-published books.

Now, I’m a bit of a geek. You may or may not be aware of the fact that text processing software capabilities have become extremely sophisticated. Programs can analyze text in order to determine whether it was likely to have been written by a male or female; whether it was plagiarized; what emotions were experienced by the author; even whether it has linguistic characteristics shared by best-sellers. Software exists to grade essay questions in college entrance exams and make suggestions for how the author can get a better score. It recently occurred to me that someone (not me – text processing isn’t my specialty) could write a program to screen out books with egregious grammatical and lexical problems.

I have no doubt that Amazon has the resources to commission this sort of computerized gatekeeper. Think about it. Before an individual, or a publisher, could finalize submission of a book for sale, they would have to run it through the Automated Editor. The program would flag potential problems for attention. If the number of dangling participles or sentence fragments or run-on constructions exceeded a threshold, the book would be rejected. In other words, it would become impossible to publish a book like the one I’m wading through at the moment. The base level quality of available books would improve dramatically.

(Of course, Amazon would never do this voluntarily, only under pressure from readers. The company has zero incentive to reduce the number of books it offers for sale.)

But then, an artificially intelligent text analyst could do a great deal more than simply check for basic grammar. It could flag repeated words, phrases or figures of speech. (How many references to an “inner goddess” should be allowed before a book was rejected?) I believe that existing linguistic analysis software could also be trained to detect clichés, simply by providing an extensive database of example phrases. Purple prose would also be sufficiently distinctive, I think, to be identified with some level of accuracy.

I’m starting to imagine a multi-level application that could analyze a wide range of textual and stylistic characteristics in order to assign a “publishability” score to each manuscript. Why stop with the superficial problems, though? Automated language understanding systems have made great progress in the past decade, due to faster hardware and new algorithms. So why not look not just for clichéd language, but clichéd plot elements as well? That may be beyond the capabilities of today’s software, but not tomorrow’s. Using tired, overused story lines as models, the program could decide that the world did not in fact need yet another vampire-turning-his-lover-to-save-her-from-death or billionaire-seduces-virgin tale.

We could also use our gatekeeper software to determine how well a book purporting to belong to a certain genre in fact fit the conventions of that genre. If the program found evidence of lesbian interaction in a heterosexual erotic romance, for instance, it could reject the book as inappropriate for the targeted readers.

In the brave new world I am imagining, almost any aspect of a book’s content or presentation could be quantified and used to make publishing decisions. Sentences too short or too long. Overuse or underuse of adjectives. Too many characters of particular ethnicities. Focus on uncomfortable, politically incorrect or otherwise controversial topics. Mention of specific individuals, events, places, companies, products… the possibilities are limitless.

Think of how much more pleasant reading would become when you didn’t have to worry about ever encountering run-on sentences – or depictions of rape. You’d be shielded from both bad grammar and bad ideas.

Sure, this might homogenize the reading experience a bit, but that’s happening anyway, isn’t it? You’re right, Hemingway and Pynchon and Palahniuk and Joyce might not make the grade with our gatekeeper, unless they were grandfathered in as previously published. I’ll admit that some promising new authors would be prevented from making their work available to the world, but that happens with human editors too. At least our computerized literary gatekeepers would be objective and impartial.


Hmm… Maybe this needs some more thought

Meanwhile, I’ve got to go read a few more painful chapters and then figure out how to write this review without totally demoralizing this poor, benighted author.