comments – Page 3 – Blogschmog

There is an effort underfoot to filter stupidity from Internet discussions. Seriously.

Although it sounds like it could be a headline from The Onion, StupidFilter is actively collecting a corpus of data in the hope that some Bayesian analysis will generate a pattern to stupid comments. The project driven by Gabriel Ortiz and Paul Starr got BoingBoinged last month without much reaction but popped back up in the blogosphere this week with some ferocity.

The concept put forth by Ortiz and Starr would create a tool to intervene before a commenter makes their stupidity public. It would apply reliable patterns to previous comments deemed “stupid” and suggest changes to the author before allowing the post to proceed. This could be implemented as a form of proactive censorship, forcing behavior to adhere to some ideal norm, or as a warning layer that informs the author that some people may ignore their post without corrections. The idea is likened to spam filters, which have had fought a losing battle with spammers who adapt their techniques to bypass such technology.

This isn’t the only such effort. As one commenter of the BoingBoing post pointed out, developer Chris Finke already has an intervention for YouTube stupidity in the form of a browser extension, the YouTube Comment Snob.

There is plenty of tongue in the StupidFilter cheek, of course—particularly evident when reading the FAQ—but it does bring up some real cultural issues about both technology intervention and communication.

Canned Elitism
The origins of stupidity being seen as a problem in Internet discussion can be found in early mixing of tech cultures. In 1993, America Online—the most popular way for the masses to get onto the Internet at the time—included access to Usenet as part of their service. Doing so injected (potentially) millions of new users into an established channel, creating a techno-elitist culture clash. Many mistakes in established etiquette were met with many harsh responses of a dismissive or condescending nature, errors on both sides of that dialogue. It is a similar dynamic taking place with the opening of Facebook, where early adopters lament the commercialism and wide-ranging demographics that came with rapid expansion and application development.

At its root, the StupidFilter project is a reflection of the attitudes of the older ‘Net culture that wants to conform others into their accepted norms. The filter is meant to be a behavior modifier, using patterns of authoring as exemplars for What Not To Do. But in creating this altruistic yardstick it is also destroying the ability of online norms to adapt.

l33tspeak is undeniably an important part of the history of the Internet and use of technology in general. While implicit norms discourage its use in forums—”Only n00bs use l33tspeak”—reference to that construction carries meaning relevant in many conversation channels where StupidFilter might be implemented. Its use, which almost certainly would be flagged as stupid by the proposed technology, is not tantamount to stupidity. Do we want to artificially sacrifice such references? Does the act of making norms explicit and objective reduce the chance of developing future conventions?

Alpha code for StupidFilter is expected as early as next month, putting the cultural hurdles center stage just in time for the holidays, when public stupidity is likely to rise in the form of drunken photography, naive use of technogadgetry, and another wave of new Internet users courtesy of Santa’s gift of a computer with wireless. Will encountering something like StupidFilter early in the personal adoption of a technology serve to establish accepted baseline behaviors in the new user, or will it prove to be an unwarranted barrier to participation?

Technical Challenges
There is validity to the approach StupidFilter is using to generate patterns in comments, however Bayesian analysis in isolation won’t map to evaluation of stupidity. Ortiz and Starr acknowledge the contextual blindness of statistics, but they also argue that by looking at a comment “based on the gross prose style alone, its stupidity is self-evident.”

Finkle’s browser extension—which seeks to alter the YouTube web page before rendering it to the reader rather than change behavior of the author—is more straightforward in its rules construction. It detects and hides comments that fit a conventional wisdom of patterned criteria, such as:

More than # spelling mistakes (# is customizable)
All capital letters
No capital letters
Doesn’t start with a capital letter
Excessive punctuation (!!!! ????)
Excessive capitalization

Adherence to all of these options effectively eliminates the e e cummings of the world and most of the texting generation, but those simple rules probably account for a lot of the flotsam that detracts from a discussion. The big question is whether either approach is a reliable method of addressing stupidity.

Spam filtering employs a variety of tactics to keep unwanted solicitation out of the conversation channels. Like the Finkle and Ortiz-Starr projects, the most common filters rely on pattern matching to flag text as spam. This allows the questionable content to be segregated and dealt with according to the preferences of the moderator. Valid contributions get caught in such filters, though, forcing people to review spam at a high level before disposal, or risk losing desired content. A second technique is the CAPTCHA intervention, which assumes that most spam originates through automated systems. By forcing a human to answer simple questions or perform tasks a computer cannot process, many of the spam comments are eliminated. Humans can still spam, however.

The most effective spam filter is Akismet, a blog and forum tool attached to the WordPress platform. While some pattern matching is inevitably being performed as a default, Akismet’s strength is in relying on many eyes to spot and evaluate questionable content. Once a comment is entered into the system as spam, it becomes less likely to propagate elsewhere. This form of distributed human cognition is arguably superior to the computational counterparts because humans are both better at identifying patterns than computers and are capable of exploring the context of each post. The Akismet community is also able to adapt quicker to changes in spamming strategies, such as RSS content scraping that winds up on splogs (spam + blog).

Even with excellent success in identifying the statistical patterns of unwanted comments, StupidFilter will still lack a reliable means of evaluating stupidity. Perhaps an Akismet model that shares common information about authors and content across platforms would allow some community-empowered moderation to take place. This is arguably already done on forums where each member has the ability to rate or block the contributions of their peers. All that is lacking is some universal clearinghouse for such data. That is a social graph problem more than a content filtering one.

The irony of the meta commentary about meta commentary could only be improved if CommonCraft made a video about Internet Stupidity in Plain English and posted it on YouTube for public comment.

But that would be stupid.