CertCities.com | Print Column Article: Endmail Part I: The War on Spam

From CertCities.com

PRINT ARTICLE NOW

BACK TO PREVIOUS PAGE

Column
Notes from Underground
Endmail Part I: The War on Spam
James offers this overview of the methods for fighting Spam before diving into tools for Linux administrators.

by James Ervin

4/2/2003 -- Legend has it that in the Internet's youth, e-mail providers maintained a "gentleman's agreement" not to meddle with the contents of messages. Today, users frustrated with a far less gentle Internet demand meddling to reduce spam. This places providers in the unenviable position of protecting their customers' privacy while invading it. Bruce Schneier has commented sagely on this phenomenon: people don't want privacy; they just want assurances that their private information won't be misused.

Social Engineering
Like computer security, Schneier's métier, spam is a social problem rather than a technical one. The anti-spam community's toughest hurdle is the lack of a legally defensible and enforceable federal definition of spam-if one existed, it's certain that the flood of class-action lawsuits to follow would intimidate at least some unregenerate spammers. Unfortunately, thanks to a weak-kneed, uninformed Congress and its long history of protecting the interests of enterprise on the grounds that unrestricted commercial speech is included under the rubric of free speech, no federal anti-spam legislation has been enacted in the United States, although the European Parliament recently adopted measures going into effect in October 2003.

In the US, what exists is a hodgepodge of confusing and contradictory state laws: for instance, North Carolina law only prohibits unsolicited commercial e-mail. Elizabeth Dole's senatorial campaign used this loophole to send thousands of unsolicited non-commercial political advertisements. This tactic struck at least one constituent, Ken Pugh, as disingenuous, though his lawsuit for $80 was later thrown out of small-claims court. Undoubtedly, the definition of "solicited" will be up for grabs next time around. Without federal guidelines, pursuing such claims across state lines is laughable.

Public distaste for spam is so strong that even the Direct Marketing Association (DMA) has spoken out in favor of spam laws, on the ground that "legitimate" marketing is damaged by the prevalence of spam when users delete legitimate advertisements indiscriminately. Lest you think that the DMA truly has your best interests at heart, note that they are still lobbying against the recently approved national telemarketer's "opt-out" list in a lawsuit against the Federal Trade Commission. As a preemptive strike on the potential precedent this lists sets for spam, the DMA is now touting self-regulation as the answer to the spam problem, and has set up their own opt-out list known as the E-mail Preference Service.

To a different cast of mind, the DMA's position begs the question of whether "legitimate" marketing even exists. The most militant anti-spammers contend that the recipient is always the final arbiter of what constitutes spam-hence, spam is spam, "legitimate" or not. Yet it's not difficult to conceive of situations where the recipient's opinion should be disregarded-for instance, when a town council sends adverse weather advisories (perhaps "terror alerts" is more topical) to its constituency. Supreme Court Justice Potter Stewart's famous concurring opinion on the definition of obscenity in Jacobellis v. Ohio -- "I know it when I see it" -- applies equally well to the contentious definition of spam. Stewart's opinion is justly notorious, though: "common sense" can't be legislated effectively, because everyone's differs.

A frequently cited common-sense definition of spam is "unsolicited commercial bulk e-mail," and it's clear that the words "unsolicited," "commercial," and "bulk" go a long way towards expressing what many people feel is uniquely objectionable about spam. An imperfect definition is a better foundation for law than none, and this definition has been adopted by many anti-spam groups and local governments in an attempt to ameliorate the situation. Yet even if spam can be effectively defined, there are always people adept at circumventing definitions, or simply ignoring the law.

Technical Solutions
The latest spam-fighting techniques take advantage of the social spam-fighting finesse we all inherently possess.

Distributed Identification
Distributed identification applies peer-to-peer logic to the spam problem. A community of users identifies incoming mails as spam. Once enough users do so, the offending mails can be dealt with by other subscribers. The Razor Perl module used by SpamAssassin (the leading server-side spam identification utility) and its commercial offshoot, SpamNet (which offers a plug-in for Microsoft Outlook users) are the most visible proponents of this method. Unfortunately, centralized peer-to-peer networks are subject to a host of problems, including nefarious users, denial-of-service attacks, and so on.

Content Filtering
Content filtering reverses the gentleman's agreement: each incoming e-mail is presumed guilty, and searched for identifying marks. If the Mark of Cain is found, the mail is quarantined, discarded, or branded as spam so that an aware browser or delivery program can deal with it. The similarities to virus scanning are obvious and intentional. Despite the stench of McCarthy, content filtering is the most effective method of identifying spam. Seen in this light, distributed identification is simply a form of content filtering that uses humans as the sieve.

The most famous new content filter is described in an article which has become a battle cry for spam warriors, Paul Graham's "A Plan for Spam." Graham describes a method called Bayesian filtering, wherein users "seed" their e-mail filters with known spams. While traditional content filters use a static set of rules to determine whether e-mail is spam, Bayesian filters change over time, based upon your additions to the spam database. If you don't want to receive mails containing the word "orange," simply identify enough mails with that word as spam, and the filter should eventually comply. Many products employing Bayesian filtering are available, including Mozilla and the open-source Bogofilter, ifile, and bmf, command-line Unix utilities.

The tragic flaw of Bayesian filtering is that if truly effective, it will sow the seeds of its own demise -- like smallpox vaccinations, if there's no spam left to train filters on, we'll all be vulnerable to a sudden outbreak. Granted, that's unlikely, and projects such as the Spam Archive are doing the hard work of collecting spam for us. A more serious concern is that, given the inevitable increases in storage and bandwidth in years to come, spammers will begin doping their e-mails with legitimate words (perhaps entire dictionaries), confounding the Bayesian filters. Pornographic websites often include common search words in non-displayable HTML code to boost their rankings on common search engines. Graham asserts that a similar approach is unlikely to work, because spammers would have to tailor their messages to individual senders' and recipients' writing styles-however, it's conceivable that enterprising linguistics students might be able to offer statistical probabilities on what words spammers should include, if a reliable database of legitimate e-mail could be acquired. Imagine an Outlook virus that discreetly sent inboxes to the Direct Marketing Association, for instance.

Blacklisting
Blacklisting is the practice of simply not accepting mail from recidivist domains -- the Internet equivalent of sex offender registration databases. The Mail Abuse Prevention System (MAPS) publishes a list of abusive domains via the Domain Name System (DNS) and other methods. Unfortunately, blacklisting often results in the loss of legitimate e-mail, and should you happen to fall into one through no fault of your own, via identity theft or other means, it's very difficult to get out. It's not for nothing that MAPS calls their blacklist the Realtime Blackhole List.

The efficacy of blacklists is also limited by their high turnover: spammers can easily obtain a free e-mail account with an online service, send their spam, and slink back into the shadows like some sort of trap-door spider. Blacklists are legion; some blacklists, such as the Open Relay Database, aren't even devoted to spammers, but to sites which simply allow relaying of mail. To maintain control over mail delivery and be able to respond to legitimate complaints, some organizations maintain their own blacklists rather than subscribing to any of the free services. Sendmail 8.9, for instance, introduced the access database capability, which allows selective processing of incoming mail based upon the sender's name, domain, or IP; most mail delivery agents have a similar feature.

Next Time...
Since there's lots of money to be made, the entire force of human ingenuity is being brought to bear on both sides of the spam war. This has led to an interesting, perhaps disturbing convergence between the techniques of law enforcement and anti-spam zealots, for whom some information is adamantly not meant to be free, but kept tightly in check. Although it's tempting to wax nostalgic for the days when the gentlemen were all in accord, it's now impossible to de-commercialize the Internet.

Spam is a social problem on a global scale, less serious than famine or disease but equally recalcitrant. To misquote Bruce Schneier once again, there's no "magic spam dust," that can be sprinkled on the problem. The anti-spam community is well aware of this; but it's equally true that technical wizardry can ameliorate the problem. Next time, we'll look at a few of the previously mentioned spam tools in-depth.

Comments? Questions? Post your thoughts below!

James Ervin is alone among his coworkers in enjoying Michelangelo Antonioni films, but in his more lucid moments suspects that they're not entirely wrong.

PRINT ARTICLE NOW

BACK TO PREVIOUS PAGE

top

Copyright 2000-2009, 101communications LLC. See our Privacy Policy.
For more information, e-mail .