Combatting Spam Emails and Contact Forms

Unsolicited mail (or SPAM) has been a thorn in the side of every user of email since the first time someone heard "You've got Mail!" In this post we talk about ways to combat SPAM both from email and from contact forms.

Nobody likes receiving unsolicited emails because it wastes time and makes it difficult to find the emails in your inbox that you do want to read. While email spam filters have improved over the years, they still don’t catch all possible spam emails, and when it comes to emails from a contact form on your website, your email spam filter may not be able to recognize the spam messages as easily as they can with direct emails.

In this post, we’ll take a deep dive into methods of combating both direct email spam and contact form spam.

How Email Spam Filters Work

Spam filters offer an automated solution for identifying unsolicited emails. Each email service maintains its own set of rules for detecting and identifying spam, and the more advanced solutions may depend on a complex level of systems that weighs multiple factors. Companies must constantly update these systems to combat new forms of spam.

At their core, all spam filters are based on finding patterns in an email that matches factors correlated with unsolicited emails, with each factor assigned a different weight or score. Some filters are so strict that failing a single rule is enough to mark the message as spam. Other filters are more nuanced, and may require multiple rules to fail in order for the message to be considered spammy enough to block.

Here are some common examples of rules that spam filters use to identify spam. These aren’t all the spam filters that are out there, just some of the more common ones.

Forged “From” Headers

When the Simple Mail Transfer Protocol (SMTP) was created, it was intended to be, well, simple. The early creators of email did not anticipate that spam would pose this large a problem. One of their biggest oversights in creating SMTP is that the SMTP protocol does not natively authenticate that messages were actually sent by the address in the “from” field of the email.

Senders could traditionally put any address in the “from” field that they wanted, and with the rise of spam in the 1990s, bulk senders would often fake the “from” address to make it harder to track the identity of the sender.

Over the years, extensions were added to SMTP to authenticate the sender. Sender Policy Framework (SPF), DomainKeys Identified Mail (DKIM) and Domain-based Message Authentication, Reporting & Conformance (DMARC) provide a means to specify which servers on the internet are allowed to send messages on behalf of a domain. Domains that implement SPF, DKIM and DMARC make it difficult for spammers to forge the “from” address to be from that domain, and when they try, spam filters are highly likely to catch the spam.

Blacklists and Greylists

The key word in “Bulk Mailer” is “Bulk.” Spam is a numbers game, and senders who send spam send a lot of it. Because of this, once a particular IP address is identified as one that sends spam, it often ends up on one or many IP Reputation Blacklists. Many spam filters use one or more of these blacklists as their first line of defense against spam. If a message comes from an IP that is blacklisted, it has a very high probability of being seen as spam on various filters.

Some providers also have blacklists for email addresses. This is somewhat less common since most spammers tend to continually change their email address or forge the “from” address (see above), but if you continually receive spam from the same address, you may be able to blacklist that email address to block unsolicited emails from that address.

Many providers also implement Greylisting. A greylist is a list of sources that might or might not be a source of spam, and messages that match these lists are not banned outright but instead are treated as just more likely to be spam than messages that don’t match the greylist.

Malware / Virus Filters

A significant portion of the spam that is sent contains some form of malware. Because of this, protecting users from malware also tends to help protect them from a good portion of spam. Often times, these pieces of malware (which are found either in an attachment of the email or as a link to a site that distributes the malware) are what actually sends the spam in the first place. After malware has infected a user’s computer, it continues to replicate by using the infected computer to send out more copies of the same malware. This form of self-propagation (not only through email, but other similar means) is what help coined the term “computer virus” in the first place. Like the biological version, a computer virus attempts to spread to new hosts once it has infected one machine.

Phishing Filters

Another common form of spam is a phishing scam. Phishing is a means of social engineering that attempts to trick recipients into giving up sensitive information, such as your login credentials to your bank’s website. The phishing attempt accomplishes this by masquerading as an email from your bank, and then taking you to a fake copy of your bank’s website. Spam filters know that common sites (such as Bank of America or Amazon.com) are often targeted for phishing scams. When they detect an email that asks you to log into your Bank of America account, but doesn’t actually link to bankofamerica.com, they can flag the email as a potential phishing scam.

Content Filters

Many spam messages fit into a number of various schemes, attempting to sell something you didn’t ask for (Would you like to order this miracle drug?), or attempt to scam you in some way (Your long lost relative the Nigerian Prince really wants to give you a million dollars). These spam messages tend to follow repeated patterns that can be detected by complicated heuristics programmed with millions of previously sent spam messages. When a spam filter finds a message similar to one it has blocked before, it can treat this as one of the lesser indicators that the message is likely to be spam.

Why Email Spam Filters Miss Contact Form Spam

Because contact forms are not sent directly by the sender, but instead are routed from your actual web server (or a service your website uses), a lot of the common spam filter rules are circumvented. Forged From Headers can’t be detected, because the “from” header comes from your server and isn’t forged. IP address of the sender can’t be properly detected, because the message comes from your own website’s IP address and is thus seen as more trustworthy. Matching common email templates or common spam messages is made more tricky because your contact form emails have their own template which changes the way the content is seen, this can confuse the spam filters because the message no longer matches the patterns in their content filters.

Blocking Contact Form Spam

Just like no email server spam filter is perfect, there’s no perfect way to block spam submitted through contact forms either, but your web developer can deploy multiple means of limiting the spam you receive from your website contact forms.

First, a word of caution about making your filters too strict. The more you filter out, the more likely you are to miss legitimate contact form emails, which could result in lost business. Deciding how much to block is a delicate balancing act between blocking as much spam as possible while not blocking even a single legitimate contact form submission.

CAPTCHA

Let’s start with the most obvious choice. Most spam is submitted automatically by programs (referred to as bots), rather than a real human filling out the contact form manually. So anything you can do that will make the form relatively easy for a human to submit while being relatively difficult for an automated program to submit is going to filter out a lot of spam. The Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is any test that a computer cannot easily pass but a human can.

While artificial intelligence and machine learning continues to make it harder to make tests that computers can’t pass, and it is proven that computers could break most CAPTCHA tests if given enough computing power, most spammers are not interested in applying significant resources to breaking CAPTCHA codes just to send spam. Spam is all about getting a message to as many recipients as possible, and for every contact form that is protected by a CAPTCHA, there are others elsewhere that are not.

For those spammers that do break CAPTCHAs to send spam, they typically use a method known as Turking to break the CAPTCHA codes, as opposed to applying significant computing power. Turking is a term popularized by Amazon’s Mechanical Turk service gets its name from “The Turk,” a hoax machine from the 18th century that purported to play chess, but in reality was simply being controlled by a human. Turking in the terms of breaking CAPTCHA is simply paying humans (typically in countries where they can be paid at very low rates) to solve the CAPTCHA codes on behalf of the spammer.

IP Blacklists

If an individual repeatedly uses your contact form to send you spam, you may be able to block them from accessing your website and contact form through an IP blacklist. However, much like with straight email sending, tracking the IP addresses of the users that submit the contact forms, webmasters can apply similar IP blacklisting techniques to block known spammers from submitting forms.

Honeypots

The term honeypot refers to a technique of tricking a computer into doing something that a human user wouldn’t do. The most common form of using a honeypot on a contact form is to add an invisible field to the form that is blocked using CSS, Javascript or other techniques. When an automated bot finds the form, they see the field that a human doesn’t see, and the tricked bot fills in the field. If when the server processes the form, they see content in these fields that a human wouldn’t see, the form can simply be rejected.

Honeypots can, and regularly do, block some spam from particularly simplistic spam bots, but unlike other methods tend to be extremely easy for an automated bot to counteract. More and more spam bots these days use a technique known as headless browsing as opposed to simply looking at the HTML of the form. The reason for this is that more and more websites these days are built with heavy javascript and thus contact forms may or may not be in the actual HTML at all.

A Headless Browser is simply a normal web browser (like Chrome, Firefox, Internet Explorer, Safari or Edge) being run in an automated way. Since it is a real browser visiting the site, CSS and Javascript is actually run, and when the headless browser looks for a contact form, they see the form exactly the same way a human does, with any hidden fields being removed from the form that would be filled out. A Headless Browser would not be fooled by a honeypot.

Even many spam bots that don’t rely on headless browsers wouldn’t be tricked by your typical honeypot, as many spam bots try to fill out as few fields in a contact form as possible. Honeypots can help you limit a lot of spam, but it should never be your sole form of spam protection.

Content Filters

While applying the exact same filters that you’d apply to inbound email would fail, contact forms have their own common spam patterns and your webmaster could program your contact form’s spam filters to recognize these patterns and filter out common spam messages. These sorts of filters can be very different depending on the type of contact form.

Make sure you aren’t missing leads!

Extra care should be taken when you have both a spam filter on your email service AND are receiving emails from contact forms. Because contact form notification emails come from your web provider as opposed to directly from the user who submitted the form, if your spam filter marks the notification as spam, it may inadvertently prevent you from receiving legitimate (non-spam) contact form notification emails in the future. Here are some rules of thumb when it comes to making sure you receive your leads.

Whitelist your provider’s email address and IP Address

Most spam filters have a whitelist in addition to a blacklist when it comes to IPs. Typically the blacklist is automated (often synced or compared against global lists), but the whitelist is local to your organization. The easiest thing you can do, is add the email address that your contact forms come from to your email address book. Many providers (including GMail and Microsoft Outlook Online) will use your contact list as a de-facto white list, trusting emails that come from those addresses as likely being not-spam. This tends to allow those messages to bypass your internal content filters which may be confused by your contact form.

Your Email Administrator can also add your provider’s IP Address to an IP Whitelist that will indicate messages coming from that IP Address as being safe.

Never mark a contact form notification as spam in your email

Every mainstream email client, and most web-based email clients have a “Report Spam” button on the toolbar. When you see a message that gets through your spam filter, clicking on this button will move the message to your spam folder and help train your email provider’s spam filters to catch messages like this in the future. In many cases this may also add the sender’s IP address and email address to your provider’s blacklist so that future messages from that sender are immediately blocked.

The problem with contact form emails is that the sender is not the spammer, it is your website provider’s contact form system. So if you mark a contact form notification email as spam, you are effectively blacklisting or at least greylisting all future messages from your contact form regardless if they are spam or not. As such our recommendation is that you never push the Report Spam button on a message that came from your contact form, even if the contents of that message is spam.

NOTE: This is a specific suggestion for contact form emails specifically. The “Report Spam” or “Mark Junk” (depending on your email client) is your most proactive tool in your email client for training your email spam filters, and if you receive an unsolicited email that was sent directly to you, you should definitely make use of the Report Spam button.

Check your spam folder regularly for false positives

A false positive is when your spam filter detects a non-spam message as spam. When looking in your spam or Junk folder (depending on your email client), if you see messages from your contact form, be sure to open the message and click the “Not Spam” button (usually takes the same place as the “Report Spam” button when in other folders). This helps train your email service not to treat contact form messages as spam in the future.

While you are in there, be sure to look through other messages in your spam folder for other false positives. Just as it is good to train your email client on what does constitute spam, it is just as good to train your email client on what does not constitute spam. Many email services (including GMail) have an auto-purge feature on the spam folder, where messages left in the spam folder for a period of time are automatically deleted, so if you don’t check your spam folder regularly, you may never see a message that was marked incorrectly.

The Fight Continues

As the war against unsolicited email and contact forms rages on, email services and web developers will continue to arm themselves with new weapons to use in the fight. At the same time, spammers are also arming themselves with new ways of evading detection. As we mentioned above, right now it is still computationally expensive to break through CAPTCHAs, but as Artificial Intelligence and Machine Learning continues to improve, the cost of breaking through such tools continues to go down.

That said, AI can also be a weapon for those fighting against spam as well. Machine Learning makes it much easier to detect patterns in spam messages in ways that simple filters and even humans could not have imagined just a few years ago.