OK, not really. That was only a slight exaggeration
Seriously, the specific spam problem that I complained about in my “technology is random” posting is what I’ve now solved.
As I mentioned in that post, I had a combination of procmail rules and SpamBayes filtering, etc. I completely turned off the old SB filtering, because at first I thought that somehow it was causing the emails with attachments to be deleted. Only when I did that, did I notice that it was throwing away other emails simply because it was incorrectly tagging them as certain spam (score of 1.0). I couldn’t believe that, but like I said, since I wasn’t updating the db, it was degrading.
So, I turned off the SB filtering, and still, emails were being sent to /dev/null on the server if they had large-ish attachments. That meant that one of my other procmail rules was kicking in. I looked at each (I have many) very closely, and couldn’t imagine which might be causing this.
Also as mentioned in the previous post, I temporarily fixed this by creating a procmail-based white list, which (unfortunately) was both after the fact, and growing steadily.
I also went back and with a few carefully crafted grep and tail pipelines, was able to identify other emails that had quietly been thrown away, and then contacted those (very surprised) authors, and asked for a resend.
OK, on to the solution (almost). Yesterday, an old boss of mine (no, he’s not that old, but I haven’t worked for him directly since 1989!) asked me to review a 384 page document that he had written (no, I’m not kidding about the size). People who know me, know that I (and Lois) are like an echo when it comes to email (think “ping pong”). When he didn’t get an acknowledgement from me within an hour, he assumed that something was wrong.
He sent me another email, asking if I’d gotten the file. Of course, /dev/null had eaten it…
I white listed him, and got the file (which is how I know the size, as at first he scared me by telling me that it was 400 pages)
That got me to thinking that I now had a specific attachment that I knew would fail. I ended up sending it to myself from an account that wasn’t white listed. It got thrown out immediately. Bingo! Now I was at least in control of my own destiny, since I could provoke the problem any time I wanted to.
The next step was easy (and obvious). I turned on verbose logging in procmail and resent the email. You might ask “Why the hell didn’t you turn on verbose logging earlier?” Good question. Aside from not really thinking about it, I must have known (intuitively) that my disk would have filled up waiting for a “bad” email to come in and provoke the problem. Even asking someone to resend would have an unacceptable lag in waiting for them to see my email and act on it, etc.
Logging showed that I was being completely stupid in one specific rule. As the rest of you must know, one of the most popular email annoyances are the pump-and-dump stock schemes. They promote a specific stock as the next moon shot. Many are traded on an exchange with a code of PK (for the few of you who don’t know, that’s the Karache Stocke Exchange in Pakistan, a place where I am dying to find a good stock deal!)
So, I started a little procmail rule that added any symbol in those emails that I was sure (and here comes my ultra-stupidity) that couldn’t occur in a normal email. So far so good, right? As an example, let’s say that one of the symbols was “JMNX.PK”. Come on, would I worry about accidentally deleting an email that had that string of characters in it?
Well, mistake number 1 (the tiny one) is that without escaping the “.” in the above symbol, it would have substituted for any character, so if a buddy sent me an email saying “Howdy, check out JMNXOPK”, I would never have seen it. Hopefully, I’d survive such a faux pas. But, over time, I added shorter symbols. Notably, one was PHYA. Again, I wasn’t “worried” that someone would send me a legitimate email with that in it. This was mistake #2, and clearly the biggie…
When someone sends you an attachment, it gets encoded, typically in base64, which is an ascii encoding. That means that it is converted into a series of apparently random characters. The bigger the attachment, the more of these random characters, and the more likely that any 4-letter combination will appear.
So, it turned out that the 384 page document had the string “pHYa” in it. Note that procmail was kind enough to be case insensitive so that “pHYa” matched my input of “PHYA”, reducing the number of random combinations I had to sweat out.
Of course, in retrospect, I was an idiot, and the inevitability of the match is obvious. The solution is trivial too: delete the rule Now that it’s gone, it’s just as simple to add at least another step to check for any number of other typical pump-and-dump keywords along with the ticker symbol, and that should work just fine. In the end, it was both laziness on my part, coupled with the fantasy of catching every occurence of that particular type of email that did me in.
All I can say is amen, a modicum of sanity has returned to the world…