Server

Converting from Procmail to Maildrop

Send to Kindle

I’ve been using procmail to filter mail on the server forever. I like it, so it’s important to note that even though I switched, I have nothing bad to say about procmail.

So, why did I switch? Procmail can be a little terse to read (obviously, I’m used to it by now). Over the years, I have built a large set of rules. There is a ton of cruft in there. If I wanted to clean it up, I had to rewrite it. Rewriting it in procmail was definitely a possibility.

But, over the years, I was also aware of maildrop as a filtering solution. It has a cleaner (more accurately, a more straightforward) syntax. The documentation is a little sparse (missing a few key examples IMHO). There are also thousands (if not millions!) of example lines of procmail available on the net, and it can be hard to find complex real-world examples of maildrop filters.

But, I knew that if I rebuilt my filters in maildrop, I’d be forced to rethink everything, since I couldn’t get lazy and just grab hunks of procmail from my current system and plop them into the new one. So, maildrop it was going to be!

One last time, just to make sure I don’t offend lovers of procmail (of which I am one!), everything that I did in maildrop could easily have been done in procmail. I just happened to choose maildrop for this rewrite, and for now, will stick with it. Perhaps if I ever revisit this project, I’ll iterate the next time back in procmail.

The goal of my filters is to toss obvious spam (send it to /dev/null). Likely spam gets sorted into one of three IMAP folders. The only reason I split it into multiple folders is so that I can test rules before turning them into full-blow deletes. Finally, mail that falls through those is delivered to my inbox.

Over the years, I added numerous rules to filter classes of spam (stocks, watches, viagra, insurance, etc.). Without a doubt, I introduced tons of redundancy. I didn’t scan all of the previous rules to see where I might be able to add one more line because it would be too tedious vs just adding a new rule.

I was reasonably satisfied with the result, but over time, became less aggressive about deleting mail automatically, preferring to stuff it in a spam folder for visual scanning during the day.

Since I create my own rules (I don’t run a system like SpamAssassin, which I did for a while), I can start to see patterns and simplifications over time, which was the impetus for the rewrite. In other words, there are more commonalities across classes of spam, and I don’t have to spend as much time categorizing things as I was bothering to do.

I’ve now made my first cut of the maildrop-based system. It’s been in production now for seven days, and I’m very happy with it so far. The one major change I made is to default to deleting things (in other words, much more aggressive than the previous system), but, I keep a copy of all mail in an archive IMAP folder that I will prune through a cron job, and never scan visually.

I review my delete logs once a day, so if I spot an email that looks like I shouldn’t have deleted it, or someone contacts me asking why I didn’t respond, I will be able to check the archive and have the full mail there (for some reasonable period of time).

Here’s the result of the rewrite:

The original procmail system had roughly 3800 lines it in (including comments and blank lines). The new maildrop system has under 550 lines, including comments and blanks. I delete more mail automatically, and in a week, haven’t deleted a single mail that I didn’t mean to. I am getting a few more emails sneaking into my inbox, but each day, I add a few more lines and the list gets shorter the next day.

Now that I am getting a bit more spam each day into my inbox, Thunderbird junk filters are getting more to train on, and they are getting better too, so even the junk that is getting in, is mostly getting filed in the Junk folder locally, automatically.

Here are two things that took me longer than they should have to figure out with maildrop (they are related, meaning the solution is identical in both cases, but it wasn’t obvious to me):

  • How to negate a test using !
  • How to use weighted scoring correctly (very simple in procmail)

Here’s a line in maildrop format:

if ($TESTVAR =~ /123/)

do something useful if true…

The above will “do something useful” if the variable TESTVAR contains the pattern 123. What if I want to “do something” if TESTVAR does not contain 123? Well, until I figured it out, I was making an empty block for “do something”, and adding an else for the thing I really wanted. Ugly.

My first attempt was to change the “=~” to “!~” (seemed obvious). Nope, syntax error. I then tried “if !($TESTVAR =~ /123/)”. Nope, syntax error. I then tried “if (!$TESTVAR =~ /123/)”. No syntax error, but it doesn’t do what I wanted.

I stumbled on the solution via trial and error:

if (!($TESTVAR =~ /123/))

Ugh. The ! can only be applied to an expression, which is normally (but not always?!?) enclosed in parens. But, the if itself requires an expression, so you need to put parens around the negated expression as well. At least I know now…

The second problem was weighted matches. I was having the same problem. Once I put parens around my expressions, it started working. That’s one of the few places where the procmail syntax feels a drop cleaner:

COUNT=(/123/:b,1)

COUNT=$COUNT+(/456/:b,1)

COUNT=$COUNT+(/789/:b,1)

echo $COUNT

So, the above sets the variable COUNT to the number of times that the string 123 exists in the body of the message. That is then added to the number of times that the string 456 exists in the body, finally adding the number of times that the string 789 exists in the body. The total is then echoed to the console. Without the parens, no workie.

I don’t like the fact that I have to maintain the running count myself. In procmail, you basically set a limit and the tests stop once the limit is reached (which feels way more efficient). There might be a way to accomplish that with maildrop too, but I haven’t found it as yet…

While I fully expect to add more rules, or lines to existing rules, I can’t imagine a scenario where my file will even double from here, so it will end up at less than 1000 lines. That will be easier to maintain for a number of reasons, most notably syntax readability.

Welcome Back Courier-IMAP

Send to Kindle

When Matt was maintaining this server, starting back in 2001, he installed Courier-IMAP for our mail service (both IMAP and POP). It worked extremely well for many years. At one point, IMAP folders started taking a long time to open. Once they were open, performance was excellent. I think this was due to not enough simultaneous open connections allowed from the same IP address.

I’ve been maintaining this machine for a while, but I didn’t bother to do anything with Courier-IMAP even though it had started annoying me. Over 18 months ago, I switched to a new server. I decided to build it from scratch, correcting some legacy problems along the way. One of them was the above-mentioned IMAP hangs (not always, but annoying nonetheless).

After some research, I chose Dovecot as the new POP and IMAP server. I was impressed with how easy it was to install and configure. While there were tons of options to choose from, all of them are controlled in one dovecot.conf file, with extremely clear descriptions of what each choice entails. Once installed, it worked perfectly, and I was very pleased with my decision.

This joy lasted for more than a year! Then, at some point (possibly after my server was physically moved from one data center to another), a few times a day (typically 1-3 times), Dovecot would think that the system clock had moved backwards in time (it never does, and no other program every complains about that). Dovecot sees this as a very bad event (understandably) and exits automatically.

After noticing this a few times, I installed a monitoring program called Monit. I am running a 5.0 beta version, but monit has been flawless in every respect, even in beta. Since I installed monit, every time that Dovecot would quit, monit would restart it within 30 seconds, and email me that it just restarted it. That’s how I know it’s a daily event, sometimes multiple times a day.

I’ve lived with this nonsense for way too long. Each time, I assumed that the next version of Dovecot would magically fix my problem, even though it started on a version that had worked perfectly for months before the problem started! As much as I like everything else about Dovecot, I finally gave up.

This weekend, I installed Courier-IMAP (a much newer version than the one we used to run). I made sure to allow more concurrent connections from the same IP address (both Lois and I always come across as coming from the same IP address). I had a few hassles with the configuration (even though there are way fewer things to choose than with Dovecot). After about an hour of messing around (probably all my own fault, over-thinking some of the choices), I got it working.

There was only one side-effect to the change. Under Dovecot, my IMAP folders were shown in the mail client as follows:

INBOX:

———>XXX

———>YYY

———>ZZZ

After switching to Courier-IMAP, the structure in the mail clients was:

INBOX:

———>INBOX:

——————–>XXX

——————–>YYY

——————–>ZZZ

We can all live with it, and no mail was lost. It’s a minor nuisance.

I tested on a separate port, and when POP and IMAP were working, I turned Dovecot off and restarted Courier-IMAP with the correct default ports.

I then wrote an email to all of my users (all four of them). 😉

However, when I went to send the email, the send failed with a SASL authentication error. Ugh. I have saslauthd on the system (it wasn’t running, because Dovecot was performing that service as well). I started it up, but even though I played around a bit, I couldn’t get Postfix to authenticate correctly through it.

After looking at the top of the dovecot.conf file, I saw that by changing one line (which protocols Dovecot should handle) to “none”, it would run in SASL authentication mode only. That worked. So, now I still run Dovecot for authentication (since I didn’t have to change anything), and Courier-IMAP for mail fetching.

So far, the system has been running for a little over 24 hours, with no exits on the part of Courier-IMAP or the Dovecot auth daemon. I also haven’t had any hangs on opening an IMAP folder. It’s still very early, but the Dovecot IMAP server would have died at least once by now (guaranteed), so it’s already a win.

Here’s hoping that this will be a permanent change…

Update: First, so far so good. No exits in 4+ days! More important, I just stubmbled across a post that gave me the answer to my nested mailbox problem above. Apparently, Dovecot repeats the .INBOX in front of each sub-folder in the Maildir folder. In other words, .INBOX.SPAM is the SPAM folder, directly under the main INBOX in Dovecot. Courier-IMAP expects it to just be .SPAM in the top-level Maildir folder, in order to be considered a direct sub-folder of the main .INBOX.

I moved the folder names, unsubscribed the old one and subscribed to the new ones, and my folder hierarchy is now sane again. Whew. 😉

Update: It turns out that the fault lies not with Dovecot, but with some bad code in the Linux Kernel for certain hardware configurations (obviously, mine included) that causes the system clock to jump a few times a day. There is a long thread about it on the Dovecot mailing list, which points to a very long thread on the Linux Kernel maintainers list. So, while the problem definitely affected me, it’s not Dovecot’s fault, which correctly notices the jump, and decides that it’s safer to exit than to guess. Perhaps a future kernel update (I just applied one today) will solve the problem. I don’t feel like hand-patching my kernel, or Dovecot…

Rebuilding jackkapanka.com

Send to Kindle

I have written twice now about taking over the maintenance of Jack Kapanka’s website. There were three distinct phases of working on the site:

  1. Fix the broken links
  2. Change the home and store pages (adding PayPal support)
  3. Redesign (rebuild) the site from scratch

All three phases are now done, though there is no trace of #1 or #2 left, now that #3 is complete.

I have zero design skills. I have very limited experience building websites. Therefore, I struggled mightily along the way. I had to Google my brains out, and still had a number of false starts and long debugging sessions.

So, I decided to write this post for two reasons:

  1. I’m sure I’ll be building other sites in the future, and I could easily forget some of the things I discovered along the way
  2. While everything I did was discovered through the web, I found different bits on different sites, so I might save some poor soul a few minutes in the future if they stumble on this post

I did not take notes along the way, so this will most definitely not be exhaustive. If I later recall anything that is material, I wll come back and update this, again, just for posterity.

The site I inherited was being hosted on a shared system, running Windows Server, with IIS serving up the pages. As far as I know, I have no access to the IIS instance (which is likely shared, though I don’t really know). I have no shell access, just web and FTP.

The site was coded a long time ago, was optimized for 800×600 resolution, and was stitched together with a complex web of HTML tables, with images in different cells, which had been sliced up from original PSD files, to create a look and feel. Everything was an image, but they weren’t uniform in size. Some spanned multiple columns, other multiple rows, etc. Rearranging anything on the stite was a nightmare (for me, with my limited skills!).

The menu system was 100% JavaScript (JS). Each menu item had an image background, with the top level menus having images for the labels. Many other navigation elements on the site were JS-based, giving little or no feedback to the user that clicking would take them somewhere. There were no scroll bars for any long content (like the Bio), as they wanted the neatness of an 800×600 design, so you paged a few paragraphs at a time with JS controls.

Having nothing to do with code, the site was typical of many artist sites, very dark (in this case brown), and completely image-laden. I am sure that they think it’s edgy and cool, and perhaps it is. That said, it’s also often hard for people (especially older people, who might actually have disposable income) to see well or navigate. Lois and I prefer lighter themes, with less images, when possible.

Many of the links on the site were broken, and the reasons varied (target was missing, link had a typo, target was renamed, but still existed on the site, etc.). Some of the important content needed to be updated to reflect updated contact info and to correct typos as well.

So, the first sweep of the site was straightforward, but painful. I fixed the broken links and the content. The only thing that made that painful was tracking down the right targets or deleting the links, and finding the content which was buried deep within nested tables, while ensuring that I didn’t disturb the flow in the small viewport.

Once the site was stabilized, the next priority was getting a PayPal button on the site, and an embedded YouTube video of Angel In My Arms (perhaps Jack’s most famous and successful song) right on the front page. Most people who visit the site are looking for that, and for the ability to purchase a copy to use for the Father/Daughter dance at their wedding.

I had never incorporated PayPal into any site before, but this turned out to be the easiest part by far. Once you have a PayPal account, which Jack did, you basically log on to your account there and fill in a small form and they generate a button and the associated code that you simply cut and paste into your site. It just worked, the first time.

Embedding the video on the home page was much more painful. I didn’t have the skill (HTML or image manipulation) to embed it into the existing table structure, possibly requiring a different slicing and dicing of the background images. I wrote a separate post about Table2CSS describing this amazing tool that helped me accomplish that. Basically, Table2CSS turned the home page into a CSS-based page (no tables whatsoever), with every cell from the original table becoming a named DIV.

This allowed me to position things with much greater accuracy, at the CSS level, without having to worry about spanning rows and columns. As I said in that post, this is not a good way to code a production site, but it was the perfect way to quickly accomplish my goal, and buy myself time to consider the real redesign.

Once that was done, I was ready to consider the new site. I have direct experience with two systems for building websites: Zope and WordPress. Opticality.com is built in Zope, and this blog is built in WordPress. Both of those systems are designed to produce pages dynamically, assembling the pieces with programming logic and content that is typically stored in a database. They offer tremendous power. Zope and WordPress are but two of dozens of extremely popular Content Management Systems (CMS). They are designed to handle this exact type of problem.

Unfortunately, they are also typically designed to run in a slightly higher-grade hosting account (shared is still fine), with WordPress being more available on some lower-level hosting accounts. Neither seemed to be a good option for this account, and I didn’t want to be one of those people who told Jack “You have to upgrade your account, because I have a hammer, and therefore all problems are nails…”

Also, while I wanted/needed the benefit of a template-based solution (on the existing site, the JS-based menu was embedded directly in 50+ static HTML files!), it didn’t need to be a dynamic template system, since for the first cut, there was no need for personalization (i.e., different users don’t get different views of the same URL).

So, the first two decisions that I made (after way too much Googling, reading and a bit of experimenting) was to select the YUI Grids CSS for the base CSS layout engine and the htp: HTML pre-processor for the static templating system. The live site uses both, so those initial decisions stuck. That said, along the way, the YUI Grids frustrated me enough that I nearly bagged it, and went so far as to implement another one before switching back.

The entire YUI system (of which Grids is but one small piece) is very powerful, elegant, and well-documented. Unfortunately, I didn’t want to have to read for days, to create a simple layout, so I didn’t. In not reading, I missed one crucial piece, which none of the examples or Layout generators included. As a result, while the layout looked exactly like I wanted it to, the reset portion of the Grids system (amazingly, I actually knew what that meant) 😉 wiped out all styles. So, lists didn’t look like lists. Headings didn’t look different than normal text, etc.

Fixing it turned out to be trivial, but not when I had no clue as to what was going on. The fix meant including just one more YUI component, Base (which redefines all of the various HTML elements to a sane default that Yahoo feels works well in most browsers). I agree, and for the most part have kept their defaults, but I was pulling out my remaining single hair until I realized what I was missing.

I looked at a number of template systems, and I am completely used to the concept of templates in both Zope and WordPress. This was the first time I used a static template system. Basically, you assemble the final production page from a variety of different inputs (page fragments that can be stored in separate files, variables, blocks, etc.). You run the preprocessor on a template input file, and create the resulting static HTML file to be put on the live site.

HTP is quite powerful, yet unbelievably simple to use for the use-case that I had. I couldn’t be happier with my choice in that regard. I have a single master template for the entire site (at the moment). That one template includes a number of different files: the head section of each page (where CSS files are referenced, etc.), the header (where the Jack Kapanka graphic is placed on every page), the menu (so that I can change the menu in one place and regenerate all of the pages with a single command!) and the footer. The template then references variables and variable blocks, which fill in the title of each page and the main content and right sidebar content.

I can restructure the entire site (navigation or look-and-feel) simply, or change the content or sidebar for any individual page just as easily.

To make my life easier, after struggling for a while, I gave up on trying to pin the footer to the bottom of the actual browser page. Not only do I know that this can be done (I found many excellent working examples of how to do this), I have even done it before on other projects. What I couldn’t do was get it to work within the confines of the YUI Grids CSS, without reading tons of stuff. I just gave up, and put in scrolling content in the main viewport, ensuring that no page was too large. I’m not thrilled with the result, but it got me to where I wanted to be much more quickly, so the tradeoff was reasonable (from my personal perspective).

Aside from the YUI CSS, there are only two other CSS files. One for the menus and the other for the content. Pretty simple to maintain and both are relatively small.

Because I am sensitive to people who have poor vision (after all, I live with someone who is essentially legally blind), I coded the various CSS elements to be sized in ems, rather than pxs. This way, if people resize the content by pressing Ctrl+ or Ctrl-, the page resizes fairly elegantly (at least it did for my hundreds of tests on four different browsers, YMMV).

Finally, I also tracked down a flash-based MP3 streaming embeddable object (the first one that I used worked fine in all but IE), and used that to stream the 30-60 second snippets of a variety of Jack’s wonderful songs.

Right before making the site live, I considered that I was putting up a vastly reduced number of pages as well as renaming some basic pages from .htm to .html. That meant that links that people had bookmarked, or perhaps more importantly search engine indexes, would break on nearly every page. That just didn’t feel right.

Redirecting would be trivial in a real CMS, where the headers get spit out by the CMS, and writing logic to catch all missing pages (404s) is generally built in or trivial to write. That wasn’t the case here, and there were roughly 30-40 pages that needed to be redirected (not necessarily to a direct replacement page, but to one of a variety of catch-all pages for each category).

I found the answer on a number of sites. It involved making a directory called XXX.htm (for example). In that directory you create a single file called default.asp. In that file is a tiny VBScript that calls two functions that redirect the page. It was trivial, though tedious to do for all 30+ pages.

I’m sure I’m leaving tons of stuff out (e.g., along the way, to debug the CSS, I ended up installing the Firefox plugin called “Web Developer”, which very nicely complemented Firebug). It was a godsend to be able to change CSS on the fly, and see the result in real-time in the browser window. I also made very heavy use of the JS Console window in both Google Chrome and Safarai (on Windows). They are very similar (with Safari being more sophisticated, but Chrome being much faster).

Like I said above, if I realize that I’ve left out something material, I’ll come back and edit this in the future…

Who Needs Floppies

Send to Kindle

Since I just wrote about my laptop spring cleaning, I may as well get one more geek post out of my system. 😉

I run many Asterisk servers. I love it. That said, I am still running the 1.2.x branch on all of the servers. They are up to 1.4.18.1 on the production branch. I will never install the 1.4 branch. Not because I don’t believe it’s good, but because they are getting close to releasing 1.6 into production (they are currently at 1.6.0-beta6!).

So, I was interested in getting a test machine set up to install it (after it goes production), so that I can get to know it before committing it to production servers. I considered running it on a VM on my laptop, but I really want to avoid that if I can (read my spring cleaning post again for any number of reasons).

I considered buying a used machine on EBay, Geeks.com, Tiger Direct, etc. You can get pretty beefy machines for under $200, and reasonable ones for well under $100 on EBay, but you’re risking the seller, etc.

Last week, while at Zope Corp., I noticed that they were gathering old junk in an area for their own version of a spring cleaning. In that pile were two old machines. One of them was a Dell Dimension 4550, a 2.53Ghz machine, 30GB hard drive, with 256MB of ram. Not exactly the kind of ram you’d like to see, but otherwise more than adequate to power Asterisk. For a test machine, ideal!

I asked (multiple times) if anyone else hoped to snag it, or ever see it again. People laughed (rightfully so). 😉

Into the back of my SUV it went. I stored it for a week in our utility room and today I finally pulled it out. I wanted to install CentOS on it. The other day I downloaded the 3.6GB DVD ISO in a drop over an hour on my FiOS link. Yummy!

I popped the DVD in the drive and booted. Nothing, it just booted into the existing CentOS 4.2 (I wanted to install the 5.1 release). Hmmm. Thankfully I didn’t waste time figuring this one out. I quickly found out that the machine had a CD drive, no DVD. OK, moving on…

I downloaded and burned a CentOS net install CD (only 7.1MB) and booted again. Again, straight into the old CentOS. Hmmm. Somehow, the CD drive isn’t working (boot order was set correctly).

I didn’t have root access on the machine, and it can PXE boot (boot over a network, but I didn’t have a target machine for it to boot off), but it can’t boot off a USB device. 🙁

Floppies to the rescue! My second choice for an operating system was Debian. I downloaded five floppy images for a net install. I booted off of the floppy, and it failed again. This was getting very tiresome…

I booted into the existing system, and tried to mount and read the floppy. It took forever, but finally, I got a clean listing, so there was no hardware problem with the floppy. I tried that with a CD, but it was never able to mount that, so indeed, there is a hardware problem with the CD drive.

It turns out that even though I pressed F12 to change the boot order, and I picked the floppy, it failed. I pressed F2 (for yucks) to get into setup. Once I moved the floppy boot up the ladder, and saved, it successfully booted off of the floppy. Whew.

I now have a smooth running Debian system configured to my taste. I am now patiently awaiting the final release of Asterisk 1.6.0.

So, do we need floppies? Hopefully not going forward. But, as long as there is life in older systems (and clearly there still is), the fact that my four-year-old laptop has a built-in floppy drive ended up saving me some headaches. More important, are you impressed that I had five blank floppies handy as well? 😉