Computers

Asus Wifi Router AiMesh Odyssey

Send to Kindle

This post was rattling around in my head before the current stay-at-home situation arose. Somehow, it never found its way to my fingers, until now.

This will be a typical techie post by me, in that it will be long, rambling/meandering, and likely bore the people who used to mostly read my music blogs.

tl;dr

Most newer (even years old) Asus routers now support AiMesh, a method of turning normal routers into a mesh system. When configured correctly, it works remarkably well and has some serious advantages over pre-built mesh systems (like Google WiFi, Eero, etc.).

My start with Asus Routers

For years, I purchased cheap (often refurbished) routers. I typically flashed them with dd-wrt firmware to make them better than new.

On December 13th, 2013 I broke the mold. I purchased my first high-end router, the Asus RT-AC66U. It was my first “AC” router (now called WiFi 5) and the first time I ever spent that much money on a router of any kind ($179.99 before tax!).

It has a (theoretical and nonsensical) top speed rating of 1750 megabits per second (mbps). That’s still pretty good even by today’s standards.

Back then, this router was glorious. I was very happy with my purchase and it was my main router for many years. It only got replaced when I switched to Verizon FiOS which at the time came with their own branded router.

This router still gets firmware updates (what a credit to Asus!). More on that later in this (happy ending) saga.

The big move

In 2015 we moved from NY to VA. Our house in VA is large, but not that large. The layout puts our master bedroom far away from every other room (no centrally placed router would reach that room).

Mesh routers hadn’t exploded in popularity yet, so I never considered one at the time. Given that I was installing FiOS there (the previous owner had FiOS as well), I knew that my main router would be a Verizon branded one.

I had an electrician put in a hard-wired Ethernet port in three rooms, in addition to the closet where the FiOS router lives. There is an Ethernet port in the wall in our office, the master bedroom, and the basement (where the TV is). Each of the three rooms is a home-run back to the closet, so each of the rooms connects into one of the four LAN ports on the back of the FiOS router.

The office is very close to the closet with the main router in it, so to begin with, I didn’t put another router in there.

The master bedroom and basement each got routers. Those routers were put in AP mode (Access Point, or Bridged). That meant that the main FiOS router still handed out all IP addresses (no matter the room or which WiFi router our devices were connected to).

For simplicity, each of the AP routers got a separate SSID (the WiFi Network Name). Devices that were fixed in a particular room always connected to their local SSID/Router (e.g., the TV, or a Fire Stick/Roku, etc.). Our cellphones had to switch to a new SSID when we moved from one room to another.

Since the main router signal really doesn’t reach the master bedroom at all, the phones would switch pretty quickly (without any manual intervention), simply because they would lose the signal completely and immediately begin searching for a new signal.

It wasn’t instantaneous, so you would have to think about starting something before you walked out of a room, because the connection would definitely drop for a few seconds.

After a few months, the FiOS router started getting flaky on the AC channel (this eventually straightened out years later after some arbitrary firmware update). This led me to pull out my old Asus RT-AC66U router (which was in a drawer for a while by then) and install it in the office.

Even though the office was close to the main router, I could no longer trust the WiFi on the main router, so I used the wired connection in the office to hook up the Asus as an AP and all was well again.

Growing the Asus family

The above setup continued for a while and worked fine. In the master bedroom I had an old TP-Link router (another brand that I really like). It worked fine, but wasn’t the fastest thing and would rarely cause me tiny frustrations.

Six weeks before buying my first Asus router, I bought a Nexus 5 phone from Google. That marked my switch from Verizon (where I was using the Galaxy Nexus) to T-Mobile. I’ve been a happy T-Mobile customer ever since.

My house had effectively zero coverage on T-Mobile when we moved in (it’s marginally better now, though I rarely ever have to use it in the house). After a few months in the house, I saw an ad for a T-Mobile Cellspot. It’s a rebranded Asus RT-AC68U router (one step up from the 66U that I owned).

T-Mobile was selling them very cheaply (much cheaper than the Asus branded 68U). I bought it not only because the price was excellent (if memory serves me well, I think I paid $75 for a new one, when the 68U was going for $199 new), but also because it was (supposedly) optimized for T-Mobile WiFi calling (which given my cell signal at home sounded like a great addition).

I replaced my (rarely) flaky TP-Link router with a T-Mobile Cellspot one in the master bedroom.

The dreaded, complicated upgrade

The new router ran great for months, but I never used the WiFi calling feature (that’s a long story for another post that I will likely never write).

At some point I stumbled on a post about “Turning a T-Mobile Cellspot into a Full Blown Asus RT-AC68U“. If you skim that article, and more importantly the tons of comments with a lot detail in them, you’ll see that this is not for the feint of heart.

Fortunately or otherwise, this is exactly the type of thing I enjoy (hey, you get your kicks your way, I’ll get mine my way…).

With some twists and turns, and needing to read a bunch of comments when I got stuck, I eventually turned the Cellspot into a full blown Asus router.

Why bother?

That’s a fair question. The Cellspot worked perfectly (for some definition of perfect). There are a few good reasons to consider the upgrade. By far the most important one is that the Cellspot firmware was way behind, and while occasionally upgraded, pretty rarely.

That meant that when the various kracks that have been discovered in recent years against WiFi routers are revealed, a real Asus router will be patched months (or years!) before the Cellspot will. That’s reason enough to bother if you don’t require any special handling of the WiFi calling feature for T-Mobile specifically.

Another reason to upgrade is to flash different firmware, e.g., dd-wrt (mentioned above). That’s not possible from the base Cellspot firmware, but is from the Asus firmware.

Finally, if you want to run the AiMesh software, you need to be on the real Asus firmware.

A quick look ahead

I’ll deal with this later on, but there are newer (shorter, better) ways of upgrading a Cellspot, and very important warnings and caveats which didn’t exist at the time I upgraded my first one.

The point of this interlude is to tell you to read on rather than follow the instructions in the article linked above.

Another Asus added to the family…

A year later, my better TP-Link router running in the basement started to have some issues (again, I think it was a firmware issue that I later resolved). I decided to replace it with another upgraded Cellspot.

I bought one refurbished on Amazon for $48. I followed the same upgrade instructions linked above, and had another working Asus RT-AC68U router installed in the basement the same day it arrived.

I now had four routers in the house. The main FiOS one which was mostly acting as a wired router to the Internet (a few legacy devices, security cameras, etc., were connected directly to the 2.4Ghz channel on that router), plus the original RT-AC66U in the office, and the two upgraded Cellspots, in the master bedroom and basement.

What? Asus doesn’t make infinitely perfect hardware?

About six months ago, I walked into the office and the old 66U router was dead. No lights, no Internet (obviously).

I disconnected the cables and pulled out the TP-Link Archer C9 that had previously been running in the basement. That’s the one that I asserted was flaky only because of a specific firmware.

I reconfigured it to take the place of the old 66U, made sure it was current on firmware, and turned it on. Problem solved, we were back in business.

I decided to try and diagnose what went wrong with the old 66U (just out of curiosity, as it was 6 years old at the time and didn’t need to provide any additional service to make it one of the more outstanding tech purchases).

I connected it directly to my laptop using the wired port and fired it up. The lights blinked for a second and then went dead. It only took me one more try to realize what was wrong. The power button was broken. It simply wouldn’t click and stay on.

In typical MacGyver mode, I found a round hard piece of plastic, scotch taped it on to the power button, then put a rubber band around that to ease the pressure on the scotch tape.

Voila! A working 66U router, once again…

I swapped it back for the TP-Link, which was now perfectly configured to be an instant backup router should my MacGyver skills prove unworthy.

Wasn’t this supposed to be about AiMesh???

Oh yeah, though I did specifically mention that this would meander and ramble and I didn’t want to disappoint on that front either…

Unfortunately, a bit more meandering is necessary, just for historical accuracy, not to discuss the merits of AiMesh.

Getting into trouble

Before getting the second Cellspot router, I upgraded the first one using the built-in Asus firmware upgrade tool once, and it worked great.

When I got the second one, I (of course) upgraded it to the same version of firmware as the first one was on, with no issues.

Toward the end of last year, another krack was discovered, and I checked whether Asus had an updated firmware to mitigate it. They did.

I updated the old 66U first, and it upgraded perfectly.

I updated the first 68U and it reverted back to the original Cellspot firmware (which had even more issues than I was currently trying to fix!).

Whoah, what just happened?

A bit of Googling and I found that Asus decided that if they noticed that a Cellspot router was being flashed with Asus firmware (rather than a T-Mobile branded firmware), they would roll it back to the original.

Darn!

Silver Lining?

This caused me to find newer methods of turning it back into an Asus router, including ways to thwart Asus from rolling it back. The old method (linked above) still works, and has the appropriate warnings and methods to avoid the rollback, but it’s still more complicated than the new ones.

This is a link to the instructions that I used the second time around. It looks long and complicated, but that’s because there are three different (analogous) methods for accomplishing the upgrade. The key point is that this avoids the full downgrade which the original method requires.

When I did this and got my first router back to the new firmware, and made sure that it wouldn’t downgrade again, I flashed the second one as well, apparently successfully.

Apparently???

Wait, what? Either it worked or it didn’t. Well, yes (and no).

I was so smart (being a seasoned techie) that I was incredibly stupid (being a seasoned know-it-all).

The first time that I upgraded each router, I meticulously followed the dozens of steps in the original article linked above. Amazingly, following detailed instructions worked (can you believe it?).

This second time around, using the simplified instructions (which are 100% accurate and would work if you followed them exactly!), I skipped one crucial section (a few commands) because I assumed that they were unnecessary the second time around (meaning, I thought I had the correct files from the first time around just sitting in my folder waiting to be reused).

Again, why apparently then? Because I don’t end up using the router in the basement all that often, and it took trying to get AiMesh to work (still coming, I promise) to finally see what I had done so wrong…

AiMesh, finally!

Well, actually, getting closer, not there quite yet…

Now that everything was running fine (or so I thought), I decided to finally experiment with turning on AiMesh in all three Asus routers.

I really didn’t need it, my setup was working well enough, but I was curious and now that I was running the latest version of Asus firmware on all three routers, I was in a position to find out. I could always roll back to non-AiMesh mode if it wasn’t to my liking.

Unfortunately, I hit a snag immediately. It turns out that the old 66U is not capable of running AiMesh software. There is a newer revision (RT-AC66U rev B) that can run full AiMesh, but mine is too old and can’t do it.

So, I popped on to Amazon and ordered another Cellspot for $48 (this time, it was labeled renewed rather then refurbished).

Unfortunately, I compounded my error by skipping the exact same block of steps on this newest router as I had on my others, because I hadn’t yet noticed the problem that I had introduced into my network…

Turning on AiMesh, finally, really, this time

I flashed the real Asus firmware onto the newest Cellspot and retired the old 66U once again. I was now ready to flip the switch and turn the three routers into an AiMesh mesh network.

All attempts to get them talking to each other failed! After some searching, it seemed that some people had more success using the Asus Router App on their phones, than using the web browser interface.

I broke down and installed the app on my phone. I did seem to get a bit further, but if I got something going on one router, another would disappear, and then that would be flipped. It was maddening.

Discovering the problem

Well, the problem was entirely created by me, so I was the problem. The crucial steps involved the following:

Upload original_cfe.bin to https://cfeditor.pipeline.sh/
Select 1.0.2.0 US 1.0.2.5 US for AC68P or 1.0.2.0 US AiMesh for AC68U with AiMesh as Source CFE
Download the new .bin
rename it to new_cfe.bin

I assumed that having done that once on the original router, I had the correctly modified CFE (now called new_cfe.bin). Meaning, I thought that all Asus routers (at least of the same model number, which mine were), shared the same identical new cfe.bin file.

You’ve all heard what the definition of the word assume is, right? I’ll spare the gentler ears/eyes from seeing it here again…

It turns out that the file is unique to each and every router. Why? Because among other things, it contains the MAC Address of the routers ports (both Ethernet and WiFi) embedded in it. So, by reusing the same cfe.bin file on all three routers, they were all running with the same exact MAC Address.

To be clear, they each had different IP addresses assigned, but that doesn’t make the problem better. The way local networking works, there is an ARP table maintained that tells the network how to reach the physical machine associated with an IP address, by translating it into the MAC Address.

So, when I tried to reach any of the three routers via their IP address, all of them returned (at a very low level) the same MAC Address, and therefore it was entirely random (perhaps based on distance in the house) as to which router would see my request!

Ugh.

The solution

Once I understood the problem, the solution was obvious and straightforward, but by no means simple. I needed to fix the individual cfe.bin files, but I could no longer follow the original instructions (uploading them to a website which would edit them) because I didn’t have the original files to upload!

Worse, I needed to figure out which MAC Address was correct for which router, which meant going to each of them and finding the stickers with the serial numbers and MAC Address printed on them.

Once I did that, I had to use a HEX Editor to load up each file, find the wrong MAC Addresses (yes, plural, since there are multiple interfaces in each router) and type over them (very carefully).

Then I needed to copy them over to the correct router, flash them, reboot the router, and pray.

Yes, that worked!

Are we there yet?

So, was I really done? Unfortunately, not quite.

I was able to get AiMesh going, but the speed in the bedroom was pathetic (reliable, but pathetic). The speed in the basement was great!

This one didn’t take long to diagnose, but it did take a while to fix…

By default, AiMesh sets the backhaul (how the mesh routers communicate with each other, rather than with the client devices or the Internet) to auto.

In the case of the basement router, that ended up using the wired connection over the Ethernet cable (which is exactly what I expected the default to be).

In the master bedroom, even though the router is fully wired like the basement one is, auto defaulted to wireless backhaul.

If you recall from a few days ago, when you started reading this post, the master bedroom is too far away to get a reliable signal, so the backhaul was awful (amazing that it worked at all!).

The solution is simple, force the backhaul to be wired. Yes, simple, in theory, but I couldn’t find any way to do that!

More searching on the Internet and I finally found a single forum post where someone linked to the official guide with highlighted screenshots.

Bless that individual, and Google, for surfacing the correct post (after much tribulation).

Here is a link to the guide, with step 6 being the secret sauce to finally see where the default backhaul could be changed/

Conclusion

So, was it all worth it? Yes, of course.

First, I love technology puzzles, even ones created by me. Once I screwed up the settings really badly, I just had to figure out how to get myself out of it. It wasn’t fun (on any level), but it was instructive, informative, and satisfying (in the end).

Much more importantly, I am now running a full mesh network and I like it. Our phones don’t drop when walking from our office to our bedroom. All three routers are effectively managed from the one main AiMesh one.

Why AiMesh is really cool

Most importantly, it’s a mix and match network. You don’t have to buy kits. You don’t have to have identical routers at each node. As long as a router supports the AiMesh firmware (which many Asus routers do!), it can be a node (or the master) of your AiMesh network.

This is crucial. Today, I don’t own a single WiFi 6 (AX) device. So, it would be overkill for me to buy a WiFi 6 router, let alone a WiFi 6 Mesh Kit.

However, if/when I get a new laptop (I’m typing this on a 6-year-old one) that has WiFi 6 in it, or a new phone (mine is 2.5 years old), I’ll be able to get an Asus WiFi 6 router (any of them!) and use it as my main AiMesh node (and place it wherever I use the laptop most frequently, which now, is in the office).

I won’t have to change the other routers, or change any settings on the other routers either. They will all just work. My laptop (and phone) will work with WiFi 6 when they’re connected to the new router, and automatically and gracefully downgrade to WiFi 5 when they roam to another AiMesh node that’s still on WiFi 5.

Further, I can even do the WiFi 6 upgrade piecemeal. For example, I could get a lower-end WiFi 6 AiMesh router first, and make that the master. Then, as I have more devices that can take advantage, I can get a higher-end WiFi 6 router, make that the main one, and move the older WiFi 6 router into the bedroom.

The ultimate beauty is that each of the routers can always be instantly returned to be non-AiMesh routers. So, I can pass them on to friends when I replace them with a WiFi 6 one and those people can use them as standalone routers, AP bridges, or create or augment an AiMesh of their own.

That’s what makes these more flexible than full-time mesh systems.

Also, it doesn’t hurt that the models I’m running can be picked up for $48, if you’re willing to be super careful and avoid the stupid mistake that I made.

Email Spam Coordination Across Different Services?

Send to Kindle

Surprise! It’s been 1,743 days since my last post. This isn’t a music post, so even if you are momentarily pleased to see a new one pop up in your feed, you’ll likely be disappointed by this “techie” one…

Disclaimer: this is 100% speculation on my part, I have done zero research to see if my theory is correct. I’m sharing it mostly to remember these thoughts as they occur, and perhaps in the distant hope that someone will either confirm or disprove the hypothesis with real proof.

Hypothesis

There is high level coordination among disparate email providers to determine bulk senders, in a completely unintuitive but ostensibly extremely clever manner.

I put the above hypothesis first, so that you can stop reading right now if you have no interest in this topic, because as is my usual style, this is likely to get long, quickly…

Background

I have operated my own email server for over 20 years. My CTO friends think I’m insane (perhaps for other reasons as well, but definitely for running my own email server). For clarity, it’s Postfix, not an email server that I wrote, just one that I operate on a dedicated server (where this WordPress blog is hosted as well).

I’ve been through the wars with email, sometimes caused by my misconfiguration or misunderstanding, but sometimes entirely out of my control (e.g., when my dedicated server was transitioned to a new data center and the static IP changed, and it had previously been on many RBL blacklists!).

Over time, I’ve tamed the configuration into a very stable setup. That has included complying with SPF, DKIM, DOMAIN-KEYS, DMARC, etc. Basically, anything that Google or Microsoft claim will help them validate email from me as a sender, and not mark it as spam or worse, just bounce it back to me.

The Problem

While my setup works flawlessly most of the time, on occasion, Lois or I will get a bounce back from someone (typically Google/Gmail, but sometimes Microsoft/Outlook/Hotmail). Once that bounce occurs, we’re often shut out from sending email to that service for a full day (rarely, longer!).

As you can imagine, it’s wildly frustrating to not be able to send an individual mail (this is not spam or bulk mail, but rather one to one emails to friends).

This is the error message we get in the bounce:

               The mail system

XXX@gmail.com: host gmail-smtp-in.l.google.com[74.125.20.26] said:
550 Action not taken (in reply to end of DATA command)

Wow, very helpful, “Action not taken”. Nothing indicating what we did wrong and why Google rejected the email. It feels like it should be a transient error, but it not only persists, it typically stops us from sending any further emails to anyone on that service for the rest of the day.

This has been going on for at least a couple of years. Just not often enough for me to pull out my (one remaining) hair trying to track it down.

What is going on?

Until this week, I literally had no idea (perhaps I should have diagnosed it earlier…). While I can’t say with certainty that my new understanding covers all aspects of this error, I can say with certainty one use case that definitely causes the error, and it might explain every single occurrence that we’ve had in this regard.

We (Lois more than I, but I do it too) share identical emails individually with a variety of friends. Specifically, if a group that we love puts out a new music video, Lois will send a link to that video to a group of people, but each will get their own separate email with the same email body and subject, so that they can reply just to us and not be BCC’ed in a large group.

I had never made the connection before that this somehow triggered the bounces, even though they were short emails, sent to people we’ve sent emails to 100’s of times, that almost always respond to those emails, and that we’re likely in the contacts list of the receivers. I couldn’t imagine that we were tripping any spam filter.

What do I think is going on?

I now believe (very firmly, with zero proof) that the body of the email (not including the subject, or the receivers) is being hashed. When another email with the identical hash (of the body) comes through (not sure if there is a time-limit or not), the service bounces it immediately with the above error message of “Action not taken”.

How did I come to this conclusion?

A week ago, I sent out invitations to a number of people to a house concert in March. Most of them went out fine, a very few bounced, so I didn’t have any suspicions over those bounces just yet.

This week, we discovered that one of the band members couldn’t make it, and we agreed with the head of the band that we would simply cancel the show. So, I sent an email to everyone that I had previously written to (except those that already said they couldn’t make it) telling them that the show was cancelled.

The cancellation emails were all bouncing, except for the first one!

I complained to Lois that the cancellations were bouncing, and that we would likely have to wait at least a day for the bounces to clear before I could send them again (still having no idea why they were bouncing).

Lois asked me why I thought the vast majority of the invitations didn’t bounce (to the same people, and those had links to the band in them, so if anything, they would have appeared to be more spammy).

It was a good question, to which I had no answer.

But, my brain often needs to sleep on problems before enlightening me, and indeed, the next morning I woke up with a theory to test.

While the bodies of the invitations were nearly identical, they each started with “Dear XXX,”. So, they couldn’t have hashed into the same exact body. On the other hand, each of the cancellations were identical in every way (copy/paste) without the lead Dear XXX. So, they indeed would hash into the same body.

To test the theory, I added back the “Dear XXX” to the cancellations, and sure enough, every single one went out without any bounces!

How is that Cross Provider Coordination?

Aha! It turns out that once Gmail (for example) bounced an email, so did Microsoft, Verizon/AOL/Yahoo, Apple (via @mac.com) and likely others (like Comcast).

To be clear, once Gmail bounced an email, if the next email went to hotmail.com, but was the very first such email to any Microsoft address, it was bounced immediately.

That implies (to me, proves) that it’s not just that they too hash incoming emails and bounce duplicates, but that they share the hash (somehow) among the competitive service providers, in order to more efficiently identify bulk senders quickly.

Please don’t ask me to conjecture on how they do that, I don’t have a clue.

Summary

We are both unbelievably relieved to understand what has been going on with these bounces (or at least to delude ourselves into thinking we understand it now).

Postscript

I doubt anyone who normally reads my blog will have an interest in this, but I needed to get it off my chest anyway.

I can only hope that an expert will weigh in and either confirm or provably deny my hypothesis.

Also, I have at least one more (completely picayune) email issue (rant?) that I will share if I get any feedback that this kind of stuff is interesting to anyone…

With Friends Like This…

Send to Kindle

With friends like this, who needs family?

Yesterday, I entered a weekly contest that Amazon.com runs. This week’s prize is a Kindle 3G. It’s only the second time I’ve entered an Amazon contest.

After entering, I was offered the opportunity to update my Facebook status. It was optional. Even though spreading the word about the contest feels counterintuitive, because it encourages more people to enter, I decided that what Amazon is doing is nice and I wanted to help spread the Amazon love.

So, I posted the following:

I am so on the fence for buying a Kindle. Winning one would solve my problem. 🙂

Below that was the full link and description to the contest.

Within minutes, a buddy of mine IM’ed me to tell me that he believed my Facebook account was hacked! I got a huge laugh out of that, because I correctly pointed out that his had been hacked a couple of months back!

I assured him that it was me that essentially created an ad for Amazon.

Today, I got a package from Amazon. Here’s a photo, including a gift card (click for a larger version):

KindleGiftWrappingAndCard

A different friend of mine (you know who you are!) saw my update and decided to create a contest of one and he immediately declared me the winner! Wow, unbelievable. I don’t know what to say, except:

Thanks, you are beyond awesome!

I have no doubt that I am going to love this. It turns out that my top five favorite gadgets of all time were things I never thought I wanted, let alone needed. Someone else knew better and bought it for me. Within a day, each became so indispensable to me that I couldn’t imagine how I lived without it the day before (e.g., my first email-only Blackberry, my first GPS, my first Treo, my current Droid, my Garmin Forerunner).

Let’s safely add the Kindle to that list.

Thanks PSC! Smile

Proposal to Allow Rooting Android

Send to Kindle

Rooting is the process of gaining low-level control of your cell phone’s operating system (I’m only going to discuss Android in this post). It voids your warranty (presumably both from your carrier and the phone manufacturer). That’s fine, perhaps it even should.

Still, carriers and phone companies try to actively stop you from rooting. There are a few obviously legitimate reasons/concerns on their part:

  1. Supporting a rooted phone can cost the companies in a number of ways
  2. Rooted phones can be used to access services that the carriers want to charge a premium for (e.g., Tethering)
  3. Problems with a rooted phone are often misunderstood by the consumer, reflecting incorrectly (and poorly) on the carrier or phone manufacturer

There are also a few shakier arguments against rooting. Rather than give them credence, or get distracted in arguing their merits, let’s skip them.

There are many legitimate reasons to root, covered in many articles. Let’s leave that topic for a more technical post, this isn’t that.

A small percentage of consumers actually root, but given the explosive growth of Android phones worldwide, the absolute number of rooters is large. They are also the most vocal users, trying to convince others to do it too.

When a rooted phone goes bad, the consumer tries to unroot and return to stock (removing all traces of their previous rooting and customization), in the hopes that the carrier will then fix the phone or replace it under warranty. Of course, the carrier might not be able to detect that the phone was previously rooted and perhaps even ruined by virtue of the rooting (overclocking the CPU for example). In that case, the carrier may indeed incur a cost that they shouldn’t bear.

My proposal:

Make it brain dead easy to root. When rooting (using an app provided by the carrier/manufacturer), ensure that the consumer is warned of the dangers and that the warranty is being voided. Also ensure that they know their account is flagged that they have opted out of the warranty.

This removes any extra support costs. In fact, returning the phone to stock will make no difference, as the user has forever agreed that this specific phone has knowingly and willingly revoked it’s right to be serviced, for any reason whatsoever.

I would go further and suggest that the carriers/manufacturers will make more money in this scenario, because if the user just drops their phone and breaks it, it is not covered, period. That will cause some percentage of rooters to have to purchase a new phone should anything go wrong with their phone, even if it isn’t related to the rooting.

Of course, that might cause many people not to root. That shouldn’t bother the carriers, since that’s the position they would like to see today.

Next, concern #2 above. While there might be a number of concerns, mostly, it boils down to Tethering, the ability to use your phone as a WiFi or USB modem for your laptop. The concern is that you can utilize significantly more data/bandwidth in this manner (e.g., streaming movies to your laptop) than you would ever practically use on your phone itself (assuming you had an unlimited data plan on your phone to begin with).

The carriers are concerned about monetary loss (they charge a hefty price for devices that Tether, or for tethering plans on the phones) as well as network quality if too many people clog the airwaves (with or without paying them) and slow down the network for all users.

My proposed solution for #2 is to charge no premium for tethering (with or without rooting!), but simply meter the data (no more unlimited plans, or price unlimited plans knowing that people will tether). AT&T is doing this already, but they charge $20/month for the right to tether, which is outrageous, since they charge for every byte of data transferred. Who cares how/why I used the data, you charge me for it.

That’s it. Let’s summarize:

  1. Make the voided warranty an overt opt-in so that support costs go to zero for any rooted phone.
  2. Charge for all data coming through the phone (tiers are fine, including unlimited), so that the carriers benefit if people tether.

NginX and WordPress OpenID Plugin

Send to Kindle

I normally have super powers when it comes to persevering through annoying technical problems. On rare occasions, Kryptonite appears out of nowhere and strips me of those powers.

This blog is powered by WordPress.org software. I use a variety of plugins to make my life a bit easier. One of my favorites is the OpenID WordPress Plugin. I’ve been using it since it first came out and I’m very happy with it.

Because I was using it from the beginning, I lived through a few rough upgrades (not complaining, just explaining). The problems always got sorted out quickly.

Back in June 2008 (yes, a long time ago), I switched from Apache to NginX and have never looked back. The trickiest part of switching a WordPress site (especially WordPress MU, even more so with BuddyPress installed!) is converting the Rewrite Rules that are typically stored in a .htaccess file into nginx syntax.

While I got my initial attempt to work, I’ve grown way more comfortable and familiar with nginx over the past year, and I’ve tweaked my rewrite rules quite a bit.

Along the way, there were numerous updates of the OpenID Plugin as well. Most times things just kept working. Once, OpenID stopped working, and I tried to track it down. I gave up pretty quickly because I had seen that behavior before with an individual update of the plugin. Sure enough, the next update got me working again.

Then at some point, it stopped again. At this point I had conditioned myself to ignore it, and I went back to logging in with a password (something I really prefer not to do). This lasted for months. At some point, the plugin got updated at least twice, and things still didn’t work for me. Now I was getting annoyed.

I stopped trying to use OpenID. Two days ago I tried again, I can’t explain why. I got a strange Google Toolbar redirect error message. I did a search, and someone was complaining about something that looked similar, but had nothing to do with WordPress. A Google employee responded to him that he should temporarily disable the Toolbar (in Firefox) and see if the problem went away.

I decided to try that and instead of a strange Google error, I simply got a 404. What? A simple 404 couldn’t be the plugin’s fault. Time to dig in (finally).

I turned on nginx debugging and tried to log in. I was shocked when I saw that the error had nothing to do with the plugin. Instead, my nginx rules weren’t even calling in to PHP to let the plugin do its work.

Without a doubt, I had an error in one of my rewrite rules in nginx. There’s a possibility that at some point, the plugin changed the way it redirects, so that it was a combination of a new URL coming back to me (that I wasn’t catching correctly) or simply one of my changes in nginx, without the plugin doing anything different.

I added a new rule and was able to log in (for the first time in over six months!) via OpenID.

In this case, I let my own normal persistence fade, because I incorrectly assumed that the problem was contained in the plugin. Shame on me!

Self-Service Pain

Send to Kindle

It’s extremely easy to cause yourself a lot of pain while performing maintenance on your computer. Here are two sure-fire steps:

  1. Do something really stupid, while being aware that you are doing so!
  2. Compound the error by being macho, and wanting to fix it manually!

Voila! You will have no one to blame but yourself for your pain, and you can be proud enough of the time wasted to waste a little more by publicly flogging yourself in a blog (like, say, this one…).

Here’s what I did to myself this morning…

Yesterday was Patch Tuesday at Microsoft. When I booted up this moring, Windows informed me that there were four critical updates available, and two optional ones. I looked over the list and was happy to accept the four critical updates.

Of the two optional ones, one was for my LAN device (which I’ve successfully updated in the past, so I was happy to include that). The other was for an external tablet device that I don’t own, and will likely never own. So, why did I check to include it? Only because I thought it might be more efficient to have the updated driver (dormant) on my system, than to hide it from future updates, but make Windows notice each time that it was out-of-date.

Oh oh, first big dumb mistake. When I restarted the computer, installing that driver triggered Vista to think that I now had a Tablet device, and it automatically turned on Tablet PC mode. In itself, that wouldn’t be so bad, except that it disabled my touchpad, which is my only mouse. 🙁

Other than resizing and moving windows around, Windows (XP and Vista) are suprisingly easy to navigate around with only a keyboard, even without Accessibility settings turned on.

I could have undone the Windows Update quickly and painlessly, through the keyboard. But no, I’m macho and need to figure out how to fix this on my own. I launched a browser, searched Google, and found out how to turn off Tablet PC mode. That worked (Tablet PC mode was off) but my touchpad was still dead, even after a reboot.

I found an updated Synaptics Touchpad driver for Windows Vista x64, downloaded and installed it, but it failed to load properly after the reboot.

After dorking around way too much (nearly 90 minutes!), completely mouseless, I finally broke down and did what I should have done in the first place. I pulled up the System Restore facility, selected a restore point from this morning and let it do its magic.

After rebooting, everything was exactly the way it was before I updated the system. I then reapplied the four critical updates plus the networking one. I hid the Tablet (IdeaCOMM) update forever, and all is back to normal and wonderful.

To summarize:

Don’t install optional updates for hardware you don’t own!

If you make any mistake after an update, roll it back immediately!

Lessons to live by. 🙂

VirtualBox Multiboot USB

Send to Kindle

Yesterday, I wrote about paying for free software. At the very end of that post, I highlighted a program called Macrium Reflect. That program can automatically create a Linux-based Rescue CD (in order to restore a previously saved image to a damaged or new hard drive).

On their site, they have a good tutorial for how to put that rescue ISO on a USB drive. As long as your BIOS supports booting from the USB drive (most modern ones do), it’s a tad more convenient to carry around a flash (thumb) drive than a CD.

In that tutorial, they use a program called UNetbootin (Universal Netboot Installer). What’s cool about this program (free and open source as well) is it can take practically any ISO and create a bootable USB drive out of it. It has many other cool features (e.g., it can automatically download any number of Linux distros and create a bootable USB or CD without you even knowing the location of the Linux project website!).

UNetbootin uses SysLinux under the covers to create and manage the bootable USB drive. Within SysLinux, there is a single file, syslinux.cfg, which controls the menu of selections that can be booted (different kernels, options to pass to a kernel, etc.).

Now switching gears for a moment, then back to the above to tie it all together…

There are a number of high-quality Virtual Machine programs/products available for all of the major operating systems. The three biggies on Windows are VMware, Virtual PC (directly from Microsoft) and VirtualBox (from Sun). All three are very capable, and all three have at least one version that is completely free.

On my old XP laptop, I used to use VMware Player. It’s free and quite good. I have read that recent versions of Virtual PC are good as well (also free), but I’ve never bothered to install it. While I understand that you can run Linux in Virtual PC, I believe it’s not supported, and I don’t really have a need to run Windows under Windows, so I passed on checking it out.

A few months ago, I stumbled onto VirtualBox. It used to be called InnoTek in a previous incarnation, and was purchased by Sun. There is a free version, which is fully open sourced as well, and there is a proprietary version which adds a few bells and whistles (including some cool USB support), which is available in binary form for free as well (for non–commercial use).

Since I have no interest or need (or capability!) to mess with virtualization source code, I am using the full binary version. On my new Vista Ultimate x64 laptop, I have only VirtualBox installed. I didn’t even download VMware (no knock on their product whatsoever!).

Here is what I like about it. Very fast to load. Full 64 bit support (both their app and guest operating systems!). Virtual PC now has 64 bit support for their own application, but you can only run 32 bit guests (if I understand correctly). Most importantly, I like the fact that it’s more complete (or at least easier to use) than the free version of VMware. I’m way too casual a user (I can go months without launching a VM!) to be willing to pay for VMware Workstation.

So, here are my two normal use cases with VirtualBox, and why I rarely need to run it:

  • Check whether a new Rescue ISO works
  • Do something fancy with ssh and X-Windows

The first one is simple. For emergency purposes, I carry around a few Linux Rescue disks (used to be only CDs, but stay tuned). My current favorite is SystemRescueCD (currently in version 1.1.4). When a new version becomes available, I download, and boot it immediately in VirtualBox, make sure it seems to work correctly, and only then burn it to a CD and toss the old CD.

The second is rarer, but more complicated. On rare occasion, a friend of mine who is running Linux (that I set up for her) on a very old laptop (that I gave to her) has a problem that I can’t talk her through over the phone. When that happens, I fire up CDLinux under VirtualBox, do some port forwarding on my router, do some ssh magic, and take control of her machine (by allowing her to ssh into my box first, so I don’t have to make changes on her firewall!). I can then even run GUI apps from her machine, redirecting the X session back to me!

Anyway, the point is that VirtualBox works really well, has tons of knobs (we’ll get to one of them in a minute), and doesn’t seem to slow down my pretty darn fast system.

Back to our main story…

When I created the Macrium Reflect Rescue CD (burned to a real CD), I also followed the tutorial to creat a bootable USB disk. When I looked at the disk, I saw the SysLinux stuff, and noticed that all of the Macrium files were in a folder.

I then experimented by using UNetbootin to create another bootable USB disk with SystemRescueCD on it. I saw that it used a different directory to store the various Linux kernels that it can boot. I was able to copy that directory to the other USB disk and copy/paste the lines from the syslinux.cfg file on the SystemRescueCD drive into the other syslinux.cfg.

I did the same thing with CDLinux (version 0.9.0 Community Edition). It used the same name for its subdirectory as Macrium did. I renamed the subdirectory before copying it over, and used the new name in the merged syslinux.cfg file. That worked, because once SysLinux gives control to the kernel in the renamed directory, everything else is relative to that new root directory!

I then rebooted my machine to test the new USB disk. It booted perfectly, and I had 28 choices of kernels to boot from! SystemRescueCD offers most of them, but I had Macrium Reflect and a few flavors of CDLinux to choose from as well. I was able to boot both 64 bit and 32 bit versions of SystemRescueCD successfully. Awesome.

Now the big test. I wanted to see whether I could boot that USB disk from VirtualBox. That would allow my normal use case of testing new releases without having to burn and reboot. Unfortunately, the GUI for VirtualBox does not permit an actual hard disk (USB or otherwise) to be directly attached to the VM (at least not for direct booting).

A quick scan of their excellent manual gave me the answer. There is a command line administration tool called VBoxManage.exe that can be used to create a tiny virtual disk (a VMDK file) that essentially points to any real disk or partition. I used that to create this virtual pointer to my USB drive. It worked perfectly.

I then attached that tiny VMDK disk to my virtual machine and fired it up. Voila, I got the same 28 choices to boot from. I couldn’t get the 64 bit versions to work (they boot, but they claim to be missing modules and won’t start X-Windows), but everything else works flawlessly under VirtualBox.

So, now I have a multi-boot USB drive, that I can keep adding stuff to, that I can test under VirtualBox to be sure it will work correctly should I ever have an emergency. It’s a 1GB USB drive, that has all of these various operating systems and tools on it, and I still have 500MB free. 🙂

Cable Woes Resolved

Send to Kindle

On December 13th I wrote about my long-standing cable Internet woes. At the end of that note, I mentioned that I would update everyone if my solution worked.

Today is the first day that we are back in the apartment, and I can now report that we’ve had zero drops in the past six hours. That seems to point the finger squarely at the old Netgear FVS318 ProSafe VPN Firewall. While it was working, it was also flaking out unbelievably.

Here’s a big bonus: the WAN port on the FVS router was a 10Mbps port, even though the eight LAN ports were 10/100. The Netgear WNR854T N router that I replaced it with has a 100Mbps WAN port, and GigE LAN ports (purchased re-certified for $40!). So, my download speed was topping out at around 8Mbps on the old router (when it wasn’t dropping packets), but the current one is reliably downloading at 18Mbps! Awesome!

The upload speed is still a very pokey 490Kbps, but you can’t have everything, at least not all the time. 😉

So, I’m now back to 60/40 as to whether I prefer the house to the apartment. Purely due to the Internet connection, I was closer to 90/10 for the past six months!

How little it takes to make me happy. 🙂

Vista speech recognition

Send to Kindle

I’ve been fascinated by speech recognition for a very long time.  I used a program called Simon on a NeXT computer back in 1992. I have toyed with every version of Dragon Naturally Speaking since v2 (now owned by Nuance). I keep upgrading my copy of Dragon Naturally Speaking (through v9, I haven’t done v10 yet), even though I never actually use it for anything real beyond checking out how much better each version has gotten.

The primary reason I don’t dictate more is that Lois and I work two feet apart 99% of the time. That makes it awkward to be speaking to the computer, from a number of perspectives. Still, I remain intrigued by the concept.

Vista has built-in speech recognition. My new laptop also has Realtek HD Audio built in, including a very high quality microphone next to the webcam, at the top of the monitor. As an example, I tested it with Skype the other day, with no headset, and there was no echo on either side of the conversation, and the other person said I sounded fine.

That made me think about the extra convenience of being able to dictate without scrounging around for headset, or wearing it for extended periods. I decided to play with the speech recognition just to see.

Basically, it works pretty well. Far from perfect. In fact, I’m not sure that Dragon 10 wouldn’t be better. That said, it’s built in, and feels much lighter weight (starts up instantly, shuts down instantly, doesn’t shift application windows around to put its toolbar up, etc.).

It took me a while to get it to work with my USB headset. Basically, you don’t tell the speech recognition program which device to use. In order to use it with the USB headset, you have to set the USB headset to be the default microphone on the system, and then the speech recognition program automatically picks it up.

You might be asking why I wanted to use the USB headset? The simple reason is that my headset has a microphone mute button on the cord. That’s very cool for speech recognition. If the phone rings, or Lois wants to talk, I can just hit the mute button, and speech recognition is off, even though the program is still listening. It simply can’t hear anything. As a bonus, in theory, the recognition should be better, but for now, I don’t care too much about that.

Here’s one annoyance. It was my intention to dictate this entire post, including all of the actual production of it (clicking the save button, publish, etc.). Unfortunately, I gave up after five minutes. I tend to write my posts in Firefox, right in the admin interface of WordPress. Even with Allow Dictation Everywhere set on in speech recognition, it doesn’t think that Firefox is a normal input program (though it recognizes that I’m in a text area).

So, every phrase gets put up in a dialog box for me to confirm. It got them all correct, but I couldn’t just speak the post. I could have dictated into Word, WordPad, NotePad, WindowsLiveWriter, etc., and in the future I might just do that, but for now, I’m typing this post…

Using speech recognition in Command Mode works reasonably well. I can switch applications easily, select menu items, switch folders in email (Thunderbird, which obviously isn’t written by Microsoft), etc. Yesterday, while eating lunch with both hands, I had an IM conversation with someone by speaking my responses and saying Enter after each one. The concept was very cool, even though I had to correct a bunch of words (I wasn’t using the USB headset at the time).

Anyway, I recommend playing around with speech recognition in Vista if you work in a room alone, or have a spouse (or co-worker) who would be amused by your ranting at the computing out loud. 😉

Cable Internet Woes

Send to Kindle

A long time ago (I can’t remember when), we had a stable, reliable connection in the apartment (provided by Time Warner Cable). It was never super fast on the download (when it was stable, it was roughly 3Mpbs downstream). It was always pokey on the upstream (used to be roughly 360Kpbs). Now, it’s typically 7-8Mbps down, and 490Kbps up.

Unfortunately, while both up and down have gotten faster, the experience has deteriorated. I wish I could point the finger directly at Time Warner, but I can’t. Not because it’s not their fault, but because I have no idea, and that’s been hugely frustrating…

I may be slow to fix things at times, but I’m typically pretty good at diagnosing. I’m completely lost at the moment.

It’s been bad for such a long time, that I’ve come to accept it on some levels, and that’s just silly on my part.

Here are the symptoms: regular disconnects from the Internet. Those drops can last between a few seconds to a minute. The most sensitive applications (the early warning systems) are IM clients (Digsby for me, Trillian for Lois, but Pidgin used to behave identically before I switched to Digsby). Email clients hang if they are fetching or sending at the moment of the drop. When the drops last a bit longer, ssh connections are lost, but not on the short drops. The Poker client disconnects every 20-30 minutes.

Here’s my setup. I had (more on that in a moment) a seven year old Toshiba cable modem (provided by Time Warner). I have a Netgear FVS318 ProSafe VPN Firewall that connects to the cable modem. Years ago, I used it in VPN mode to connect to the office, but now it’s just a plain old router. Connected to that are three devices: an Asterisk server, VoIP ATA (Sipura) and a WiFi router.

After working great for years, when the trouble started, I suspected the wireless router (at the time, it was a Linksys WRT54G). I swapped it for a US Robotics spare that I had sitting around. The US Robotics device exhibited different problems. It went for longer periods without drops, but then would drop for longer periods (rebooting it seemed to always work). Since it was a bit older, and only a B router, I bought a Netgear WNR854T (an N router). I bought it re-certified.

It behaves just like the Linksys did, more frequent drops, but of shorter duration, that always auto-correct. I know, it’s re-certified, but still, three WiFi routers in a row? Yesterday, while at the house, enjoying my fantastic Verizon FiOS connection (30Mbps down, 5Mbps up), it occurred to me that the only thing all three WiFi routers have in common is the port they share on the FVS318 router.

I was actually excited to have a theory to test. When we got to the apartment mid-morning, I fired up the laptop, and swapped both the cable and the port that connected the WiFi router to the FVS. No Internet connection at all! What? We were just here on Wednesday, and it was working (albeit with the stated problems).

Rebooting the cable modem indicated that the cable modem just decided to die. I marched to the Time Warner service center. The one thing Time Warner does extremely well (and, unfortunately, I’ve taken advantage of this good service way too many times!), is staff a service center and swap devices quickly, with no questions asked.

I got there at about noon on a Saturday, and there were roughly 15 people working the service desks. It took less than five minutes to have my number called, and less than two minutes to swap the cable modem. 15 minutes later, I was back in the apartment (at least I got a bit of exercise). The new cable modem worked right away (still using the new cable and new port from the WiFi to the FVS).

A quick speed test gave me the sense that all of my problems were done. I got 8Mbps down and 492Kbps up, but with no jerkiness in the numbers. A smooth connection, it seemed. All joy died about 20 minutes later, when I had my first IM drop out.

So, in theory, it could still be the cable modem (or the cable company, at the head end). But, I left out a major detail which makes me believe that this is not the case. All of our phone calls are VoIP, so they use the cable modem as well. Whenever we experience a drop on the IM client, if either of us is on the phone, there is no drop on the call. At least, it’s not discernible (which doesn’t mean that a packet isn’t dropped along the way).

I am at my wits end. I have two things that I can try. One is to bypass the WiFi with a PowerLine connection (the cable modem is not in the room where we work). That will work, but if there are any drops, it won’t be clear that it isn’t the PowerLine adapter. The second thing is more painful. I just did the second one this minute. I reprogrammed the WiFi router to completely replace the FVS318. Since we’re leaving now, I won’t know until the next time we’re back whether this will solve the problem. If it does, then it means that the FVS was flaking out in general, somehow.

While it’s possible that three WiFi routers in a row are all bad, in a similar (if not exact) way, somehow, I doubt it. It is ironic that the cable modem just up and died, but the new one is exhibiting the exact same problem, so I don’t think it’s the problem either. For completeness sake, I should report that we are both using brand new laptops, with built-in N WiFi cards. Before that, we were each using Netgear N Cards. Before that, we used different model B and G USB-based WiFi adapters (with the old Linksys and US Robotics), all with the same drops, so it’s most definitely not the client devices!

If any of you have any suggestions out there for what else I can try, I’m all ears!