For the past few weeks I’ve been having trouble with my DSL connection at work. We have two connections, and the firewall is set up to “fail over” to the other if one should go down.
For some reason, the DSL link would fail. This would cause the firewall to generate an alert (to my inbox) and then failover to the alternate connection. It would then realize that the DSL was back up and switch back again. Every time the link went down, it generated three emails and sent them to me.
It was going down every few seconds. You can imagine the state of my inbox. We went and looked at the DSL modem and could see that the DSL light would go out… then flicker on (while the connection was being made) and then go solid, when the connection was up. Then it would go out again. We weren’t sure if it was a loose cable, but when I picked it up, it dropped the connection again. I unplugged and re-plugged the telephone cord, the power cord and the ethernet cord and that seemed to stabilize it… for a few hours.
We would get a rash of DSL disconnects for a few hours, and then it would stop and be stable again for a few days, and then start crapping out again. Part of this was what led to the frustration I had trying to join a new subnet to the domain from a new, remote office last month.
Finally I had enough and I called Telus TAC to open a trouble ticket. It had to be line noise or something like that causing this. I dialed the 866 number and was told the service was not available in my area. Whaaaaat? Ultimately I had to go “inside” found out that I just needed to dial 310-TECH without the 866 or 877 or anything else.
Once I got a human on the line (and it was less than 30 seconds after the automated attendant picked up and thanked me for calling and my call will be (not may be, but will be) recorded for QA and training purposes) he was able to bring up my circuit and noticed that there WAS a high signal-to-noise-ratio. He asked me if I could try changing the telephone cord that goes from the wall to the modem, just to rule out something simple before they sent a technician out. I took this opportunity to move the modem INTO my rack and plugged it into the wall with a new, shorter telephone cable. I also noticed that the Dlink modem was plugged into a 3Com wall wart power adapter. Strange… Everything seemed to be fixed… until 4:49PM when it started again. When I came in the next morning I had 300+ emails waiting for me. Great. I pulled up my notes and called Telus TAC back with my case number and told them it was still happening, and please send out a tech to do a line test. Maybe some work was done in the risers downstairs and something was shorting to my line, or causing some interference. It would explain the high SNR and frequent disconnections.
The first thing the tech did when he got here was disconnect my DSL modem from the line and test the line. It seemed to be working and the SNR was within acceptable limits. He also thought the modem was “running a little bit hot”. Then he also noticed the 3Com power adapter plugged into the DLink modem. We unplugged it and examined it and that’s when we learned that the 3Com adapter was putting out 15v and the DSL modem was marked with a 7v input. THAT’S why it was running hot! It was being force-fed twice the voltage it required, like trying to drink out of a garden hose. Instead of water coming out it’s ears and nose, it dissipated the energy as heat.
The DLink was also and old model and had been replaced by the Netopia Speedtouch in Telus’ network a few years ago. They stopped issuing the DLink modems over four years ago. He went down to his truck and got a modem and replaced the DLink. As soon as the link came up, it stayed up.
During this time, my firewall was going batshit crazy with the emails because the link was going up and down as we poked and prodded and replaced the hardware. To keep my sanity and my inbox from getting clogged (and by extension my mail server) I unchecked the “Preempt and failback to Primary WAN when possible” so that once it failed, it stayed failed.
What I couldn’t figure out (and still can’t, it’s a bit of a mystery to me how the Sonicwall TZ series works) is that even though it was in “fail state” and all traffic was switched over to the other connection, as soon as the DSL was up, my outside IP address (as reported by www.whatismyip.com ) was showing my Telus DSL fixed IP address. Mystery! Everything seemed to be working though.
Sometime over the weekend, each of the VPN tunnels connecting each of my branch offices back to head office timed out. When they tried to re-negotiate, the interface was still in “failed” mode so the packets were re-directed to the Telus DSL IP address, which didn’t match the intended target… so the firewall dropped all traffic from all my offices as “IP Spoof Intrusion Detected” or some such malarkey.
Did I mention that since it’s the first business day after the 15th of the month that it was TPS report day?? Ay carumba. I poked around in the SonicWALL settings for a good 20 minutes before clueing in to the fact that the primary connection said FAILED and then looked into the load balancing setup. Once I re-checked the “preempt and failback to Primary WAN when available” then ALL of the remote/branch office tunnels came up almost instantly and traffic started flowing as normal. Phew!
Take homes from this crisis: Telus TAC are pretty good. This is the second time I’ve had to call Telus TAC (Tech support) and both times I’ve had a resolution. First time was on the first call, this time on the second. Second take home: ESPECIALLY if the firewall doesn’t seem to make sense to you, and seems to be logically backwards from what it says to what it does, all the settings that were in place before you touched it were there for a reason. Don’t monkey with it. :)