Last week I went on-site to a client that was completely down; phones, data, and all. The previous day one of our field techs spent hours trying to get them back online and ended up running out of time. Not the best scenario for the client and not the best look for my company.
The client's network isn't large or particularly complicated. Couple of switches, a few WAPs, phones, firewall, and cable modem. They rent space out of an office building and share a communal demarc.
I arrived first thing and was shown to the network closet. I asked our point of contact one question, "Has anything changed?". She told me that there is some construction going on in the building and she thought they were working in the closet at some point.
I setup and gave everything a quick visual and took some pictures with my phone. Always have a fall-back plan and document before changing anything. Then I fired up a command prompt and started a continuous ping to 8.8.8.8.
There are several options at this point but I decided to simplify. I unplugged the WAN cable and went directly into my laptop. Nothing. Hmmm. I power-cycled the modem and started getting replies. Can't be that easy. I plugged the WAN cable back in back into the switch and plugged my laptop into the network. Almost immediately my computer became really unresponsive. I rebooted and fired up Wireshark. Within seconds Wireshark was showing over 300,000 packets, and I wasn't getting any replies to the continuous pings.
Broadcast storm.
I unplugged everything from the switch except for the WAN cable and the cable running to my laptop, checked my ping to Google and starting plugging cables in one by one. Eventually, I found the culprit and left it dangling. I plugged everything else back in and the network was solid. Wireshark was averaging less than 100 packets a second. I traced the cable back to the patch panel and went to go consult with out client. Their entire network was back up and running. My best guess is that one of the construction guys saw a dangling cable and plugged it back in to be helpful. The patch panel mapped back to an office where I found a five-port switch.
When I got back to the office I discussed with the technician that had been onsite the previous day. He was stuck on the idea that it had to be an IP address conflict and spent a lot of time on the server looking at DHCP and scanning the network.
I have two main goals when troubleshooting; one - to fix the problem as quickly and thoroughly as possible, and two - learn from everything. I was reminded by my conversation how easy it is to create causation and chase evidence to support that theory. Always simplify where you can.