The power supply in our primary nameserver expired this morning in honor of James Brown. The machine has been replace and everything is operating normally.
Abovenet is currently performing router maintenance in San Jose and has caused a routing loop to occur. They are actively working to resolve the problem and we should be back online shortly.
Another momentary power outage caused a reboot of all equipment. All services returned normally.
Everything is back up.
A power outage encompassing all of downtown San Jose occurred a short while ago. This resulted in everything rebooting. No statement from the facility regarding the failure of the battery backup and generator power to prevent this. At this time we still have a few servers offline and are working to restore them.
After much pain and suffering the VoIP server is back up. Although it appears the drive is working properly it's a Maxtor so we'll plan on replacing it tomorrow night or the following one with a new drive.
We are currently experiencing a power outage on one phase in Fort Collins. Everything is up at this time except for our VoIP server which has experieced some disk problems. We're currently working to resolve this issue.
The ethernet card on one of our email servers locked up a short while ago and required an emergency reboot of the machine. There was an interruption in mail service during this time, but no mail was lost.
Yipes has found fixed the problem.
Mesa Networks made a change to their network connection on 9/9. It appears that Fort Collins Yipes customers have had trouble getting to Mesa addresses since that time. We have routed around the problem for the networks that we are aware of and Yipes continues to investigate the problem.
From Abovenet:
Abovenet has experienced a network event.
Start Date & Time: 3:40am pdt to 4:40am pdt
Event Description: Switch2.colo7.sjc2 spontaneously rebooted at approximately 3:40am pdt. The switch did come back online however it was not passing traffic over the 2 trunks to the routers. We power cycled the switch and it is now fully back online.
We are still investigating the cause of the initial reboot.
We apologize for any inconvenience this has caused.
At 5:45AM MT service was restored. Abovenet reports a hardware failure within their network.
At approx 4:47AM MT we lost connectivity to the San Jose facility. We are working with engineers locally to determine the cause.
It appears that level3 and wcg is having trouble nationwide. No reason for outage or time to repair reported.
We are seeing some major packet loss in several of the backbone providers this afternoon. This is not a local issue, but is affecting the Internet as a whole. No news as to the cause at this time.
The Internap connection has been restored and traffic flow is back to normal.
At approximately 12:20 our connection to Internap went down. Qwest is reporting a hardware failue in the ATM circuit. They have dispatched a technician, but do not have an ETR.
At this time all traffic is flowing through our Yipes connection. There was a short period of instability at the time of the failure during which BGP rerouted the traffic.
An emergency reboot of one of our email servers was required resulting in some email services being unavailable for a short period of time.
The following from Yipes:
Service Alert: Please be aware that we are experiencing network issues in Longmont and Fort Collins that may affect your connection or ability to route to Internet sites.
The problem seems to be stabilizing, however if it continues we will shutdown the Yipes upstream until the problem is resolved.
Shortly after 3PM MT we experienced a problem with one of our mail servers that required an emergency reboot. As of this time all services have been restored. Any email sent during the outage would have been spooled to a backup mail server and delivered normally.
We're seeing routing issues within level3's network today. This is causing random short term mini-outages (less than 1 min) depending on where you are trying to reach.
An emergency reboot of one of our email servers was required this morning resulting in some email services being unavailable for a short period of time.
Earlier today a UPS failed in the Fort Collins facility causing our main switch and routers to reboot. Serveral other servers momentarily lost power and rebooted. A few did not come back up cleanly and needed further assistance. At this time all services have been restored.
From AboveNet:
At approximately 05:45 (EST) we began experiencing routing issues with Level 3. We are working on this issue and it should be resolved shortly. We apologize for any inconvenience.
RFO:
At approximately 10:38pm (GMT) routes from AS3356 were leaked into the AboveNet network by a customer who had just installed in London. The route leak caused many public and private peers to drop BGP sessions with us due to exceeding the maximum prefix limit set by the peer.
The leak issue was resolved at 11:10pm (GMT). Most peers were either not affected or back online by 11:30pm (GMT). Some peers did not have 7x24 operations which caused it to take longer to re-set those links.
Root Cause:
Misconfigured BGP session with a customer and routes leaked by the customer.
Preventive measures:
Despite existing fail safe measures already in place to prevent this type of event, they do not go far enough. We will be changing the standard BGP configuration applied to all customers. This will add an additional layer of protection. This change will be implemented immediately for new customers. Existing BGP customers will be contacted when it comes time for us to make the change to their session.
This morning we experienced issues with our machine that provides NAT and DHCP services within Drake Park. We believe this is related to a hard disk drive that is beginning to fail. Late this evening we will be taking that machine off line to replace the defective drive.
Ticket: HD0000000062754
Event: Service Alert
Start: 12:41pm 2/2/06 MST
Stop: 12:46pm 2/2/06 MST
Service Location: Longmont and Fort Collins
Service Alert: Please be aware that we are experiencing network issues in Longmont and Fort Collins that may affect your connection or ability to route to Internet sites.
Symptom(s): The current impact is packet loss and intermittent service disruptions.
Our Network Engineers are currently investigating the cause of this issue. We will continue to keep you updated with the status of our investigation.
We are currently experiencing an outage in San Jose. More information to follow...
As of 1:10 PT a router reboot restored service. We are currently looking into the cause.
Abovenet reports that there was an event which disrupted traffic in San Jose between 7:15-7:30 MT this evening. No report of the cause as of this time.