TM on Internet disruption: We took all necessary action
By A. Asohan June 16, 2015
- MCMC reviewing incident report, ICANN expresses concern
- TM says will continue to review and improve procedures to prevent such incidents
TELEKOM Malaysia Bhd (TM) said it took all necessary action to quickly mitigate the routing issue that inadvertently caused disruptions to Internet access across the globe last week, and would take further steps to prevent a recurrence.
On June 12, the Malaysian Internet service provider (ISP) had accepted international traffic from US-based multinational ISP Level 3 Communications, which provides services to large carriers and other customers in over 60 countries.
This caused too much Internet traffic to be diverted via TM’s network, which was overwhelmed, causing service disruptions to its own customers as well as Level 3 customers, as reported by Digital News Asia (DNA), for at least two hours on June 13.
Internet traffic was apparently disrupted in France, Germany, Italy, the United Kingdom and the United States, amongst other countries.
Internet experts attributed the problem to TM’s improper use of the Border Gateway Protocol (BGP), and also a lack of due diligence on Level 3’s part.
BGP is a routing protocol which allows providers to route traffic through each other’s networks based on the configurations set by the networks administrators.
Level 3 has not responded to DNA’s queries as at press time, and TM has not confirmed whether it was indeed a BGP issue, but it did give further information on the June 12 incident.
“With regards to the recent incident, as we shared in our previous statements, a reconfiguration of our Internet gateway router caused congestion and packet loss for some of our customers and international routes as well,” a TM spokesperson told DNA.
“We would like to clarify that based on standard routing protocol for establishment of connection with our international network providers, we would need to announce our internal routes to them.
“However, during the June 12 incident, a number of traffic routes had been accidentally announced to an international provider.
“This, as a result, had caused a disproportionate amount of traffic being routed to our network, leading to deterioration in connection performance – hence impacting Internet traffic flow for some of our customers, including wholesale.
“As soon as we identified the root cause, our network team immediately took steps to optimise traffic flows, while we worked to restore connectivity to its expected level of performance. The services were restored at 6.30pm on the same day,” the spokesperson told DNA via email.
Meanwhile, industry regulator the Malaysian Communications and Multimedia Commission (MCMC), which told DNA it was looking into the issue, confirmed it has received an incident report from TM.
“MCMC … is currently reviewing the report. However, it must be noted that TM was able to rectify the issue and its Streamyx, UniFi and TM Direct services were fully restored following the two hours of service disruptions,” the MCMC told DNA.
BGP issues
According to the Computer History Museum, BGP was developed by Kirk Lougheed of Cisco and Yakov Rekhter of IBM after they met at an Internet Engineering Task Force (IETF) conference in 1989.
“BGP is still integral to an Internet that has grown from 80 thousand hosts in 1989 to over one billion hosts today,” the Museum notes. “BGP continues to play a critical role in allowing the Internet to move large amounts of data quickly and efficiently.”
Many observers have noted BGP’s weaknesses, and criticism is bound to grow now that two ISPs have shown how Internet traffic can be negatively impacted by human error.
The IETF also had not responded to DNA queries as at press time, but the Internet Corporation for Assigned Names and Numbers (ICANN) did express its concern. [UPDATE: The IETF has responded to say the issue is outside the scope of its remit].
“We are aware of the incident on June 12,” said ICANN chief technology officer David Conrad (pic).
“While ICANN is not directly involved in this, we do encourage ISPs globally to invest more in ensuring that the routing information they receive from their customers and peers is valid,” he added.
ICANN is primarily responsible for maintaining the operation stability of the Internet and does not usually get involved at the ISP level, while the IETF is responsible for technical standards, amongst other tasks.
Meanwhile, writing in Enterprise Networking Planet for a series of ‘Networking 101’ articles, Charlie Schluting, currently site reliability manager at Google and also author of Network Ninja, had noted years ago that BGP is vulnerable.
“ … there is always a concern that someone will ‘advertise the Internet.’ If some large ISP’s customer suddenly decides to advertise everything, and the ISP accepts the routes, all of the Internet’s traffic will be sent to the small customer’s AS (autonomous server),” he wrote.
This seems to be what happened between TM and Level 3.
“There's a simple solution to this,” Schluting wrote. “It’s called route filtering. It’s quite simple to set up filters so that your routers won’t accept routes from customers that you aren't expecting, but many large ISPs will still accept the equivalent of ‘default’ from peers that have no likelihood of being able to provide transit.”
When asked if TM took measures such as route filtering, the company spokesperson said that “as a responsible operator, we had deployed all necessary action, including route filtering measures, to ensure customers continue to experience uninterrupted services.
“TM has made further steps to strengthen our risk management internally and externally. We acknowledge the consequences of the incident.
“Whilst regrettable, it has provided us with valuable data to allow us to better equip our network and continue to review and improve our procedures to prevent such an incident from recurring,” she said.
Related Stories:
MCMC looking into TM outage behind global Internet slowdown
Week in Review: Trust, security and standards, or lack thereof
DNS hijacking: Government needs to step in
For more technology news and the latest updates, follow us on Twitter, LinkedIn or Like us on Facebook.