Troubleshooting Commercial Carrier Services

With more than 20 years of experience in managed networks and service provider networking, I have seen a lot of bad tickets. Tickets from exceptional reputational companies that have very competent engineers. That seem to forget that they can provide valuable information to their service provider, that can shorten the repair time of their trouble.

This would probably be the ideal time to mention that the opinions and beliefs herein are mine alone, and do not represent my employer. Nor am I a speaking for the company that employs me on any level.

Now that I have covered my C.Y.A. statement, let’s dive into it! I am first going to go over the different products that service providers provide (Only talking about traditional wave and IP/Ethernet today excluding Frame-Relay/SONET/’Packet Over SONET’ flavors). And do please have a sense of humor. This is to be informative and hopefully helpful. I do not intend to offend. I do however intend to tell the truth as well. heh…

Service Provider Commercial Services Offerings:

  • DIA – Direct Internet Access – Generally broadband or fiber with a /30 public IP block unless something else is requested.
  • DIA + BGP – I’ll assume most will understand what this means. BGP requires that you have a registered Autonomous System. Either publicly, or privately, with your carrier and avoid the public exposure but keep the routing flexibility of BGP.
  • EPL/ELINE/Pseudo-wire/Epipe – These are all the same products. And I should probably add, that an EVPL is also the exact same product, but allows more than 1 top dot1q VLAN in the point-to-point path. Also known as a trunk in most parlance. (There should never be any bridges from the customer’s perspective between their devices. This is a non-switched and transparent product)
  • ELAN/ETREE/VPLS – There can be so many permeation’s of this product to quantify them, would take several posts. But the cliff notes is that this is a multi-point – to – multi-point switched environment. Allowing for ‘meshy’ connectivity that allows for the most efficient path of the data between locations. If leveraged as such.
  • EVPN/SDN / “Complex Products” – There really is no limit to where this group can extend to. The only question, is how much are you willing to pay?

Carrier Services OSI Layer Associations

  • Layer 1 is the physical media that the data is traversing. This layer works at the bit level. 1’s and 0’s (WAVE or Dark Fiber circuits. And of course, is part and parcel with the rest of the higher layer products.)
  • Layer 2 starts us off with a little more variety. And what is chosen controls how everything above it behaves. The reigning King is Ethernet (type II). Layer-2 is the foundation of the network. Nothing else works without it. Never forget that! (EPL/EVPL/ELAN/EVPN)
  • Layer 3 of course is IPv4/IPv6 connectivity. (This is a DIA.)
  • Layer 7 is the application layer. This is the BGP operating layer.

Covering the Basics of Carrier Responsibility

Okay, now that we have defined the layers of operation for carrier services, it’s time to get into it. And before we get rolling, I should probably put a disclaimer

!!!! WARNING !!!! If you are solely a DIA+BGP carrier services, or you are easily offended and/or irrational, please skip this next section. It does not apply to you. This next section is purely for those who can laugh at themselves

If You Open a Ticket with “BGP Flapping”

And you do not have a DIA+BGP service package…. You have told your service provider something that does not help them with your trouble at any level. Because there are many things that can impact BGP connectivity, that have nothing to do with the functionality of lower layer services. With that said, BGP will not work if those lower layer services are not working. But… Those lower layers can be operating without defect while your BGP still isn’t working. And that is not the fault of your service provider.

BGP Flapping, is a condition or a symptom of a problem. And not the problem itself 99% of the time. If you want to resolve your network issue as quickly as possible, you should give your service provider USEFUL information related to your service layer. When investigating and reporting a trouble, always check and reference what you are seeing at layer-1.

  • Do you have optical/electrical transmission on your connection?
  • Are you seeing proper signaling within those transmissions? (Loss of Frame/Sync/Pointer, Ethernet handshake not completing)
  • If you have Layers 1 & 2 established, is ARP completing on your interface?
  • Can you ping across the link between your layer-3 interface addresses?
  • Are you taking packet loss and/or errors in layers 1 & 2?

Answering the above questions will help every service provider more quickly and efficiently understand and hopefully resolve the circuit trouble you are having. This is the type of information every Carrier technician is hoping for when they open a ticket. Just to get smacked in the face with a “BGP is flapping” over the top of their layer-1 bit-level WAVE service, or layer-2 EPL (connectivity is still layer 1 and you complete the Ethernet handshake between your own devices in most cases. Meaning that an EPL is really a pseudo[wire] layer 1.5 service). And one, doesn’t necessarily have anything to do with the other.

The Rising Popularity of Echo BFD

With the rising popularity of “Echo BFD”, I have seen a huge increase in the number of trouble reports referencing BGP flapping, when no defects can be found in circuit paths. The reasons for this, I will go into in-depth. And hopefully, we will help some people achieve their “AH-HA!” moment. And save me from having to explain to them that their layer-7 application bouncing really does not concern your evaluation of their layer-1 or layer-2 services.

Quick Tidbit About Wave Services

So ‘wave’ services. Or layer-1 services that are a single wavelength through the network. And are generally 10, 25, 40, 100 to 400 gigabit per second speeds in the current markets. These are ‘bit-level’ services that will transport EXACTLY what you send to them, without permution. If you send errors into a wave circuit path. It will transmit those same errors on egress at the other end, because it has no logic on the client sides. Carrier network transport devices do have error correction. But that is only to repair damage to the data during transport of the ‘line side’ of the network. Or a different way to say it, is that carrier equipment will only mitigate the errors introduced within their network. Generally speaking, client facing interfaces do not have forward error correction enabled. The only option for a layer-1 services when it receives errors, is to transmit them on the egress of the other side. So, if you are taking errors on your wave service, make sure and check your far-end equipment while you open a ticket with your service provider. It’s not that errors beyond the ability to correct do not happen in a carrier network. They do. Just please always clear your own network as well.

Problems with BFD

The problem with BFD, is a perfect storm. Though it can be mitigated. Or at least I think it can. I’ve only ever dealt with it from the carrier side. Where you use unidirectional BFD and both nodes are participating. Of late, ‘echo’ BFD has become popular with companies. It does not require the far-end to participate in BFD for it to sense the quality of the link.

The problem is that carrier grade equipment has DDoS protections in it. And will look at the forwarding destination of the first two headers. If the customer, is not using a VLAN to encapsulate their traffic. Then their BFD is exposed to the hardware DDoS and ‘invalid next hop’ interrogation of traffic frame headers within carrier grade equipment. Because it is in the top two Ethernet headers. And the hardware will monitor and protect the chassis and network. Regardless of configuration. This is hardware micro-coded behavior.

To clarify, this scenario happens when the customer traffic is not encapsulated in a broadcast domain (VLAN) and is sending their traffic into a single-tagged transport ‘pop/push’ catch-all interface. It is possible to transport the traffic without this issue. But you can’t use a 0x8100 Ethernet type-II frame. But we’ll talk about that in a bit.

Double-Tagging as a carrier is a BFD Fix

You should be able to escape the hardware protections of carrier grade routing and switching gear by encapsulating the customer traffic in a VLAN as you egress into the carrier circuit. Then at the carrier NID (Network Interface Device, or NIU for some) or demarcation point, the carrier will push on their ‘transport’ VLAN tag. Making the BFD packet under 2 0x8100 (c-tag) dot1q headers and out of inspection from the DDoS and invalid next-hop hardware protections in the forwarding plane. There are lots of articles about troubles with echo BFD. But very few of them have the perspective that I am exampling here. But I have resolved echo BFD problems with making sure the traffic has the proper overhead.

As a non-carrier customer your options are:

When you are ordering an EPL/EVPL, you can request an ether type of 0x88a8 (Here is a link to the wiki for 802.1ad if you want a refresher!). And if you are ordering a ‘carrier over carrier’ you should request a 0x88a8 transport every time. It will save you many headaches in the future. 802.1ad tells all of the vendor ASIC protections, “Don’t look under this tag. It’s customer traffic”. And as a general consumer, you can request that transport ether tag type. But it may cost more as it enables other functionality to the customer that they wouldn’t have with the base service. Such as LACP, CDP and LLDP tunneling. Though if you do not want to pay for the ‘carrier over carrier’ service. You can just keep adding tags until your echo BFD stops dropping intermittently. Generally, that would be 1 tag. But if the carrier is popping tags going into their MPLS tunnels and installing a ‘service identifier’ for the tunnel. Then you will need to have the BFD under two ethernet headers/tags to ensure that you will avoid the next-hop-forwarding protections that silently drop echo-BFD packets if they are exposed to those protections.

Would you like me to write an article about a topic?

Well, I think we’ve covered enough for this article. But I am open to suggestions about other articles. If you have a topic, you would like me to discuss, drop me an email at routinglooporg@gmail.com and providing it’s in an area that I have experience in, I will likely write something about it!

Leave a Reply