L2 WAN Routing Design for QoS- Hub and Spoke Using EIGRP Distribute-list with Route-map

rauenpcSenior Network Speialist
CERTIFIED EXPERT
Published:
Updated:
In the world of WAN, QoS is a pretty important topic for most, if not all, networks. Some WAN technologies have QoS mechanisms built in, but others, such as some L2 WAN's, don't have QoS control in the provider cloud.

The Challenge

Quality of Service, or QoS,  is a pretty important topic for most, if not all, networks. Having connectivity between sites means nothing if you can't ensure that the most important data (or voice) makes it end to end without loss or speed/quality issues. Numerous types of WAN technologies have this issue covered - dedicated point-to-point circuits, PRI lines that give a guaranteed channel to each voice call, and MPLS circuits that can use QoS markers to give priority to important traffic.

L2 WAN's, however, can be a bigger challenge. This type of WAN technology essentially extends a local network across a service provider's network. From the viewpoint of the local network, it is as if a simple network switch connected all sites directly together. Each site can communicate directly with every other site without the need for any additional routing such as OSPF or BGP. For more on L2 WAN's, look at VPLS or Metro-E which is the technology that service providers use to provide L2 WAN's.

L2 WAN's, sometimes called Metro-E, have great qualities. They are easy to work with, cost effective, and usually offer high-bandwidth circuits. On the flip side, one of the most difficult topics to deal with on an L2 WAN is QoS. Most providers do not modify or drop QoS markings on packets, but they do not honor them either. The provider does not differentiate a voice packet from user downloading cat photos on social media. With networks pushing both data and voice over their WAN, it is difficult as a network infrastructure admin to accept a WAN that gives no control over prioritization of traffic.

First, let me show you how the lack of full QoS control can be a concern. Below is a simple L2 WAN. There are two head end routers which would equate to having a primary and secondary datacenter. The bottom three routers are simply remote sites.
2016-02-08_141724.jpg
With an L2 WAN, traffic can go any direction. It can go from head end to remote site, remote site to remote site, and any other combination. This any-to-any traffic pattern can be very efficient, but it can also cause issues.

On the basic side of QoS configuration, we can easily control packets as they exit the WAN interface and enter the WAN. So as far as outbound traffic is concerned, QoS is covered. Each remote site or head end will ensure that packets leaving the LAN and going across the WAN will be prioritized as we see fit. In the situation where one site speaks to another, this works out extremely well. You can saturate the circuits on both sites and still maintain solid connections for the traffic that you deem critical, such as voice.
2016-02-08_143930.jpgIn the above diagram, there exists no problem at all. Both sites should have outbound QoS configured in a way that does not allow either site to send more traffic, especially priority traffic like voice, than the other site can handle. This is where the any-to-any traffic flow can cause issues. Let's say that the two sites continue their heavy traffic flow, and a third site decides to join in. The third site can be attempting to send/receive any type of traffic, regardless of being priority or not.
2016-02-08_145458.jpgHerein lies the problem: Once more than two sites are talking to one another, it can be the ISP that chooses which traffic is sent and which traffic is dropped. In this case, equal traffic loss would mean that every phone call would experience quality issues. This would not make for a good day.

There are multiple ways to solve this issue, and, as the title would suggest, this article will show how to handle it via EIGRP and distribute lists.
 

The Solution (Well, one of them!)

There needs to be a point in the network that is intelligent enough to decide what traffic takes priority, and from there send traffic at appropriate speeds to each destination. Since the ISP is not going to do it, something on your network must provide that intelligence. In order for this intelligence to be useful, it needs to be able to see all the traffic and in both directions. Here's where a hub-n-spoke topology works very well. If you can get all traffic to go through a hub before it reaches the final destination, the hub can control QoS to ensure that the ISP never has to decide which traffic to drop. When using a dynamic routing protocol such as EIGRP, this can be a difficult task to do.

By default, routes will always look best when advertised from an originating router (a router with a local interface for the subnet being advertised). This means that by default in an L2 WAN, all routes will look best going directly to the originating router. This would be the any-to-any traffic pattern that is causing the issue. What we want is to learn all routes from the head end routers only. This way, all of our traffic will be sent to the head end router first before reaching the final destination. The local router at the remote site can control outbound traffic and QoS, while the head end router controls traffic that will end up being inbound traffic at the other remote sites.

To cause the routers to only learn the routes from the head end routers and essentially ignore all other routers, we need to do a couple things. On each remote site router, we need to configure a distribute list that allows routes to be learned from specific neighbors, but not from others. Hosts 10.1.1.1 and 10.1.1.2 are the head end routers in any of the following configuration examples. There is other EIGRP configuration required, but it is the regular required things such as network statements.

Head end router:
 

interface FastEthernet0/0
                      description Metro-E L2 WAN
                      no ip split-horizon eigrp 100

Open in new window

Remote site router:
 

access-list 100 permit host 10.1.1.1 any
                      access-list 100 permit host 10.1.1.2 any
                      
                      router eigrp 100
                      distribute-list 100 in FastEthernet0/0

Open in new window

The above essentially says that on Fa0/0, I can learn any route from neighbors 10.1.1.1 and 10.1.1.2. Because of the implicit 'deny all' at the end of the ACL, the router will not learn any subnets from any other neighbors. As a note, even though the distribute-list will prevent the router from learning any prefixes from other remote sites, you will still have adjacencies formed for all neighbors. The 'no ip split-horizon' on the head end router is required to allow the router to re-advertise the routes learned on an interface back out the same interface.

With this configuration in place, you should end up with all WAN routes in the routing table using the head end router(s) as a next hop. If you have more than one head end router, the next step would be to modify metrics so that only one head end router is used under normal circumstances. Myself, I would modify metrics at the head end router using delay or offset-list. I wouldn't modify bandwidth because this would affect QoS configuration. If both head ends are used equally, you can still run into the same QoS issue as described earlier in the article so one head end must be preferred over the other.
 

Design Thoughts

For this design to be rock-solid, the head end router needs to have as much bandwidth as ALL the remote sites combined since all remote sites send all traffic through the head end first. This way you are guaranteed to never drop traffic due to lack of bandwidth at the head end. This design choice gives you essentially a guarantee - if you choose to go with a lower speed link than all the remote sites combined, you might still be in good shape as long as you never have a point and time where all the remote sites are requesting max bandwidth simultaneously such as for OS updates or virus definition file updates.

This design also changes latency. Latency is the amount of time it takes for a data packet to get from one location in the network to another. For the most part, the shorter the distance between the two endpoints, the lower the latency will be assuming that none of the links along the path have low bandwidth such as a dial-up modem.

The latency is affected because of the routing path. In the original network diagram and routing design in this article, the default path would be for communication to happen directly between sites. This is the most efficient path and should have the lowest latency (again, assuming no low-bandwidth issues). Geographic distances play a major role in latency as scientists haven't figured out how to increase the speed of light. If one were to send a beam of light from San Francisco to New York City, it would take roughly 25ms. This would mean that if everything on a WAN were perfect and the best of speeds, to send a packet from SF to NYC and get a response would take 50ms. Considering the design in this article, if the network had a couple sites in SF and the head end router was in NYC, the lowest latency for traffic between the SF sites would be 100ms. The traffic path would be from SF site 1 to NYC (25ms), and then from NYC to SF site 2 (25ms). The return traffic would use the same path, which would add another 25ms x 2. In total, 100ms of latency would exist for any traffic that went between SF sites with a NYC head end router. If traffic were to flow direct, the latency would likely be in the 1-5ms range. This can be a huge difference depending on normal traffic patterns.

Another note with the design and configuration as shown, if both head ends were to become unavailable, all remote sites would become islands with no communication going outside of the local site. There would be no ability for any remote-site to remote-site communication since you are not learning any prefixes from remote-site neighbors. Depending on your network setup, this might not be a concern. If there is no traffic that would be of any usefulness to go between remote sites in the event of dual head end failure then it doesn't matter if there is any sort of last-ditch survive-ability built into the routing design. If you want that type of last-ditch effort, there is a way to make that happen. Below is the configuration that you can use at a remote site to accomplish just that: Head ends will always be preferred, but direct traffic to a remote site can be the last choice.
access-list 1 permit host 10.1.1.1
                      access-list 1 permit host 10.1.1.2
                      !
                      route-map ELAN-DL permit 10
                       description Permit routing updates from head end routers as defined in acl 1
                       match ip route-source 1
                      !
                      route-map ELAN-DL permit 20
                       description Permit routes from all other sources and add 1000000 to the metric to make it less preferred
                       set metric +16000000
                      !
                      router eigrp 10
                       network 10.0.0.0
                       distribute-list route-map ELAN-DL in FastEthernet0/0
                       no auto-summary

Open in new window


The route-map has two permit statements. The first (permit 10) only has a match statement that essentially will match the head end routers. With no 'set' statements, routes will be learned from the head end routers as per usual with no modification or filtering. The second permit statement (permit 20) has no 'match' statement therefore everything will be matched. The 'set' statement sets the metric to +16000000, and since everything is matched all subnets learned from sources that are not the head end routers will have the metric set pretty high.
 

*Note: 16,000,000 is nearly the maximum usable setting. Although the statement logically appears to be adding 16000000 to the metric, it is actually setting the inbound learned metric to 16000000 and then performing the normal metric calculation which, in the end, is 256*[set metric value]. If you use 16000000, the end metric is 4096000000 which is fairly close to EIGRP's maximum metric of 4294967295 which is unusable. The max metric of 4294967295 will show up in the EIGRP topology table, but not the routing table.

 

!!With a maximum metric, the route will show in the topology table, but NOT the routing table
                      
                      R3(config-router)# do show ip eigrp topo 10.100.100.0/24
                      IP-EIGRP (AS 10): Topology entry for 10.100.100.0/24
                        State is Passive, Query origin flag is 1, 0 Successor(s), FD is 4294967295
                        Routing Descriptor Blocks:
                        10.1.1.10 (FastEthernet0/0), from 10.1.1.10, Send flag is 0x0
                            Composite metric is (4294967295/128256), Route is Internal
                            Vector metric:
                              Minimum bandwidth is 0 Kbit
                              Total delay is 0 microseconds
                              Reliability is 0/255
                              Load is 0/255
                              Minimum MTU is 0
                              Hop count is 1
                      R3(config-router)#do show ip route
                      Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
                             D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
                             N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
                             E1 - OSPF external type 1, E2 - OSPF external type 2
                             i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
                             ia - IS-IS inter area, * - candidate default, U - per-user static route
                             o - ODR, P - periodic downloaded static route
                      
                      Gateway of last resort is not set
                      
                           10.0.0.0/24 is subnetted, 1 subnets
                      C       10.1.1.0 is directly connected, FastEthernet0/0
                      R3(config-router)#
                      
                      !!With a non-max metric, the route will be in both the topology table and the routing table
                      
                      R3(config-route-map)# do show ip eigrp topo 10.100.100.0/24
                      IP-EIGRP (AS 10): Topology entry for 10.100.100.0/24
                        State is Passive, Query origin flag is 1, 1 Successor(s), FD is 512000000
                        Routing Descriptor Blocks:
                        10.1.1.10 (FastEthernet0/0), from 10.1.1.10, Send flag is 0x0
                            Composite metric is (4096000000/128256), Route is Internal
                            Vector metric:
                              Minimum bandwidth is 0 Kbit
                              Total delay is 0 microseconds
                              Reliability is 0/255
                              Load is 0/255
                              Minimum MTU is 0
                              Hop count is 1
                      R3(config-route-map)#do show ip route
                      Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
                             D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
                             N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
                             E1 - OSPF external type 1, E2 - OSPF external type 2
                             i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
                             ia - IS-IS inter area, * - candidate default, U - per-user static route
                             o - ODR, P - periodic downloaded static route
                      
                      Gateway of last resort is not set
                      
                           10.0.0.0/24 is subnetted, 2 subnets
                      D 10.100.100.0 [90/4096000000] via 10.1.1.10, 00:00:17, FastEthernet0/0
                      C 10.1.1.0 is directly connected, FastEthernet0/0 
                      R3(config-route-map)#

Open in new window


With all of this configuration and thought combined, the WAN routing in the network sends all traffic through a head end router, applies a QoS policy, and sends it off to the final destination. If both head ends were to fail or become disconnected from the WAN, remote sites would have the ability to communicate directly. For the network that this example was implemented in production, there actually was a group of 3 sites (added to the network via acquisition) that had centralized resources just for the 3 sites. This configuration allows those 3 sites to continue to use their file shares and DB applications even if both head ends are down.
 

Conclusion

There are certainly more ways to accomplish the original goal; this article merely shows how it was done for one customer. Routing paths and metrics were manipulated in a way that gives the local network administrator control over the QoS policies and expectations by sending all traffic through a central point.


To see the QoS used along with this routing design, see my next article!
https://www.experts-exchange.com/articles/26239/L2-WAN-3-Tier-QoS-Hub-n-Spoke.html
1
3,923 Views
rauenpcSenior Network Speialist
CERTIFIED EXPERT

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.