[x]
Posted via EE Mobile

Search, ask, and monitor your questions on the go with EE Mobile. Visit Experts Exchange from your mobile device and never be out of touch again.

Question
[x]
Attachment Details
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

7.0

Need proper redundancy and failover in a service provider environment

Asked by rrb31337 in Network Routers, OpenBSD, Network Switches & Hubs

Tags: Networking IGP BGP OSPF HSRP VRRP

I'm trying to devise a proper High Availability scenario for my network; we're an IP service provider.  We hand customers a dedicated switch port in a dedicated VLAN with a dedicated subnet.  Right now we have a very collapsed network consisting of Transit Provider(s) -> Router -> L2 aggregation switch -> Customer Edge switch.  Router is a Riverstone RS8000 and we own several.  L2 agg can be anything with multiple GigE ports, although SMC's SMCGS24-Smart has worked fine thus far.  Customer Edge are low-end Cisco devices, i.e. 2924, 3524, 2950, etc.  We feed the Cust Edge via GigE ports - not necessarily because traffic volume warrants it, but because a DoS attack would criple multiple customers if we uplinked at 100mbit.  I want to preserve the existing infrastructure, as buying replacement gear would be cost prohibitive.  Ideally whatever I implement will be a new addition and will not require replacing the existing gear.

Requirements:  I need redundancy between the Cust. Edge devices and the upper layer of our topology.  I want a full mesh (so a fully broken out core) so that I have two routers which are completely redundant.  Those two Core routers will talk upward to two or more Transit Routers (Riverstone) and downward to the Cust. Edge switches.  The core routers should have enough memory to hold full BGP tables from the Transit Router(s).  It will speak some kind of IGP to the Transit Routers, probably IBGP.  Ideally I'd have an L2 aggregation layer between the Core routers and the Cust Edge switches so I can save ports on the Core devices.  I need to be able to survive any kind of failure upward of the Cust Edge switches.  Obviously the Cust Edge layer is a single point of failure - no problem there.  I'm ok with the possibility that 20ish customers could lose connectivity if a CE device dies - what I'm trying to prevent are sweeping outages that affect hundreds of customers.  I envision using HSRP or VRRP to provide for the possibility that either router could die.  But the system also needs to account for the possibility that the L2 Agg. switch could die completely, or it could be half-working and selectively forward packets or not forward anything at all without downing the link.  The system should account for the possibility of faulty cabling anywhere, regardless of whether that cabling problem results in a downed link or just an intermittent communication failure.  ___I WOULD LIKE TO AVOID USING SPANNING TREE IF POSSIBLE___.  I've heard from several colleagues that even if you don't do anything dumb (i.e. create loops), invariably STP still fails and ports start blocking for no reason, resulting in tremendous CPU spikes.  In other words, it creates more problems than it solves.  Feel free to weigh in on this, keeping in mind that any use we might make of STP would not involve huge rings, and that each L2 domain would be isolated to about 20 CustEdge devices and two L2 agg devices.

Thus far here's what I've considered:

* Two 24 port Cisco 3750Gs in a stack - the L2 agg layer would disappear because each 3750 would have 24 ports - so as many as 24 Cust Edge switches can connect into the stack.  Each CE switch would connect to both physical switches in the stack, as to prevent being affected by a PSU failure in one of the switches.  There are two problems here: 1) 3750s only have enough memory for 8000 to 10000 hardware routes, depending on how many SVIs are defined (we'd be shooting for somewhere between 900 and 1200 SVIs in the stack - I've read that 1k is no problem).  But even 10k routes is not nearly enough - I'd have to create another network layer (the real Core) to run big routers with full tables in order to pick which Transit Router a given packet goes to.  2) A common theme I'm seeing among everything I've looked at, is that while using Layer 2 switches for the Customer Edge, there's no way to ensure the Cust Edge switches do not use a half-working link - i.e. a link that's up but does not reliably pass traffic.  Think a failing GBIC or something.

* A Linux/OpenBSD/FreeBSD box running vrrpd or carp.  With this scenario we have plenty of RAM to hold full tables.  The problem again is that there is no awareness of the topology lying below the Core.  I envision a layout like so:

Transit Router -> Core -> L2 agg -> Customer Edge <- L2 agg <- Core <- Transit Router

Given the foregoing, any number of failures could happen beneath the Core box and it would not be aware - except for a link failure between Core and L2 agg.  But any degredation in communication anywhere in the system (i.e. bad GBIC) would not be known by the Core and could not be acted on.  The Core boxes could only detect a link failure between themselves and the L2 agg device, but not between the L2 agg and the Cust Edge switch.  And even if the Cust Edge switch accounted for a failure (i.e. a down link between Cust Edge and L2 agg), the Core routers would not withdraw the route announcement for the Customer's IP prefix - that's a must.  I can't have the Transit Routers sending traffic down a blackhole.

Again, keep in mind that I have absolutely no desire to replace my L2 Cust Edge switches with Layer 3 switches.  So having said that - how can I possibly solve for all of the aforementioned failure scenarios?
[+][-]10/01/08 05:55 PM, ID: 22620457Accepted Solution

View this solution now by starting your 30-day free trial. Setting up your free trial is quick, easy, and secure. We will return you to this solution, unlocked, when you're done.

About this solution

Zones: Network Routers, OpenBSD, Network Switches & Hubs
Tags: Networking IGP BGP OSPF HSRP VRRP
Sign Up Now!
Solution Provided By: rrb31337
Participating Experts: 1
Solution Grade: A
 
[+][-]09/24/08 11:23 AM, ID: 22562487Author Comment

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]09/25/08 01:07 AM, ID: 22567060Expert Comment

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]09/25/08 04:43 AM, ID: 22568026Author Comment

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]09/26/08 12:01 PM, ID: 22582621Expert Comment

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]09/28/08 11:26 AM, ID: 22592072Author Comment

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]09/29/08 12:37 PM, ID: 22599839Expert Comment

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]09/29/08 01:19 PM, ID: 22600230Author Comment

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]09/30/08 01:02 AM, ID: 22603416Expert Comment

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]10/21/08 01:56 AM, ID: 22765451Expert Comment

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
 
Loading Advertisement...
20091111-EE-VQP-92 / EE_QW_2_20070628