I'm trying to devise a proper High Availability scenario for my network; we're an IP service provider. We hand customers a dedicated switch port in a dedicated VLAN with a dedicated subnet. Right now we have a very collapsed network consisting of Transit Provider(s) -> Router -> L2 aggregation switch -> Customer Edge switch. Router is a Riverstone RS8000 and we own several. L2 agg can be anything with multiple GigE ports, although SMC's SMCGS24-Smart has worked fine thus far. Customer Edge are low-end Cisco devices, i.e. 2924, 3524, 2950, etc. We feed the Cust Edge via GigE ports - not necessarily because traffic volume warrants it, but because a DoS attack would criple multiple customers if we uplinked at 100mbit. I want to preserve the existing infrastructure, as buying replacement gear would be cost prohibitive. Ideally whatever I implement will be a new addition and will not require replacing the existing gear.
Requirements: I need redundancy between the Cust. Edge devices and the upper layer of our topology. I want a full mesh (so a fully broken out core) so that I have two routers which are completely redundant. Those two Core routers will talk upward to two or more Transit Routers (Riverstone) and downward to the Cust. Edge switches. The core routers should have enough memory to hold full BGP tables from the Transit Router(s). It will speak some kind of IGP to the Transit Routers, probably IBGP. Ideally I'd have an L2 aggregation layer between the Core routers and the Cust Edge switches so I can save ports on the Core devices. I need to be able to survive any kind of failure upward of the Cust Edge switches. Obviously the Cust Edge layer is a single point of failure - no problem there. I'm ok with the possibility that 20ish customers could lose connectivity if a CE device dies - what I'm trying to prevent are sweeping outages that affect hundreds of customers. I envision using HSRP or VRRP to provide for the possibility that either router could die. But the system also needs to account for the possibility that the L2 Agg. switch could die completely, or it could be half-working and selectively forward packets or not forward anything at all without downing the link. The system should account for the possibility of faulty cabling anywhere, regardless of whether that cabling problem results in a downed link or just an intermittent communication failure. ___I WOULD LIKE TO AVOID USING SPANNING TREE IF POSSIBLE___. I've heard from several colleagues that even if you don't do anything dumb (i.e. create loops), invariably STP still fails and ports start blocking for no reason, resulting in tremendous CPU spikes. In other words, it creates more problems than it solves. Feel free to weigh in on this, keeping in mind that any use we might make of STP would not involve huge rings, and that each L2 domain would be isolated to about 20 CustEdge devices and two L2 agg devices.
Thus far here's what I've considered:
* Two 24 port Cisco 3750Gs in a stack - the L2 agg layer would disappear because each 3750 would have 24 ports - so as many as 24 Cust Edge switches can connect into the stack. Each CE switch would connect to both physical switches in the stack, as to prevent being affected by a PSU failure in one of the switches. There are two problems here: 1) 3750s only have enough memory for 8000 to 10000 hardware routes, depending on how many SVIs are defined (we'd be shooting for somewhere between 900 and 1200 SVIs in the stack - I've read that 1k is no problem). But even 10k routes is not nearly enough - I'd have to create another network layer (the real Core) to run big routers with full tables in order to pick which Transit Router a given packet goes to. 2) A common theme I'm seeing among everything I've looked at, is that while using Layer 2 switches for the Customer Edge, there's no way to ensure the Cust Edge switches do not use a half-working link - i.e. a link that's up but does not reliably pass traffic. Think a failing GBIC or something.
* A Linux/OpenBSD/FreeBSD box running vrrpd or carp. With this scenario we have plenty of RAM to hold full tables. The problem again is that there is no awareness of the topology lying below the Core. I envision a layout like so:
Transit Router -> Core -> L2 agg -> Customer Edge <- L2 agg <- Core <- Transit Router
Given the foregoing, any number of failures could happen beneath the Core box and it would not be aware - except for a link failure between Core and L2 agg. But any degredation in communication anywhere in the system (i.e. bad GBIC) would not be known by the Core and could not be acted on. The Core boxes could only detect a link failure between themselves and the L2 agg device, but not between the L2 agg and the Cust Edge switch. And even if the Cust Edge switch accounted for a failure (i.e. a down link between Cust Edge and L2 agg), the Core routers would not withdraw the route announcement for the Customer's IP prefix - that's a must. I can't have the Transit Routers sending traffic down a blackhole.
Again, keep in mind that I have absolutely no desire to replace my L2 Cust Edge switches with Layer 3 switches. So having said that - how can I possibly solve for all of the aforementioned failure scenarios?