Bridging the gap between CCIE RS and SP

February 1, 2010

BGP fast-external-fallover – Common confusion

Filed under: CCIE, CCIE SP — 21500 @ 1:01 pm

Most will know the feature and what it does, but to recap the process level command:

R5(config-router)#bgp fast-external-fallover

R5(config-router)#no bgp fast-external-fallover

This feature will enable fast fallover in the event of a link failure for all neighbors peers. In layman terms shutdown the bgp neighbor as soon as the interface reset is detected and not wait for the holddown timer to expire.

Then the interface command:

R5(config-if)#ip bgp fast-external-fallover permit

R5(config-if)#ip bgp fast-external-fallover deny

This is used to overwrite the process level command. Therefore if the feature is enabled under the bgp process, which is on by default, and a specific client interface is flapping frequently, the interface level command can be used to keep the client peer from flapping due to the fast-fallover and prevent upstream peers from dampening the client routes. Fast-fallover is important in multihomed scenarios where it is useful to shut the neighbor as soon as possible in order to avoid packet drops.

But the real reason for this post is that I have seen this a couple of times configured in both (RS) INE and (SP) IPX workbooks with the incorrect interface level command:

R5(config-if)#no ip bgp fast-external-fallover

This will have no effect, except removing previous fast-fallover config. Beware of this common confusion between the two syntaxes. The correct interface level configuration is to use permit or deny.

January 30, 2010

Old habits: Soft-Reconfiguration

Filed under: CCIE, CCIE SP — 21500 @ 9:37 am

While on the subject of old habits, I had to mention this one. I remember back when studying for CCNA and maybe even CCNP that it was always recommended to configure soft-reconfiguration in order to propagate route policy updates/changes without hard resetting the bgp neighbor. This is one of those things that sometimes just becomes routine, a habit, something that is just pasted into new configs and forgotten about. Well it might be time to shake this one off as well.

In brief terms the soft-reconfiguration command will allow a ’soft’ reset. The tcp session between the two bgp peers will not be reset but new policy changes will take effect. E.g, a new route-map filter is applied. Therefore a way to cause minimal damage to overall stability of the network.

A decade or so ago RFC 2918 Route Refresh for BGP-4 September 2000 was published which made the soft-reconfiguration redundant. Two bgp peers that support the route refresh capability can implement a soft reset without any preconfiguration. In order to determine whether a peer support this capability:

show ip bgp nei 11.11.7.11
BGP neighbor is 11.11.7.11,  remote AS 11, external link
BGP version 4, remote router ID 150.140.130.120
BGP state = Established, up for 00:03:27
Last read 00:00:27, hold time is 180, keepalive interval is 60 seconds
Neighbor capabilities:
Route refresh: advertised and received(new)

Extract from: http://www.cisco.com/en/US/docs/ios/12_2/ip/configuration/guide/1cfbgp.html#wp1001128

To use soft reset without preconfiguration, both BGP peers must support the soft route refresh capability, which is advertised in the OPEN message sent when the peers establish a TCP session. Routers running Cisco IOS software releases prior to Release 12.1 do not support the route refresh capability and must clear the BGP session using the neighbor soft-reconfiguration router configuration command. Clearing the BGP session in this way will have a negative impact upon network operations and should only be used as a last resort.

Table 8 Advantages and Disadvantages of Hard and Soft Resets

Type of Reset Advantages Disadvantages
Hard reset No memory overhead. The prefixes in the BGP, IP, and Forwarding Information Base (FIB) tables provided by the neighbor are lost. Not recommended.
Outbound soft reset No configuration, no storing of routing table updates. Does not reset inbound routing table updates.
Dynamic inbound soft reset Does not clear the BGP session and cache.

Does not require storing of routing table updates, and has no memory overhead.

Both BGP routers must support the route refresh capability (in Cisco IOS Release 12.1 and later releases).
Configured inbound soft reset (uses theneighbor soft-reconfiguration router configuration command) Can be used when both BGP routers do not support the automatic route refresh capability. Requires preconfiguration.

Stores all received (inbound) routing policy updates without modification; is memory-intensive.

Recommended only when absolutely necessary, such as when both BGP routers do not support the automatic route refresh capability.

Now what does this really mean to you and me? The memory consumption used by soft-reconfiguration since all routes from a neighbor with soft-reconfig configured will be stored in memory. For example a peer might send a full table but the router is filtering all neighbor AS and neighbor client AS’s. Although only a few thousand routes might be inserted into the bgp table from this neighbor, the router still has to keep the remaining 200k+ routes in memory. If the router has a couple on these peers, it will probably not scale well. By relying only on the route refresh feature, the router will be able to scale to far more peers.

In an enterprise environment with less routes, an old 3600 might still be active in the BGP routing domain and become unstable due to running out of memory. Removing the legacy “soft-reconfiguration” configuration might be the healing touch it needs.

January 25, 2010

Another day, another CCIE track – SP Operations

Filed under: CCIE, CCIE SP — Tags: — 21500 @ 6:14 pm

I initially started with this post and thought hard about whether this post is a knee jerk reaction to another SP track. A couple days later and nothing has changes. CCIE SP Ops is still not a winner.

Sometimes when news is made you either get a positive or negative vibe. When the rumors surfaced about CCIE Data Center, I had a positive vibe about it, from the speculation it just seems the right fit. A track that is needed by industry demand. A year ago when Cisco released CCIE Wireless, I had the same thoughts: ‘This is exactly what the industry needs’. Today Cisco announce CCIE SP Operations and my first impression is that this is going to be another CCIE Design or Storage. Is Cisco expanding with too many tracks too soon?

SP Operations will cover Cisco’s IP NGN which I have said on a couple of occasions should be on the SP track. Cisco IP Next Generation Network buzz is largely based on Carrier Ethernet. In very compact form, a mass migration from proprietary SONET/SDH/ATM onto Cisco Metro Ethernet and EoMPLS. I say Cisco Metro because it is an all Cisco or no Cisco affair since Cisco Metro Ethernet does not play ball with others well. This is largely due to not supporting standards QinQ 802.1ad on the Metro switches. From what I read in ‘future’ releases they would, but read between the lines, once they have the monopoly on Carrier Ethernet.

I have no doubt Cisco has put a lot of research into this track, but I think they overlooked the most important aspect. CCIE SP has been neglected for years and has been begging for an upgrade. People have been talking about the outdated ATM/Frame and no relevant Layer2 VPN for ages. This is a personal opinion but I don’t believe IP NGN warrants a CCIE track on its own and again a personal opinion I don’t believe MPLS L3VPN does either. A mixture of the two however makes a lethal combination.

Another aspect of a new qualification is the time and numbers it takes in order to get market recognition. SP is only at a very late stage maturing into a track that is generally known and accepted in the industry. Will Cisco dumb down the SP Operations track in order to get the numbers out which will ensure engineers build another proprietary carrier network empire? Yes, sounds like a brilliant business plan. The second part will be the cost in preparing for XR, this one is not going to be cheap. Perhaps Cisco will sponsor (read: leak) a simulator?

It is still early days and not much about CCIE SP operations is known to make a informed judgement, but I get the gut feeling this is a track developed by Business/Sales in order to push a revenue stream rather than demand from the industry. While a lot is still unknown, this is my initial conclusion: SP operations has all the right ingredients for another epic fail.

From the general outline of the SP Operations written:

1.0 Manage the network fault management system
1.1 Develop a fault management process for a managed network environment collaboratively with the tools team
1.2 Determine the interaction between the fault management system and the ticketing system in collaboration with the tools team
1.3 Determine the method to gather appropriate metrics for an established fault management process

2.0 Manage performance and capacity
2.1 Identify spikes and potential trouble spots based on syslog and/or Network Management System (NMS) output
2.2 Develop a plan to solve a particular performance issue based on syslog and/or Network Management System (NMS) output
2.3 Identify the Network Management System (NMS) metrics and SLA metrics that will be needed in order to further troubleshoot a specific problem communicated orally, written, etc.
2.4 Develop a plan to establish a baseline and monitor the network in conjunction with the tools and performance groups
2.5 Create baseline network performance in conjunction with engineering and architecture teams
2.6 Monitor the network to look for variances against the baseline
2.7 Edit existing scripts which enable a network baseline management plan in conjunction with the tools and performance groups

3.0 Manage operations processes
3.1 Collaborate with the process team and NOC management on process development to meet a desired network operational objective
3.2 Develop a specific prototype and test plan for a particular planned network change, working collaboratively with the engineering and design groups
3.3 Develop, for a particular network, a list of needed tools working collaboratively with the tools team
3.4 Develop a detailed operations plan  including metrics and reporting functions for a particular network working collaboratively with the process team
3.5 Develop a process change action plan based on the results of a network audit
3.6 Develop and maintain a spares plan for a particular network

4.0 Troubleshoot and fix reachability and transport problems within the network
4.1 Identify predecessor steps that have not been executed based on an escalation ticket dealing with reachability
4.2 Determine whether to fix or escalate a ticket dealing with reachability
4.3 Identify the area(s) causing a complex reachability problem of unknown origin
4.4 Troubleshoot a complex routing problem and, considering the technical aspects, determine the risks and fix it
4.5 Troubleshoot a complex security problem and, considering the technical aspects, determine the risks and fix it

5.0 Identify problems in implementation plans
5.1 Find issues of a rollout plan received from engineering before deployment
5.2 Identify hardware which is not backwards compatible on a new service rollout plan
5.3 Find hardware that needs operating system upgrades on a new service rollout plan
5.4 Review and provide recommendations on areas in which NOC support plans will not be sufficient on a new service rollout plan

6.0 Troubleshoot and fix network performance problems
6.1 Identify predecessor steps that have not been executed based on an escalation ticket dealing with network performance
6.2 Determine whether to fix or escalate a ticket dealing with network performance
6.3 Determine whether to fix or where to escalate a core network fault
6.4 Identify the source of a complex network performance problem
6.5 Troubleshoot a complex network performance problem and, considering the technical aspects, determine the risks and  fix it
6.6 Identify a complex application performance problem and isolate it
6.7 Identify a complex computing device (server, call manager, etc – not the network or application) performance problem and isolate it
6.8 Troubleshoot a complex traffic pattern problem and, considering the technical aspects, determine the risks and fix it
6.9 Troubleshoot a complex, chronic performance problem and, considering the technical aspects, determine the risks and fix it

Identify spikes and escalate tickets? My word, what is Cisco doing? Sounds more like a CCNA blueprint. Perhaps just trust and put faith in the network giant? Hope this does not destroy the CCIE reputation.

January 15, 2010

Autonegotiate – The debate continues

Filed under: CCIE SP — 21500 @ 11:14 pm

The saying, old habbits die hard holds true to this one. Quite a while ago I read an article by Greg Ferro regarding the force speed and duplex myth. Today I stumbled on Terry Slattery’s blog regarding the same “autonegotiate duplex or not“ topic. For those that dont know Sir Slattery, he is practically CCIE#1. Yes, the first man on the moon. While I accept each person has his own opinion based on past experience, consider reading these two posts with an open mind.

The fundamental problem I have with the “Force All” practice is the administrative overhead with duplex mismatching. I found that problems caused due to this practice out weigh the problems solved by 100:1, in fact it might be more. Besides the network overhead there is also the system administrative overhead as forcing needs to happen on both sides or in a LAN scenario the helpdesk overhead when a techinician needs to force settings on all workstations.

Problems happen when ports are forced in order to make them shut up and stay up. I could compare this to a situation where my wife indirectly complains I am spending too much time preparing for the next CCIE, I could just put the head phones on and ignore the warning message. But this does not solve the problem and it might make it worse by making it harder to determine when I have pushed the issue beyond its limits. If ports are not agreeing on a speed and duplex, 99.9% of the time it can be solved by asking the following two questions?

1) Are both sides are auto/auto? Should be standard practice and you will have a happy network.

2) The Autonegotiation principle is based on electrical signal pulses with micro second (µs) tolerances.  Are the signals received within the required threshold? If not, why not? Quality layer 1 will determine the quality of the layers above.

This raise some thoughts:
How old are the cables? How old is the Datacenter? Something that I hardly ever hear of is the lifespan of UTP cables. While manufactures will today punt a 20+ year lifespan on CAT5e, I personally believe UTP cables from the 90’s should be phased out. Where possible, do not re-use patch leads as they are often the culprits. Patching and repatching puts additional wear and tear on the cable/connectors. All good quality UTP cable has the manufacturing date printed almost every meter. Check the manufacture dates of the patch leads, more than 5 years? Tie a knot in it. Check the cable standard, recently I discovered a 1000Mb link issue due to a Cat5 cable (rated 100mb) being used. Only use at a minimum Cat5e.

The negative effects of forcing speeds is that the shut up and stay up effect does exactly that. The autonegotiation process especially on 1000Mb links is fundamental to fast fail over detection and error signaling. When troubleshooting something as simple as packetloss on a Gb link with both sides forced is difficult to determine whether the problem is at the physical layer. This is since only a fraction of errors and CRC’s are logged. Removing the settings will most likely display a more realistic error count. This is not specific to GB ports, I have seen this on 100mb links as well.

The fundamental flaw with forcing speed and duplex, besides the technical aspects mentioned above, is that everyone has to follow this religion vigorously.

January 12, 2010

Mixed feedback on SP OEQ

Filed under: CCIE, CCIE SP — Tags: — 21500 @ 6:24 pm

For the last week and a bit I have been doing what I consider preparing for the OEQ. My method of attack consist of reading a topic on the blueprint and then labbing up what ever I read. Again a simple approach. After reading the first feedback regarding the OEQ on the SP lab, my initial thought was that they will be relatively easy if one is prepared well for the lab. Putting more emphasis on the practical rather than theory seemed the appropriate next step. Now with further feedback from another candidate, I am not so sure.

Some comments from the two candidates to think about:

Most positive thing about them was that they really were core things. And you can really answer with one single word, maybe 2-4, but that’s it.
If you are able to pass the lab, you certainly are able to answer those four questions!

I passed the lab configuration today.  If I would have taken my test 2 weeks earlier, I would be a CCIE now.  BEWARE of the OEQs.  They threw me for a loop.  One was a stupid mistake.  The others were just crazy.  None of the simulator questions even came close! It was just not what I would consider core material. Look in all the nooks and crannies of all the books you have – there you may find some OEQ study material. I have not touched any lab configuration for the past month prior to my lab attempt yesterday.  All I did was reread books and bone back up on the theory.

I am a bit puzzled.

See also Swapnendu’s feedback: http://eminent-ccie.blogspot.com/2010/01/failed-ccie-sp-lab-and-oeq-mystery.html

January 9, 2010

CCIE Worldwide stats – RS still dropping

Filed under: CCIE — Tags: — 21500 @ 4:58 pm

A month after the first RS number drop of 48 Cisco updated the stats again, this time the RS numbers dropped by a further 62 or about 2 a day. Interesting that so many people don’t recertify.

Comparing the stats from the beginning of 2009 with 2010:

Total new: 2136
Total RS: 1382
Total SP: 622
Total Double: 438
Total Sec: 340
Total Voice: 260
Total Triple+: 98
Total Wireless: 17
Total Storage: 9

http://www.cisco.com/web/learning/le3/ccie/certified_ccies/worldwide.html

December 30, 2009

BGP Best Path Selection Process

Filed under: CCIE, CCIE SP — Tags: , , — 21500 @ 9:18 am

One of those topics that is really fundamental to passing Cisco exams and labs or more importantly, predicting the behavior of BGP, is knowing the BGP route selection process well.
Following is a summary for quick reference.

1. Next_Hop: The next hop of the BGP route has to be in the routing table else if it is unreachable, the route is ignored.
2. Pre-bestpath Cost: If the pre-bestpath cost attribute is present, choose the route with the lowest cost value, if they are the same, the lowest community.
3. Weight: Cisco proprietary, local significant attribute where the largest is prefered.
4. Loc_Pref: If the weights are the same, choose the path with the highest local preference.
5. Local Originated: Routes that were locally originated with network statement, aggregated or redistributed.
6. AS_PATH: Next compare the as-path length and prefer the route with the shortest AS_PATH length.
7. ORIGIN: Choose the route with the lowest origin type if the AS_PATH lengths are the same. IGP<EGP<INCOMPLETE
8. MED: If the origin types are the same, choose the route with the lowest MED value. This will only be compared for routes from the same AS or if bgp always-compare-med is enabled, for all routes.
9. EBGP/iBGP: Prefer EBGP routes over IBGP routes if the routes have the same MED value.
10. IGP: At this point if there are still multiple routes prefer the route with the shortest route to the NEXT_HOP. The IGP will have already determined the shortest path to the next-hop.
11. Cost: If the cost attribute is present not configured to be ignored, choose the lowest cost.
12. Multipath: If multipath is enabled, multiple paths that match up to this point will be installed.
13. Oldest: If multiple external routes remain, choose the oldest one, thus avoids propagating a flapping route. To overwrite this, this step can be ignored with bestpath compare router-id.
14. Router_ID: If multiple routes still exist, the BGP ROUTER_ID will be a tiebreaker. Choose the route advertised by the BGP peer with the lowest Router_ID. If RR present, the originator ID is used.
15. Cluster Length: Minimum RR cluster length is compared next.
16. Lowest Neighbor: Last, the path from the lowest neighbor address.

Richard Bannister made a great post on the topic that in detail illustrates the algorithm in a flow-chart:
http://rbcciequest.wordpress.com/2008/02/27/bgp-path-selection/

And then there is the well known Cisco documentation:
http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094431.shtml

Preview of Richard Bannister’s flow-chart:
Bgp best path flow-chart

BGP scan-time

Filed under: CCIE, CCIE SP — Tags: , , — 21500 @ 8:18 am

BGP scanner process monitors the next hop of installed routes to verify next-hop reachability. It is also responsible to select, install, and validate the BGP best path. By default, the BGP scanner is used to poll the RIB for this information every 60 seconds. During the 60 second time period between scan cycles, Interior Gateway Protocol (IGP) instability or other network failures can cause black holes and routing loops to temporarily form.

BGP scan process is also responsible for the checks to determine whether the conditional advertisement should or should not advertise the conditional route. It also checks whether route dampening information needs to be updated.

bgp scan-time

There is also a VPN4 equivalent that is configured under the VPN4 address family and the syntax is slightly different. By default it runs every 15 seconds.

bgp scan-time import

Also see:
http://www.cisco.com/en/US/docs/ios/iproute/configuration/guide/irp_bgp_adv_features_ps6350_TSD_Products_Configuration_Guide_Chapter.html#wp1056233
http://www.cisco.com/en/US/docs/ios/12_0t/12_0t7/feature/guide/VPN_EN.html#wp1045721

December 9, 2009

CCIE SP gets OEQ

Filed under: CCIE SP — 21500 @ 1:34 pm

OEQ to be tested on SP from 4th Jan 2010. Guess I will have some use for that extra month after all. Just glad I worked it into the study schedule. Now, the next question, how to prepare for this and what to expect. If the RS feedback regarding the OEQ is anything to go by then it should be a walk in the park. Famous last words…

Call me crazy, but I am glad it has finally made it.

http://www.cisco.com/web/learning/le3/ccie/sp/lab_exam.html

Still nothing on:
http://www.cisco.com/web/learning/le3/ccie/announcements/index.html

and
https://learningnetwork.cisco.com/community/certifications/ccie_service_provider

December 7, 2009

CCIE Data Center

Filed under: CCIE — 21500 @ 4:30 pm

Very interesting track that is making the rounds in the usual gossip corners. Now I have already decided SP will be my last CCIE (excluding CCDE), but Data Center would be very tempting. Wife would kill me though :(

Apparently it will replace/consolodate the current Storage track which I think is a good idea. Storage in it self is not very appealing, but Data Center, if truly reflecting current and future real world Data Centers, then Yes, it is bound to be a winner.

In summary I would expect something to the effect of CCIE storage + CCIE RS less the routing. Possible devices:
Cat: 65xx, 49xx, 45xx, some smaller cats
Nexus: 5k, 7k
MSD 9xxx
Cisco ACE?
ACS

Realistically we will probably not see the high end devices and even then it will be a difficult track to prepare for due to the hardware.

Some earlier comments:
http://www.facebook.com/topic.php?uid=75717837879&topic=8511

Older Posts »

Powered by WordPress