On Hiring and Being Hired
I've spent some time recently thinking about what it takes for employers to find the right people. My employer has had openings for a long time and we get very few 'qualified' candidates. Trying to find someone with good tech skills can be tricky. Finding people with great tech skills can seem impossible. Finding great tech, and is also a great personality fit can take forever... I would argue that the easiest path is to grow from within, but some companies don't have a culture that allows for that type of movement, whether they people political, logistical, or personality issues. Short of that, if you have to choose between a great fit, and a great tech person... I'd go with the smartest fit you can, even if you need them to grow in specific tech areas. If you are working lean, and who isn't these days, you can't afford to have someone that doesn't fit in to a team. Engineers must be able to trust each others work, otherwise before you do anything you must recheck all of the work that preceded you, and that's a waste of everyones time.
Impending doom! No, not IPv4 exhaustion…
For those of you getting full tables from ISPs, you'll want to check what your RP's and, more importantly, Line Cards have in the way of TCAM space. If it's 256M, you are getting dangerously close to running out TCAM space, and the failure modes are many and varied... Some products, like Cisco GSR's, simply stop forwarding. Others simply stop adding new routes to the hardware FIB, with leads to hard to track down failure scenarios.
As background, the difference between memory and TCAM space is memory is needed to hold all of your BGP/IGP updates, routes, etc... the TCAM holds the FIB, which is the *Best route for any given prefix. TCAM space grows in size over time, but doesn't grow terribly with more providers. If you have multiple ISP's, that will impact your memory needs more than your TCAM needs. that being said, you should really have tcam sizes of 512M or higher.
If you are running IPv6, you whould look how the TCAM is being partitioned. Even though there may only be 6,000 or so IPv6 routes, the TCAM may be carved up allocating a much bigger chunk than might be optimal. Best to read through product sheets/white papers etc soon to find out.
Trade Show Season
Whether it's Interop, or various vendor events, trade shows are something that lots of money get spent on every year, but is often considered a boondoggle or waste of company money/time. If you go for the swag, the presentations, or the 'training sessions' then it very well may be a waste of time. Here are some of my tips to get the most out of trade shows.
1) Have a firm understanding of the problems that are facing you and your network. Be ready to have a conversation that doesn't involve config syntax, but instead focuses on architecture and roadmaps to solve that problem in the best way. If a feature you need is available now, then you can hash out the specifics with your SE.
2) Learn how to get to the right guy at a booth, be prepared to go through several other people to get to 'the guy', typically a product manager. Don't ask for them right away, as your needs may not merit his time, but start with your objective, and increase the conversations technical detail until you exhaust each successive persons abilities. They are usually really good at bringing you the right person to answer a question... AS LONG AS IT ISN'T ABOUT IMPLEMENTATION. don't try to get free consulting from these guys. They probably won't necessarily give you good advice.
3) Know some options on how to solve you problems, and dig into some upcoming RFC's and ask the vendors how they plan to implement RFC XYZ on this hardware platform, if it's even a possibility, then talk about alternatives.
4) It's about planning. Not for next week, but from 2 quarters or more from now... What software and hardware features are you going to want, and on which platforms are they going to be available. You want to purchase accordingly between now and the time that feature XYZ is available.
All I want for Christmas
It always seems that vendors don't really know what's useful. What I would find really useful right now is a smallish (1-2U) switch that does 24-48x10G (not unheard of, right) and can support DWDM optics so I can use it with a Passive DWDM Mux. Force-10 probably has the only thing close to what I want, but not quite there on port density. Doesn't need routing features, so a switch that can forward port to port at line rate is all I need. If anyone has a good recommendation, let me know.
Bi-Directional Forwarding Detection (BFD)
Bi-Directional Forwarding Detection is a Layer 2 Hello mechanism that allows you verify end-to-end connectivity of media by using small hello packets to directly connected neighbors. For things such as point-to-point Serial interfaces, this isn't such a big deal, because the interfaces react quickly to problems in most cases, however, in the world of Ethernet connections, your physical media locally may be just fine, but you may be a switched path in the middle that is down. BFD allows you to configure hello's to your routing peer that come right back to you so you know that both tx and rx are working. There are hooks in most routing protocols in newer IOS trains to use BFD as a link-failure mechanism, so it can vastly improve BGP and OSPF convergence.
For some Cisco reading, check out this link. For Juniper, here is a PDF.
Bandwidth and Geography
Obtaining Bandwidth in today's market is simple, and its not. Typically, if you are for example, a Level 3 customer, you call your rep and give them locations A and B, and tell them to quote you a price. Maybe you think its a little high, and you ask a couple of other vendors, or maybe some people you've worked with in the past. These are all great first steps, especially if you are familiar with the local telco landscape. But, if you are NOT familiar with the telco landscape, or you don't even know what I'm talking about, then here are some first good steps.
First, a short history lesson. During the mid to late 90's, everyone and their mother was laying fiber, especially in metro areas, but in and through other areas as well. When they needs 10 pairs, they laid several hundred, cause they don't want to have to dig up the street again anytime soon. All sound businesss sense. However, an economic downturn caused many of theses companies to whither and either die, or sell of many of their unused assets to cover expenses. Depending on your city, or your street even, someone may have a TON of dark/dim fiber capacity right at your curb, and be willing to sell you, if not a dark pair from A to B, then maybe a lamda or two off of their DWDM equipment, and allow you to have 1G or 10G for incredibly cheap. How do you find this out? For starters, ask the building super which providers have equipment in the building if you are in a large multi-tenant building. Being On-Net is the first step to cheap bandwidth. Second, find a reseller/broker who doesn't mind getting competative when it comes to pricing, and who knows the local landscape. If he isn't local, he may not be the best call.
Avoid paths that you have to pay and access loop on if at all possible. This is just paying a middle man, only do it if you really have to. Go buy some beers for some local network engineers and find out how they are solving the access problems at their sites. They will be more than happy to share their methods, especially if you springing for a good microbrew and some appetizers. :)
Cisco’s replacement for HSRP, its called GLBP (and yes, it’s proprietary)
While the bulk of this info can best be found in this guide. Here are some highlights. GLBP, or Gateway Load Balancing Protocol, is Cisco's newest Gateway redundancy feature. It elects one of the members of the GLBP group to the GLBP Virtual Gateway, and this acts as the ARP responder. Other members of the GLBP group can act as GLBP Virtual Forwarders. What the Virtual Gateway will do is (via either host-dependant, round-robin, or weighted methods) resolve an ARP request to 1 of up to 4 possible GLBP Virtual Forwarders. This allows all gateway to be utilized and forwarding traffic at any given time.
More config examples from above link to get the flavor:
interface fastethernet 0/0
ip address 10.20.30.2 255.255.255.0
glbp 10 authentication text <passphrase>
glbp 10 load-balancing weighted
glbp 10 group timers msec 300 msec 1000
glbp 10 ip 10.20.30.1
This config will get you hello's every .3 seconds, and a timeout of 1 sec... Check your Hardware if you ratchet it down this far, as this is 10x faster than defaults.
Cisco Chassis Line Cards – Getting to the guts
As you know, starting with I believe the VIP line cards in the 7500, Cisco has put proc's and memory on individual line cards in most of their high performace chassis. Here are a couple of commands to look at what those line cards are actually doing. You can log into each card, and look at it as if it were its own router. Bottleneck in CPU and memory within a line card are sometimes hard to spot, so if you are seeing packet loss, these commands can help you find out a little more info.
Cisco 7500 series: if-console <slot>
Cisco 6500/7600: remote login <slot>
GSR: attach <slot>
I've seen situations where line cards that were doing excessive buffering get maxed on CPU of the line card, so if you are seeing packet loss in an odd place, make sure you attach to the line card and peak at what is going on. If these commands are new to you, please let me know if you find them interesting or useful by leaving a comment.
Blaming the network – New Netblock problem
Recently an ex-employer of mine got a new netblock from ARIN, and migrated several services to the new address space. Almost immediately they started being unable to reach several websites, nothing too critical, but an annoyance for several of the employees. I was chatting with them, and some other network engineers about the problem, and we were ruling out different things that it could be. We ran through a list of things it couldn't be, ruling out things like firewall rules/problems, asymetric routing, and pretty much every other network diagnostic that would could think of. It came down to everything worked fine except for DNS. Their nameservers wouldn't reply to our dnscache instances. Although I haven't confirmed it with them yet, I'm pretty sure the problem stem's from the fact that people haven't updated their DNS Bogon filters in quite a while. Looks like a network problem, but in fact is the problem of a remote site... The new ARIN block was assingned out of 174.x.x.x/8, and that wasn't allocated by ARIN until 2008.
If you run some DNS servers, make sure you check to see if you are filtering bogons, and if so, to update regularly, or just remove the darn things...
VPN Slowness – Quick Fixes
There have been lots of times where users complain about slowness, but once in a while, people get real specific. Sometimes its "email is slow from NY" or "I can't run this app from Philadelphia." If you have done the usual stuff, run extended ping tests, traceroutes looking for asymmetry, etc. Three quick things to check to try to ease the pain.
- Port Exhaustion - Make sure you aren't hitting a port exhaustion problem. I've mentioned this before, but this one just feels odd until you look for it.
- ALG's - Make sure the appropriate ALG's are enabled on your device, if you have a protocol that embeds IP info in your packets (SIP for instance) make sure you have a device that is capable of handling that appropriately.
- Set you MSS to something in the neighborhood of 1380-1420. If you are having an MTU problem that can't be seen due to blocking ICMP messages, setting your MSS is the quick way to deal with this on almost all firewalls. It won't impact performance much at all, and will astound users if this was the problem. They come in one day, and its magically fixed... The problem with exceeding the MTU size is because of the added IPSec headers associated with VPN's.