Network Defense Against Perception - Updated
In this non-cyber network defense post we will discuss creating a network assessment document
We've all heard it a million times - "The network sucks!" - "The network is slow!" - "The firewall is the problem, please disable inspection!" etc. However as most network and security professionals have found, this is usually not the case and we've spent countless hours trying to prove it. Often when coming into new environments, you begin to see the reoccurring customer or end-user complaints. Sometimes you are able to dispel the complaints or maybe you find and correct something, however sometimes the problems will continue (or have been happening for "years") and escalations will happen. That is where the network assessment comes into play. A lot of times there is finger pointing or fighting between teams and the only thing that happens is the customer suffers.
The intent of this post and document is to provide a framework/style of presentation someone could use when compiling different types of data. I won't go deep into what you should do as far as step-by-step when troubleshooting, but I do showcase different data points I have used in my network assessments. Perhaps you've gotten some tickets which you closed out for the same location. Sent a packet capture here, sent a graph there for this or that issue, but that isn't always going to calm down the mobs of end-users experiencing long buffering for Netflix movies and Instagram stories. That is where this assessment document comes into play, it is presented in a show-and-tell format that progressively moves through different aspects of the network to build the case of innocence. It doesn't matter if it is a specific application or a specific location, the format stays the same, only the language and graphics will change based on what the analysis is covering (or the requirements).
As you can see from the below cover page, the idea is to provide a presentable document which includes multiple technical points you have reviewed/troubleshot/collected and complied. This document then becomes a historical look into time for when the issue pops back up again, when a baseline is required, or something someone can use to reference when working with other teams. Typically I will look at current stats and info from the last 30-60 days.
The network assessment has one primary objective which is to locate the problem within the network or eliminate the network of any issues. You will either find an issue in your domain which needs correction or you will provide recommendations for other groups to check. The goals of the assessment are to first demonstrate competency in that you know your equipment, configuration, network, setup etc. Second is to meticulously check every aspect of your domain so that there is no disputing your conclusion. Often this takes much in-depth analysis (time) to compile, so that is the downside of this approach. However once things get escalated the issue has to be resolved; therefore it is in your best interest to have a document like this you can bring up in the future. The third goal is to educate the audience. Although not every person will understand all the content, as it's often technical, people prefer to have visual representations and graphs to help understand the points the presenter is conveying. In addition, this can give management ammunition to push other teams to move on the issue.
You will notice different sections within my example document. We start with some background information and analysis criteria which goes back to educating the audience and also laying some ground work prior to moving into items you troubleshot and the data you collected. Part D below covers the assessment approach.
Then we showcase the topology as well as also check basic things like speed/duplex/errors and configuration. From there we move to part 4 - bandwidth/CPU reports which are essentially check boxes, then on to part #5 where we introduce flow reports - again analysis write ups are key. Depending on what type of tools you have for flows will depend on what you can show here. Netflow only? Then focus on top talkers or analyze what the most used protocols are or perhaps verify certain traffic is being tagged properly for QoS. In addition, maybe you have taps and packet collectors, then you can likely showcase flow response times and number of TCP re-transmissions etc. Below is an example of part 4 and 5 of my assessment example.
After that we have firewall, wireless, and on-premises testing respectively. The direction of your language, explanations, and methodology largely depends on what the issue is. For example if the customer is pushing hard that the problem is the firewall, you would probably skip the wireless and bandwidth sections and focus heavily on packet captures and firewall logs. Moreover if the problem is related slowness, perhaps during specific times of day, then you'd probably focus heavily on historical reporting and end-to-end metrics, QoS drops etc.
Keep in mind these are just examples shown and are in no way an end-all be-all. Get creative and think outside the box on what data you should present. Obviously not every environment or problem will warrant the need to compile all this data. Some assessments might only need a 2 page doc with background, data/screenshots, and a conclusive write up.
To conclude, in my opinion for those unsolvable tough issues, unless the other IT group involved can provide a similar evaluation regarding all the devices and software etc. they control, then they truly haven't eliminated their domain. Inside my example document you will find examples of written analysis and screenshots. The intro covers some of the goals and formatting etc. The conclusion is key and I think it's good to provide recommendations if you did not find anything wrong in your area of responsibility. However if you did find something wrong which could be the cause, I recommend to be transparent and take responsibility, its better for your credibility than to hide a problem and fix it behind the scenes. I hope this helps you !
[note: this could apply for most technical IT domains - not just networking - you'd just have to change the analysis criteria.]