Monitoring and Troubleshooting

Monitoring Latency on Infrastructure Services

9 min

navigating the nile portal dashboard see figure 1 for a typical nile dashboard view the slas on this instance are grayed out because slas are disabled on lab test instances of the nile service block (nsb) to check for latency in infrastructure (dhcp, dns, radius, or internet connectivity you can click on the infrastructure tile or click on infrastructure on the left hand navigation to check for latency on applications you can click on applications see figure 2 for where to click viewing infrastructure latency clicking on infrastructure in the left hand navigation brings you right to details of infrastructure experience for the last 2 hours if you click on the infrastructure tile, you then need to select the site you want to look at in order to reach this view figure 3 shows how the infrastructure experience view is shown before you select one of the infrastructure components to drill down into the details for to view details of a specific infrastructure component click on the name of the component when you select a component it will open a time based graph of availability and experience for that infrastructure type figure 5 shows how the dns experience over the last two hours has been in this case, some spots of higher latency are seen on the graph and marked by "bad experience" along with dips in latency showing improvement note that on this graph, "now" is on the right side and as you look further left, you are looking farther back in time to change the time duration to see longer (or shorter) time periods, you can adjust the time settings on the top right of the nile portal screen viewing application latency in order to view application latency in more detail, from the nile portal dashboard you click on the application tile then navigate on the map and click on the site where you want to view application latency for see figure 7 once the site is selected, you can see the performance of all the applications the red on any application bar shows problems and green shows good performance see figure 8 in figure 8 we can see several of the applications had some issues selecting gcp, we can dive into the details of what happened once again you will click on the name of the application this opens a time based graph view of gcp availability and performance (latency) in a few cases you can see that the application has high latency and those spots are marked in red as a bad experience if you hover your mouse pointer over the bad spot you can get a bit more detail on the experience of the gcp application at that point in time, as shown in figure 11 here you see the latency went from an average of 250 ms up to 396 ms, triggering the alert application performance is tested from our headend controller northbound out to the internet, so performance recorded here reflects the state of that application experience over the internet at that point in time how is latency calculated by nile for infrastructure and applications? for dns, dhcp, internet, radius and application latency, nile monitors and compares individual transaction times to prior collected latency data during the last 8 hours if the current latency number is above the baseline moving average (from the last 8 hours), then nile will label the number a bad experience point for example if the baseline is 8 msecs (calculated based on the data from the last 8 hours) and the latency in the current minute t is 10 msecs, then we have a bad experience point at time t the baseline is calculated for each server if there are two dhcp servers, each dhcp server has its own baseline based on the past data if 10 msecs becomes norm for 8hrs continuously, then 10 msecs will become the baseline and as long as the value after t+8 hours is at or less than 10 msecs, everything will be good what transactions are used to calculate latency? the following transactions are used to calculate latency dhcp testing dhcp is performed from the headend controller, using a synthetic dhcp transaction, acting as if it were a client requesting a new ip address this tests both server availability, and dhcp service availability on the server dns testing dns is performed from the headend controller, using script that creates an actual dns request from the server and calculates the response time this tests both server availability, and dns service availability on the server radius testing radius is performed from the headend controller, using a synthetic radius authentication request from the server and calculating the response time the synthetic transaction requires setting up a test authentication account on the actual radius server this tests both server availability, and dns service availability on the server internet testing internet is performed from the headend controller, using icmp pings to google com and calculating the response time this tests the path from the headend northbound to google and back, so does include additional infrastructure on prem like routers and firewalls that it transits applications testing applications is performed from the headend controller, using a "curl" command to try to load the initial application website page, and calculating the response time in some cases if the application website refuses "curl" commands, we can change this to icmp pings instead one exception like this is skype which is tested via icmp pings

How to View Internet Connection Issues

Discover top network users