Network Monitoring

Why This Matters

A user reports that the application is "slow." You have checked CPU, memory, and disk -- they are all fine. The problem is the network. But where? Is it bandwidth saturation? Packet loss? High latency to a downstream service? A single application hogging all the bandwidth?

Network problems are notoriously difficult to diagnose because they involve multiple hosts, multiple hops, and multiple layers. You cannot just look at one number and know the answer. You need tools that show you bandwidth usage per interface, per process, and per connection. You need tools that measure latency along every hop. You need tools that capture and analyze individual packets.

This chapter covers the essential network monitoring toolkit: iftop, nethogs, iperf3, ss, nload, mtr, and tcpdump. These tools will give you visibility into what your network is doing and help you find problems fast.

Try This Right Now

# What network interfaces do you have?
$ ip -br link show
lo         UNKNOWN  00:00:00:00:00:00
eth0       UP       52:54:00:ab:cd:ef

# How much traffic has passed through them?
$ ip -s link show eth0

# What sockets are open?
$ ss -tuln

# Quick latency check to a well-known host
$ ping -c 5 1.1.1.1

# How many established connections?
$ ss -t state established | wc -l

iftop: Bandwidth by Connection

iftop shows real-time bandwidth usage per connection on a network interface. Think of it as top for network traffic.

# Install iftop
$ sudo apt install iftop    # Debian/Ubuntu
$ sudo dnf install iftop    # Fedora/RHEL

# Run on the default interface
$ sudo iftop

# Run on a specific interface
$ sudo iftop -i eth0

# Show port numbers instead of service names
$ sudo iftop -P

                     12.5Kb         25.0Kb         37.5Kb         50.0Kb
└────────────────────┴──────────────┴──────────────┴──────────────┘
myhost              => db-server.local               4.23Kb  3.12Kb  2.89Kb
                    <=                                1.45Kb  1.23Kb  1.12Kb
myhost              => cdn.example.com              12.50Kb  8.90Kb  7.45Kb
                    <=                               45.2Kb  34.5Kb  28.9Kb
myhost              => api.service.com               2.34Kb  1.89Kb  1.56Kb
                    <=                                5.67Kb  4.23Kb  3.45Kb

────────────────────────────────────────────────────────────────────
TX:             cum:   2.34MB   peak:   125Kb   rates:  19.1Kb  14.0Kb  11.9Kb
RX:                    5.67MB           234Kb           52.3Kb  40.0Kb  33.5Kb
TOTAL:                 8.01MB           359Kb           71.4Kb  54.0Kb  45.4Kb

iftop Interactive Commands

Key	Action
`h`	Help
`n`	Toggle DNS resolution
`s`	Toggle source host display
`d`	Toggle destination host display
`S`	Toggle source port display
`D`	Toggle destination port display
`t`	Cycle through display modes (2-line, 1-line, sent only, received only)
`p`	Toggle port display
`P`	Pause display
`j/k`	Scroll the list
`1/2/3`	Sort by 2s/10s/40s average
`q`	Quit

Filtering Traffic

# Only show traffic to/from a specific host
$ sudo iftop -f "host 192.168.1.10"

# Only show traffic on a specific port
$ sudo iftop -f "port 443"

# Combine filters
$ sudo iftop -f "host 192.168.1.10 and port 80"

nethogs: Bandwidth by Process

While iftop shows bandwidth per connection, nethogs shows bandwidth per process. This answers the question: "which application is using all the bandwidth?"

# Install nethogs
$ sudo apt install nethogs    # Debian/Ubuntu
$ sudo dnf install nethogs    # Fedora/RHEL

# Run on default interface
$ sudo nethogs

# Run on a specific interface
$ sudo nethogs eth0

NetHogs version 0.8.7

    PID USER     PROGRAM                          DEV        SENT      RECEIVED
   1234 user     /usr/bin/rsync                   eth0      45.234    123.456 KB/sec
   5678 www-data /usr/sbin/nginx                  eth0       8.901     23.456 KB/sec
   9012 root     /usr/bin/apt                     eth0       0.234     15.789 KB/sec
   3456 user     /usr/bin/ssh                     eth0       1.234      0.567 KB/sec

  TOTAL                                                     55.603    163.268 KB/sec

nethogs Interactive Commands

Key	Action
`m`	Cycle between KB/s, KB, B, MB
`r`	Sort by received
`s`	Sort by sent
`q`	Quit

Think About It: You see that rsync is consuming 90% of your bandwidth. Is this a problem? How would you decide whether to throttle it?

iperf3: Network Benchmarking

iperf3 measures maximum achievable bandwidth between two endpoints. It is the gold standard for network performance testing.

# Install iperf3
$ sudo apt install iperf3    # Debian/Ubuntu
$ sudo dnf install iperf3    # Fedora/RHEL

Basic Bandwidth Test

You need two machines: a server and a client.

# On the server side:
$ iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

# On the client side:
$ iperf3 -c server-ip-address
Connecting to host server-ip-address, port 5201
[  5] local 192.168.1.20 port 43210 connected to 192.168.1.10 port 5201
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-1.00   sec   112 MBytes   941 Mbits/sec    0
[  5]   1.00-2.00   sec   112 MBytes   940 Mbits/sec    0
...
[  5]   0.00-10.00  sec  1.09 GBytes   939 Mbits/sec    0   sender
[  5]   0.00-10.00  sec  1.09 GBytes   938 Mbits/sec        receiver

Advanced iperf3 Tests

# Test with multiple parallel streams
$ iperf3 -c server-ip -P 4

# Test UDP performance (default is TCP)
$ iperf3 -c server-ip -u -b 100M
# -b sets target bandwidth for UDP

# Reverse test (server sends, client receives)
$ iperf3 -c server-ip -R

# Longer test (60 seconds instead of default 10)
$ iperf3 -c server-ip -t 60

# Test with specific window/buffer size
$ iperf3 -c server-ip -w 512K

# Bidirectional test
$ iperf3 -c server-ip --bidir

Interpreting iperf3 Results

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   112 MBytes   941 Mbits/sec    3   256 KBytes

Field	Meaning
Transfer	Amount of data transferred
Bitrate	Throughput achieved
Retr	TCP retransmissions (should be 0 or very low)
Cwnd	TCP congestion window size

High retransmissions indicate packet loss or network congestion.

ss: Socket Statistics Deep Dive

ss (socket statistics) is the modern replacement for netstat. It is faster and provides more detailed information.

Basic ss Usage

# All TCP connections
$ ss -t
State  Recv-Q  Send-Q  Local Address:Port  Peer Address:Port
ESTAB  0       0       192.168.1.20:22     192.168.1.10:54321
ESTAB  0       0       192.168.1.20:80     203.0.113.50:12345

# All listening ports
$ ss -tln
State  Recv-Q  Send-Q  Local Address:Port  Peer Address:Port
LISTEN 0       128     0.0.0.0:22           0.0.0.0:*
LISTEN 0       511     0.0.0.0:80           0.0.0.0:*
LISTEN 0       128     0.0.0.0:443          0.0.0.0:*

# Show process names
$ ss -tlnp
State  Recv-Q  Send-Q  Local Address:Port  Peer Address:Port  Process
LISTEN 0       128     0.0.0.0:22           0.0.0.0:*         users:(("sshd",pid=1234,fd=3))
LISTEN 0       511     0.0.0.0:80           0.0.0.0:*         users:(("nginx",pid=5678,fd=6))

Advanced ss Queries

# Show all connections to a specific port
$ ss -t dst :443

# Show connections from a specific IP
$ ss -t src 192.168.1.20

# Show connections in a specific state
$ ss -t state established
$ ss -t state time-wait
$ ss -t state close-wait

# Count connections by state
$ ss -t | awk '{print $1}' | sort | uniq -c | sort -rn
   245 ESTAB
    12 TIME-WAIT
     3 CLOSE-WAIT
     1 State

# Show detailed TCP info (congestion window, RTT, etc.)
$ ss -ti
ESTAB 0 0 192.168.1.20:22 192.168.1.10:54321
     cubic wscale:7,7 rto:204 rtt:1.234/0.567 ato:40 mss:1448 cwnd:10 ssthresh:20

# Show memory usage per socket
$ ss -tm
ESTAB 0 0 192.168.1.20:22 192.168.1.10:54321
     skmem:(r0,rb131072,t0,tb87040,f0,w0,o0,bl0,d0)

Finding Connection Problems with ss

# Large Recv-Q or Send-Q indicates a problem
$ ss -t | awk '$2 > 0 || $3 > 0'
# Recv-Q > 0: application is not reading data fast enough
# Send-Q > 0: peer is not acknowledging data (network issue or slow peer)

# Too many TIME-WAIT connections (can exhaust port space)
$ ss -t state time-wait | wc -l

# CLOSE-WAIT connections (application did not close the socket)
$ ss -t state close-wait
# These indicate a bug in the application

nload and bmon: Simple Bandwidth Monitors

nload

# Install nload
$ sudo apt install nload

# Monitor all interfaces
$ nload

# Monitor a specific interface
$ nload eth0

Device eth0 [192.168.1.20] (1/2):
==========================================================
Incoming:
                       ####
                    ########
                  ############
Curr: 23.45 MBit/s
Avg:  18.90 MBit/s
Min:   0.12 MBit/s
Max:  95.23 MBit/s
Ttl: 123.45 GByte

Outgoing:
               ##
             #####
            #######
Curr:  5.67 MBit/s
Avg:   4.23 MBit/s
Min:   0.01 MBit/s
Max:  12.34 MBit/s
Ttl:  45.67 GByte

bmon

# Install bmon
$ sudo apt install bmon

# Run bmon
$ bmon

# bmon shows per-interface bandwidth with graphs
# Use arrow keys to select interfaces
# Press 'd' for detailed statistics
# Press 'g' to toggle graph

Monitoring Network Errors

# Show interface statistics including errors
$ ip -s link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP
    link/ether 52:54:00:ab:cd:ef brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast
    1234567890 9876543     0       0       0    1234
    TX:  bytes packets errors dropped carrier collsns
     987654321 8765432     0       0       0       0

# Key error counters:
# errors   - hardware-level errors (bad CRC, etc.)
# dropped  - packets dropped (often buffer overruns)
# carrier  - carrier sense errors (cable/physical issues)
# collsns  - collisions (should be 0 on full-duplex)

# More detailed statistics
$ ethtool -S eth0 | head -20
NIC statistics:
     rx_packets: 9876543
     tx_packets: 8765432
     rx_bytes: 1234567890
     tx_bytes: 987654321
     rx_errors: 0
     tx_errors: 0
     rx_dropped: 0
     tx_dropped: 0
     rx_crc_errors: 0
     rx_frame_errors: 0

If you see errors or drops increasing, investigate:

Physical layer: Bad cables, loose connections, failing NIC
Buffer overruns: Increase ring buffer (ethtool -G eth0 rx 4096)
MTU mismatches: Jumbo frames on one side, standard on the other

MTR: Ping + Traceroute Combined

mtr (My Traceroute) combines ping and traceroute into a single tool. It continuously sends packets and shows per-hop latency and packet loss.

# Install mtr
$ sudo apt install mtr-tiny    # Debian/Ubuntu
$ sudo dnf install mtr         # Fedora/RHEL

# Run mtr to a destination
$ mtr example.com

                             My traceroute  [v0.95]
myhost (192.168.1.20) -> example.com (93.184.216.34)
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                         Packets               Pings
 Host                                  Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. gateway.local                       0.0%    50    0.5   0.6   0.3   1.2   0.2
 2. isp-router.example.net              0.0%    50    5.2   5.8   4.1   8.9   1.2
 3. core-router.isp.net                 0.0%    50    8.3   9.1   7.5  12.4   1.1
 4. peering.exchange.net                0.0%    50   15.2  16.8  14.1  22.3   2.1
 5. cdn-edge.example.com                0.5%    50   18.4  19.2  17.5  25.6   1.8
 6. example.com                         0.0%    50   20.1  21.3  18.9  28.4   2.3

Reading MTR Output

Column	Meaning
Loss%	Packet loss at this hop
Snt	Packets sent
Last	Most recent ping time (ms)
Avg	Average ping time
Best	Best (lowest) ping time
Wrst	Worst (highest) ping time
StDev	Standard deviation (consistency)

Diagnosing Network Problems with MTR

Scenario 1: Packet loss at one hop that continues downstream
 3. router-a     0.0%    5.2ms
 4. router-b    15.0%   45.2ms    ← Loss starts here
 5. router-c    14.8%   48.1ms    ← Loss continues
 6. destination 15.2%   52.3ms    ← Loss continues
Diagnosis: The problem is at hop 4.

Scenario 2: Loss at one hop but NOT downstream
 3. router-a     0.0%    5.2ms
 4. router-b    50.0%   45.2ms    ← Looks like loss
 5. router-c     0.0%   12.1ms    ← But no loss here!
 6. destination  0.0%   15.3ms    ← Or here!
Diagnosis: Router-b is simply deprioritizing ICMP packets.
           This is normal and NOT a problem.

Scenario 3: Latency spike at one hop
 3. router-a     0.0%    5.2ms
 4. router-b     0.0%   85.2ms    ← Huge jump
 5. router-c     0.0%   86.1ms    ← Stays high
 6. destination  0.0%   87.3ms    ← Stays high
Diagnosis: Congestion or routing issue at hop 4.

# Generate a report (non-interactive)
$ mtr -r -c 100 example.com
# -r = report mode
# -c 100 = send 100 packets

# Use TCP instead of ICMP (gets through more firewalls)
$ mtr -T -P 443 example.com

tcpdump: Packet Capture

tcpdump captures individual network packets. It is the most powerful network diagnostic tool and the one you reach for when nothing else shows you what is happening.

# Capture all traffic on an interface
$ sudo tcpdump -i eth0

# Capture traffic on a specific port
$ sudo tcpdump -i eth0 port 80

# Capture traffic to/from a specific host
$ sudo tcpdump -i eth0 host 192.168.1.10

# Capture only TCP SYN packets (new connections)
$ sudo tcpdump -i eth0 'tcp[tcpflags] & tcp-syn != 0'

# Save capture to a file (for analysis with Wireshark)
$ sudo tcpdump -i eth0 -w /tmp/capture.pcap -c 1000
# -c 1000 = capture 1000 packets then stop

# Read a capture file
$ sudo tcpdump -r /tmp/capture.pcap

# Show packet contents in ASCII
$ sudo tcpdump -i eth0 -A port 80 | head -50

# Show packet contents in hex and ASCII
$ sudo tcpdump -i eth0 -X port 80 | head -50

# Capture with timestamps
$ sudo tcpdump -i eth0 -tttt port 443

Common tcpdump Filters

# DNS queries
$ sudo tcpdump -i eth0 port 53

# HTTP traffic
$ sudo tcpdump -i eth0 port 80 or port 443

# Traffic between two specific hosts
$ sudo tcpdump -i eth0 host 192.168.1.10 and host 192.168.1.20

# Only incoming traffic
$ sudo tcpdump -i eth0 dst host $(hostname -I | awk '{print $1}')

# Exclude SSH traffic (useful when capturing over SSH)
$ sudo tcpdump -i eth0 not port 22

# Large packets (possible MTU issues)
$ sudo tcpdump -i eth0 'greater 1500'

WARNING: tcpdump on a busy server generates enormous output. Always use filters to narrow down the traffic. Use -c to limit the number of packets captured. Capturing to a file (-w) is more efficient than displaying on screen.

Think About It: You are connected to a server via SSH and need to capture network traffic. If you run tcpdump -i eth0 without any filters, what problem will you immediately encounter?

Debug This

Users report intermittent slowness when accessing your web server. The application itself is healthy. You suspect a network issue.

Investigation steps:

# Step 1: Check for errors on the interface
$ ip -s link show eth0
# Look for errors, dropped packets

# Step 2: Check connection states
$ ss -t state established | wc -l
# If this number is very high (thousands), you may have connection exhaustion

# Step 3: Check for retransmissions
$ ss -ti | grep -c retrans
# High retransmission count indicates packet loss

# Step 4: MTR to clients (or from client to server)
$ mtr -r -c 100 client-ip
# Look for packet loss along the path

# Step 5: tcpdump for TCP retransmissions
$ sudo tcpdump -i eth0 'tcp[tcpflags] & tcp-syn != 0' -c 100
# Are SYN packets being retransmitted? That means connection setup is failing.

Common findings:

Packet loss at an intermediate router: contact ISP
NIC errors increasing: replace cable or NIC
Too many TIME-WAIT connections: tune net.ipv4.tcp_tw_reuse
Large Send-Q values: application or peer cannot keep up

┌──────────────────────────────────────────────────────────┐
│                  What Just Happened?                      │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  Network monitoring tools:                                │
│                                                           │
│  Bandwidth monitoring:                                    │
│  - iftop: per-connection bandwidth (who is talking?)      │
│  - nethogs: per-process bandwidth (which app?)            │
│  - nload/bmon: per-interface bandwidth (how much?)        │
│                                                           │
│  Benchmarking:                                            │
│  - iperf3: max throughput between two points              │
│                                                           │
│  Connection analysis:                                     │
│  - ss: socket states, queues, processes                   │
│  - ss -ti: TCP internals (RTT, cwnd, retrans)            │
│                                                           │
│  Path analysis:                                           │
│  - mtr: per-hop latency and packet loss                   │
│                                                           │
│  Packet capture:                                          │
│  - tcpdump: capture and analyze individual packets        │
│                                                           │
│  Error monitoring:                                        │
│  - ip -s link: interface error counters                   │
│  - ethtool -S: NIC-level statistics                       │
│                                                           │
└──────────────────────────────────────────────────────────┘

Try This

iftop exploration: Run sudo iftop -P on your system and browse the web in another window. Watch the connections appear and disappear. Press n to toggle DNS resolution.
nethogs discovery: Run sudo nethogs and start a large download or run apt update. Identify which process is consuming the most bandwidth.
iperf3 benchmark: Set up iperf3 between two machines (or between your machine and a public iperf3 server). Measure your actual throughput and compare it to your link speed.
ss deep dive: Run ss -t state established and identify all connections on your system. How many are SSH? HTTP? Are there any in CLOSE-WAIT state (indicating application bugs)?
MTR diagnosis: Run mtr -r -c 100 to several different destinations (google.com, your ISP, a server in another country). Compare the latency and loss at each hop. Can you identify where the latency jumps significantly?
Bonus challenge: Capture HTTP traffic with tcpdump -i eth0 -A port 80 while making a curl request to an HTTP (not HTTPS) site. Can you read the HTTP headers and body in the capture? Now try the same with port 443 -- can you read HTTPS traffic? Why or why not?

Linux Book: From First Boot to Production