Routing & Network Troubleshooting
Why This Matters
It is 2 AM. Your monitoring system is firing alerts: the application is down. You SSH into the server and discover the app itself is fine -- it just cannot reach the database server on another subnet. Or perhaps a DNS change has gone wrong. Or maybe a newly added firewall rule is silently eating packets.
Network problems are among the most common and most stressful issues you will face as a Linux admin. They are also the most satisfying to solve, because Linux gives you incredible tools to peel back every layer of the network stack and see exactly what is happening. This chapter teaches you a systematic approach to diagnosing network issues and the tools that make it possible.
Try This Right Now
Run these commands and observe the output. They form the foundation of every network troubleshooting session:
# Can I reach the local network?
ping -c 3 $(ip route | awk '/default/ {print $3}')
# Can I reach the internet by IP?
ping -c 3 1.1.1.1
# Can I resolve DNS?
ping -c 3 google.com
# What is my routing table?
ip route show
# What connections are active right now?
ss -tunap
If the first ping fails, you have a local network problem. If the second works but the third fails, you have a DNS problem. If all three work, your basic connectivity is fine and the problem is elsewhere.
The Linux Routing Table
Every Linux system maintains a routing table that tells the kernel where to send packets. When a packet needs to leave your machine, the kernel looks up the destination IP in the routing table and picks the best matching route.
Viewing the Routing Table
# Modern way
ip route show
# Example output:
# default via 192.168.1.1 dev eth0 proto dhcp metric 100
# 10.0.0.0/24 dev eth1 proto kernel scope link src 10.0.0.1
# 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.100
# Legacy way (avoid, but you'll see it in old docs)
route -n
netstat -rn
Let's decode this:
+----------------------------------------------------------------------+
| Route | Meaning |
|----------------------------------------|-----------------------------|
| default via 192.168.1.1 dev eth0 | Default gateway: send all |
| | unknown traffic to |
| | 192.168.1.1 via eth0 |
| | |
| 10.0.0.0/24 dev eth1 src 10.0.0.1 | The 10.0.0.0/24 network is |
| | directly attached to eth1. |
| | Use source IP 10.0.0.1. |
| | |
| 192.168.1.0/24 dev eth0 | The 192.168.1.0/24 network |
| src 192.168.1.100 | is directly attached to |
| | eth0. |
+----------------------------------------------------------------------+
How Routing Decisions Work
The kernel uses longest prefix match: it picks the most specific route that matches the destination.
Destination: 10.0.0.50
Routing table:
default via 192.168.1.1 dev eth0 (/0 -- matches everything)
10.0.0.0/24 dev eth1 (/24 -- matches 10.0.0.*)
10.0.0.48/30 dev eth2 (/30 -- matches 10.0.0.48-51)
Winner: 10.0.0.48/30 via eth2 (longest prefix = most specific)
The Default Gateway
The default gateway is the route of last resort. If no other route matches a destination, the packet goes to the default gateway. On most single-homed machines, this is the only route that matters for internet-bound traffic.
# View the default gateway
ip route show default
# Set a default gateway (temporary)
sudo ip route add default via 192.168.1.1
# Replace the default gateway
sudo ip route replace default via 192.168.1.254
# Delete the default gateway
sudo ip route del default
Adding Static Routes
Static routes tell the kernel about networks that are not directly connected but are reachable through a specific gateway.
# Add a static route: "to reach 10.10.0.0/16, go via 192.168.1.254"
sudo ip route add 10.10.0.0/16 via 192.168.1.254
# Add a route through a specific interface
sudo ip route add 172.16.0.0/12 via 10.0.0.1 dev eth1
# Add a route with a specific metric (lower = preferred)
sudo ip route add 10.20.0.0/16 via 192.168.1.254 metric 200
# Delete a static route
sudo ip route del 10.10.0.0/16
Think About It: You add a static route to 10.10.0.0/16 via 192.168.1.254, but pinging 10.10.0.5 still fails. The gateway 192.168.1.254 is reachable (you can ping it). What could be wrong?
Several possibilities: the gateway at 192.168.1.254 does not have a route to 10.10.0.0/16 either, the gateway does not have IP forwarding enabled, a firewall on the gateway is blocking forwarded traffic, or the destination host does not have a return route back to you.
Making Static Routes Persistent
As always, ip route add is temporary. To persist routes:
NetworkManager (nmcli):
sudo nmcli connection modify "my-connection" +ipv4.routes "10.10.0.0/16 192.168.1.254"
sudo nmcli connection up "my-connection"
systemd-networkd (add to your .network file):
[Route]
Destination=10.10.0.0/16
Gateway=192.168.1.254
Debian /etc/network/interfaces (add to the interface stanza):
up ip route add 10.10.0.0/16 via 192.168.1.254
Netplan:
network:
ethernets:
eth0:
routes:
- to: 10.10.0.0/16
via: 192.168.1.254
IP Forwarding: Turning Linux into a Router
By default, Linux drops packets that arrive on one interface and are destined for another. To make Linux forward packets (act as a router), you must enable IP forwarding.
# Check current setting (0 = disabled, 1 = enabled)
cat /proc/sys/net/ipv4/ip_forward
# Enable temporarily
sudo sysctl -w net.ipv4.ip_forward=1
# Enable permanently
echo "net.ipv4.ip_forward = 1" | sudo tee /etc/sysctl.d/99-ip-forward.conf
sudo sysctl -p /etc/sysctl.d/99-ip-forward.conf
For IPv6 forwarding:
sudo sysctl -w net.ipv6.conf.all.forwarding=1
Simple NAT Gateway
A common setup: Linux machine with two interfaces acting as a gateway for an internal network.
Internet Internal Network
| |
| eth0 (public IP) eth1 (10.0.0.1/24)
+--------[ Linux Gateway ]--------+
IP forwarding ON
NAT (masquerade)
|
[ Internal hosts ]
10.0.0.0/24
# Enable forwarding
sudo sysctl -w net.ipv4.ip_forward=1
# Add masquerade rule (replace eth0 with your external interface)
sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
# Allow forwarding between interfaces
sudo iptables -A FORWARD -i eth1 -o eth0 -j ACCEPT
sudo iptables -A FORWARD -i eth0 -o eth1 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
Internal hosts need their default gateway set to 10.0.0.1, and they will be able to reach the internet through the Linux gateway.
The Troubleshooting Methodology
When network connectivity fails, work through the layers systematically. Do not skip ahead -- each step rules out a category of problems.
+-------------------------------------------------------------------+
| Step | Check | Tool | Rules Out |
|--------|----------------------|---------------|--------------------|
| 1 | Is the interface up? | ip link | Physical/driver |
| 2 | Do I have an IP? | ip addr | DHCP/config |
| 3 | Can I reach the | ping gateway | Local network/ |
| | gateway? | | ARP/switching |
| 4 | Can I reach the | ping 1.1.1.1 | Routing/gateway/ |
| | internet by IP? | | ISP |
| 5 | Does DNS resolve? | dig, nslookup | DNS configuration |
| 6 | Can I reach the | curl, telnet | Firewall/app/port |
| | target service? | | issues |
+-------------------------------------------------------------------+
Let's go through each tool in detail.
Tool: ping -- Basic Connectivity
ping sends ICMP echo request packets and waits for replies. It tests basic IP
connectivity and measures round-trip time.
# Ping a host (Ctrl+C to stop)
ping 192.168.1.1
# Send exactly 5 pings
ping -c 5 google.com
# Set a timeout of 2 seconds per ping
ping -W 2 -c 3 10.0.0.1
# Flood ping (root only, sends as fast as possible)
sudo ping -f -c 1000 192.168.1.1
# Ping with a specific source interface
ping -I eth1 10.0.0.1
What ping results tell you:
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=57 time=12.3 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=57 time=11.8 ms
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 11.8/12.0/12.3/0.250 ms
- time: Round-trip time. Over 100ms to local resources is suspicious.
- ttl: Time To Live. Decreases at each hop. Helps identify how far away a host is.
- packet loss: Any loss is a problem. More than 1-2% is significant.
Think About It: You can ping 8.8.8.8 but not google.com. You can ping 1.1.1.1 but not cloudflare.com. What is the most likely problem?
DNS resolution is broken. IP connectivity works fine. Check /etc/resolv.conf for your
DNS server settings and test with dig google.com @8.8.8.8.
Tool: traceroute / tracepath -- Tracing the Path
traceroute shows every router (hop) between you and the destination. It works by
sending packets with increasing TTL values.
# Basic traceroute
traceroute google.com
# Use ICMP instead of UDP (sometimes more reliable)
sudo traceroute -I google.com
# Use TCP SYN on port 80 (gets through more firewalls)
sudo traceroute -T -p 80 google.com
# tracepath (no root needed, uses UDP)
tracepath google.com
Example output:
traceroute to google.com (142.250.80.46), 30 hops max, 60 byte packets
1 192.168.1.1 (192.168.1.1) 1.234 ms 1.112 ms 1.001 ms
2 10.0.0.1 (10.0.0.1) 5.432 ms 5.321 ms 5.210 ms
3 isp-router.example.com (203.0.113.1) 10.123 ms 10.234 ms 10.345 ms
4 * * *
5 142.250.80.46 (142.250.80.46) 15.678 ms 15.567 ms 15.456 ms
* * *means that hop did not respond. This is often normal -- many routers are configured not to reply to traceroute probes.- If the trace stops at a certain hop and never progresses, there is likely a routing problem or firewall at that hop.
Distro Note:
tracerouteis not always installed by default. Install it withsudo apt install traceroute(Debian/Ubuntu) orsudo dnf install traceroute(RHEL/Fedora).tracepath(fromiputils) is usually pre-installed.
Tool: dig -- DNS Troubleshooting
dig is the gold standard for DNS troubleshooting. It queries DNS servers directly and
shows the full response.
# Basic lookup
dig google.com
# Query a specific DNS server
dig @8.8.8.8 google.com
# Look up a specific record type
dig google.com MX
dig google.com AAAA
dig google.com NS
# Short output (just the answer)
dig +short google.com
# Trace the full resolution path
dig +trace google.com
# Reverse DNS lookup
dig -x 8.8.8.8
When DNS is not working, the most useful test is:
# Test with your configured DNS server
dig google.com
# Test with a known-good public DNS server
dig @8.8.8.8 google.com
If the second works but the first does not, your configured DNS server (in
/etc/resolv.conf) is the problem.
# Check your DNS configuration
cat /etc/resolv.conf
# On systems using systemd-resolved
resolvectl status
Tool: ss -- Socket Statistics
ss replaces the older netstat command. It shows listening ports, active connections,
and socket details.
# Show all listening TCP ports
ss -tlnp
# Show all listening UDP ports
ss -ulnp
# Show all established connections
ss -tnp
# Show connections to a specific port
ss -tnp | grep :443
# Show socket summary statistics
ss -s
# Show all TCP sockets in all states
ss -ta
Breaking down the flags:
-t: TCP-u: UDP-l: Listening only-n: Show numbers (no DNS resolution)-p: Show the process using each socket
Example output:
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=1234,fd=3))
LISTEN 0 511 0.0.0.0:80 0.0.0.0:* users:(("nginx",pid=5678,fd=6))
ESTAB 0 0 192.168.1.100:22 192.168.1.50:54321 users:(("sshd",pid=9012,fd=4))
This tells you:
- SSH is listening on all interfaces, port 22
- Nginx is listening on all interfaces, port 80
- There is one active SSH connection from 192.168.1.50
Tool: tcpdump -- Packet Capture
tcpdump is your most powerful network troubleshooting tool. It captures and displays
actual packets on the wire. When all else fails, tcpdump tells you exactly what is
happening.
# Capture all traffic on eth0
sudo tcpdump -i eth0
# Capture only traffic to/from a specific host
sudo tcpdump -i eth0 host 10.0.0.5
# Capture only TCP traffic on port 80
sudo tcpdump -i eth0 tcp port 80
# Capture DNS traffic
sudo tcpdump -i eth0 port 53
# Capture ICMP (ping)
sudo tcpdump -i eth0 icmp
# Show packet contents in ASCII
sudo tcpdump -i eth0 -A port 80
# Show packet contents in hex and ASCII
sudo tcpdump -i eth0 -XX port 80
# Save capture to a file (for analysis in Wireshark)
sudo tcpdump -i eth0 -w capture.pcap
# Read a saved capture
sudo tcpdump -r capture.pcap
# Capture only 100 packets
sudo tcpdump -i eth0 -c 100
# Don't resolve hostnames (faster)
sudo tcpdump -i eth0 -n
tcpdump Filter Expressions
You can build complex filters:
# Traffic to or from a subnet
sudo tcpdump -i eth0 net 10.0.0.0/24
# Traffic from a source to a specific destination port
sudo tcpdump -i eth0 src 192.168.1.50 and dst port 443
# Traffic that is NOT SSH (filter out your own session)
sudo tcpdump -i eth0 not port 22
# SYN packets only (new TCP connections)
sudo tcpdump -i eth0 'tcp[tcpflags] & tcp-syn != 0'
# HTTP GET requests
sudo tcpdump -i eth0 -A 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'
Reading tcpdump Output
14:23:45.123456 IP 192.168.1.100.54321 > 93.184.216.34.80: Flags [S], seq 1234567890, win 65535, length 0
14:23:45.234567 IP 93.184.216.34.80 > 192.168.1.100.54321: Flags [S.], seq 987654321, ack 1234567891, win 65535, length 0
14:23:45.234600 IP 192.168.1.100.54321 > 93.184.216.34.80: Flags [.], ack 1, win 65535, length 0
This is a TCP three-way handshake:
- [S] -- SYN: Client initiates connection
- [S.] -- SYN-ACK: Server acknowledges and responds
- [.] -- ACK: Client acknowledges, connection established
Common flags: [S] SYN, [S.] SYN-ACK, [.] ACK, [P.] PSH-ACK (data), [F.]
FIN-ACK (close), [R] RST (reset/reject).
Safety Warning: tcpdump can capture sensitive data including passwords sent in plain text, session tokens, and personal information. Use it responsibly and be mindful of capture files stored on disk.
Tool: curl -- Application-Layer Testing
curl tests HTTP/HTTPS connectivity and is essential for verifying web services.
# Basic request
curl http://example.com
# Show response headers
curl -I http://example.com
# Verbose output (shows the full connection process)
curl -v https://example.com
# Follow redirects
curl -L http://example.com
# Test a specific port
curl http://10.0.0.5:8080
# Set a timeout
curl --connect-timeout 5 --max-time 10 http://example.com
# Test with a specific Host header (useful for virtual hosts)
curl -H "Host: mysite.com" http://10.0.0.5
# Test HTTPS, ignoring certificate errors
curl -k https://self-signed-server.local
Hands-On: Systematic Network Troubleshooting
Here is a real-world troubleshooting workflow. Imagine you cannot reach a web server at
web.example.com.
Step 1: Is my interface up and do I have an IP?
ip link show
ip addr show
Look for state UP and a valid inet address. If the interface is down or has no IP,
fix that first (see Chapter 33).
Step 2: Can I reach my gateway?
ip route show default
ping -c 3 192.168.1.1 # (your gateway)
If this fails, the problem is local: check cables, switch, VLAN, ARP table.
ip neigh show # Check ARP cache
Step 3: Can I reach an external IP?
ping -c 3 1.1.1.1
If this fails but the gateway ping works, the problem is upstream: routing, ISP, or the gateway is not forwarding traffic.
Step 4: Does DNS work?
dig web.example.com
dig @8.8.8.8 web.example.com
If DNS fails with your configured server but works with 8.8.8.8, update your DNS configuration.
Step 5: Can I reach the target service?
curl -v http://web.example.com
curl -v --connect-timeout 5 http://web.example.com:80
If DNS resolves and you can ping the server but curl times out, there may be a firewall blocking port 80.
Step 6: Is the port actually open on the remote end?
# Test if a port is open
ss -tln | grep :80 # On the server itself
sudo nmap -p 80 web.example.com # From your machine (if nmap is available)
Step 7: Capture packets to see what is happening
# On the server, watch for incoming connections on port 80
sudo tcpdump -i eth0 tcp port 80 -n
# From your machine, try to connect
curl http://web.example.com
If you see SYN packets arriving but no SYN-ACK response, the server's firewall is dropping them. If you see no packets at all, they are being dropped somewhere between you and the server.
Common Problems and Solutions
Problem: "Network is unreachable"
$ ping 10.0.0.5
connect: Network is unreachable
Cause: No route to the destination network.
Fix: Add a route or default gateway:
sudo ip route add default via 192.168.1.1
Problem: "No route to host"
$ ping 192.168.1.50
From 192.168.1.100 icmp_seq=1 Destination Host Unreachable
Cause: The destination is on the local network but not responding to ARP.
Fix: Check that the destination host is powered on, on the same VLAN, and has the correct IP configured. Check ARP:
ip neigh show
Problem: "Connection refused"
$ curl http://10.0.0.5:80
curl: (7) Failed to connect to 10.0.0.5 port 80: Connection refused
Cause: The port is not open. The host is reachable, but nothing is listening.
Fix: Start the service on the target host:
sudo systemctl start nginx
ss -tlnp | grep :80
Problem: "Connection timed out"
$ curl --connect-timeout 5 http://10.0.0.5:80
curl: (28) Connection timed out
Cause: A firewall is silently dropping packets (no response, not even a reject).
Fix: Check firewall rules on the target host and any intermediate firewalls:
sudo iptables -L -n -v | grep 80
Problem: "Name resolution failed"
$ ping google.com
ping: google.com: Temporary failure in name resolution
Cause: DNS is not configured or not reachable.
Fix:
cat /etc/resolv.conf
# If empty or wrong, temporarily fix:
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
Debug This
Scenario: A developer says their application on Server A (192.168.1.100) cannot connect to the API on Server B (10.0.0.50, port 8080). You investigate:
# From Server A:
$ ping 10.0.0.50
PING 10.0.0.50 (10.0.0.50) 56(84) bytes of data.
64 bytes from 10.0.0.50: icmp_seq=1 ttl=63 time=1.23 ms
$ curl --connect-timeout 5 http://10.0.0.50:8080
curl: (28) Connection timed out after 5001 milliseconds
$ sudo tcpdump -i eth0 host 10.0.0.50 and port 8080 -c 5
14:23:01 IP 192.168.1.100.54321 > 10.0.0.50.8080: Flags [S], seq 12345
14:23:02 IP 192.168.1.100.54321 > 10.0.0.50.8080: Flags [S], seq 12345
14:23:04 IP 192.168.1.100.54321 > 10.0.0.50.8080: Flags [S], seq 12345
What does this tell you, and what do you do next?
Diagnosis: Ping works (ICMP is allowed), but TCP to port 8080 times out. The tcpdump shows SYN packets being sent but no SYN-ACK coming back. This means either:
- A firewall on Server B is dropping TCP packets to port 8080
- The application on Server B is not listening on port 8080
- A network firewall between the two is blocking port 8080
Next steps: SSH into Server B and check:
# Is anything listening on port 8080?
ss -tlnp | grep 8080
# What does the firewall look like?
sudo iptables -L -n -v | grep 8080
What Just Happened?
+-------------------------------------------------------------------+
| Chapter 35 Recap |
+-------------------------------------------------------------------+
| |
| * The routing table determines where packets are sent. |
| Longest prefix match wins. |
| |
| * ip route manages routes. Static routes need persistence |
| via nmcli, netplan, or config files. |
| |
| * IP forwarding turns Linux into a router. |
| Enable via sysctl net.ipv4.ip_forward=1. |
| |
| * Troubleshooting order: interface -> IP -> gateway -> |
| internet -> DNS -> service/port. |
| |
| * Key tools: |
| - ping: basic connectivity |
| - traceroute: path to destination |
| - dig: DNS queries |
| - ss: listening ports and connections |
| - tcpdump: actual packet capture |
| - curl: HTTP-level testing |
| |
| * Error messages tell you exactly what layer is broken: |
| "Network unreachable" = no route |
| "Connection refused" = nothing listening |
| "Connection timed out" = firewall dropping packets |
| |
+-------------------------------------------------------------------+
Try This
-
Route tracing: Run
traceroute(ortracepath) to five different websites. Compare the paths. Can you identify which hops belong to your ISP? -
tcpdump practice: Start a tcpdump capture on port 80, then open a web page in your browser. Identify the TCP handshake, the HTTP request, and the response.
-
Break and fix: On a test VM, remove the default gateway (
sudo ip route del default). Observe what breaks. Then add it back and verify. -
DNS investigation: Use
dig +trace google.comto see the full DNS resolution chain from root servers to authoritative servers. How many DNS servers are involved? -
Bonus challenge: Set up two VMs on different subnets. Configure a third VM as a router between them (with IP forwarding and proper routes). Verify that the two VMs can ping each other through the router.