One of the things that I’ve discovered over years of troubleshooting vast and various connection problems is that it’s almost impossible to jump to the right conclusion about a problem. There are always things that surprise you, always aspects that are unexpected and always presumptions that you make about the user.
I’ve developed a set of steps based on conditions I’ve come across, hopefully they help.
This writing isn’t designed to be a great volume into how networking works, just a quick primer to the software developer or system administrator who can’t work out why their stuff doesn’t work.
Firstly check the endpoints
Can they reach out in general, can both ends touch something else (the Internet, another machine, etc)? If so then you at least have basic connectivity. Now check your service, can you telnet to it from the same server by doing telnet localhost 80? I’ve seen many times where the service either wasn’t started or a user wanted connectivity to a service that hadn’t even been built.
Where are you listening? Make sure you’re actually listening from a public IP and not just 127.0.0.1.
Test something is listening with netstat -lt or a similar tool. If you need to test connectivity without a server being built then try using netcat or your favourite replacement.
Open access test
If you have an open access VPN, a central router you can go through, a server deep inside your internal network, or an open port on the switch connected to your endpoint then it’s worth testing from there too. This eliminates host based firewalls, port conflicts, and other silent killers.
Don’t believe ping
Quite often you’ll see connectivity tests based entirely on ping. Quite often users will claim connectivity problems based on someone having taught them to ping a place on the Internet to confirm a connection problem. Networks often filter pings, take a look at my previous post.
If your network does pass ICMP traffic then run a trace route to confirm that there’s a route to the endpoint. Looking at the hops on the trace route you should be able to list the firewalls and routers which may have rules or ACL’s preventing the traffic passing.
Test the connection with telnet. If it connects and gives you an escape character then you’re all good.
Check the DNS
If you’re hitting a name instead of an IP, check it’s going to the right place. Sometimes you’ll need a fully qualified domain to resolve correctly. You may find your machine resolves server to server.example.com instead of server.example.net.
If you haven’t figured it out by this point you may want to start looking at firewall logs and routing tables. If you’re completely lost try working out what does work and finding out why.