How do you find the bottleneck of a network?

wop@infosec.pub · edit-2 1 year ago

How do you find the bottleneck of a network?

Kazaii@sh.itjust.works · 1 year ago

Pretty good suggestions here. Can’t remember the last time I saw such quality replies on r/networking .

InEnduringGrowStrong@sh.itjust.works@sh.itjust.works · 1 year ago

Care to tell me those numbers?

RTT
TCP Window Size (RWND)
TCP MSS (that or MTU, inside the tunnel)

Honestly, that sounds like TCP bandwidth-delay product.
50 ms with a 65k byte RWND is just around 10Mbps.
See for yourself with your numbers:
https://wintelguy.com/wanperf.pl

InEnduringGrowStrong@sh.itjust.works@sh.itjust.works · 1 year ago

Just saw this part:

the whole location in UK […]

Some VPN solutions downgrade the MSS of all VPNs to the lowest common denominator for things like MTU/MSS. I guess that can make sense in a full-mesh, but whatever.
Take a packet capture of another client while the problem one connects, you’ll likely see something.
Decrypted traffic is usually easier to analyze.

Ohhh and you say that’s when they connect through SSH? Check that he’s not tcp forwarding all traffic through his SSH connection somehow.

wop@infosec.pub · 1 year ago

Getting a pcap of another client could bring some insight, yeah.

SSH is used for the data transfer. Without knowing it at this moment, I’d assume scp or rsync. You mean whether all their internet traffic is routed through the active SSH session?

InEnduringGrowStrong@sh.itjust.works@sh.itjust.works · 1 year ago

I mean that in an SSH connection you can configure it to bind local/remote ports of local/remote IPs.
The user might have unknowingly or maliciously configured their stuff to either:

forward all their traffic through the ssh session, adding more bandwidth than you’re expecting
remote port forward something important that’s somehow used by all your users to his machine. This is a bit unlikely, but then your symptoms are a bit weird.

Unlikely, because they couldn’t bind a port that is already in use on the server. Still, that could technically happen if there’s a misconfigured load balancer, maybe from an old config that was never removed, that has that server as a member and just declares it down/up when that user starts listening on that port.

That last one is far-fetched.
I’d start with cpu/mem, mtu/mss, etc.

I tend to have a bit of a bias towards absolutely far-fetched things because I’m basically the last line of support where I work. This means ~~all~~ most of the “normal” problems get filtered out before they get to me, which leaves me with the stuff that’s bananas.

taladar@sh.itjust.works · 1 year ago

Are you sure that the download speed is 10Mbit/s and not 10Mbyte/s which would be close to saturating the 100Mbit/s link and would explain the other symptoms you are seeing?

wop@infosec.pub · 1 year ago

Valid question. We’ve checked it multiple times, on the client and via monitoring that it is 10 Mbits. Thank you.

taladar@sh.itjust.works · 1 year ago

Have you checked for resent packets or connection resets or similar things that might use up more bandwidth than the successfully received packets? I would probably use Wireguard for that.

wop@infosec.pub · 1 year ago

Not yet. Wouldn’t expect it tbh, but you’ll never know. How would you utilize Wirehuard for it? I’d like to hear more about it.

taladar@sh.itjust.works · 1 year ago

Oops, I meant Wireshark of course. Basically capture the packets and then check for any with errors.

wop@infosec.pub · 1 year ago

Gotcha! - I thought Wireguard might has some logging features that could provide some insights. Thank you.