Some interestion observation over the past weeks, is that i have seen Linux is selecting only even source ports. So lets have a closer look.
- An example
- Does this have any effect?
- How does LACP work?
- Does this make a difference? Yes certainly it does make a diffrence.
- What can we do?
- Where does this come from?
- Conclusion
- Calculation in Python
- Cisco
- Versions
An example
First lets see this for our self. We can use iperf3 to fireup some sessions.
> iperf3 -c 10.255.93.77 -P 10
Connecting to host freebsd.rd.pp52.de, port 5201
[ 5] local 10.10.4.253 port 47806 connected to 10.255.93.77 port 5201
[ 7] local 10.10.4.253 port 47816 connected to 10.255.93.77 port 5201
[ 9] local 10.10.4.253 port 47824 connected to 10.255.93.77 port 5201
[ 11] local 10.10.4.253 port 47836 connected to 10.255.93.77 port 5201
[ 13] local 10.10.4.253 port 47852 connected to 10.255.93.77 port 5201
[ 15] local 10.10.4.253 port 47862 connected to 10.255.93.77 port 5201
[ 17] local 10.10.4.253 port 47864 connected to 10.255.93.77 port 5201
[ 19] local 10.10.4.253 port 47874 connected to 10.255.93.77 port 5201
[ 21] local 10.10.4.253 port 47882 connected to 10.255.93.77 port 5201
[ 23] local 10.10.4.253 port 47886 connected to 10.255.93.77 port 5201
In contrary to FreeBSD based hosts, where you see even and uneven ports.
$ iperf3 -c 10.255.93.77 -P 10
Connecting to host 10.10.93.77, port 5201
[ 5] local 10.10.4.251 port 54819 connected to 10.255.93.77 port 5201
[ 7] local 10.10.4.251 port 24383 connected to 10.255.93.77 port 5201
[ 9] local 10.10.4.251 port 48869 connected to 10.255.93.77 port 5201
[ 11] local 10.10.4.251 port 40010 connected to 10.255.93.77 port 5201
[ 13] local 10.10.4.251 port 55178 connected to 10.255.93.77 port 5201
[ 15] local 10.10.4.251 port 37904 connected to 10.255.93.77 port 5201
[ 17] local 10.10.4.251 port 39569 connected to 10.255.93.77 port 5201
[ 19] local 10.10.4.251 port 45923 connected to 10.255.93.77 port 5201
[ 21] local 10.10.4.251 port 35810 connected to 10.255.93.77 port 5201
[ 23] local 10.10.4.251 port 22573 connected to 10.255.93.77 port 5201
Does this have any effect?
Well sort of i would say, so lets have a look at LAGs first. Most networks i know do utilize some sort of Link Aggregation (LAG), like LACP.
How does LACP work?
LACP do calulate a hash out of a comination of MAC, IP addresses and ports. Maybe this is described as xor-l3, xor-l3l4.
This means the hash is build from a combination out of this values with XOR. The result will be used to get the remainder (modulo) based on the amount of links.
So this will give you the link to use.
So why all even source ports can be an issue?
In some cases you will only utilize only halve of the links in an LAG.
This would mean in the example above with an 4 port LAG.
Linux
( 10.10.4.253 ^ 10.255.93.77 ^ 47806 ^ 5201 ) % 4 = 3
( 10.10.4.253 ^ 10.255.93.77 ^ 47816 ^ 5201 ) % 4 = 1
( 10.10.4.253 ^ 10.255.93.77 ^ 47824 ^ 5201 ) % 4 = 1
( 10.10.4.253 ^ 10.255.93.77 ^ 47836 ^ 5201 ) % 4 = 1
( 10.10.4.253 ^ 10.255.93.77 ^ 47852 ^ 5201 ) % 4 = 1
( 10.10.4.253 ^ 10.255.93.77 ^ 47862 ^ 5201 ) % 4 = 3
( 10.10.4.253 ^ 10.255.93.77 ^ 47864 ^ 5201 ) % 4 = 1
( 10.10.4.253 ^ 10.255.93.77 ^ 47874 ^ 5201 ) % 4 = 3
( 10.10.4.253 ^ 10.255.93.77 ^ 47882 ^ 5201 ) % 4 = 3
( 10.10.4.253 ^ 10.255.93.77 ^ 47886 ^ 5201 ) % 4 = 3
As we can see linux would only use links 1 and 3 of the lag group, ports 0 and 2 will not be used.
With FreeBSD on the ohter hand this would look like:
( 10.10.4.251 ^ 10.255.93.77 ^ 54819 ^ 5201 ) % 4 = 0
( 10.10.4.251 ^ 10.255.93.77 ^ 24383 ^ 5201 ) % 4 = 0
( 10.10.4.251 ^ 10.255.93.77 ^ 48869 ^ 5201 ) % 4 = 2
( 10.10.4.251 ^ 10.255.93.77 ^ 40010 ^ 5201 ) % 4 = 1
( 10.10.4.251 ^ 10.255.93.77 ^ 55178 ^ 5201 ) % 4 = 1
( 10.10.4.251 ^ 10.255.93.77 ^ 37904 ^ 5201 ) % 4 = 3
( 10.10.4.251 ^ 10.255.93.77 ^ 39569 ^ 5201 ) % 4 = 2
( 10.10.4.251 ^ 10.255.93.77 ^ 45923 ^ 5201 ) % 4 = 0
( 10.10.4.251 ^ 10.255.93.77 ^ 35810 ^ 5201 ) % 4 = 1
( 10.10.4.251 ^ 10.255.93.77 ^ 22573 ^ 5201 ) % 4 = 2
As you can see all 4 links will be used
Does this make a difference? Yes certainly it does make a diffrence.
Is it realy important, this depends on your particular network setup. Haveing “few” hosts in the network or a sub optimal layout in ip addressing this can impact the overall performance.
What can we do?
Since we do not contoll the service (destination) ports in a data center we can provicde provide more destination addresses, like AWS S3 does.
Whan you are talking with Public AWS S3 Buckets, you will get a few out of many addresses belonging to the S3 Infrastructure.
Where does this come from?
The not compelling answer on stackexchange and the kernel commit.
It’s to reduce contention between connect() and bind() (appeared in Linux 4.2; Jessie has 3.16 and Stretch has 4.9):
commit 07f4c90062f8fc7c8c26f8f95324cbe8fa3145a5
Author: Eric Dumazet
Date: Sun May 24 14:49:35 2015 -0700
tcp/dccp: try to not exhaust ip_local_port_range in connect()
A long standing problem on busy servers is the tiny available TCP port
range (/proc/sys/net/ipv4/ip_local_port_range) and the default
sequential allocation of source ports in connect() system call.
If a host is having a lot of active TCP sessions, chances are
very high that all ports are in use by at least one flow,
and subsequent bind(0) attempts fail, or have to scan a big portion of
space to find a slot.
In this patch, I changed the starting point in __inet_hash_connect()
so that we try to favor even [1] ports, leaving odd ports for bind()
users.
We still perform a sequential search, so there is no guarantee, but
if connect() targets are very different, end result is we leave
more ports available to bind(), and we spread them all over the range,
lowering time for both connect() and bind() to find a slot.
This strategy only works well if /proc/sys/net/ipv4/ip_local_port_range
is even, ie if start/end values have different parity.
Therefore, default /proc/sys/net/ipv4/ip_local_port_range was changed to
32768 - 60999 (instead of 32768 - 61000)
There is no change on security aspects here, only some poor hashing
schemes could be eventually impacted by this change.
[1] : The odd/even property depends on ip_local_port_range values parity
You may also want to see the followup commit 1580ab63fc9a03593072cc5656167a75c4f1d173.
https://unix.stackexchange.com/questions/379533/debian-stretch-source-tcp-port-is-always-even
In the first place the patch looks like a compelling solution for this problem.
As explained with the LAG groups this can lead to issues later in the infrastructure.
It might be that also other forms of load balancing techniques are affected.
Conclusion
As the author allready stated the former port selection can lead to issues with source port selection under load. But it falls short to take into consideration that there is a need for even and uneven source ports.
Idealy the port selection algorythm needs to take a few things into consideration, like newly opened listerns, even and uneven source ports, ports from closed session.
Some of those conditions only holds true for sessions between the source and one destination and one service.
Will there be an easy solution to solve this?
As allways there are a few constrains contradicting another, like speed to open a port, load balancing and security. Just to name a few.
Calculation in Python
We can do the calculation, somewhat our self.
#!/usr/bin/env python
import ipaddress
def mod(HASH,LINKS):
return HASH % LINKS
def lag_ports(SRCIPS,DESTIPS,SPORTS,DPORT,LINKS=2,COMMENT=None):
if COMMENT:
print(f"{COMMENT}")
for SIP in SRCIPS:
SINT = int(ipaddress.ip_address(SIP))
for DIP in DESTIPS:
DINT = int(ipaddress.ip_address(DIP))
for SPORT in SPORTS:
HASH_SPORT_DPORT=SINT ^ DINT ^ SPORT ^ DPORT
MOD = mod(HASH_SPORT_DPORT,LINKS)
print(f"( {SIP} ^ {DIP} ^ {SPORT} ^ {DPORT} ) % {LINKS} = {MOD}")
print(f"\n")
SRCIPS=['10.10.4.253']
SPORTS=[47806, 47816 , 47824 , 47836 , 47852 , 47862 , 47864 , 47874 , 47882 , 47886 ]
DESTIPS=['10.255.93.77']
DPORT=5201
lag_ports(SRCIPS,DESTIPS,SPORTS,DPORT,LINKS=4,COMMENT="Linux")
SRCIPS=['10.10.4.251']
SPORTS=[54819 , 24383 , 48869 , 40010 , 55178 , 37904 , 39569 , 45923 , 35810 , 22573 ]
DESTIPS=['10.255.93.77']
DPORT=5201
lag_ports(SRCIPS,DESTIPS,SPORTS,DPORT,LINKS=4,COMMENT="FreeBSD")
Cisco
cisco#show etherchannel load-balance
EtherChannel Load-Balancing Configuration:
src-mac
EtherChannel Load-Balancing Addresses Used Per-Protocol:
Non-IP: Source MAC address
IPv4: Source MAC address
IPv6: Source MAC address
Configuration
cisco#port-channel load-balance src-dst-ip
cisco(config)#end
Available modes depend on the platfrom in this case WS-C3560CX-8PT-S
cisco(config)#port-channel load-balance ?
dst-ip Dst IP Addr
dst-mac Dst Mac Addr
src-dst-ip Src XOR Dst IP Addr
src-dst-mac Src XOR Dst Mac Addr
src-ip Src IP Addr
src-mac Src Mac Addr
After the Change
cisco#show etherchannel load-balance
EtherChannel Load-Balancing Configuration:
src-dst-ip
EtherChannel Load-Balancing Addresses Used Per-Protocol:
Non-IP: Source XOR Destination MAC address
IPv4: Source XOR Destination IP address
IPv6: Source XOR Destination IP address
Versions
Versions user for Testing.
Linux
uname -a
Linux host 6.5.0-kali3-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.6-1kali1 (2023-10-09) x86_64 GNU/Linux
iperf3 -v
iperf 3.15 (cJSON 1.7.15)
FreeBSD
uname -a
FreeBSD host 14.0-RELEASE-p5 FreeBSD 14.0-RELEASE-p5 #0: Tue Feb 13 23:37:36 UTC 2024 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
iperf3 -v
iperf 3.16 (cJSON 1.7.15)

