Re: [squid-users] Re: squid 3.2.0.14 with TPROXY => commBind: Cannot bind socket FD 773 to xxx.xxx.xxx.xx: (98) Address

From: Eliezer Croitoru <eliezer_at_ngtech.co.il>
Date: Sat, 14 Sep 2013 21:36:18 +0300

Hey,

it can be tested in a matter of minutes.
If we have some test candidate I will write a small tproxy script to
verify the suspect.

Eliezer

On 09/14/2013 07:39 PM, Nikolai Gorchilov wrote:
> Hi, Eliezer,
>
> On Tue, Sep 10, 2013 at 1:49 AM, Eliezer Croitoru <eliezer_at_ngtech.co.il> wrote:
>> Hey Nickolai,
>>
>> I would try to make sense of what you have seen.
>> The tproxy is a very complex feature which by the kernel cannot bind
>> double src(ip:port) + dst(ip:port)..
>> like let say for example the 10.100.1.100 client tries to connect
>> 2.3.4.5 at port 80.
>> the client tries once for:
>> 10.100.1.100:5455 to 2.3.4.5:80
>> then let say the client doesn't have the right route and there is a
>> network problem then the client tries again from:
>> 10.100.1.100:5456 to 2.3.4.5:80
>> the above client have an issue with the network and the proxy knows that..
>> the proxy is transparent and needs to re-intercept the same request
>> twice.. and when the first connection was timedout from the kernel level
>> then application can drop the connection and do not continue parsing the
>> request.
>
> The problem I'm facing is not related to user to proxy connection at
> all. With proper network setup this works flawlessly.
>
> It's the proxy to server connection when squid tries to bind to an IP,
> without specifying a port, thus leaving the kernel to choose one.
>
>> the kernel can bind the ip:port of the src to the dst if it knows that
>> all 80 port traffic is using only the traffic as a route.
>> in a case this is not the case the client will have troubles and hence a
>> binding of ip:port to ip:port from the network layer will be a disaster
>> for couple layers..
>
> Yeah! ip:port pairs have to be unique :-)
>
>> SO the kernel manages what the bind will be like..
>> I dont see how a tproxy enabled system for more then 10,000 cilents can
>> reach a critical level of commbind unless the cpu and all the lower
>> levels of the kernel will not be able to handle this level of traffic.
>
> It's not about number of users, but number of simultaneous live
> connections from the cache server. Have in mind "idle" http
> connections are "live" tcp streams.
>
>> if it's the range thing from the kernel it can be reproduced in a matter
>> of seconds by lowering it..
>
> Exactly. Try something like echo 32768 32867 >
> /proc/sys/net/ipv4/ip_local_port_range and you'll start getting
> EADDRINUSE on the 101st parallel outbound connection of squid.
>
>> This limit is not a rule for the application but it limits the kernel to
>> what local-ip:port bind when the source machine is the local machine.
>> this doesn't force the kernel to handle lower amount of connections but
>> allows the kernel to do less lookup when trying to find a free ip:port
>> socket to bind to the new connection.
>>
>> it seems to me like you are using connection tracking on a tproxy system
>> that doesn't need to do connection tracking at all in this kind of scale..
>> There is no reason for a tproxy system to keep track on connections of
>> the client for more then 5-10 minutes tops..
>>
>> try to look more into the connection tracking rather then the basic
>> kernel lands..
>
> Nope. The problem has nothing to do with TPROXY, nor connection
> tracking. It's in the port auto-selection algorithm of the kernel that
> limits the number of live auto-selected ports to
> ip_local_port_range.max - ip_local_port_range.min.
>
> Here's some pseudocode to reproduce it, even with local addresses
> assigned to the host:
>
> ===[cut]===
> $broken = true; // ask the kernel to select port
> $port_min = ip_local_port_range.min;
> $port_max = ip_local_port_range.max;
> $ips_to_test_with = {'aaa.aaa.aaa.aaa', 'bbb.bbb.bbb.bbb');
>
> function socket_setup($ip, $port) {
> $socket = new socket(AF_INET, SOCK_STREAM, SOL_TCP);
> $socket.set_option(SOL_SOCKET, SO_REUSEADDR, 1);
> $socket.set_option(SOL_IP, IP_TRANSPARENT, 1); // needed only if
> $ips_to_test_with are not assigned to the host
> $socket.bind($ip, $port);
> $socket.listen(); // listen is easier and faster for testing, we
> have to just block this socket in the kernel somehow. in the real life
> it will be a $socket.connect.
> return $socket;
> }
>
> for ($port = $socket_min; $port <= $socket_max; $port++) {
> foreach ($ips_to_test_with as $ip) {
> if ($broken) {
> // will produce exception when $port = floor(($socket_max
> - $socket_max) / count($ips_to_test_with)) +1
> socket_setup($ip, 0);
> } else {
> // will assign all the ports
> socket_setup($ip, $port);
> }
> }
> }
>
> ===[cut]===
>
> That's it. Do echo 32768 32867 >
> /proc/sys/net/ipv4/ip_local_port_range in try it. Once with $broken =
> true, and then again with $broken = false.
>
> When $broken = true on the 51st port assignment on IP address
> aaa.aaa.aaa.aaa you'll get EADDRINUSE.
> When $broken = false you'll get both aaa.aaa.aaa.aaa and
> bbb.bbb.bbb.bbb listening to 100 ports each and no error.
>
> Hope this time it's more clear.
>
> Best,
> Niki
>
Received on Sat Sep 14 2013 - 18:36:35 MDT

This archive was generated by hypermail 2.2.0 : Sun Sep 15 2013 - 12:00:04 MDT