I have a squid box put between the main router and the rest of the network.
The squid box is on CentOS 5.0 with kernel 2.6.18 that has (properly) patched against Balabit's tproxy patch which goes fine and without any error.
I set 6 coss directories of 60GB each.
The box runs fine without any problem for about a month and then slow down about 10-20 times than normal (clients opening the pages experience incredible lag).
When i turn off the redirection of packets from port 80 to squid local port everything works normal. The moment i change it back (redirect traffic to squid) the problem starts again.
If i shut down the squid and start it again, the problem dissappears for another month (approximately).
The connection and traffic to squid box works fine (like SSH, for example), and the connection to squid port, either redirected 80 or direct squid port have a terrible delays.
The connection and traffic from the squid box works fine (like wget, for example).
DNS works fine (otherwise it would run slowly even if i turned off the squid, and i have multiple dns servers and all of them runs fine).
I tried to follow the packet with tcpdump on bridge interface (br0) and here's what i got:
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br0, link-type EN10MB (Ethernet), capture size 96 bytes
22:29:32.173726 IP r00t.gromnet.net.kermit > 10.0.0.3.65432: S 887633810:887633810(0) win 64240 <mss 1460,nop,wscale 0,nop,nop,sackOK>
22:29:35.137551 IP r00t.gromnet.net.kermit > 10.0.0.3.65432: S 887633810:887633810(0) win 64240 <mss 1460,nop,wscale 0,nop,nop,sackOK>
22:29:41.078270 IP r00t.gromnet.net.kermit > 10.0.0.3.65432: S 887633810:887633810(0) win 64240 <mss 1460,nop,wscale 0,nop,nop,sackOK>
22:29:41.079374 IP r00t.gromnet.net.kermit > 10.0.0.3.65432: . ack 441019686 win 64240
22:29:41.080781 IP r00t.gromnet.net.kermit > 10.0.0.3.65432: P 0:839(839) ack 1 win 64240
22:29:41.080891 IP r00t.gromnet.net.msims > 10.0.0.3.ssh: . ack 6745 win 63796
22:29:41.081189 IP r00t.gromnet.net.50425 > hu-in-f104.google.com.http: S 442399918:442399918(0) win 5840 <mss 1460,sackOK,timestamp 312971813 0,nop,wscale 7>
As you can see, i open the google page after 9 seconds...
Part of the strace:
read(7, "!", 256) = 1
epoll_wait(3, {{EPOLLOUT, {u32=153, u64=13804498572697141401}}}, 256, 10) = 1
write(153, "HTTP/1.0 200 OK\r\nLast-Modified: "..., 364) = 364
gettimeofday({1216848052, 61457}, NULL) = 0
epoll_wait(3, {{EPOLLOUT, {u32=245, u64=42558092201689333}}, {EPOLLOUT, {u32=361, u64=103079215465}}, {EPOLLIN, {u32=140, u64=578393994392764556}}, {EPOLLIN, {u32=21, u64=594587369718415381}}, {EPOLLOUT, {u32=28, u64=42557662704959516}}, {EPOLLOUT, {u32=134, u64=274877907078}}, {EPOLLOUT, {u32=153, u64=13804498572697141401}}}, 256, 0) = 7
write(245, ":\236p\323\303 \246\26j\250\263\30E\6^y\356tk+\202\2330\204\254!\17\276\254z=/"..., 4096) = 4096
write(361, "\332\263\252.\347\371 \315\200V\2055\375\267B\353\253\22J\362\1\222=49\363\331T\335\304f\251"..., 4096) = 4096
read(140, "HTTP/1.1 200 OK\r\nConnection: clo"..., 65535) = 193
epoll_ctl(3, EPOLL_CTL_MOD, 135, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=135, u64=13804500428123013255}}) = 0
read(140, "", 4) = 0
epoll_ctl(3, EPOLL_CTL_DEL, 140, {0, {u32=140, u64=586446379467931788}}) = 0
close(140) = 0
recvfrom(21, "0)\2\1\0\4\7gromnet\240\33\2\3\'l\272\2\1\0\2\1\0000\0160\f\6"..., 4096, 0, {sa_family=AF_INET, sin_port=htons(1567), sin_addr=inet_addr("10.0.0.225")}, [16]) = 43
sendto(21, "0\202\0/\2\1\0\4\7gromnet\242\202\0\37\2\3\'l\272\2\1\2\2\1\0010"..., 51, 0, {sa_family=AF_INET, sin_port=htons(1567), sin_addr=inet_addr("10.0.0.225")}, 16) = 51
write(28, "\236\1`\320>\17\332\301\207\354O{P\21\'\274\2\304\276W\17\360\352x\5\274K\247\261\247\247\273"..., 4096) = 4096
write(134, "\334*\"\252\240\337\331!\203\25\306\244)\247\327\264\312\353\37\26\'\255\2047\273kd\360\213\242\241\217"..., 4096) = 4096
write(153, "\377\330\377\340\0\20JFIF\0\1\1\1\0H\0H\0\0\377\333\0C\0\1\1\1\1\1\1\1"..., 4096) = 4096
gettimeofday({1216848052, 63349}, NULL) = 0
epoll_wait(3, {{EPOLLOUT, {u32=135, u64=13804500428123013255}}, {EPOLLOUT, {u32=153, u64=13804498572697141401}}}, 256, 0) = 2
write(135, "HTTP/1.0 200 OK\r\nDate: Wed, 23 J"..., 312) = 312
read(135, 0xbf9272ad, 65535) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(3, EPOLL_CTL_DEL, 135, {0, {u32=135, u64=586446379467931783}}) = 0
close(135) = 0
write(153, "\272\257\22\242\241/a\355%\3025\274/&x\322i\335>\205as\307*\372\226\242\341\277f\211\201"..., 4096) = 4096
gettimeofday({1216848052, 63848}, NULL) = 0
epoll_wait(3, {}, 256, 0) = 0
gettimeofday({1216848052, 63957}, NULL) = 0
epoll_wait(3, {}, 256, 0) = 0
gettimeofday({1216848052, 64391}, NULL) = 0
epoll_wait(3, {}, 256, 0) = 0
gettimeofday({1216848052, 64527}, NULL) = 0
epoll_wait(3, {}, 256, 0) = 0
gettimeofday({1216848052, 64648}, NULL) = 0
epoll_wait(3, {}, 256, 0) = 0
gettimeofday({1216848052, 64799}, NULL) = 0
epoll_wait(3, {{EPOLLIN, {u32=53, u64=578393994392764469}}}, 256, 0) = 1
I see nothing wrong here either....
Now, the cache manager:
Memory
Cumulative allocated volume: 5.63 TB
Current overhead: 11979 bytes (0.001%)
Idle pool limit: 0.00 MB
memPoolAlloc calls: -815471534
memPoolFree calls: -848877678
General info:
Connection information for squid:
Number of clients accessing cache: 0
Number of HTTP requests received: 55350809
Number of ICP messages received: 0
Number of ICP messages sent: 0
Number of queued ICP replies: 0
Request failure ratio: 0.00
Average HTTP requests per minute since start: 2581.9
Average ICP messages per minute since start: 0.0
Select loop called: 888982585 times, 1.447 ms avg
Cache information for squid:
Request Hit Ratios: 5min: 13.8%, 60min: 13.4%
Byte Hit Ratios: 5min: 4.9%, 60min: 6.7%
Request Memory Hit Ratios: 5min: 18.8%, 60min: 13.5%
Request Disk Hit Ratios: 5min: 16.5%, 60min: 17.8%
Storage Swap size: 211953448 KB
Storage Mem size: 1048696 KB
Mean Object Size: 20.59 KB
Requests given to unlinkd: 0
Median Service Times (seconds) 5 min 60 min:
HTTP Requests (All): 0.24524 0.23230
Cache Misses: 0.25890 0.24524
Cache Hits: 0.00865 0.01309
Near Hits: 0.30459 0.28853
Not-Modified Replies: 0.00000 0.00091
DNS Lookups: 0.00779 0.00779
ICP Queries: 0.00000 0.00000
Resource usage for squid:
UP Time: 1286289.832 seconds
CPU Time: 89455.367 seconds
CPU Usage: 6.95%
CPU Usage, 5 minute avg: 24.67%
CPU Usage, 60 minute avg: 5.75%
Process Data Segment Size via sbrk(): 985764 KB
Maximum Resident Size: 0 KB
Page faults with physical i/o: 0
Memory usage for squid via mallinfo():
Total space in arena: -1954140 KB
Ordinary blocks: -1968363 KB 10309 blks
Small blocks: 0 KB 0 blks
Holding blocks: 77676 KB 33 blks
Free Small blocks: 0 KB
Free Ordinary blocks: 14222 KB
Total in use: -1890687 KB 101%
Total free: 14222 KB 0%
Total size: -1876464 KB
Memory accounted for:
Total accounted: 1978214 KB
memPoolAlloc calls: 3479792962
memPoolFree calls: 3446385714
File descriptor usage for squid:
Maximum number of file descriptors: 131072
Largest file desc currently in use: 1317
Number of file desc currently in use: 1301
Files queued for open: 0
Available number of file descriptors: 129771
Reserved number of file descriptors: 100
Store Disk files open: 0
IO loop method: epoll
Internal Data Structures:
10295103 StoreEntries
68732 StoreEntries with MemObjects
68090 Hot Object Cache Items
10292002 on-disk objects
Everything seems normal but the connection to squid box is beign delayed and i don't know why?
Anyone have a idea what more to check?
Received on Fri Aug 01 2008 - 14:33:54 MDT
This archive was generated by hypermail 2.2.0 : Fri Aug 01 2008 - 12:00:05 MDT