Last.fm – the Blog · Squid Optimization Guide

Squid, is a caching web proxy, and is one of the great many back-end applications that we in the Systems department of Last.FM use to make your experience of the site that little bit smoother. We have Squid deployed as a reverse caching proxy.

This worked fine for a while, but over the past few days (as some of you have probably noticed and mentioned on the forums), it began to slow down. I set to work a couple of days ago debugging the speed decrease that had suddenly afflicted our squid cluster. Unfortunately, Squid is probably one of the least documented applications out there – the documentation that exists is vague, and doesn’t go into much detail; it would also seem that everyone else who’s set up squid either hasn’t dealt with the amount of traffic we do, or just hasn’t posted online about how they dealt with these issues when they cropped up.

In debugging squid, there is approximately a 24 hour period after a modification before you really see whether what you have changed has fixed the problem. This is compounded if you add file-system benchmarking into the mix – the cache must refill before you get a decent picture of what is happening.
Over the past few days, I’ve tried many squid configuration options, the majority of which are anecdotally documented, so it really felt like a stab in the dark a lot of times. Tonight, however, I struck gold, and I feel that I should share the wealth regarding optimizing squid for a high-throughput setup such as the one we employ here.

Playing the Optimization Game

Probably the most important thing to note when deploying squid, is that in 99% of cases, you will have many thousands – if not millions – of very small files; due to this, you need to choose a file-system that is able to deal very well with reading/writing many small files concurrently.

Enter ReiserFS

Having tried both XFS – very poor performance over time -, and ext3 – better performance, but still lags a lot under load -, I switched over to ReiserFS, and have found that this lives up very well to its reputation of being good with many-small-files and many-reads/writes-per/sec.

I highly recommend that your machine is set up with a separate pair of squid disks, or worst case, on a separate partition on the host OS drives, utilizing a decently fast RAID level (think RAID10 here, don’t bother going near RAID5, you’ll get major I/O lag on writes). I’d recommend going for FAST disks (stay away from IDE here, or you’ll be in a world of pain).

On Debian, you should have ReiserFS support already, on CentOS, you’ll need to enable the centosplus repo by setting enable=1 in /etc/yum.repos.d/CentOS-Base.repo (on, or around line 59), then yum install reiserfs-utils.

Then it’s a case of

mkfs.reiserfs /dev/sdXX

Where XX is the partition you are going to use for squid – in our case:

mkfs.reiserfs /dev/sdb1

Then add your partition to /etc/fstab:

/dev/sdb1 /var/spool/squid reiserfs defaults,notail,noatime 1 2

Note the notail,noatime – these are both important, and will give you a performance boost. For more details about ReiserFS mount options, see here

No! No! No! Compile from source!!

I’m not usually a great fan of compiling from source when it comes to multi-system implementations; they make life hard when it comes to system administration, and to be honest, I’m a big fan-boy of packages for ease-of-use, and lack of headaches they cause. I’ll be making a package for our particular squid setup tomorrow, but this optimization how-to wouldn’t benefit from a ‘now simply install a package’, would it? =)

We’re using Squid-2.6STABLE14 here – it’s the latest current release from the STABLE branch. I took a look at the Squid-3.0 release a while ago, and found a lot of bugs (after all, it is in beta), so I’m sticking with 2.6 for now. You can find a full list of versions available here, but I warn you that this how-to is probably only good for 2.6, so YMMV if you choose another version.

Grab the source and extract it. You’ll need the relevant development binaries installed – gcc, g++, etc.
The following CHOST and CFLAGS will vary based on your processor and platform. The ones you will need to change are -march= and of course, if you’re on a 32bit platform, use CHOST="i686-pc-linux-gnu".

I find the Gentoo-Wiki Safe CFLAGS page to be an excellent reference for quickly finding which -march= definition to use based off processor type.

In our case, we’re running 64bit Core2Duo chips, so compile with the following options

CHOST="x86_64-pc-linux-gnu" \
CFLAGS="-DNUMTHREADS=60 \
-march=nocona \
-O3 \
-pipe \
-fomit-frame-pointer \
-funroll-loops \
-ffast-math \
-fno-exceptions" \
./configure \
--prefix=/usr \
--enable-async-io \
--enable-icmp \
--enable-useragent-log \
--enable-snmp \
--enable-cache-digests \
--enable-follow-x-forwarded-for \
--enable-storeio="aufs" \
--enable-removal-policies="heap,lru" \
--with-maxfd=16384 \
--enable-poll \
--disable-ident-lookups \
--enable-truncate \
--exec-prefix=/usr \
--bindir=/usr/sbin \
--libexecdir=/usr/lib/squid

Note the -DNUMTHREADS=60; this is probably an under-estimate for our setup, as you can easily run with 30 on a 500mhz machine. This CFLAG controls the number of threads squid is able to run when using asynchronous I/O. I’ve been quite conservative with this value, as I don’t want Squid to block, or utilize too much CPU. The rest of the CFLAGS heavily optimize the outputted binaries.

I recommend building with the ./configure line as above, obviously, if you change it, YMMV!

Here’s a rundown of what those options do:

--enable-async-io: enables asynchronous I/O – this is really important, as it stops squid from blocking on disk reads/writes

--enable-icmp: optional, squid uses this to determine the closest cache-peer, and then utilizes the most responsive one based off the ping time. Disable this if you don’t have cache peers.

--enable-useragent-log: causes squid to print the useragent in log entries – useful when you’re using lynx to debug squid speed.

--enable-snmp: We graph all of our squid boxes utilizing cacti, you’ll want this enabled if you want to proxy SNMP requests to squid and graph the output.

--enable-cache-digests: required if you want to use cache peering

--enable-follow-x-forwarded-for: We have multi-level proxying happening as packets come through to squid, so to stop squid from seeing every request as from the load balancers, we enable this so squid reads the X-Forwarded-For header and picks up the real IP of the client that’s making the request.

--enable-storeio="aufs": YMMV if you utilizing an alternate storage i/o method. AUFS is Asynchronous, and has significant performance gains over UFS or diskd.

--enable-removal-policies="heap,lru": heap removal policies outperform the LRU policy, and we personally utilize “heap LFUDA”, if you want to use LRU, YMMV.

--with-maxfd=16384: File Descriptors can play hell with squid, I’ve set this high to stop squid from either being killed or blocking when it’s under load. The default squid maxfd is (i believe), 4096, and I’ve seen squid hit this numerous times.

--enable-poll: Enables poll() over select(), as this increases performance.

--disable-ident-lookups: Stops squid from performing an ident looking for every connection, this also removes a possible DoS vulnerability, whereby a malicious user could take down your squid server by opening thousands of connections.

--enable-truncate: Forces squid to use truncate() instead of unlink() when removing cache files. The squid docs claim that this can cause problems when used with async I/O, but so far I haven’t seen this be the case. A side effect of this is that squid will utilizing more inodes on disk.

Go! Go! Gadget Makefile

After your ./configure has finished running, and if there aren’t any errors, it’s time to make. This will take some time, depending on the spec of your machine, but once it’s finished (without errors), you’ll want to make install.

This bit is optional, but doesn’t hurt:

strip /usr/sbin/squid /usr/lib/squid/*

This will remove the symbols from the squid binaries, and give them a slightly smaller memory footprint.

/etc/squid.conf

Now, lets move on to getting the squid.conf options right…

I’m not going to go into every config option here, if you don’t understand one, I recommend you check out the Configuration Manual, which contains pretty much every option and a description of how to use it.

This would be my recommended squid.conf contents:

NOTE! I’ve stripped out superfluous (obvious) configuration options that are required, such as http_port IP:PORT type, as they are out-side the scope of this blog entry.


hosts_file /etc/hosts
dns_nameservers x.x.x.x x.x.x.x
cache_replacement_policy heap LFUDA
cache_swap_low 90
cache_swap_high 95
maximum_object_size_in_memory 50 KB
cache_dir aufs /var/spool/squid 40000 16 256
cache_mem 100 MB
logfile_rotate 10
memory_pools off
maximum_object_size 50 MB
quick_abort_min 0 KB
quick_abort_max 0 KB
log_icp_queries off
client_db off
buffered_logs on
half_closed_clients off

Okay, so what does all that do?

hosts_file /etc/hosts: Forces squid to look in /etc/hosts for any hosts file entries; don’t ask me why, but it isn’t good at figuring out that this is the default place on every Linux distribution.

dns_nameservers x.x.x.x x.x.x.x: Important! Squid will stall connections while attempting to do DNS lookups, somehow, specifying DNS name-servers within the squid.conf stops this from happening (and yes, they must be valid name-servers).

cache_replacement_policy heap LFUDA: You may not want to use the LFUDA replacement policy. If not, I recommend you stick with a variant on heap, as there are massive performance gains over LRU. Details of the other policies are here

cache_swap_low 90: Low water mark before squid starts purging stuff from its cache – this is in percent. If you have a 10gb cache storage limit, squid will begin to prune at 9gb used.

cache_swap_high 95: The high water mark. Squid will aggressively prune old cache files utilizing the replacement policy defined above. This would take place at 9.5gb in our above example. If you have a huge cache, it’s worth noting that your percentages would be better served closer together. i.e. a 100gb cage is 90gb/95gb – 5 gb difference. In this case, it would be better to have a 94%/95% watermark setup.

maximum_object_size_in_memory 50 KB: Unless you want to serve larger files super fast from memory, I recommend keeping this low – mainly to keep memory usage under control. Large files monopolizing vital RAM, while giving you a better byte hit-rate, will sacrifice your request hit-rate, as smaller files will keep getting swapped in and out.

cache_dir aufs /var/spool/squid X X X: I highly recommend NOT changing from AUFS. All the other storage methods in my benchmarking have been a lot slower performance wise. Obviously, replace the 3 X’s here with your storage limits.

cache_mem 100 MB: Keep this set low-ish. This represents the maximum amount of ram that squid will utilize to keep cached objects in memory. Remember, squid requires about 100mb of ram per GB of cache storage. If you have a 10gb cache, squid will use ~1gb just to handle that. Make sure that cache_mem + (storage size limit * 100mb ) is less than your available ram, or your squid will start to swap.

memory_pools off: This stops squid from holding onto ram that it is no longer actively using.

maximum_object_size 50 MB: Tweak this to suite the approximate maximum object size you’re going to serve from cache. I’d recommend not putting this up too high though. Better to cache many small files, than one very large file that only 4 people have downloaded.

quick_abort_min 0 KB: This feature is useful, in some cases, but not in an optimized squid case. What quick_abort does in laymans terms, is evaluates how much data is left to be transferred if a client cancels a transfer. If that amount is within the quick_abort range, squid will continue downloading the file and then swap it out to cache. Sounds good, right? Hell no. If a client makes multiple requests, you can end up with squid finishing off multiple fetches for the same file. This bogs everything down, and causes your squid to be slow. 0 KB disables this feature.

quick_abort_max 0 KB: See quick_abort_min

log_icp_queries off: If you’re using cache_peers, you probably don’t need to know every time squid goes and talks to one of its peer-caches. This is needless logging in most cases, and is just an extra I/O thread that could be used elsewhere.

client_db off: If enabled, squid will keep statistics on each client. This can become a memory hog after a while, so it’s best to keep it disabled.

buffered_logs on: Buffers the write-out to log files. This can increase performance slightly. YMMV.

half_closed_clients off: Sends a connection-close to clients that leave a half open connection to the squid server.

Tweak my /proc baby, yeah!

Okay, so Squid is optimized; what about the TCP stack? By default, a pristine installation is ‘optimized’ for any-use. By that, I mean it has a set of default kernel-level configuration settings that really don’t play ball well with network/disk intensive applications. We need to make a few modifications.

First thing, is to ‘modprobe ip_conntrack’, and add this module to either /etc/modules (debian) or /etc/modprobe.conf (RHEL/CentOS).
This will stop squid from spitting out the terribly useful message

parseHttpRequest: NF getsockopt(SO_ORIGINAL_DST) failed: (92) Protocol not available

With that done, lets make some sysctl modifications…

Add the following lines to the end of your /etc/sysctl.conf


fs.file-max = 65535
net.core.rmem_default = 262144
net.core.rmem_max = 262144
net.core.wmem_default = 262144
net.core.wmem_max = 262144
net.ipv4.tcp_rmem = 4096 87380 8388608
net.ipv4.tcp_wmem = 4096 65536 8388608
net.ipv4.tcp_mem = 4096 4096 4096
net.ipv4.tcp_low_latency = 1
net.core.netdev_max_backlog = 4000
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.tcp_max_syn_backlog = 16384

I’ll let you google for the meaning of those changes, they’re documented almost everywhere; I’m merely telling you which one’s are worth changing.
Note that with the file-max entry, you’ll also want to modify /etc/security/limits.conf and add:

* - nofile 65535

With that done, your best bet is to reboot, and let the box pick up the changes that way. I’ve had some funky issues with squid + file-descriptor changes on the fly.

When the box is back up, start up squid, and have fun. You’re optimized. =)

@deujigum:
Quite probably not without a lot of debugging.
Squid shouldn’t be CPU bound if compiled the way I suggest in the above howto. It’s much more likely that you are I/O bound. Check your cache.log for further enlightenment as to what the issue is. If it’s I/O bound, and you’re using async i/o, you will get messages telling you there are issues in there. They will look something like:

squidaio_queue_request: WARNING – Queue congestion

This isn’t an overtly /bad/ error message, it’s basically letting you know that you’re getting close to the mark on how much I/O your machine can handle.

Error messages like this, on the other hand:

squidaio_queue_request: Queue Length: current=586, high=615, low=161, duration=26

Are /bad/. If you have lots of them, that is /really bad/. It means, in laymans terms, that you’ve overloaded your squid machine. It’s receiving more requests than it can cope with, and a queue of requests is forming. Again, this is usually all down to the speed of your disks, and probably nothing to do with CPU. Squid’s CPU usage will reach stupid levels when it’s waiting on disk.

Really, the only course of action to resolve this, is to put more squid machines online.

That’s about the best I can do, I’m afraid. For more help, you’re going to need to speak to the squid folks on irc.

Comments

Aron
31 August, 04:58

Thanks for the great guide! It’s been a while since I’ve done anything with Squid now, but can tell that it’s a painful installation / maintainence progress.

This should be featured on the project homepage :)

Aron – 31 August, 04:58
Martin
31 August, 05:34

Interesting. Have you tried Varnish?

Martin – 31 August, 05:34
David
31 August, 12:59

Wow! Thanks, this guide is super useful.

David – 31 August, 12:59
Tony
31 August, 13:10

@Aron: np =) I personally found that there were no guides of this kind online that were actually useful, so figured this would help more people than just me in the long term.

@Martin: I took a brief look at Varnish a few months ago, but didn’t have the time to put into testing. I may look at it again in the future.

Tony – 31 August, 13:10
Max Howell
31 August, 17:53

Tony you are my hero :)

Max Howell – 31 August, 17:53
Fiona McLaren
31 August, 18:14

He’s not mine. I was lost about two sentences in.

Having said that, I am thrilled that this might be the only blog post to get fewer comments than the music team one.

Fiona McLaren – 31 August, 18:14
MK
31 August, 18:22

>He’s not mine. I was lost about two >sentences in.

>Having said that, I am thrilled that this >might be the only blog post to get fewer
>comments than the music team one.

LOL….!!!

MK – 31 August, 18:22
elias
31 August, 20:45

@ Fiona, I admit I didn’t understand much more than you did ;-)
But there was some squid in my lunch today!

elias – 31 August, 20:45
Mark Lee
31 August, 21:22

Um, -O9? Doesn’t exist. I suggest taking a look at the Gentoo’s official compilation optimization guide as well.

Mark Lee – 31 August, 21:22
Jon Davis
31 August, 22:37

Thanks for the very useful info Tony.

Squid is amazing but documentation on setting up a good forward proxy is lacking (till now).

Jon Davis – 31 August, 22:37
Rob Szarka
1 September, 00:25

Oh, sure. Now that I’m unlikely to have to mess with Squid for the conceivable future, you go and write this nice guide. Makes me want to install a new cache just because. ;)

Rob Szarka – 1 September, 00:25
Tony Dodd
1 September, 00:51

@Mark Lee: Aye, thanks for picking up on that. Some of the compiler flags, that one included were taken from a squid related page I found somewhere on the internet (probably on page 14 of google), and I completely missed that error. My excuse is that it was around 4am when I was working on it, and I’d been working on it 18 hours a day for 3 days straight. But hey, what can you do. I’ll correct it at some point when I’m in front of a real PC. Until then, everone’s compilers should default to 3 if the compiler level is >3 (although I’m told this default behavior is not maintained in Gentoo?). ;-)

@Jon Davis: Thanks =) I’m pleased other folks are finding it useful.

@Rob Szarka: Hey, you can come set up squids all day here if you like! ;-)

Tony Dodd – 1 September, 00:51
yeled
2 September, 21:35

try `sysctl -w net.ipv4.route.flush=1’ instead of rebooting.

(also, your textile thingy isnt working)

yeled – 2 September, 21:35
kL
2 September, 21:54

Meh. lighttpd can handle heavy load of small files, and rest of the webserving too.

kL – 2 September, 21:54
Tony
3 September, 12:49

@yeled: My issue was getting ubuntu to accept the new max file-descriptors. Works as it should under CentOS. But yes, for quick modifications, sysctl is great.
Not sure why textile isn’t formatting comments, I’ll beat Julian later. ^_

@kL: Yes, it can. However, I’d recommend reading up on exactly why people choose to use caching in front of webservers before making comments like that though.

Tony – 3 September, 12:49
Julian
3 September, 13:42

The Textile issue is fixed now. You may go crazy with code and tables and holy-moly.

Julian – 3 September, 13:42
incongnito
4 September, 11:01

Wikipedia uses squids (a lot), Wikimedia Meta

incongnito – 4 September, 11:01
Dave
6 September, 15:44

Hi,
Wow! Great Guide. I just tried it on a CentOS 5 router and it is flying! I’m off to try as much as applicable on a FreeBSD squid router. Again, great guide.
Dave.

Dave – 6 September, 15:44
Dave
7 September, 18:04

As a sysadmin, thanks for the writeup. I recently set up squid in a server at a datacenter just for my own personal use, and I see gains just in normal web browsing. Nice to see a sysadmin geek post :)

Dave – 7 September, 18:04
James
25 September, 22:13

Tony:

Nice write-up, but regarding cache_mem usage: are you sure it’s 100MB, and not 10MB?

(from Squid wiki…)

“As a rule of thumb on Squid uses approximately 10 MB of RAM per GB of the total of all cache_dirs”

James – 25 September, 22:13
Tony
28 September, 16:07

@James: I see what you’re saying, however, cache_mem is merely ram that is allocated for squid to store hot cache objects in, and also objects that are to be cached, but not swapped out — 404 pages, etc.
You need to keep the cache_mem low, so that you don’t end up swapping because you’re keeping too many objects cached in memory, and squid is suffering on the i/o side because it has no ram available to use for cache dir operations. =]

Tony – 28 September, 16:07
deujigum
3 October, 07:23

I have 2 CHIP but Squid only run base on a CPU .

So CPU system : only 2% but CPU for Squid 99%

can you help my

deujigum – 3 October, 07:23
Tony Dodd
4 October, 01:47

@deujigum:
Quite probably not without a lot of debugging.
Squid shouldn’t be CPU bound if compiled the way I suggest in the above howto. It’s much more likely that you are I/O bound. Check your cache.log for further enlightenment as to what the issue is. If it’s I/O bound, and you’re using async i/o, you will get messages telling you there are issues in there. They will look something like:

squidaio_queue_request: WARNING – Queue congestion

This isn’t an overtly /bad/ error message, it’s basically letting you know that you’re getting close to the mark on how much I/O your machine can handle.

Error messages like this, on the other hand:

squidaio_queue_request: Queue Length: current=586, high=615, low=161, duration=26

Are /bad/. If you have lots of them, that is /really bad/. It means, in laymans terms, that you’ve overloaded your squid machine. It’s receiving more requests than it can cope with, and a queue of requests is forming. Again, this is usually all down to the speed of your disks, and probably nothing to do with CPU. Squid’s CPU usage will reach stupid levels when it’s waiting on disk.

Really, the only course of action to resolve this, is to put more squid machines online.

That’s about the best I can do, I’m afraid. For more help, you’re going to need to speak to the squid folks on irc.

Tony Dodd – 4 October, 01:47
vade
10 October, 06:49

good jobs dude,
it’s works : )

vade – 10 October, 06:49
carrot
11 October, 06:34

Hm..
net.ipv4.tcp_* values are small than default setting.

why?

carrot – 11 October, 06:34

Comments are closed for this entry.

Last.fm – the Blog

Squid Optimization Guide

Thursday, 30 August 2007

by tony
filed under Tips and Tricks
Comments: 25

Playing the Optimization Game

Enter ReiserFS

No! No! No! Compile from source!!

Go! Go! Gadget Makefile

/etc/squid.conf

Tweak my /proc baby, yeah!

Comments

Aron
31 August, 04:58

Martin
31 August, 05:34

David
31 August, 12:59

Tony
31 August, 13:10

Max Howell
31 August, 17:53

Fiona McLaren
31 August, 18:14

MK
31 August, 18:22

elias
31 August, 20:45

Mark Lee
31 August, 21:22

Jon Davis
31 August, 22:37

Rob Szarka
1 September, 00:25

Tony Dodd
1 September, 00:51

yeled
2 September, 21:35

kL
2 September, 21:54

Tony
3 September, 12:49

Julian
3 September, 13:42

incongnito
4 September, 11:01

Dave
6 September, 15:44

Dave
7 September, 18:04

James
25 September, 22:13

Tony
28 September, 16:07

deujigum
3 October, 07:23

Tony Dodd
4 October, 01:47

vade
10 October, 06:49

carrot
11 October, 06:34

Browse by Category:

Thursday, 30 August 2007

by tony filed under Tips and Tricks Comments: 25

Playing the Optimization Game

Enter ReiserFS

No! No! No! Compile from source!!

Go! Go! Gadget Makefile

/etc/squid.conf

Tweak my /proc baby, yeah!

Comments

kL 2 September, 21:54

incongnito 4 September, 11:01

James 25 September, 22:13

deujigum 3 October, 07:23

carrot 11 October, 06:34

Browse by Category:

by tony
filed under Tips and Tricks
Comments: 25

kL
2 September, 21:54

incongnito
4 September, 11:01

James
25 September, 22:13

deujigum
3 October, 07:23

carrot
11 October, 06:34