This morning I was testing the new XML/XSLT-based engine I’m working on for the next version of Poseidon. Previously, I had only tested it on my local dev server at home, but this time I wanted to see how it would perform on wonko.com. I was shocked and dismayed to find that it performed horribly. In fact, it was almost 20 times slower than on the dev server.
After getting over the initial urge to destroy things (I’ve spent a lot of time over the last two months tweaking and optimizing my code to squeeze every last bit of performance out of it) I started benchmarking each individual component of the application to try and find the culprit. Actually, first I went on a picnic, then I started benchmarking. But I digress.
The first suspects, obviously, were the XML and XSLT routines. There’s some pretty heavy DOM manipulation going on, not to mention XSL transformations, and it had taken me quite a while to get things to a point on the dev server where I was satisfied with the performance. But my benchmarks showed that these routines weren’t the problem. In fact, they were consuming less than 1% of the total processing time.
What was the problem then, you ask? A single echo statement. All it did was echo the transformed HTML document—a string of about 37,000 bytes—to the browser. On a whim, I removed the echo and, instead, wrote the string to a temporary file and then used readfile() to dump the file’s contents to the browser. Absurdly, this brought the performance back up to the level I had expected.
After a great deal of Googling and digging through PHP bug reports, I found this old bug report (which is, rather frustratingly, marked “bogus”). In short, using echo to send large strings to the browser results in horrid performance due to the way Nagle’s Algorithm causes data to be buffered for transmission over TCP/IP. It wasn’t an issue on the dev server because it’s on my LAN.
The solution? A simple three-line function that splits large strings into smaller chunks before echoing them:
function echobig($string, $bufferSize = 8192) {
$splitString = str_split($string, $bufferSize);
foreach($splitString as $chunk) {
echo $chunk;
}
}
Play around with the buffer size and see what works best for you. I found that 8192, apart from being a nice round number, seemed to be a good size. Certain other values work too, but I wasn’t able to discern a pattern after several minutes of tinkering and there’s obviously some math at work that I have no desire to try to figure out.
By the way, the performance hit also happens when using PHP’s output control functions (ob_start() and friends) and when you enable output buffering in php.ini. I’m amazed more people haven’t noticed this, and I’m even more amazed that the PHP developers have just decided to ignore it since it’s not technically a bug in PHP.
Comments
Is print...
Just curious.
I Don't Get It
1. Buffer data until you have enough to fill the MTU (which is hopefully set to 1500 these days).
2. After sending packet x, wait for the ACK of packet x before sending packet x'.
So, if you have approximately 37,000 byte string, and you can put 1480 bytes in each packet (their is a 20 byte packet overhead), you will need to send approximately 25 packets, which should get sent as fast as the ACKS get received. I don't understand how splitting you inputs to echo() into 8KB chunks is going to change anything.
In short, I don't understand, how does Nagling slow throughput on the 37,000 byte string but not the 8KB string? I am not saying you are wrong, I am just confused.
Re: I Don't Get It
This is a total shot in the dark, and I'm not certain why it would work yet, but maybe you have to set the split boundary in echobig() to just under your MTU?
Re: I Don't Get It
echoconstruct, but I spent two hours looking for it in their CVS repository and came up with nothing. If you manage to figure out where it's buried, let me know.All I know is that my benchmarks say
echoing large strings is slow andechoing small strings is fast.Re: Is print...
Yes,
printhas the same problem.Re: I Don't Get It
I too poked around on cvs.php.net, and I too came up empty handed. I briefly considered downloading the source and grepping for echo, but then I realized that I would probably get thousands of false hits.
Anticipation
I know this is an irritating question, but I can't help myself. By when do you think you'll have the next version of Poseidon available publicly and what new features can be expected?
Re: Anticipation
The most noteworthy new feature is the new XML/XSLT-based rendering system, which means every single page on the site is available as an XML or RSS feed, and which allows for some really cool things to be done with themes and plugins.
The plugin architecture will also be very different, although the details of that haven't been finalized yet.
PHP5 will be required, and Poseidon will make full use of its capabilities. In fact, I'm already pushing it almost to its limits in places and I'm nowhere near finished yet. We'll see how that works out. Poseidon's main goal in the past has been simplicity through minimalism; the goal of the new version is more along the lines of simplicity through elegance. That should be transparent to users, though. If anything, I think the new Poseidon will appear to be much simpler at a glance, but much more powerful upon closer examination.
Of course, if you want to see all of this for yourself, the latest development code is in CVS. Instructions here.
Re: Anticipation
my brain started trying to think of an analogy in terms of ships of the line. Damn you Patrick O'Brian! Damn your brain-eating books!
Re: I Don't Get It
Now to answer the question, you can send a bunch of acks in succession, without waiting for an ack, but once you exceed the packet size for a given packet, the nagle algorithm required the remote computer to ack (acknowledge) receipt of the packet before it sends the next packet. Hence, the delay. This has nothing to do with any programming language per se but rather with tcp/ip implementation. If you feel the need to respect the mtu packetsize for strings sent so as not to incur the dely, then that is your choice as an optimization (why not check the max mtu on a per machine basis instead of praying to the math gods for scrolls of enlightenment to post on your pitiful board) but don't make a fool of yourself by pretending that you found some php bug years ago and they still won't fix it. It's a bogus bug, amatuer preacherman. Excuse the bad english, I am an american.
Re: I Don't Get It
That should be a bunch of packets in succession (sp? succession, proper spelling, yes? thx)
Re: I Don't Get It
That should be requires, not required. A bug. I will file it.
Re: I Don't Get It
Thanks for the explanation of Nagling, but read closer. I didn't file any bug reports. I didn't even claim it was a bug. I just described a performance problem I had encountered and shared my solution, which I know isn't ideal (and I said as much), but which is good enough for me. Of course, seeing as how it's my site, I could have said "PHP performs sexual favors on Ebola monkeys" and your patronizing indignation still wouldn't shift my opinion.
That said, in my tests I saw no evidence that this problem had anything to do with MTU size. If it had, then my numbers wouldn't have needed to be stabs in the dark.
In any case, checking the MTU size on a per-machine basis in a cross-platform web app would be both completely overkill and a huge pain in the ass. It wouldn't be too hard to glean such information from Unix (although there would need to be different routines for Linux, FreeBSD, etc.), but in Windows it would require reading from the registry, which is not at all convenient in PHP. Either way, looking up the MTU size on each pageview would be an ugly performance hit, and it'd be useless to try to get server administrators to specify the value in a config file, since most of them wouldn't bother. But, since this issue doesn't appear to be related to the MTU size in the first place, this is a moot point.
Re: I Don't Get It
mtu = maximum transmission unit / packet. If you go read the nagle algo explanation yet again, you'll see that you're problem is that you get a wait state once you exceed the packet size, since then the algo requires an ack from remote machine before sending the next packet in the string. it's also probably a negotiated protocol, so the actual packetsize used depends most likely on the max packetsize declared for machines at each end (so i'll leave rtfm up to you since you seem to be so good at it) and not much to do with sexual favors.
Re: I Don't Get It
If you're going to keep insisting that the MTU is the key to everything, I suggest you run your own tests. I'd love to see your results.
Re: I Don't Get It
OK you said you did not have an issue on the dev server since it's on your LAN. The only two machines then is the server and the client, so only two MTU settings the algorithm might consider. If there are a bunch of hops in between each machine, don't you think the protocol will respect the mtu preferences of intermediate hops to optimize their bandwidth preferences as well (I'm not inclined to read RFC's right now, but I'm pretty sure they get factored int too). Correct me if I'm wrong, I'm sure.
Re: I Don't Get It
Also if you'll remember there was the whole issue where, when all the receive window size, mtu, etc. registry manipulators came out for windows machines, there was a slew of articles on what the optimal settings were. The issue at hand was that, although a setting of xxx > yyy might be optimal for you pipe width and os environment, most servers on the internet would not have the same higher setting, so that packets will still get fragmented somewhere along the way. Which is why some people actually found lower settings (non-optimal locally) to be best since it catered for more window sizes/mtu sizes in use on the internet, so that they got better overall throughput. Beats randomly trying buffer lengths and trying to extrapolate causes of bottlenecks...
Re: I Don't Get It
That's a reasonable explanation that I hadn't thought of. You may be right.
Seeing the same thing
The problem is that on one of our servers echo does not appear to be slow. It takes 0.00001s to echo out a huge string, and on all the others it takes something like 0.5s
Have you found any other explainations or workarouns since? It seems silly to have to use a special echo function to echo a string out. Defeats the purpose of opimising code if you know what I mean :(
PHP echo CVS source
The 'echo' statement is actually a PHP language construct, not a function.
So it's almost for sure going to be in the guts of the Zend engine, in the language grammar files passed to yacc, bison, or whatever is used to convert the syntax of things like "if" "else" and "while" etc.
In other words, "echo" will be in the same bit of code as "if" for PHP internals.
Hope that helps.
No title
I found this old thread because I have a similar problem with some interesting testresults.
I use echo to output a very large text (600kB). The spent time depends on the client machine and the used browser. In tests from the same client network (same IP) on three different client machines it differs from .7 to 15 seconds with firefox and .7 to 1 second with IE.
I am not sure what that means. But I will try to split up the text anyway.
Same Problem
I have the same exact problem, though it is a 100kb HTML
on dev server is about 0.01 sec on produciton is ~0.6 sec ???anybody found the solution to this
Refresh ??
Hm , I tried your approach to split string to smaller chunks, same issue. But what I found out is that if I start hiting refresh on each 1-2 seoconds echo time drops ~0.000X seconds
if I wait for eg 10 seconds and hit refresh again it goes up to 0.7 seconds again ??? It may be related to memory usage, swapping … ( I am not a linux expert so this is a guess ) I am on VPS with 256mb of ram and I see that the mem is full and have used 10mb of swap, any ideas ?
Solution found.
Hello,
I’ve got this issue too after the website hoster updated PHP. It tooks about 0.3+ seconds for a big echo output. What the?!
I’ve googled a bit.. and found this page. That function doesn’t help me.
Here’s a fix: put ob_start() before outputing via echo.
Thanks to http://phplens.com/lens/php-book/optimizing-debugging-php.php !
wierd problem this one
having this problem with a 0.7s echo. ob_start() didn’t work, neither did the buffering function.
what seems to be happening on other pages of the same site however, is that the larger echoed string is the one that takes the most time to process on that page.
Compression helped me
Hey Guys,
I feel a little sorry that the otherwise interesting discussion went a little out of control up there.
Anyway, I found that applying compression to the php output helps a lot, as described in
http://httpd.apache.org/docs/2.0/mod/mod_deflate.html
This got my benchmark times down to nearly 10% depending on server traffic and such.
Hope this helps!
Cheers!
Thomas
Problem solved with ob_start()
I am too encountered the same problem. Echoing a large string took as long as a hundred seconds sometimes. what I found is that it greatly depends on the client and connections. I was only able to replicate this problem using slow proxies. The issue is not really in PHP but more with TCP protocol. Here is how it work I believe. The PHP outputs a string to Apache and as soon as the TCP buffer size is full (which on many systems is set to 8192, which seems to be the case for the author of this post) it starts to output packets in succession. Once all packets go out it waits for acks on those packets before sending next buffer full of data.
Anyways, I tries splitting string in pieces and that did not help. I agree with the reply above that Nagle Algorithm does not have anything to do with this problem. It just does not make sense. Nagle is only supposed to slow down transmissions for small data chunks only. It supposed to wait until it has enough data before sending a packet. That just does not make sense for large strings. Anyways once i started using explicit caching using ob_start() the problem went away. Using output_buffering=On will not solve this problem, its only the ob_start() that can help. To see more on this issue check out my whole though process on http://www.phpfreaks.com/forum s/index.php/topic,241381.0.htm l
I wish I had found this post earlier.
Solution
I had the exact same issue. Was about to abandon my entire project as my production server whas oh about 50 times slower than my dev… all because of ONE echo statement (I concatenate my html into variables and use one echo). I tried output buffering, changing mtu size (max 1500 =[), apc_php caching and various other techniques including the echobig function mentioned here. NOTHING worked at all. Then I added this line to php.ini “output_handler = ob_gzhandler” BOOM literally 100x faster (the other 50x being fom the fact that my dev is 8 years old so the new server really helps out). So yea HTML compression is the key. I hope this helps as it saved my ass big time.
The way you MEASURE performance is misleading you
Dear all,
Thank you for all the info above. It definitely helped me to understand what was going on.
However, in the end, I want to suggest that most of you have been mislead by the way you measure performance!!
I bet that just like me, you put a timer at the beginning of your PHP script and another at the end, you compute the difference and the resulting time is your measure of performance. (At least, that’s what I was doing).
Now, the effect of adding ob_start() is that PHP buffers the output internally and only passes it back to apache ***after*** you compute your PHP run time at the end of the script. Yes, the actual “echo” in this case, happens after the end of the script.
Configuring an output_handler like ob_gzhandler pretty much results in the same effect.
Now, I suggest to you that you actually measure the time that Apache takes to serve the whole request. I posted details on how to measure request times in apache on my blog.
If you do that, you will see that the global processing time in Apache doesn’t actually change whether or not you do the ob_start() trick.
Global processing time varies greatly though, depending on the distance between you and the production server. If you have severs in different datacenters, try a few wgets! ;)
Now I have to admit I have still added the ob_start() to my code, just for the satisfaction of seeing a short php execution time and knowing for sure the delay is due to the connection, not to my code ;)
How to make echo really faster (and transfer the delay over to Apache)
Guys, I think I narrowed it down even further!
As previously said, PHP buffering will let PHP race to the end of your script, but after than it will still “hang” while trying to pass all that data to Apache.
Now I was able, not only to measure this (see previous comment) but to actually eliminate the waiting period inside of PHP. I did that by increasing Apache’s SendBuffer with the SendBufferSize directive.
This pushes the data out of PHP faster. I guess the next step would be to get it out of Apache faster but I’m not sure if there is actually another configurable layer between Apache and the raw network bandwidth.