This morning I was testing the new XML/XSLT-based engine I’m working on for the next version of Poseidon. Previously, I had only tested it on my local dev server at home, but this time I wanted to see how it would perform on wonko.com. I was shocked and dismayed to find that it performed horribly. In fact, it was almost 20 times slower than on the dev server.
After getting over the initial urge to destroy things (I’ve spent a lot of time over the last two months tweaking and optimizing my code to squeeze every last bit of performance out of it) I started benchmarking each individual component of the application to try and find the culprit. Actually, first I went on a picnic, then I started benchmarking. But I digress.
The first suspects, obviously, were the XML and XSLT routines. There’s some pretty heavy DOM manipulation going on, not to mention XSL transformations, and it had taken me quite a while to get things to a point on the dev server where I was satisfied with the performance. But my benchmarks showed that these routines weren’t the problem. In fact, they were consuming less than 1% of the total processing time.
What was the problem then, you ask? A single echo statement. All it did was echo the transformed HTML document—a string of about 37,000 bytes—to the browser. On a whim, I removed the echo and, instead, wrote the string to a temporary file and then used readfile() to dump the file’s contents to the browser. Absurdly, this brought the performance back up to the level I had expected.
After a great deal of Googling and digging through PHP bug reports, I found this old bug report (which is, rather frustratingly, marked “bogus”). In short, using echo to send large strings to the browser results in horrid performance due to the way Nagle’s Algorithm causes data to be buffered for transmission over TCP/IP. It wasn’t an issue on the dev server because it’s on my LAN.
The solution? A simple three-line function that splits large strings into smaller chunks before echoing them:
function echobig($string, $bufferSize = 8192) {
$splitString = str_split($string, $bufferSize);
foreach($splitString as $chunk) {
echo $chunk;
}
}
Play around with the buffer size and see what works best for you. I found that 8192, apart from being a nice round number, seemed to be a good size. Certain other values work too, but I wasn’t able to discern a pattern after several minutes of tinkering and there’s obviously some math at work that I have no desire to try to figure out.
By the way, the performance hit also happens when using PHP’s output control functions (ob_start() and friends) and when you enable output buffering in php.ini. I’m amazed more people haven’t noticed this, and I’m even more amazed that the PHP developers have just decided to ignore it since it’s not technically a bug in PHP.
Comments
Is print...
Just curious.
I Don't Get It
1. Buffer data until you have enough to fill the MTU (which is hopefully set to 1500 these days).
2. After sending packet x, wait for the ACK of packet x before sending packet x'.
So, if you have approximately 37,000 byte string, and you can put 1480 bytes in each packet (their is a 20 byte packet overhead), you will need to send approximately 25 packets, which should get sent as fast as the ACKS get received. I don't understand how splitting you inputs to echo() into 8KB chunks is going to change anything.
In short, I don't understand, how does Nagling slow throughput on the 37,000 byte string but not the 8KB string? I am not saying you are wrong, I am just confused.
Re: I Don't Get It
This is a total shot in the dark, and I'm not certain why it would work yet, but maybe you have to set the split boundary in echobig() to just under your MTU?
Re: I Don't Get It
echoconstruct, but I spent two hours looking for it in their CVS repository and came up with nothing. If you manage to figure out where it's buried, let me know.All I know is that my benchmarks say
echoing large strings is slow andechoing small strings is fast.Re: Is print...
Yes,
printhas the same problem.Re: I Don't Get It
I too poked around on cvs.php.net, and I too came up empty handed. I briefly considered downloading the source and grepping for echo, but then I realized that I would probably get thousands of false hits.
Anticipation
I know this is an irritating question, but I can't help myself. By when do you think you'll have the next version of Poseidon available publicly and what new features can be expected?
Re: Anticipation
The most noteworthy new feature is the new XML/XSLT-based rendering system, which means every single page on the site is available as an XML or RSS feed, and which allows for some really cool things to be done with themes and plugins.
The plugin architecture will also be very different, although the details of that haven't been finalized yet.
PHP5 will be required, and Poseidon will make full use of its capabilities. In fact, I'm already pushing it almost to its limits in places and I'm nowhere near finished yet. We'll see how that works out. Poseidon's main goal in the past has been simplicity through minimalism; the goal of the new version is more along the lines of simplicity through elegance. That should be transparent to users, though. If anything, I think the new Poseidon will appear to be much simpler at a glance, but much more powerful upon closer examination.
Of course, if you want to see all of this for yourself, the latest development code is in CVS. Instructions here.
Re: Anticipation
my brain started trying to think of an analogy in terms of ships of the line. Damn you Patrick O'Brian! Damn your brain-eating books!
Re: I Don't Get It
Now to answer the question, you can send a bunch of acks in succession, without waiting for an ack, but once you exceed the packet size for a given packet, the nagle algorithm required the remote computer to ack (acknowledge) receipt of the packet before it sends the next packet. Hence, the delay. This has nothing to do with any programming language per se but rather with tcp/ip implementation. If you feel the need to respect the mtu packetsize for strings sent so as not to incur the dely, then that is your choice as an optimization (why not check the max mtu on a per machine basis instead of praying to the math gods for scrolls of enlightenment to post on your pitiful board) but don't make a fool of yourself by pretending that you found some php bug years ago and they still won't fix it. It's a bogus bug, amatuer preacherman. Excuse the bad english, I am an american.
Re: I Don't Get It
That should be a bunch of packets in succession (sp? succession, proper spelling, yes? thx)
Re: I Don't Get It
That should be requires, not required. A bug. I will file it.
Re: I Don't Get It
Thanks for the explanation of Nagling, but read closer. I didn't file any bug reports. I didn't even claim it was a bug. I just described a performance problem I had encountered and shared my solution, which I know isn't ideal (and I said as much), but which is good enough for me. Of course, seeing as how it's my site, I could have said "PHP performs sexual favors on Ebola monkeys" and your patronizing indignation still wouldn't shift my opinion.
That said, in my tests I saw no evidence that this problem had anything to do with MTU size. If it had, then my numbers wouldn't have needed to be stabs in the dark.
In any case, checking the MTU size on a per-machine basis in a cross-platform web app would be both completely overkill and a huge pain in the ass. It wouldn't be too hard to glean such information from Unix (although there would need to be different routines for Linux, FreeBSD, etc.), but in Windows it would require reading from the registry, which is not at all convenient in PHP. Either way, looking up the MTU size on each pageview would be an ugly performance hit, and it'd be useless to try to get server administrators to specify the value in a config file, since most of them wouldn't bother. But, since this issue doesn't appear to be related to the MTU size in the first place, this is a moot point.
Re: I Don't Get It
mtu = maximum transmission unit / packet. If you go read the nagle algo explanation yet again, you'll see that you're problem is that you get a wait state once you exceed the packet size, since then the algo requires an ack from remote machine before sending the next packet in the string. it's also probably a negotiated protocol, so the actual packetsize used depends most likely on the max packetsize declared for machines at each end (so i'll leave rtfm up to you since you seem to be so good at it) and not much to do with sexual favors.
Re: I Don't Get It
If you're going to keep insisting that the MTU is the key to everything, I suggest you run your own tests. I'd love to see your results.
Re: I Don't Get It
OK you said you did not have an issue on the dev server since it's on your LAN. The only two machines then is the server and the client, so only two MTU settings the algorithm might consider. If there are a bunch of hops in between each machine, don't you think the protocol will respect the mtu preferences of intermediate hops to optimize their bandwidth preferences as well (I'm not inclined to read RFC's right now, but I'm pretty sure they get factored int too). Correct me if I'm wrong, I'm sure.
Re: I Don't Get It
Also if you'll remember there was the whole issue where, when all the receive window size, mtu, etc. registry manipulators came out for windows machines, there was a slew of articles on what the optimal settings were. The issue at hand was that, although a setting of xxx > yyy might be optimal for you pipe width and os environment, most servers on the internet would not have the same higher setting, so that packets will still get fragmented somewhere along the way. Which is why some people actually found lower settings (non-optimal locally) to be best since it catered for more window sizes/mtu sizes in use on the internet, so that they got better overall throughput. Beats randomly trying buffer lengths and trying to extrapolate causes of bottlenecks...
Re: I Don't Get It
That's a reasonable explanation that I hadn't thought of. You may be right.
Seeing the same thing
The problem is that on one of our servers echo does not appear to be slow. It takes 0.00001s to echo out a huge string, and on all the others it takes something like 0.5s
Have you found any other explainations or workarouns since? It seems silly to have to use a special echo function to echo a string out. Defeats the purpose of opimising code if you know what I mean :(
PHP echo CVS source
The 'echo' statement is actually a PHP language construct, not a function.
So it's almost for sure going to be in the guts of the Zend engine, in the language grammar files passed to yacc, bison, or whatever is used to convert the syntax of things like "if" "else" and "while" etc.
In other words, "echo" will be in the same bit of code as "if" for PHP internals.
Hope that helps.
No title
I found this old thread because I have a similar problem with some interesting testresults.
I use echo to output a very large text (600kB). The spent time depends on the client machine and the used browser. In tests from the same client network (same IP) on three different client machines it differs from .7 to 15 seconds with firefox and .7 to 1 second with IE.
I am not sure what that means. But I will try to split up the text anyway.
Same Problem
I have the same exact problem, though it is a 100kb HTML
on dev server is about 0.01 sec on produciton is ~0.6 sec ???anybody found the solution to this
Refresh ??
Hm , I tried your approach to split string to smaller chunks, same issue. But what I found out is that if I start hiting refresh on each 1-2 seoconds echo time drops ~0.000X seconds
if I wait for eg 10 seconds and hit refresh again it goes up to 0.7 seconds again ??? It may be related to memory usage, swapping … ( I am not a linux expert so this is a guess ) I am on VPS with 256mb of ram and I see that the mem is full and have used 10mb of swap, any ideas ?
Solution found.
Hello,
I’ve got this issue too after the website hoster updated PHP. It tooks about 0.3+ seconds for a big echo output. What the?!
I’ve googled a bit.. and found this page. That function doesn’t help me.
Here’s a fix: put ob_start() before outputing via echo.
Thanks to http://phplens.com/lens/php-book/optimizing-debugging-php.php !