The eclectic musings of a bitter software engineer.

This morning I was testing the new XML/XSLT-based engine I’m working on for the next version of Poseidon. Previously, I had only tested it on my local dev server at home, but this time I wanted to see how it would perform on wonko.com. I was shocked and dismayed to find that it performed horribly. In fact, it was almost 20 times slower than on the dev server.

After getting over the initial urge to destroy things (I’ve spent a lot of time over the last two months tweaking and optimizing my code to squeeze every last bit of performance out of it) I started benchmarking each individual component of the application to try and find the culprit. Actually, first I went on a picnic, then I started benchmarking. But I digress.

The first suspects, obviously, were the XML and XSLT routines. There’s some pretty heavy DOM manipulation going on, not to mention XSL transformations, and it had taken me quite a while to get things to a point on the dev server where I was satisfied with the performance. But my benchmarks showed that these routines weren’t the problem. In fact, they were consuming less than 1% of the total processing time.

What was the problem then, you ask? A single echo statement. All it did was echo the transformed HTML document—a string of about 37,000 bytes—to the browser. On a whim, I removed the echo and, instead, wrote the string to a temporary file and then used readfile() to dump the file’s contents to the browser. Absurdly, this brought the performance back up to the level I had expected.

After a great deal of Googling and digging through PHP bug reports, I found this old bug report (which is, rather frustratingly, marked “bogus”). In short, using echo to send large strings to the browser results in horrid performance due to the way Nagle’s Algorithm causes data to be buffered for transmission over TCP/IP. It wasn’t an issue on the dev server because it’s on my LAN.

The solution? A simple three-line function that splits large strings into smaller chunks before echoing them:

function echobig($string, $bufferSize = 8192) {
  $splitString = str_split($string, $bufferSize);

  foreach($splitString as $chunk) {
    echo $chunk;
  }
}

Play around with the buffer size and see what works best for you. I found that 8192, apart from being a nice round number, seemed to be a good size. Certain other values work too, but I wasn’t able to discern a pattern after several minutes of tinkering and there’s obviously some math at work that I have no desire to try to figure out.

By the way, the performance hit also happens when using PHP’s output control functions (ob_start() and friends) and when you enable output buffering in php.ini. I’m amazed more people haven’t noticed this, and I’m even more amazed that the PHP developers have just decided to ignore it since it’s not technically a bug in PHP.

Comments

a problem as well? They are almost the same function (or builtin). The only difference being that echo allows you to print more than one string.

Just curious.
Sunday February 27, 2005 @ 11:56 AM (PST) Posted by johnr
Now, I haven't read the whole of RFC 896, but as far as I can tell the Nagle algorithm does two things:

1. Buffer data until you have enough to fill the MTU (which is hopefully set to 1500 these days).

2. After sending packet x, wait for the ACK of packet x before sending packet x'.

So, if you have approximately 37,000 byte string, and you can put 1480 bytes in each packet (their is a 20 byte packet overhead), you will need to send approximately 25 packets, which should get sent as fast as the ACKS get received. I don't understand how splitting you inputs to echo() into 8KB chunks is going to change anything.

In short, I don't understand, how does Nagling slow throughput on the 37,000 byte string but not the 8KB string? I am not saying you are wrong, I am just confused.
Sunday February 27, 2005 @ 12:06 PM (PST) Posted by tabor

This is a total shot in the dark, and I'm not certain why it would work yet, but maybe you have to set the split boundary in echobig() to just under your MTU?

Sunday February 27, 2005 @ 03:10 PM (PST) Posted by digdug
Beats me. I'd love to look at the source for PHP's echo construct, but I spent two hours looking for it in their CVS repository and came up with nothing. If you manage to figure out where it's buried, let me know.

All I know is that my benchmarks say echoing large strings is slow and echoing small strings is fast.
Sunday February 27, 2005 @ 03:47 PM (PST) Posted by Ryan Grove

Yes, print has the same problem.

Sunday February 27, 2005 @ 03:48 PM (PST) Posted by Ryan Grove
I don't doubt you one bit in your observations, I just have trouble believing that it is the Nagling that is the culprit.

I too poked around on cvs.php.net, and I too came up empty handed. I briefly considered downloading the source and grepping for echo, but then I realized that I would probably get thousands of false hits.
Sunday February 27, 2005 @ 08:06 PM (PST) Posted by tabor

I know this is an irritating question, but I can't help myself. By when do you think you'll have the next version of Poseidon available publicly and what new features can be expected?

Monday February 28, 2005 @ 10:45 AM (PST) Posted by moridin
It'll be released before 2007. Probably.

The most noteworthy new feature is the new XML/XSLT-based rendering system, which means every single page on the site is available as an XML or RSS feed, and which allows for some really cool things to be done with themes and plugins.

The plugin architecture will also be very different, although the details of that haven't been finalized yet.

PHP5 will be required, and Poseidon will make full use of its capabilities. In fact, I'm already pushing it almost to its limits in places and I'm nowhere near finished yet. We'll see how that works out. Poseidon's main goal in the past has been simplicity through minimalism; the goal of the new version is more along the lines of simplicity through elegance. That should be transparent to users, though. If anything, I think the new Poseidon will appear to be much simpler at a glance, but much more powerful upon closer examination.

Of course, if you want to see all of this for yourself, the latest development code is in CVS. Instructions here.

Monday February 28, 2005 @ 11:18 AM (PST) Posted by Ryan Grove
The moment I read this line:
If anything, I think the new Poseidon will appear to be much simpler at a glance, but much more powerful upon closer examination.

my brain started trying to think of an analogy in terms of ships of the line. Damn you Patrick O'Brian! Damn your brain-eating books!

Monday February 28, 2005 @ 06:25 PM (PST) Posted by Eilonwy
Play around with reading the damn specs before you start filing bogus bug reports like it's your job as an enlightened journalist english-as-a-day job major. Don't pretend you're some elite programmer wannabe who needs to enlighten the masses because he found the bug. Please take a course on tcp/ip networking before you pretend to know what it is you preach.

Now to answer the question, you can send a bunch of acks in succession, without waiting for an ack, but once you exceed the packet size for a given packet, the nagle algorithm required the remote computer to ack (acknowledge) receipt of the packet before it sends the next packet. Hence, the delay. This has nothing to do with any programming language per se but rather with tcp/ip implementation. If you feel the need to respect the mtu packetsize for strings sent so as not to incur the dely, then that is your choice as an optimization (why not check the max mtu on a per machine basis instead of praying to the math gods for scrolls of enlightenment to post on your pitiful board) but don't make a fool of yourself by pretending that you found some php bug years ago and they still won't fix it. It's a bogus bug, amatuer preacherman. Excuse the bad english, I am an american.
Wednesday March 30, 2005 @ 12:06 AM (PST) Posted by grv575

That should be a bunch of packets in succession (sp? succession, proper spelling, yes? thx)

Wednesday March 30, 2005 @ 12:07 AM (PST) Posted by grv575

That should be requires, not required. A bug. I will file it.

Wednesday March 30, 2005 @ 12:09 AM (PST) Posted by grv575
My, but you're a prickly cove. Did I kill a relative of yours in battle, perhaps?

Thanks for the explanation of Nagling, but read closer. I didn't file any bug reports. I didn't even claim it was a bug. I just described a performance problem I had encountered and shared my solution, which I know isn't ideal (and I said as much), but which is good enough for me. Of course, seeing as how it's my site, I could have said "PHP performs sexual favors on Ebola monkeys" and your patronizing indignation still wouldn't shift my opinion.

That said, in my tests I saw no evidence that this problem had anything to do with MTU size. If it had, then my numbers wouldn't have needed to be stabs in the dark.

In any case, checking the MTU size on a per-machine basis in a cross-platform web app would be both completely overkill and a huge pain in the ass. It wouldn't be too hard to glean such information from Unix (although there would need to be different routines for Linux, FreeBSD, etc.), but in Windows it would require reading from the registry, which is not at all convenient in PHP. Either way, looking up the MTU size on each pageview would be an ugly performance hit, and it'd be useless to try to get server administrators to specify the value in a config file, since most of them wouldn't bother. But, since this issue doesn't appear to be related to the MTU size in the first place, this is a moot point.

Wednesday March 30, 2005 @ 11:57 AM (PST) Posted by Ryan Grove

mtu = maximum transmission unit / packet. If you go read the nagle algo explanation yet again, you'll see that you're problem is that you get a wait state once you exceed the packet size, since then the algo requires an ack from remote machine before sending the next packet in the string. it's also probably a negotiated protocol, so the actual packetsize used depends most likely on the max packetsize declared for machines at each end (so i'll leave rtfm up to you since you seem to be so good at it) and not much to do with sexual favors.

Wednesday March 30, 2005 @ 04:45 PM (PST) Posted by grv575
And if you go read my response again, you'll see that MTU has nothing to do with it. When I changed the packet size to fit within the MTUs of the server and the client (both 1500 in this case) I still encountered the problem. Even accounting for TCP overhead, the MTU size appeared to have nothing to do with the delay.

If you're going to keep insisting that the MTU is the key to everything, I suggest you run your own tests. I'd love to see your results.

Wednesday March 30, 2005 @ 05:51 PM (PST) Posted by Ryan Grove

OK you said you did not have an issue on the dev server since it's on your LAN. The only two machines then is the server and the client, so only two MTU settings the algorithm might consider. If there are a bunch of hops in between each machine, don't you think the protocol will respect the mtu preferences of intermediate hops to optimize their bandwidth preferences as well (I'm not inclined to read RFC's right now, but I'm pretty sure they get factored int too). Correct me if I'm wrong, I'm sure.

Friday April 01, 2005 @ 07:45 AM (PST) Posted by grv575

Also if you'll remember there was the whole issue where, when all the receive window size, mtu, etc. registry manipulators came out for windows machines, there was a slew of articles on what the optimal settings were. The issue at hand was that, although a setting of xxx > yyy might be optimal for you pipe width and os environment, most servers on the internet would not have the same higher setting, so that packets will still get fragmented somewhere along the way. Which is why some people actually found lower settings (non-optimal locally) to be best since it catered for more window sizes/mtu sizes in use on the internet, so that they got better overall throughput. Beats randomly trying buffer lengths and trying to extrapolate causes of bottlenecks...

Friday April 01, 2005 @ 07:53 AM (PST) Posted by grv575

That's a reasonable explanation that I hadn't thought of. You may be right.

Friday April 01, 2005 @ 09:58 AM (PST) Posted by Ryan Grove
I'm seeing the same thing with echoing huge strings.

The problem is that on one of our servers echo does not appear to be slow. It takes 0.00001s to echo out a huge string, and on all the others it takes something like 0.5s

Have you found any other explainations or workarouns since? It seems silly to have to use a special echo function to echo a string out. Defeats the purpose of opimising code if you know what I mean :(
Monday June 13, 2005 @ 04:38 PM (PDT) Posted by bongo
If you still want to find the PHP source for 'echo' after all this time...

The 'echo' statement is actually a PHP language construct, not a function.

So it's almost for sure going to be in the guts of the Zend engine, in the language grammar files passed to yacc, bison, or whatever is used to convert the syntax of things like "if" "else" and "while" etc.

In other words, "echo" will be in the same bit of code as "if" for PHP internals.

Hope that helps.
Monday October 10, 2005 @ 11:38 AM (PDT) Posted by Richard Lynch

I found this old thread because I have a similar problem with some interesting testresults.

I use echo to output a very large text (600kB). The spent time depends on the client machine and the used browser. In tests from the same client network (same IP) on three different client machines it differs from .7 to 15 seconds with firefox and .7 to 1 second with IE.

I am not sure what that means. But I will try to split up the text anyway.

Thursday November 15, 2007 @ 10:51 PM (PST) Posted by Brigitta Buerger

I have the same exact problem, though it is a 100kb HTML

$t1 = microtime(true);
echo $html;
$t2 = microtime(true);
echo ($t2-$t1);
on dev server is about 0.01 sec on produciton is ~0.6 sec ???

anybody found the solution to this

Friday June 13, 2008 @ 06:37 PM (PDT) Posted by Realty

Hm , I tried your approach to split string to smaller chunks, same issue. But what I found out is that if I start hiting refresh on each 1-2 seoconds echo time drops ~0.000X seconds

if I wait for eg 10 seconds and hit refresh again it goes up to 0.7 seconds again ??? It may be related to memory usage, swapping … ( I am not a linux expert so this is a guess ) I am on VPS with 256mb of ram and I see that the mem is full and have used 10mb of swap, any ideas ?

Friday June 13, 2008 @ 07:02 PM (PDT) Posted by Realty

Hello,

I’ve got this issue too after the website hoster updated PHP. It tooks about 0.3+ seconds for a big echo output. What the?!

I’ve googled a bit.. and found this page. That function doesn’t help me.

Here’s a fix: put ob_start() before outputing via echo.

Thanks to http://phplens.com/lens/php-book/optimizing-debugging-php.php !

Friday July 18, 2008 @ 03:21 PM (PDT) Posted by Sergio
Post a comment

Basic XHTML (including links) is allowed, just don't try anything fishy. Your comment will be auto-formatted unless you use your own <p> tags for formatting. You're also welcome to use Textile.

Don't type anything here unless you're an evil robot:


And especially don't type anything here:

Copyright © 2002-2008 Ryan Grove. All rights reserved.
Powered by Thoth.