Tuesday 6 April 2010

PHP benchmarks (part 1)

Recently I've stumbled upon several PHP benchmark tests. I've found that a lot of these tests were inaccurate or based on false conceptions.
Unfortunately, none of them mentioned what PHP version are they using (and maybe some extra stuff could be useful also). I'll use PHP 5.3/Apache 2.2 on my Windows XP SP3.


Let's bring up some light. I'll begin with loops!

LOOPS


Question: What is the best way to loop a hash array?


Answer: In short - there is no one best way to loop an array. It depends on what do you intend to do with array (read, write/modify).


In one benchmark regarding array values reading (and not writing) these were the tests:

+ 100 % 1: foreach($aHash as $val); Total time: 0[ms]
+ 1196 % 2: while(list(,$val) = each($aHash)); Total time: 1[ms]
+ 2176 % 3: foreach($aHash as $key=>$val); Total time: 2[ms]
+ 3017 % 4: while(list($key,$val)= each($aHash)); Total time: 3[ms]
+ 2300 % 5: foreach($aHash as $key=>$val) $tmp[] = & $aHash[$key]; Total time: 3[ms]
+ 2461 % 6: while(list($key) = each($aHash)) $tmp[]=&$aHash[$key]; Total time: 3[ms]
+ 819 % 7: Get key-/ value-array: foreach($aHash as $key[]=>$val[]); Total time: 1[ms]
+ 864 % 8: Get key-/ value-array: array_keys() / array_values() Total time: 1[ms]
+ 839 % 9: STRANGE: This is the fasetest code when using the the &-ref-operator (to avoid copying)
$key = array_keys($aHash);
$size = sizeOf($key);
for ($i=0; $i<$size; $i++) $tmp[] = &$aHash[$key[$i]]; Total time: 1[ms]

And this tests and conclusions are incorrect since they don't do identical thing.
For example, one can benchmark:

++$i;   VS  $i++;

since result will be correct. But benchmarking:

$j = $i++;   VS    $j = ++$i;

isn't correct since final result isn't identical (the latter $j will be greater).

So, one should be aware of what he is benchmarking, so he could draw right conclusions from benchmark results.


Now ... why above mentioned benchmarks aren't correct?
Because they are comparing different things. Take "foreach" loop for example:

1: foreach($aHash as $val);
3: foreach($aHash as $key=>$val);
5: foreach($aHash as $key=>$val) $tmp[] = &$aHash[$key];

How could this be compared? First two aren't identical since there is one extra assignment ($key). The third ... well, I'm not sure what author intended to do with this, but it isn't good at all since there are more operations involved beside looping - creating $tmp array's new element, getting $aHash element with $key key and aliasing $aHash array as new array element.

One important misconception is there:
STRANGE: This is the fasetest code when using the the &-ref-operator (to avoid copying)
& - reference operator actually doesn't prevent copying (actually, it has different semantic meaning and behaviour ... see "PHP reference"). When looping through array, it's value copy isn't made unless array is being modified (see "PHP lazy copying" or "PHP copy on write"). So, reading array values won't make a copy and using &-reference operator in foreach loop that doesn't modify an array actually slows down looping since variable alias has to be created in symbol table (see "PHP symbol table").

So ... in short - using reference will modify original array in one writes to referenced array's value, not using won't (that's why values are being copied when writing/modifying, it's language optimization so you don't have to worry about that).

If someone doesn't understand & operator ... it simply creates variable alias (one more name). Try to imagine that like some thing that has more than one name:

$myFruit = 'Apple';
$yourFruit = & $myFruit;

var_dump($yourFruit);
//prints 'Apple'


Here, your fruit is 'Apple'.
Actually, I am You in the background. :)

Try this:

$yourFruit = 'Orange';
var_dump($myFruit);
//prints 'Orange'
So, you can see these variables acts as one.




My tests are divided into two sections. The first one uses array's key, the second one doesn't. Why to test like this? Because of above mentioned fairness - loop has to do identical thing in different ways so results can be compared. One should store array's key into variable ($key) when it will be used.

$hashA = array();
$hashB = array();
$hashC = array();
$a = 'a';
$b = 'b';
$c = 'c';
$max = 10000;

for ($i = 0; $i < $max; ++$i) {
$hashA[$a . $i] = $i;
}

$start = microtime();

foreach ($hashA as $key => $value) {}

echo microtime() - $start, '<br><br>';

# NEXT TEST
for ($i = 0; $i < $max; ++$i) {
$hashB[$b . $i] = $i;
}

$start = microtime();
@reset($hashB);
while(list($key, $value) = each($hashB)) {}

echo microtime() - $start, '<br><br>';

# NEXT TEST
for ($i = 0; $i < $max; ++$i) {
$hashC[$c . $i] = $i;
}

$start = microtime();

$key = array_keys($hashC);
$size = count($key);
for ($i=0; $i<$size; ++$i) {
                        $key = $keys[$i];
                $value = $hashC[$key];
}

echo microtime() - $start, '<br><br>';

Typical results are:

0.002062

0.014654

0.005652

Someone will probably notice "@reset" (see "PHP @" and "PHP reset"). Yes, that adds some extra time to the "while" loop (but it doesn't make a difference). "@reset" is commonly used before "while" loop, so I think it's better to put it in the test because it shows real use case.


Tests without array's key:


$hashA = array();
$hashB = array();
$hashC = array();
$a = 'a';
$b = 'b';
$c = 'c';
$max = 10000;

for ($i = 0; $i < $max; ++$i) {
$hashA[$a . $i] = $i;
}

$start = microtime();

foreach ($hashA as $value) {}

echo microtime() - $start, '<br><br>';

# NEXT TEST
for ($i = 0; $i < $max; ++$i) {
$hashB[$b . $i] = $i;
}

$start = microtime();
@reset($hashB);
while(list(, $value) = each($hashB)) {}

echo microtime() - $start, '<br><br>';

# NEXT TEST
for ($i = 0; $i < $max; ++$i) {
$hashC[$c . $i] = $i;
}

$start = microtime();

$keys = array_keys($hashC);
$size = count($keys);

for ($i=0; $i<$size; ++$i) {
$value = $hashC[$keys[$i]];
}

echo microtime() - $start, '<br><br>';

The typical results are:

0.000784

0.014069

0.005436


Let's move on to the "modify loop" ...

+ 272 % 1: foreach($aHash as $key=>$val) $aHash[$key] .= "a"; Total time: 3[ms]
+ 128 % 2: while(list($key) = each($aHash)) $aHash[$key] .= "a"; Total time: 1[ms]
+ 100 % 3: STRANGE: This is the fasetest code :
$key = array_keys($aHash);
$size = sizeOf($key);
for ($i=0; $i<$size; $i++) $aHash[$key[$i]] .= "a"; Total time: 1[ms]
These tests are more or less correct, but one version of "foreach" loop is missing. If we want to modify array's values, why not use & operator? In above tests it's used to read values, but not for modifying values - and that is wrong in semantic context. It should be used only when we need to modify original array and not opposite (just when reading array's values).

Here are my tests:

$hashA = array();
$hashB = array();
$hashC = array();
$hashD = array();

$a = 'a';
$b = 'b';
$c = 'c';
$d = 'd';
$max = 10000;

for ($i = 0; $i < $max; ++$i) {
$hashA[$a . $i] = $i;
}

$start = microtime();

foreach ($hashA as $key => $value) {
$hashA[$key] = 'A';
}

echo microtime() - $start, '<br><br>';

# NEXT TEST
for ($i = 0; $i < $max; ++$i) {
$hashB[$b . $i] = $i;
}

$start = microtime();
@reset($hashB);
while(list($key) = each($hashB)) {
$hashB[$key] = 'B';
}

echo microtime() - $start, '<br><br>';

# NEXT TEST
for ($i = 0; $i < $max; ++$i) {
$hashC[$c . $i] = $i;
}

$start = microtime();

$keys = array_keys($hashC);
$size = count($keys);

for ($i=0; $i<$size; ++$i) {
$hashC[$keys[$i]] = 'C';
}

echo microtime() - $start, '<br><br>';

# NEXT TEST
for ($i = 0; $i < $max; ++$i) {
$hashD[$d . $i] = $i;
}

$start = microtime();

foreach ($hashD as &$value) {
$value = 'D';
}

echo microtime() - $start, '<br><br>';
// dump $hashD if you don't believe it's modified
//var_dump($hashD);

Typical results are:

0.007736

0.015848

0.006368

0.002416 

So, one can see that the missing "foreach" loop is actually the fastest! :)




CONCLUSION


There is no need to complicate things. Just use what is proper to use:


  • if one reads array values use "foreach($array as $value)" or "foreach($array as $key => $value) if one needs array's keys also
  • if one writes/modifies original array values use "foreach($array as &$value)" or "foreach($array as $key => &$value) if one needs array's keys, so that values are written using referencing variable $value (and not $array[$key])


FINAL WORDS

If you want to optimize your PHP scripts by changing loop type ("while" to "foreach") - don't do that, especially if your application isn't slow. These optimizations are micro-optimizations and they are made as conclusion from results under certain environment and conditions and may not produce same results in some other environment and conditions. 
In real world situations, PHP will rarely be the slowest part of your application - database design is most common point where to search for speeding up your application ... amount of HTML, JS, CSS are also place where you could speed up user experience. For more frequently used applications, there is APC, load balancing, caching and so on.

This should be guidelines how to write PHP scripts from basics and to understand some of PHP basics. More will come up soon!

2 comments:

  1. One of the things I do, specially with my SQL loops is:

    $sql = 'SELECT * FROM `myTable`';
    $qry = mysql_query($sql);
    $cnt = mysql_num_rows($qry);

    if($cnt > 0)
    {
    do
    {
    $row = mysql_fetch_assoc($qry);
    /** CODE HERE **/
    }while(--$cnt > 0);
    }

    I find that I get ~12% better performance for that loop.

    ReplyDelete
  2. Interesting ... but you haven't said better from what loop (I guess the "for" loop).

    I use: while ($row = mysql_fetch_assoc($qry)), but I haven't benchmark it. It should have similar performance as your version, if not faster since it has less operations. I found that in most of the cases, examples from PHP documentation are the fastest and the best to use.

    And btw, I think you should always check (maybe not always, but...) if mysql_query returned FALSE:

    if (FALSE !== $qry) ...

    because, if you query for some reason fails, you won't get warning (mysql_query returns "resource" which you use later or FALSE on failure which produces warning if you try to use it). I think it's a good practice.


    Ivan

    ReplyDelete