PHP - MySQL running OOM with large result sets

Bearbeiten

The issue

Bearbeiten

When fetching large MySQL result sets, PHP webservice is running into out-of-memory (OOM) errors. The default limit for webservices is currently 4 GB vmem, but this doesn't matter here.

Question: What is causing PHP to consume this amount of memory? And what can be done ?
Hypothesis: SQL result sets are too large. To save PHP memory, it's better to use unbuffered queries to retrieve one data row after another from the server, instead of fetching all at once into client memory (= buffered = default for plain mysqli queries).

Conclusion

Bearbeiten
  • The high memory consumptiom of PHP is mainly caused by the resulting array, not by the retrieved sql result set.
  • Enforcing the use of unbuffered queries, doesn't help anything as long as you have to store the results. Memory usage just switches from memory (buffer) to memory (variable).
  • As an alternative, an unbuffered query can be processed row by row, without storing it. But this will greatly slow down things.
  • Buffering is a way to quickly execute SQL queries and free up resources on the MySQL server, as the server is much faster than the client.
  • The inital difference between normal mysql query and prepared statement is caused by the way datatypes are retrieved. With mysql query MySQL server converts all SQL datatypes to string before sending them to client, while prepared statement tries to keep SQL datatypes (as long as they match to PHP datatypes). After casting the variables, this difference is gone.
* Include only columns you really need
* Reduce dimensions/size of storage array(s)
* Cast variables to numeric type (if applicable)

Test setup

Bearbeiten
  • randomly chosen editor with 107,803 edits on dewiki
  • result stored in 2-dimensional PHP array
  • script as shown below

Test results

Bearbeiten
  • size of query result exported as plain file: 11 MB
Memory usage with different modifications
Test modifications normal mysql query prepared statement
buffered (store_result) 161.00 MB 155.25 MB
unbuffered (use_result) 161.00 MB 155.25 MB
casting variables in PHP result array 1 148.00 MB 148.00 MB
reducing result array to 1 dimension 2 38.25 MB 38.25 MB
no array as result, but one large string 3 11.25 MB 11.25 MB
adding a column with blank content 4 187.50 MB 181.75 MB


1 Modification: casting suitable variables to number type integer

  $foo[] = array(
	(int)$row->rev_timestamp,
	$row->page_title,
	(int)$row->page_namespace,
	$row->rev_comment,
    );


2 Modification: reducing result array to 1 dimension

  $foo[] =  $row->rev_timestamp . $row->page_title .
	    $row->page_namespace . $row->rev_comment;


3 Modification: no array as result, but one large string

  $foo .=  $row->rev_timestamp . $row->page_title .
	    $row->page_namespace . $row->rev_comment;


4 Modification: adding a column with blank content

   SELECT  UNIX_TIMESTAMP(rev_timestamp) as rev_timestamp, page_title, page_namespace, rev_comment, '' as blank_column
  
   $foo[] = array(
	$row->rev_timestamp,
	$row->page_title,
	$row->page_namespace,
	$row->rev_comment,
	$row->blank_column,
     );


For simplification, all error handlers are removed.

<?php 

// Set memory limit ( default = 128 MB )
   ini_set("memory_limit", "512M" );


// Read DB credentials
   $inifile = "../../replica.my.cnf";
   $iniVal = parse_ini_file($inifile);
   $dbUser = $iniVal["user"];
   $dbPwd  = $iniVal["password"];
   unset($iniVal);


// Create new mysqli Object
   $mysqli = new mysqli("s5.labsdb",$dbUser, $dbPwd, "dewiki_p");
   $mysqli->set_charset("utf8");


// Define the SQL query
   $query = "
	SELECT  UNIX_TIMESTAMP(rev_timestamp) as rev_timestamp,  page_title, page_namespace, rev_comment
	/*SLOW_OK RUN_LIMIT 60 NM*/
	FROM revision_userindex 
	JOIN page ON page_id = rev_page 
	WHERE rev_user = '226562'
	ORDER BY rev_timestamp ASC
   ";

// Get memory usage before query
   $m1 = memory_get_usage(true);


// The alternative calls. commented out if not to be run
   $ff = sqli_query( $mysqli, $query, MYSQLI_USE_RESULT );
   $ff = sqli_query( $mysqli, $query, MYSQLI_STORE_RESULT );
   $ff = sqli_stmt( $mysqli, $query, $store_result=false );
   $ff = sqli_stmt( $mysqli, $query, $store_result=true );


// Get memory usage after query
   $m2 = memory_get_usage(true);


// Output the results
   printf("\n num: ".count($ff)." records \n\n");
   printf("$m1 \n $m2");


// Functions

// A. Plain vanilla mysqli query
   function sqli_query( $mysqli, $query, $mode ){
		
	if ( $result = $mysqli->query( $query, $mode) ){
		while( $row = $result->fetch_object() ){
			$foo[] = array(
				$row->rev_timestamp,
				$row->page_title,
				$row->page_namespace,
				$row->rev_comment,
			   );	
		}
		$result->close();
	}
	
	return $foo;
   }


// B. prepared statement
   function sqli_stmt( $mysqli, $query, $mode ){
	
	if ( $stmt = $mysqli->prepare( $query ) ){
#		$stmt->bind_param('i', $user_id );  // in this example no param to bind
		$stmt->execute();
		$stmt->bind_result($rev_timestamp, $page_title, $page_namespace, $rev_comment);
		if ( $mode ) {
		    $stmt->store_result();
		}
		while( $stmt->fetch() ){
			$foo[] = array(
				$rev_timestamp,
				$page_title,
				$page_namespace,
				$rev_comment
			   );
		}
		$stmt->free_result();
		$stmt->close();
	}

	return $foo;
   }

References

Bearbeiten