Categories

Archives

07 30th, 2008

PHP5 Autoloading on Steroids

Author: tim.ariyeh

For an overview of autoloading in general, please see my previous article

Much fuss is made over the performance penalties of using PHP’s magic autoloading functionality. While I believe that it reduces overall project complexity and increases productivity, there are some very high-traffic projects I’ve overseen where the performance overhead becomes noticeable. This is particularly true for page requests that are already sensitive to delays, such as AJAX or REST requests. In this article, I’d like to showcase a couple of options for mitigating (and even eliminating) autoloader performance penalties.

Please note that all code examples in this article are strictly for illustrative purposes, and are not at all ready for production sites. These concepts should serve as starting points to implement your own high performance autoloaders

The Problem

While autoload is a godsend for managing large projects, you do incur a small performance penalty for its use. This overhead occurs primarily because:

  • Most autoloader methods search through the include path
  • Most autoloader methods include files using relative paths
  • PHP must devote execution time to Just In Time load each class. This magic comes at a price.

If this process occurs a few dozen times per request, and requests are popping in around 10/second, you might find yourself in a boring meeting discussing how the hell your team can decrease latency.

Step 1: Include Patch Caching

By and large, the content of include directories for production projects does not change. If a class file was found during the last request, why search for it again during the current one?

Rather than simply calling include() (or require() ) from your autoloading methods, do a little digging to discover the absolute path of the file for a particular class. This will be an expensive operation, but it need only be performed once. Each subsequent request can simply result in the direct inclusion of the correct file, rather than a hunt through the include_path. An example:

  1. class CodeFeast_Loader
  2. {
  3.   protected static $include_paths = array();
  4.   protected static $include_paths_loaded = false;
  5.   protected static $file_location_cache = array();
  6.   public static function autoload($class)
  7.   {
  8.    //check if this class is cached
  9.    if(isset(self::$file_location_cache[$class]) )
  10.    {
  11.    //It was cached.  No need to search, just load
  12.     require(self::$file_location_cache[$class]);
  13.    }
  14.    //It wasn’t cached.  We’ve got to hunt
  15.    else
  16.    {
  17.     //did we nab the include_path yet?
  18.     if(!$include_paths_loaded)
  19.     {
  20.      self::populateIncludePaths();
  21.     }
  22.     //Look in each path for the class file
  23.     foreach(self::$include_paths as $path)
  24.     {
  25.      if(file_exists($path . "$class.php") )
  26.      {
  27.       //We found our file.  Add it to the cache
  28.       self::$file_location_cache[$class] = $path . "$class.php";
  29.       //And load it
  30.       require(self::$file_location_cache[$class]);
  31.      }
  32.     }
  33.    }
  34.   }
  35.  //grab the include_path from php.ini
  36.   public static function populateIncludePaths()
  37.   {
  38.    self::$include_paths = explode(PATH_SEPARATOR,
  39.    ini_get(‘include_path’);
  40.    self::$include_paths_loaded = true;
  41.   }
  42.   //load the cache from ‘tmp’
  43.   public static function openCache()
  44.   {
  45.    $file_location_cache =
  46.     file_get_contents(‘tmp/file_location_cache.bin’);
  47.    self::$file_location_cache =
  48.     unserialize($file_location_cache);
  49.  
  50.   }
  51.   //persist the cache to dir ‘tmp’
  52.   public static function saveCache()
  53.   {
  54.    $file_location_cache =
  55.     serialize(self::$file_location_cache);
  56.    file_put_contents(‘tmp/file_location_cache.bin’,
  57.     $file_location_cache);
  58.  
  59.   }
  60.   //init the autoloader
  61.   public static function start()
  62.   {
  63.    //Register the autoloader
  64.    spl_autoload_register(
  65.     array(‘CodeFeast_Loader’, ‘autoload’)
  66.    );
  67.  
  68.    //Restore the cache
  69.    self::openCache();
  70.   }
  71.  }
  72.  
  73. CodeFeast_Loader::start();
  74.  
  75. //Code for your application goes here
  76.  
  77. //Call the saveCache() method to persist the cache across requests
  78. CodeFeast_Loader::saveCache();

Using this autoloader class, the actual search for the file is only performed the first time a file is requested. Furthermore, performance is enhanced by the calling of require() rather than require_once(), and by calling it with the absolute path to the file requested.

Step 2: Anticipatory Autoloading

So, you’ve implemented path caching in your autoloaders, but your users are still complaining that their annoying popups aren’t loading fast enough? Let’s figure out how to keep the convenience, but completely eliminate calls to autoload.

Much like the include_path, the specific classes loaded for each section of your application are unlikely to change from request to request. If we can find a unique identifier that will partition our apps into segments (such as the URL), we can simply have our autoloader remember what files it had to load from the last request, and skip the autoloader overhead all together.

Let’s start with our previous example, and add some enhancements:

  1. class CodeFeast_Loader
  2. {
  3.   protected static $include_paths = array();
  4.   protected static $include_paths_loaded = false;
  5.   protected static $file_location_cache = array();
  6.   protected static $app_segment_cache = array();
  7.   protected static $app_segment = ;
  8.   public static function autoload($class)
  9.   {
  10.    //check if this class is cached
  11.    if(isset(self::$file_location_cache) )
  12.    {
  13.    //It was cached.  No need to search, just load
  14.     require(self::$file_location_cache);
  15.    }
  16.    //It wasn’t cached.  We’ve got to hunt
  17.    else
  18.    {
  19.     //did we nab the include_path yet?
  20.     if(!$include_paths_loaded)
  21.     {
  22.      self::populateIncludePaths();
  23.     }
  24.     //Look in each path for the class file
  25.     foreach(self::$include_paths as $path)
  26.     {
  27.      if(file_exists($path . "$class.php") )
  28.      {
  29.       //We found our file.  Add it to the cache
  30.       self::$file_location_cache[$class] = $path . "$class.php";
  31.       //And load it
  32.       require(self::$file_location_cache[$class]);
  33.       //And add it to the segment cache
  34.       self::$app_segment_cache[self::$app_segment] [] =
  35.       self::$file_location_cache[$class];
  36.      }
  37.     }
  38.    }
  39.   }
  40.  //grab the include_path from php.ini
  41.   public static function populateIncludePaths()
  42.   {
  43.    self::$include_paths = explode(PATH_SEPARATOR,
  44.    ini_get(‘include_path’);
  45.    self::$include_paths_loaded = true;
  46.   }
  47.   //load the cache from ‘tmp’
  48.   public static function openCache()
  49.   {
  50.    $file_location_cache =
  51.     file_get_contents(‘tmp/file_location_cache.bin’);
  52.    self::$file_location_cache =
  53.     unserialize($file_location_cache);
  54.  
  55.    $app_segment_cache =
  56.     file_get_contents(‘tmp/app_segment_cache.bin’);
  57.    self::$app_segment_cache =
  58.     unserialze($app_segment_cache);
  59.   }
  60.   //persist the cache to dir ‘tmp’
  61.   public static function saveCache()
  62.   {
  63.    $file_location_cache =
  64.     serialize(self::$file_location_cache);
  65.    file_put_contents(‘tmp/file_location_cache.bin’,
  66.     $file_location_cache);
  67.  
  68.    $app_segment_cache =
  69.     serialize(self::$app_segment_cache);
  70.    file_put_contents(‘tmp/app_segment_cache.bin’);
  71.   }
  72.   //init the autoloader
  73.   //We add a method to identify which part of the app we’re in
  74.   //This could be a URL, or a controller action in MVC
  75.   public static function start($app_segment)
  76.   {
  77.    self::$app_segment = $app_segment;
  78.    //Register the autoloader
  79.    spl_autoload_register(
  80.     array(‘CodeFeast_Loader’, ‘autoload’)
  81.    );
  82.  
  83.    //Restore the cache
  84.    self::openCache();
  85.  
  86.    //Load every cached autoload for this segment as a simple require
  87.    if(is_array(self::$app_segment_cache[$app_segment]) )
  88.    {
  89.     foreach(self::$app_segment_cache[$app_segment] as $include)
  90.     {
  91.      require($include);
  92.     }
  93.    }
  94.   }
  95.  }
  96.  
  97. //We’ll start the autoloader using the URL as the app segment
  98. CodeFeast_Loader::start($_SERVER[‘REQUEST_URI’]);
  99.  
  100. //Code for your application goes here
  101.  
  102. //Call the saveCache() method to persist the cache across requests
  103. CodeFeast_Loader::saveCache();

And now we’ve rid ourselves of autoload’s performance overhead without sacrificing any of its convenience.

This class has little effect on the initial, uncached request to an app segment, but really shines on subsequent requests.

Since this class remembers what it autoloaded, it can eliminate itself all together and bring what it will need in as simple includes. Best of all, it still has autoload functionality in the unlikely event that the required classes change between requests.

Since we’ve chosen to segment our app by URL, I’ll use “index.php” as an example. You could also easily segment by module, controller, or action.
When the first request is made for the location “index.php”:

  1. The autoloader is woken up and restores its cache files
  2. Every class that is needed for index.php will be autoloaded as usual
  3. The autoloader will remember everything it had to autoload for “index.php”

Now, when the second request is made for this same location, this happens:

  1. The autoloader is woken up and restores its cache files
  2. Every class that it had to autoload last time is immediately included
  3. The people rejoice

Security Note

Do not store the autoloader cache files in /tmp, or any other world writable area. The last thing any of us needs is some little weenus having the ability to arbitrarily load include files on our server. Also, you really should perform some sanity checks on the files you’re including in your autoloaders.

Conclusion

I hope these crude classes effectively illustrate that you needn’t suffer autoloader overhead with every request. I should also note that I don’t advocate the use of steroids. Sure, you look great on the beach, but they shrink your giblets.

9 Responses to “PHP5 Autoloading on Steroids”

  1. Dave Marshall Says:

    Great post, thanks alot.

    I think lines 10 to 15 should be…

    //check if this class is cached
    if(isset(self::$file_location_cache[$class]) )
    {
    //It was cached. No need to search, just load
    require(self::$file_location_cache[$class]);
    }

  2. tim.ariyeh Says:

    Good call, Dave. I got it straightened out. The last thing this topic needs is typos in the code examples.

  3. chris Says:

    great idea, do you think its possible and makes sense to reduce the

    foreach(self::$app_segment_cache[$app_segment] as $include)
    {
    require($include);
    }

    down to just ONE require. I mean to let the cache class build ONE php file to include?

  4. tim.ariyeh Says:

    Chris:

    That’s a good point. I left it out of the article code because I didn’t want the classes to overload anyone who was new to the topic, but I think it is probably the next step for someone trying to shave every last microsecond of request time.

  5. Nathan Says:

    Sure would be interested to see some hard data on whether this is actually faster or not.

  6. Dominik Says:

    @chris & @tim:
    Merging all include files into one is a thing you will want to do for high traffic websites. However you have to make sure that those files you merge don’t get included somewhere else “manually”. If one file in the application decides to “require_once” a file that is also available in the merged include, it will result in an error, because the - for example - class in there is already defined, however the file to include hasn’t been loaded previously.

    Obviously that shouldn’t stop you from doing it altogether, but you should be aware of that.

    @tim:
    What crossed my mind when I read your code: Why do you start your method comments with “A method to”? It should be fairly obvious that what follows is a method. The addinional three (redundand) words just slow you down when trying to figure out what the method below does. ;)

  7. tim.ariyeh Says:

    Dominik:
    My production version of this uses a class derived from Zend_Loader and Zend_Cache. It’s typically the only way I ever load files, so I don’t usually have to worry about redundant includes. However, with the “compressed method”, you can use require_once() and not have the zealots on your back for calling the slower require statement, since it’s only getting called once for a big single file.

    My production version also uses JavaDoc style comments, and are worded a little better. :) As a consultant, I’ve experienced a little bit of a “blank stare” situation in the past when trying to explain it, so I guess I went a little nuts trying to simplify the example. I’ve removed the redundancy.

  8. tim.ariyeh Says:

    Nathan:

    I’m usually one of the guys at the forefront pining for benchmark results as well. In this scenario, benchmarks would be particularly deceptive, because the performance benefits of using this method will vary wildly depending on project size, and something like this wouldn’t even be worth implementing for a small app.

    As an example, I have a moderately complex application on my development machine. I’m able to reduce its average response time by 1/10 of a second by switching from conventional autoloaders to this one. What does that mean to you, however?

    In a nutshell, everyone knows autoload is slower, so I set out to eliminate it. This class can obviously avoid calls to autoload. It’s less designed to address “how much faster can I make the app”, and more designed to address “how can I avoid using autoload?”.

    Additionally, I’m not exactly a technical evangelist, and I’ll sleep fine at night if this method of autoloading doesn’t sweep the world like wildfire.

  9. Ad Server Says:

    It’s an interesting idea and after profiling out growing app, _autload() becomes one of the most frequent calls and is the 2nd time-consuming operations behind SQL queries. Prefetching files based on URL is good but not a fool-proof method.

Leave a Reply