Biz & IT —

Facebook looks to fix PHP performance with HipHop virtual machine

Based on the open-source PHP-to-C++ translator used to create nearly all of …

Facebook looks to fix PHP performance with HipHop virtual machine

Look at the URL of most pages on Facebook, and you'll see a ".php" in there somewhere. That's because Facebook has leaned heavily on the PHP scripting language to develop the Web-facing parts of the site. PHP's popularity and simplicity made it easy for the company's developers to quickly build new features. But PHP's (lack of) performance makes scaling Facebook's site to handle hundreds of billions of page views a month problematic, so Facebook has made big investments in making it leaner and faster. The latest product of those efforts is the HipHop VM (HHVM), a PHP virtual machine that significantly boosts performance of dynamic pages. And Facebook is sharing it with the world as open-source.

Facebook's initial PHP performance efforts had been focused on tuning the Zend Engine—contributing fixes and patches to Zend, and writing C++ based PHP extensions to offload the heavy lifting of application logic. But as Facebook senior engineer Haiping Zhao said in a post to Facebook's developer blog last year, those efforts required splitting up development resources and investing time in mastering the Zend APIs for C++. Facebook wanted to be able to keep as many engineers working in PHP as possible, and the company wasn't seeing the kind of performance boosts that developers were hoping for.

So the Facebook team started to look for other options outside of Zend. In 2010, the company solidified those efforts in the form of HPHP, or "HipHop," an interpreter that allows PHP script to be converted first into an abstract syntax tree (AST) which maps out the code so that it can be analyzed, and then into optimized C++ code. While converting PHP to a lower-level language wasn't a new concept—Roadsend, for example, converts PHP to C code and compiles it—converting to C++ allowed Facebook to better integrate the custom back-end services developers had already built in C++ more easily. HipHop has nearly doubled the speed of Facebook's code, and in many cases cut the utilization level of the company's servers by half. And it's yielded similar benefits for other PHP products, including doubling the performance of WordPress in benchmark tests.

But developing in HipHop has its issues. One of them is that the Facebook could no longer develop and test code using standard PHP interpreters, because HipHop doesn't fully implement every aspect of PHP (HipHop supports "most features of PHP version 5.3"). There are also some specific idiosyncrasies to the way what is supported in PHP has been implemented in HipHop—and many of them haven't been documented well, or at all. So using a standard PHP environment for debugging for HipHop is almost pointless—it won't catch all the potential errors that can occur when the script is converted to C++ code. "The old PHP interpreter can no longer run our production code," Facebook software engineer Minghui Yang said in a blog post in October," and Facebook developers need to see the result of changes they make immediately."

To help make debugging easier, Facebook's engineers developed their own PHP interpreter, HPHPi, that closely matches how PHP code will behave when converted and compiled. Unfortunately, HPHPi is slow—and painfully so, despite efforts to improve it. Former Facebook software engineer Evan Priestly said in a post on Quora that HPHPi is "roughly twice as slow as PHP." And while HPHPi uses an AST to analyze PHP code, it's not implemented the same as in HipHop's C++ generator, creating potential for code to work well in development but then fail when compiled.

Facebook has also run up against some of the limitations of converting from a dynamic language that is interpreted as it is run to essentially a static, compiled language. Things like variables, data types and functions work differently in PHP because they can be assigned at the time the script is run rather than having to be explicitly coded, so there are elements of Facebook's applications that don't work very well when compiled statically.

That's where the HHVM comes in. It uses the same AST implementation as the HipHop compiler, but generates a bytecode instead of C++. Unlike Java and Microsoft's C#, the HipHop doesn't do "just-in-time" compilation of code method by method—it uses trace-based translation, analyzing each loop in the script, in a similar fashion to that used by Mozilla's TraceMonkey native code compiler for JavaScript. The translated bytecode is then put through a run-time interpreter.

While the HHVM is still a work in progress—about 90 percent done so far, according to Facebook software engineer Jason Evans—the performance of the HHVM interpreter is already 60 percent faster than that of the HPHPi interpreter—fast enough that Facebook is shifting to the vm for all development. And while it's still less than a quarter the speed of compiled code, Facebook is looking to eventually run all of its production code on the HipHop VM, eliminating some of the problems related to the conversion to a static language and making it easier for Facebook to maintain the code. Meanwhile, the HHVM code—like Facebook's previous HipHop work—has been open-sourced under the PHP and Zend licenses, and publicly published on GitHub.

Listing image by Photograph by Scott Beale

Channel Ars Technica