By Badoo on 14 Mar 2016 - 9 Comments
We did it! Hundreds of our application servers are now running on PHP7 and doing just fine. By all accounts, ours is only the second project of this scale (after Etsy) to switch to PHP7. During the process of switching over we found a couple bugs in the PHP7 bytecode cache system, but thankfully it’s all fixed now. Now we’re excited to share our good news with the whole PHP community: PHP7 is completely ready for production, stable, significantly reduces memory consumption, and improves performance dramatically.
In this article, we’ll discuss the process of switching over to PHP7 in detail, explaining what difficulties we encountered, how we dealt with them, and what the final results were. But first let’s step back a bit and look at some of the broader issues:
The idea that databases are a bottleneck in web-projects is an all-too-common misconception. A well designed system is balanced: when the input load increases, all parts of the system take the hit. Likewise, when a certain threshold is reached, all components – not just the hard disk database, but the processor and the network part – are hit. Given this reality, the processing power of the application cluster is arguably the most important factor. In many projects, this cluster is made up of hundreds or even thousands of servers, which is why taking the time to adjust the app cluster processing load more than justifies itself from the economic standpoint (by a million dollars in our case).
In PHP web apps, the processor consumes as much as any dynamic high-level language – a lot. But PHP developers have faced a particular obstacle (one that has made them the victims of vicious trolling from various communities): the absence of JIT or, at the very least, a generator of compilable texts in languages like C/C++. The inability of the PHP community to supply a similar solution within the frame of the core project fostered a suboptimal tendency: the main players started to slap together their own solutions. This is how HHVM was born at Facebook, KPHP at VKontakte, and maybe some other similar hacks. Thankfully, in 2015, PHP started to “grow up” with the release of PHP7. Though there is still no JIT, it’s hard to overestimate how significant these changes in the “engine” are. Now, even without JIT, PHP7 holds its own against HHVM (e.g. Benchmarks from the LightSpeed blogor PHP devs benchmarks). The new PHP7 architecture will even simplify the addition of JIT in the future.
Our “platform” developers at Badoo have paid careful attention to every hack to come out in recent years, including the HHVM pilot project, but we decided to wait for PHP7’s arrival given how promising it was. Now we’ve launched Badoo on PHP7! With over three million lines of PHP code and 60,000 tests, this project took on epic proportions. Keep reading to find out how we handled these challenges, came up with a new PHP app testing framework (which, by the way, is already open source), and saved a million bucks along the way.
Before switching over to PHP7, we spent some time looking for other ways to optimize our backend. The first step was, of course, to play around with HHVM.
Having spent a few weeks experimenting, we got quite respectable results: after warming up JIT on our framework, we saw triple digit gains in speed and CPU use.
On the other hand, HHVM proved to have some serious drawbacks:
So we patiently awaited PHP7.
The switch to the new version of the interpreter was both an important and difficult process, and we prepared for it by putting together a precise plan. This plan consisted of three stages: - Changing the PHP build/deploy infrastructure and adapting the mass of extensions we’d already written - Changing the infrastructure and testing environments - Changing the PHP app code
We’ll get into the details of all these stages later.
At Badoo, we have our own actively supported and updated PHP branch, and we started switching over to PHP7 even before its official release, so we had to regularly rebase PHP7 upstream in our tree in order for it to update with every release candidate. All patches and customizations that we use in our everyday work also had to be ported between versions and work correctly.
We automated the process of downloading and building dependencies, extensions and the PHP tree for versions 5.5 and 7.0. This not only simplified our current work, but also bodes well for the future: when version 7.1 comes out, everything will be in place.
As mentioned, we also had to turn our attention to extensions. We support over 40 extensions, more than half of which are open source with our reworks.
In order to switch them over as quickly as possible, we decided to launch two parallel processes. The first involved individually rewriting the most critical extensions: the blitz template engine, data cache in shared memory/APCu, pinba statistics collector, and a few other custom extensions for internal services (in total, we used our forces to redo about 20 extensions).
The second involved actively ridding ourselves of all extensions that are only used in non-critical parts of the infrastructure in order to unclutter things as much as possible. We were easily able to get rid of 11 extensions, which is not an insignificant figure!
Additionally, we started to actively discuss PHP7 compatibility with those who maintain the main open extensions (special thanks to xdebug developer Derick Rethans).
We’ll go into more detail regarding the technical details of porting extensions to PHP7 a bit later.
Developers made a lot of changes to internal APIs in PHP7, which meant we had to alter a lot of extension code.
Here are the most important changes:
All this makes it possible to radically reduce the number of small memory allocations and, as a result, speed up the PHP engine by double digit percentage points.
We should note that all these changes made it necessary to at least alter all extensions (if not rewrite them completely). Though we could rely on the authors of built-in extensions to make the necessary changes, we of course were responsible for altering our own, and the amount of work was substantial. Due to changes to internal APIs, it was easier just to rewrite some sections of code.
Unfortunately, introducing new structures using garbage collection along with speeding up the code execution made the engine itself more complex and it became harder to locate problems. One such problem concerned OpCache. During cache flush, the cached file’s bytecode breaks down at the moment when it could be used in a different process, so the whole thing falls apart. Here’s how it looks from the outside: (zend_string) in function names or as a constant suddenly breaks down and garbage appears in its place.
Given that we use a lot of in-house extensions, many of which deal particularly with strings, we suspected that the problem was with how strings were used in them. We wrote a lot of tests and conducted plenty of experiments, but didn’t get the results we expected. Finally, we asked for help from the main PHP engine developer, Dmitri Stogov.
One of his first questions was “Did you clear the cache?” We explained that we had, in fact, cleared the cache every time. At that point, we realized that the problem was not on our end, but with OpCache. We quickly reproduced the case, which helped us to replay and fix the problem within a few days. Without this fix that came out in the 7.0.4 version, it wouldn’t have been possible to put PHP7 into stable production.
We take special pride in the testing we do at Badoo. We deploy server PHP code to production two times a day, and every deploy contains 20-50 tasks (we use feature branches in git and automated builds with tight JIRA integrations). Given this schedule and task volume, there’s no way we could go without autotests. Currently, we have around 60 thousand unit tests with about 50% coverage, which run for an average of 2-3 minutes in the cloud (see ourarticle for more). In addition to unit tests, we use higher-level autotests, integration and system tests, selenium tests for web pages, and calabash tests for mobile apps. Taken as a whole, this allows us to quickly reach conclusions about the quality of each concrete version of code and apply the appropriate solution.
Switching to the new version of interpreter was a major change fraught with potential problems, so it was especially important that all tests worked. In order to clarify exactly what we did and how we managed to do it, let’s take a look at how test development has evolved over the years at Badoo.
Often, people starting to think about implementing product testing (or, in some cases, having started implementation already) discover that their code is “not ready for testing” during the experimentation process. For this reason, in most cases it’s important for the developer to keep in mind that his code should be testable while he’s writing it. The architecture should allow unit tests to replace calls and external dependency objects in order to isolate the code being tested from external conditions. Of course, it goes without saying that this is a much-hated requirement and many programmers take a stand against writing “testable” code out of principle. They feel that these restrictions flies in the face of “good code” and often don’t pay off. And you can imagine the sheer volume of code that’s not written “by the rules”, and results in testing being delayed “for a better time” or experimenters trying to satisfy themselves by running small tests that only cover what can be covered (which basically means the tests don’t yield the expected results).
I’m not trying to say that our company is an exception; we also didn’t implement testing right from the start of our project. There were several lines of code that worked fine in production and brought in cash, so it would have been stupid to rewrite them just to run tests (as recommended in the literature). That would take too long and be too expensive.
Fortunately, we already had an excellent tool that allowed us to solve the majority of our problems with “untestable code” - runkit. While the script is running, this PHP extension lets you change; delete; and add methods, classes, and functions used in the program. It also has many other functions, but we didn’t use them. This tool was developed and supported for many years, from 2005 to 2008 by Sara Goleman (who now works at Facebook and, interestingly enough, on HHVM). Beginning in 2008 and continuing through the present, it has been maintained by Dmitri Zenovich (who headed the testing division at Begun and Mail.ru). We’ve also done our bit to contribute to the project.
On its own, runkit is a very dangerous extension. It lets you change constants, functions, and classes while the script that uses them is running. In essence, it’s like a tool that let’s you rebuild a plane during the flight. Runkit gets right to the “guts” of PHP on the fly, but one mistake or deficiency makes everything go up in flames and either the PHP fails or you have to spend a lot of time searching for memory leaks or other low-level debugging. Nonetheless, this tool is essential for our testing: implementing project testing without having to do major rewrites can only be done by changing the code on the fly.
But runkit turned out to be a big problem during the switch to PHP7 because it didn’t support the new version. We could have sponsored the development of a new version, but, looking at the long-term perspective, this didn’t seem like the most reliable path to pursue. So we looked at a few other options.
One of the most promising solutions was to shift from runkit to uopz. The latter is also a PHP extension with similar functionality that launched in 2014. Our colleagues at Wamba suggested uopz, focusing on its impressive speed. The maintainer of the uopz project, by the way, is Joe Watkins (First Beat Media, UK). Unfortunately, however, switching all our tests to uopz didn’t work out. In some places there were fatal errors, in others - segfaults. We created a few reports but there was no movement on them, unfortunately (e.g.https://github.com/krakjoe/uopz/issues/18). Trying to deal with this situation by rewriting tests would have been very expensive, and more issues could very well have emerged even if we did.
Given that we had to rewrite a lot of code no matter what, and were dependent on external projects like runkit or uopz regardless of how problematic they were, we came to the obvious conclusion that we should rewrite our code to be as independent as possible. We also pledged to do everything we could to avoid similar problems in the future, even if we ended up switching to HHVM or any similar product. This is how we arrived at our own framework.
The system got the name “SoftMocks”, with “soft” highlighting the fact that the system works on clean PHP without the use of extensions. The project is open source and is available in the form of an add-on library. SoftMocks is not tied up with the particulars of PHP engine implementation and works by rewriting code “on the fly”, analogously to the Go AOP! framework.
Tests in our code primarily use the following:
All these things are implemented successfully using runkit. Rewriting code makes all this possible with some reservations.
Though we don’t have space to go into much detail about SoftMocks in this article, we plan on devoting a separate article to this topic in the future. Here we’ll hit some of the main points:
Now to get back to the main point of this article: the switch to PHP7. After switching the project over to SoftMocks, we still had about 1000 tests that we had to fix manually. You could say that this wasn’t a bad result, given that we started with 60,000 tests. By comparison with runkit, test run speeds didn’t decrease, so there are no performance issues with SoftMocks. To be fair, we should note that uopz is supposed to work significantly faster.
原文发布于微信公众号 - Netkiller（netkiller-ebook）