How Badoo saved one million dollars switching to PHP7

netkiller old

发布于 2018-03-05 16:03:31

1.1K0

发布于 2018-03-05 16:03:31

文章被收录于专栏：Netkiller

How Badoo saved one million dollars switching to PHP7

By Badoo on 14 Mar 2016 - 9 Comments

Introduction

We did it! Hundreds of our application servers are now running on PHP7 and doing just fine. By all accounts, ours is only the second project of this scale (after Etsy) to switch to PHP7. During the process of switching over we found a couple bugs in the PHP7 bytecode cache system, but thankfully it’s all fixed now. Now we’re excited to share our good news with the whole PHP community: PHP7 is completely ready for production, stable, significantly reduces memory consumption, and improves performance dramatically.

In this article, we’ll discuss the process of switching over to PHP7 in detail, explaining what difficulties we encountered, how we dealt with them, and what the final results were. But first let’s step back a bit and look at some of the broader issues:

The idea that databases are a bottleneck in web-projects is an all-too-common misconception. A well designed system is balanced: when the input load increases, all parts of the system take the hit. Likewise, when a certain threshold is reached, all components – not just the hard disk database, but the processor and the network part – are hit. Given this reality, the processing power of the application cluster is arguably the most important factor. In many projects, this cluster is made up of hundreds or even thousands of servers, which is why taking the time to adjust the app cluster processing load more than justifies itself from the economic standpoint (by a million dollars in our case).

In PHP web apps, the processor consumes as much as any dynamic high-level language – a lot. But PHP developers have faced a particular obstacle (one that has made them the victims of vicious trolling from various communities): the absence of JIT or, at the very least, a generator of compilable texts in languages like C/C++. The inability of the PHP community to supply a similar solution within the frame of the core project fostered a suboptimal tendency: the main players started to slap together their own solutions. This is how HHVM was born at Facebook, KPHP at VKontakte, and maybe some other similar hacks. Thankfully, in 2015, PHP started to “grow up” with the release of PHP7. Though there is still no JIT, it’s hard to overestimate how significant these changes in the “engine” are. Now, even without JIT, PHP7 holds its own against HHVM (e.g. Benchmarks from the LightSpeed blogor PHP devs benchmarks). The new PHP7 architecture will even simplify the addition of JIT in the future.

Our “platform” developers at Badoo have paid careful attention to every hack to come out in recent years, including the HHVM pilot project, but we decided to wait for PHP7’s arrival given how promising it was. Now we’ve launched Badoo on PHP7! With over three million lines of PHP code and 60,000 tests, this project took on epic proportions. Keep reading to find out how we handled these challenges, came up with a new PHP app testing framework (which, by the way, is already open source), and saved a million bucks along the way.

Experimenting with HHVM

Before switching over to PHP7, we spent some time looking for other ways to optimize our backend. The first step was, of course, to play around with HHVM.

Having spent a few weeks experimenting, we got quite respectable results: after warming up JIT on our framework, we saw triple digit gains in speed and CPU use.

On the other hand, HHVM proved to have some serious drawbacks:

Deploying is difficult and slow. During deploy, you have to warm up the JIT-cache. While the machine is warming up, it shouldn’t be loaded down with production traffic, because everything goes pretty slowly. HHVM team also doesn’t recommend warming up parallel requests. By the way, the warm-up phase of a big cluster operation doesn’t go quickly. Additionally, for big clusters consisting of a few hundred machines, you have to learn how to deploy in batches. Thus the architecture and deploy procedure involved is substantial, and it’s difficult to tell how much time it will take ahead of time. For us, it’s important for deploy to be as simple and fast as possible. Our developer culture prides itself on putting out two planned releases a day and being able to roll out many hot fixes.
Inconvenient testing. We rely heavily on the runkit extension, which wasn’t available in HHVM. A bit later, we’ll go into more detail about runkit, but suffice it to say, it’s an extension that lets you change the behavior of variables, classes, methods, functions, practically whatever you want on the fly. This is accomplished via an integration that gets to the very “guts” of PHP. The HHVM engine bares only a faint resemblance to PHP’s, however, so their respective “guts” are quite different. Due to the extension’s particular features, implementing runkit independently on top of HHVM is insanely difficult and we had to rewrite tens of thousands of tests in order to be sure that HHVM was working correctly with our code. This just didn’t seem worthwhile. To be fair, we would later encounter this same problem with all other options at our disposal, and we still had to redo a lot of things including getting rid of runkit during the switch over to PHP7. But more about that later.
Compatibility. The main issues are incomplete compatibility with PHP5.5 (see:https://github.com/facebook/hhvm/blob/master/hphp/doc/inconsistencies,https://github.com/facebook/hhvm/issues?labels=php5+incompatibility&state=open) and incompatibility with existing extensions (of which we have dozens). Both of these incompatibilities are a result of an obvious drawback of the project: HHVM is not developed by the larger community, but rather within a division of Facebook. In situations like this, it’s easier for companies to change their internal rules and standards without referencing the community and volumes of code contained therein. In other words, they cut themselves off and solve the problem using their own resources. Therefore, in order to handle tasks of similar volume, a company needs to have Facebook-like resources to devote to both the initial implementation as well as continuing support. This proposition is both risky and potentially expensive, so we decided against it.
Potential. Even though Facebook is a huge company with numerous top-notch programmers, we doubted that their HHVM developers would prove more powerful than the entire PHP-community. We reckoned that as soon as something similar to HHVM appeared for PHP, the former would start to slowly fade out of use.

So we patiently awaited PHP7.

The switch to the new version of the interpreter was both an important and difficult process, and we prepared for it by putting together a precise plan. This plan consisted of three stages: - Changing the PHP build/deploy infrastructure and adapting the mass of extensions we’d already written - Changing the infrastructure and testing environments - Changing the PHP app code

We’ll get into the details of all these stages later.

Changes to the engine and extensions

At Badoo, we have our own actively supported and updated PHP branch, and we started switching over to PHP7 even before its official release, so we had to regularly rebase PHP7 upstream in our tree in order for it to update with every release candidate. All patches and customizations that we use in our everyday work also had to be ported between versions and work correctly.

We automated the process of downloading and building dependencies, extensions and the PHP tree for versions 5.5 and 7.0. This not only simplified our current work, but also bodes well for the future: when version 7.1 comes out, everything will be in place.

As mentioned, we also had to turn our attention to extensions. We support over 40 extensions, more than half of which are open source with our reworks.

In order to switch them over as quickly as possible, we decided to launch two parallel processes. The first involved individually rewriting the most critical extensions: the blitz template engine, data cache in shared memory/APCu, pinba statistics collector, and a few other custom extensions for internal services (in total, we used our forces to redo about 20 extensions).

The second involved actively ridding ourselves of all extensions that are only used in non-critical parts of the infrastructure in order to unclutter things as much as possible. We were easily able to get rid of 11 extensions, which is not an insignificant figure!

Additionally, we started to actively discuss PHP7 compatibility with those who maintain the main open extensions (special thanks to xdebug developer Derick Rethans).

We’ll go into more detail regarding the technical details of porting extensions to PHP7 a bit later.

Developers made a lot of changes to internal APIs in PHP7, which meant we had to alter a lot of extension code.

Here are the most important changes:

zval * -> zval. In earlier versions, the zval structure was always allocated for a new variable, but now a stack structure is used.
char * -> zend_string. Aggressive string caching in the PHP engine is used in version 7. For this reason, with the new engine there is a complete switch from regular strings to the zend_string structure where a string is stored along with its length.
Changes in array API. Now zend_string is used as a key and the array implementation substitutes a double linked list with an ordinary array that is highlighted by one block instead of a lot of smaller ones.

All this makes it possible to radically reduce the number of small memory allocations and, as a result, speed up the PHP engine by double digit percentage points.

We should note that all these changes made it necessary to at least alter all extensions (if not rewrite them completely). Though we could rely on the authors of built-in extensions to make the necessary changes, we of course were responsible for altering our own, and the amount of work was substantial. Due to changes to internal APIs, it was easier just to rewrite some sections of code.

Unfortunately, introducing new structures using garbage collection along with speeding up the code execution made the engine itself more complex and it became harder to locate problems. One such problem concerned OpCache. During cache flush, the cached file’s bytecode breaks down at the moment when it could be used in a different process, so the whole thing falls apart. Here’s how it looks from the outside: (zend_string) in function names or as a constant suddenly breaks down and garbage appears in its place.

Given that we use a lot of in-house extensions, many of which deal particularly with strings, we suspected that the problem was with how strings were used in them. We wrote a lot of tests and conducted plenty of experiments, but didn’t get the results we expected. Finally, we asked for help from the main PHP engine developer, Dmitri Stogov.

One of his first questions was “Did you clear the cache?” We explained that we had, in fact, cleared the cache every time. At that point, we realized that the problem was not on our end, but with OpCache. We quickly reproduced the case, which helped us to replay and fix the problem within a few days. Without this fix that came out in the 7.0.4 version, it wouldn’t have been possible to put PHP7 into stable production.

Changes to testing infrastructure

We take special pride in the testing we do at Badoo. We deploy server PHP code to production two times a day, and every deploy contains 20-50 tasks (we use feature branches in git and automated builds with tight JIRA integrations). Given this schedule and task volume, there’s no way we could go without autotests. Currently, we have around 60 thousand unit tests with about 50% coverage, which run for an average of 2-3 minutes in the cloud (see ourarticle for more). In addition to unit tests, we use higher-level autotests, integration and system tests, selenium tests for web pages, and calabash tests for mobile apps. Taken as a whole, this allows us to quickly reach conclusions about the quality of each concrete version of code and apply the appropriate solution.

Switching to the new version of interpreter was a major change fraught with potential problems, so it was especially important that all tests worked. In order to clarify exactly what we did and how we managed to do it, let’s take a look at how test development has evolved over the years at Badoo.

Often, people starting to think about implementing product testing (or, in some cases, having started implementation already) discover that their code is “not ready for testing” during the experimentation process. For this reason, in most cases it’s important for the developer to keep in mind that his code should be testable while he’s writing it. The architecture should allow unit tests to replace calls and external dependency objects in order to isolate the code being tested from external conditions. Of course, it goes without saying that this is a much-hated requirement and many programmers take a stand against writing “testable” code out of principle. They feel that these restrictions flies in the face of “good code” and often don’t pay off. And you can imagine the sheer volume of code that’s not written “by the rules”, and results in testing being delayed “for a better time” or experimenters trying to satisfy themselves by running small tests that only cover what can be covered (which basically means the tests don’t yield the expected results).

I’m not trying to say that our company is an exception; we also didn’t implement testing right from the start of our project. There were several lines of code that worked fine in production and brought in cash, so it would have been stupid to rewrite them just to run tests (as recommended in the literature). That would take too long and be too expensive.

Fortunately, we already had an excellent tool that allowed us to solve the majority of our problems with “untestable code” - runkit. While the script is running, this PHP extension lets you change; delete; and add methods, classes, and functions used in the program. It also has many other functions, but we didn’t use them. This tool was developed and supported for many years, from 2005 to 2008 by Sara Goleman (who now works at Facebook and, interestingly enough, on HHVM). Beginning in 2008 and continuing through the present, it has been maintained by Dmitri Zenovich (who headed the testing division at Begun and Mail.ru). We’ve also done our bit to contribute to the project.

On its own, runkit is a very dangerous extension. It lets you change constants, functions, and classes while the script that uses them is running. In essence, it’s like a tool that let’s you rebuild a plane during the flight. Runkit gets right to the “guts” of PHP on the fly, but one mistake or deficiency makes everything go up in flames and either the PHP fails or you have to spend a lot of time searching for memory leaks or other low-level debugging. Nonetheless, this tool is essential for our testing: implementing project testing without having to do major rewrites can only be done by changing the code on the fly.

But runkit turned out to be a big problem during the switch to PHP7 because it didn’t support the new version. We could have sponsored the development of a new version, but, looking at the long-term perspective, this didn’t seem like the most reliable path to pursue. So we looked at a few other options.

One of the most promising solutions was to shift from runkit to uopz. The latter is also a PHP extension with similar functionality that launched in 2014. Our colleagues at Wamba suggested uopz, focusing on its impressive speed. The maintainer of the uopz project, by the way, is Joe Watkins (First Beat Media, UK). Unfortunately, however, switching all our tests to uopz didn’t work out. In some places there were fatal errors, in others - segfaults. We created a few reports but there was no movement on them, unfortunately (e.g.https://github.com/krakjoe/uopz/issues/18). Trying to deal with this situation by rewriting tests would have been very expensive, and more issues could very well have emerged even if we did.

Given that we had to rewrite a lot of code no matter what, and were dependent on external projects like runkit or uopz regardless of how problematic they were, we came to the obvious conclusion that we should rewrite our code to be as independent as possible. We also pledged to do everything we could to avoid similar problems in the future, even if we ended up switching to HHVM or any similar product. This is how we arrived at our own framework.

The system got the name “SoftMocks”, with “soft” highlighting the fact that the system works on clean PHP without the use of extensions. The project is open source and is available in the form of an add-on library. SoftMocks is not tied up with the particulars of PHP engine implementation and works by rewriting code “on the fly”, analogously to the Go AOP! framework.

Tests in our code primarily use the following:

Implementation override of one of the class methods
Function execution result override
Changing the value of global constants or class constants
Adding a method to a class

All these things are implemented successfully using runkit. Rewriting code makes all this possible with some reservations.

Though we don’t have space to go into much detail about SoftMocks in this article, we plan on devoting a separate article to this topic in the future. Here we’ll hit some of the main points:

Custom code is connected through the rewrite wrapper function. Then all include operators are automatically overridden as wrappers.
Checks for existing overrides are added inside every custom method definition. If they exist, then the corresponding code is executed. Direct function calls are replaced by calls through the wrapper; this lets us catch both built-in and custom functions.
Calls to the wrapper dynamically override access to constants in the code.
SoftMocks works with Nikita Popov’s PHP-Parser: This library isn’t very fast (parsing is about 15 times slower than token_get_all), but the interface lets you bypass the parse tree and includes a convenient API for handling syntactic constructions of indeterminate difficulty.

Now to get back to the main point of this article: the switch to PHP7. After switching the project over to SoftMocks, we still had about 1000 tests that we had to fix manually. You could say that this wasn’t a bad result, given that we started with 60,000 tests. By comparison with runkit, test run speeds didn’t decrease, so there are no performance issues with SoftMocks. To be fair, we should note that uopz is supposed to work significantly faster.

字数有限制，原文： https://techblog.badoo.com/blog/2016/03/14/how-badoo-saved-one-million-dollars-switching-to-php7/

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2016-03-15，如有侵权请联系 cloudcommunity@tencent.com 删除

php

本文分享自 Netkiller 微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

php

登录后参与评论

0 条评论

热度

How Badoo saved one million dollars switching to PHP7

How Badoo saved one million dollars switching to PHP7

How Badoo saved one million dollars switching to PHP7

Introduction

Experimenting with HHVM

Changes to the engine and extensions

Changes to testing infrastructure

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐