How Badoo saved one million dollars switching to PHP7

How Badoo saved one million dollars switching to PHP7

By Badoo on 14 Mar 2016 - 9 Comments

  • PHP

Introduction

We did it! Hundreds of our application servers are now running on PHP7 and doing just fine. By all accounts, ours is only the second project of this scale (after Etsy) to switch to PHP7. During the process of switching over we found a couple bugs in the PHP7 bytecode cache system, but thankfully it’s all fixed now. Now we’re excited to share our good news with the whole PHP community: PHP7 is completely ready for production, stable, significantly reduces memory consumption, and improves performance dramatically.

In this article, we’ll discuss the process of switching over to PHP7 in detail, explaining what difficulties we encountered, how we dealt with them, and what the final results were. But first let’s step back a bit and look at some of the broader issues:

The idea that databases are a bottleneck in web-projects is an all-too-common misconception. A well designed system is balanced: when the input load increases, all parts of the system take the hit. Likewise, when a certain threshold is reached, all components – not just the hard disk database, but the processor and the network part – are hit. Given this reality, the processing power of the application cluster is arguably the most important factor. In many projects, this cluster is made up of hundreds or even thousands of servers, which is why taking the time to adjust the app cluster processing load more than justifies itself from the economic standpoint (by a million dollars in our case).

In PHP web apps, the processor consumes as much as any dynamic high-level language – a lot. But PHP developers have faced a particular obstacle (one that has made them the victims of vicious trolling from various communities): the absence of JIT or, at the very least, a generator of compilable texts in languages like C/C++. The inability of the PHP community to supply a similar solution within the frame of the core project fostered a suboptimal tendency: the main players started to slap together their own solutions. This is how HHVM was born at Facebook, KPHP at VKontakte, and maybe some other similar hacks. Thankfully, in 2015, PHP started to “grow up” with the release of PHP7. Though there is still no JIT, it’s hard to overestimate how significant these changes in the “engine” are. Now, even without JIT, PHP7 holds its own against HHVM (e.g. Benchmarks from the LightSpeed blogor PHP devs benchmarks). The new PHP7 architecture will even simplify the addition of JIT in the future.

Our “platform” developers at Badoo have paid careful attention to every hack to come out in recent years, including the HHVM pilot project, but we decided to wait for PHP7’s arrival given how promising it was. Now we’ve launched Badoo on PHP7! With over three million lines of PHP code and 60,000 tests, this project took on epic proportions. Keep reading to find out how we handled these challenges, came up with a new PHP app testing framework (which, by the way, is already open source), and saved a million bucks along the way.

Experimenting with HHVM

Before switching over to PHP7, we spent some time looking for other ways to optimize our backend. The first step was, of course, to play around with HHVM.

Having spent a few weeks experimenting, we got quite respectable results: after warming up JIT on our framework, we saw triple digit gains in speed and CPU use.

On the other hand, HHVM proved to have some serious drawbacks:

  • Deploying is difficult and slow. During deploy, you have to warm up the JIT-cache. While the machine is warming up, it shouldn’t be loaded down with production traffic, because everything goes pretty slowly. HHVM team also doesn’t recommend warming up parallel requests. By the way, the warm-up phase of a big cluster operation doesn’t go quickly. Additionally, for big clusters consisting of a few hundred machines, you have to learn how to deploy in batches. Thus the architecture and deploy procedure involved is substantial, and it’s difficult to tell how much time it will take ahead of time. For us, it’s important for deploy to be as simple and fast as possible. Our developer culture prides itself on putting out two planned releases a day and being able to roll out many hot fixes.
  • Inconvenient testing. We rely heavily on the runkit extension, which wasn’t available in HHVM. A bit later, we’ll go into more detail about runkit, but suffice it to say, it’s an extension that lets you change the behavior of variables, classes, methods, functions, practically whatever you want on the fly. This is accomplished via an integration that gets to the very “guts” of PHP. The HHVM engine bares only a faint resemblance to PHP’s, however, so their respective “guts” are quite different. Due to the extension’s particular features, implementing runkit independently on top of HHVM is insanely difficult and we had to rewrite tens of thousands of tests in order to be sure that HHVM was working correctly with our code. This just didn’t seem worthwhile. To be fair, we would later encounter this same problem with all other options at our disposal, and we still had to redo a lot of things including getting rid of runkit during the switch over to PHP7. But more about that later.
  • Compatibility. The main issues are incomplete compatibility with PHP5.5 (see:https://github.com/facebook/hhvm/blob/master/hphp/doc/inconsistencies,https://github.com/facebook/hhvm/issues?labels=php5+incompatibility&state=open) and incompatibility with existing extensions (of which we have dozens). Both of these incompatibilities are a result of an obvious drawback of the project: HHVM is not developed by the larger community, but rather within a division of Facebook. In situations like this, it’s easier for companies to change their internal rules and standards without referencing the community and volumes of code contained therein. In other words, they cut themselves off and solve the problem using their own resources. Therefore, in order to handle tasks of similar volume, a company needs to have Facebook-like resources to devote to both the initial implementation as well as continuing support. This proposition is both risky and potentially expensive, so we decided against it.
  • Potential. Even though Facebook is a huge company with numerous top-notch programmers, we doubted that their HHVM developers would prove more powerful than the entire PHP-community. We reckoned that as soon as something similar to HHVM appeared for PHP, the former would start to slowly fade out of use.

So we patiently awaited PHP7.

The switch to the new version of the interpreter was both an important and difficult process, and we prepared for it by putting together a precise plan. This plan consisted of three stages: - Changing the PHP build/deploy infrastructure and adapting the mass of extensions we’d already written - Changing the infrastructure and testing environments - Changing the PHP app code

We’ll get into the details of all these stages later.

Changes to the engine and extensions

At Badoo, we have our own actively supported and updated PHP branch, and we started switching over to PHP7 even before its official release, so we had to regularly rebase PHP7 upstream in our tree in order for it to update with every release candidate. All patches and customizations that we use in our everyday work also had to be ported between versions and work correctly.

We automated the process of downloading and building dependencies, extensions and the PHP tree for versions 5.5 and 7.0. This not only simplified our current work, but also bodes well for the future: when version 7.1 comes out, everything will be in place.

As mentioned, we also had to turn our attention to extensions. We support over 40 extensions, more than half of which are open source with our reworks.

In order to switch them over as quickly as possible, we decided to launch two parallel processes. The first involved individually rewriting the most critical extensions: the blitz template engine, data cache in shared memory/APCu, pinba statistics collector, and a few other custom extensions for internal services (in total, we used our forces to redo about 20 extensions).

The second involved actively ridding ourselves of all extensions that are only used in non-critical parts of the infrastructure in order to unclutter things as much as possible. We were easily able to get rid of 11 extensions, which is not an insignificant figure!

Additionally, we started to actively discuss PHP7 compatibility with those who maintain the main open extensions (special thanks to xdebug developer Derick Rethans).

We’ll go into more detail regarding the technical details of porting extensions to PHP7 a bit later.

Developers made a lot of changes to internal APIs in PHP7, which meant we had to alter a lot of extension code.

Here are the most important changes:

  • zval * -> zval. In earlier versions, the zval structure was always allocated for a new variable, but now a stack structure is used.
  • char * -> zend_string. Aggressive string caching in the PHP engine is used in version 7. For this reason, with the new engine there is a complete switch from regular strings to the zend_string structure where a string is stored along with its length.
  • Changes in array API. Now zend_string is used as a key and the array implementation substitutes a double linked list with an ordinary array that is highlighted by one block instead of a lot of smaller ones.

All this makes it possible to radically reduce the number of small memory allocations and, as a result, speed up the PHP engine by double digit percentage points.

We should note that all these changes made it necessary to at least alter all extensions (if not rewrite them completely). Though we could rely on the authors of built-in extensions to make the necessary changes, we of course were responsible for altering our own, and the amount of work was substantial. Due to changes to internal APIs, it was easier just to rewrite some sections of code.

Unfortunately, introducing new structures using garbage collection along with speeding up the code execution made the engine itself more complex and it became harder to locate problems. One such problem concerned OpCache. During cache flush, the cached file’s bytecode breaks down at the moment when it could be used in a different process, so the whole thing falls apart. Here’s how it looks from the outside: (zend_string) in function names or as a constant suddenly breaks down and garbage appears in its place.

Given that we use a lot of in-house extensions, many of which deal particularly with strings, we suspected that the problem was with how strings were used in them. We wrote a lot of tests and conducted plenty of experiments, but didn’t get the results we expected. Finally, we asked for help from the main PHP engine developer, Dmitri Stogov.

One of his first questions was “Did you clear the cache?” We explained that we had, in fact, cleared the cache every time. At that point, we realized that the problem was not on our end, but with OpCache. We quickly reproduced the case, which helped us to replay and fix the problem within a few days. Without this fix that came out in the 7.0.4 version, it wouldn’t have been possible to put PHP7 into stable production.

Changes to testing infrastructure

We take special pride in the testing we do at Badoo. We deploy server PHP code to production two times a day, and every deploy contains 20-50 tasks (we use feature branches in git and automated builds with tight JIRA integrations). Given this schedule and task volume, there’s no way we could go without autotests. Currently, we have around 60 thousand unit tests with about 50% coverage, which run for an average of 2-3 minutes in the cloud (see ourarticle for more). In addition to unit tests, we use higher-level autotests, integration and system tests, selenium tests for web pages, and calabash tests for mobile apps. Taken as a whole, this allows us to quickly reach conclusions about the quality of each concrete version of code and apply the appropriate solution.

Switching to the new version of interpreter was a major change fraught with potential problems, so it was especially important that all tests worked. In order to clarify exactly what we did and how we managed to do it, let’s take a look at how test development has evolved over the years at Badoo.

Often, people starting to think about implementing product testing (or, in some cases, having started implementation already) discover that their code is “not ready for testing” during the experimentation process. For this reason, in most cases it’s important for the developer to keep in mind that his code should be testable while he’s writing it. The architecture should allow unit tests to replace calls and external dependency objects in order to isolate the code being tested from external conditions. Of course, it goes without saying that this is a much-hated requirement and many programmers take a stand against writing “testable” code out of principle. They feel that these restrictions flies in the face of “good code” and often don’t pay off. And you can imagine the sheer volume of code that’s not written “by the rules”, and results in testing being delayed “for a better time” or experimenters trying to satisfy themselves by running small tests that only cover what can be covered (which basically means the tests don’t yield the expected results).

I’m not trying to say that our company is an exception; we also didn’t implement testing right from the start of our project. There were several lines of code that worked fine in production and brought in cash, so it would have been stupid to rewrite them just to run tests (as recommended in the literature). That would take too long and be too expensive.

Fortunately, we already had an excellent tool that allowed us to solve the majority of our problems with “untestable code” - runkit. While the script is running, this PHP extension lets you change; delete; and add methods, classes, and functions used in the program. It also has many other functions, but we didn’t use them. This tool was developed and supported for many years, from 2005 to 2008 by Sara Goleman (who now works at Facebook and, interestingly enough, on HHVM). Beginning in 2008 and continuing through the present, it has been maintained by Dmitri Zenovich (who headed the testing division at Begun and Mail.ru). We’ve also done our bit to contribute to the project.

On its own, runkit is a very dangerous extension. It lets you change constants, functions, and classes while the script that uses them is running. In essence, it’s like a tool that let’s you rebuild a plane during the flight. Runkit gets right to the “guts” of PHP on the fly, but one mistake or deficiency makes everything go up in flames and either the PHP fails or you have to spend a lot of time searching for memory leaks or other low-level debugging. Nonetheless, this tool is essential for our testing: implementing project testing without having to do major rewrites can only be done by changing the code on the fly.

But runkit turned out to be a big problem during the switch to PHP7 because it didn’t support the new version. We could have sponsored the development of a new version, but, looking at the long-term perspective, this didn’t seem like the most reliable path to pursue. So we looked at a few other options.

One of the most promising solutions was to shift from runkit to uopz. The latter is also a PHP extension with similar functionality that launched in 2014. Our colleagues at Wamba suggested uopz, focusing on its impressive speed. The maintainer of the uopz project, by the way, is Joe Watkins (First Beat Media, UK). Unfortunately, however, switching all our tests to uopz didn’t work out. In some places there were fatal errors, in others - segfaults. We created a few reports but there was no movement on them, unfortunately (e.g.https://github.com/krakjoe/uopz/issues/18). Trying to deal with this situation by rewriting tests would have been very expensive, and more issues could very well have emerged even if we did.

Given that we had to rewrite a lot of code no matter what, and were dependent on external projects like runkit or uopz regardless of how problematic they were, we came to the obvious conclusion that we should rewrite our code to be as independent as possible. We also pledged to do everything we could to avoid similar problems in the future, even if we ended up switching to HHVM or any similar product. This is how we arrived at our own framework.

The system got the name “SoftMocks”, with “soft” highlighting the fact that the system works on clean PHP without the use of extensions. The project is open source and is available in the form of an add-on library. SoftMocks is not tied up with the particulars of PHP engine implementation and works by rewriting code “on the fly”, analogously to the Go AOP! framework.

Tests in our code primarily use the following:

  1. Implementation override of one of the class methods
  2. Function execution result override
  3. Changing the value of global constants or class constants
  4. Adding a method to a class

All these things are implemented successfully using runkit. Rewriting code makes all this possible with some reservations.

Though we don’t have space to go into much detail about SoftMocks in this article, we plan on devoting a separate article to this topic in the future. Here we’ll hit some of the main points:

  • Custom code is connected through the rewrite wrapper function. Then all include operators are automatically overridden as wrappers.
  • Checks for existing overrides are added inside every custom method definition. If they exist, then the corresponding code is executed. Direct function calls are replaced by calls through the wrapper; this lets us catch both built-in and custom functions.
  • Calls to the wrapper dynamically override access to constants in the code.
  • SoftMocks works with Nikita Popov’s PHP-Parser: This library isn’t very fast (parsing is about 15 times slower than token_get_all), but the interface lets you bypass the parse tree and includes a convenient API for handling syntactic constructions of indeterminate difficulty.

Now to get back to the main point of this article: the switch to PHP7. After switching the project over to SoftMocks, we still had about 1000 tests that we had to fix manually. You could say that this wasn’t a bad result, given that we started with 60,000 tests. By comparison with runkit, test run speeds didn’t decrease, so there are no performance issues with SoftMocks. To be fair, we should note that uopz is supposed to work significantly faster.

字数有限制,原文: https://techblog.badoo.com/blog/2016/03/14/how-badoo-saved-one-million-dollars-switching-to-php7/

原文发布于微信公众号 - Netkiller(netkiller-ebook)

原文发表时间:2016-03-15

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏技术小黑屋

Fix Missing Command Takes Too Long to Respond in Fedora

I am getting well on with Fedora now. However when I was a fresher to Fedora, I ...

861
来自专栏杨建荣的学习笔记

listener.ora,tnsnames.ora中一个空格的威力

最近几天被网络监听配置搞得焦头烂额,有时候配置没问题,有时候就出莫名其妙的问题,今天专门花时间总结了一下,希望对大家有所帮助。 listener.ora,tns...

3499
来自专栏Clive的技术分享

修改CentOS服务器时间为北京时间

3834
来自专栏MoeLove

Docker 实战和基础架构

Maybe you will see a few extra lines if your Docker install is brand new.

806
来自专栏云知识学习

kubernetes 基础集群排障

在排错过程中,kubectl 是最重要的工具,通常也是定位错误的起点。这里也列出一些常用的命令,在后续的各种排错过程中都会经常用到。

72512
来自专栏运维

linux文件树

以前有意找这方面的资料,今天突然发现在系统中就有 linux系统用man hier solaris用man  filesystem 其结果如下     ...

532
来自专栏CreateAMind

Gazebo 简介

These three steps will run Gazebo with a default world.

832
来自专栏康怀帅的专栏

Docker 相关概念总览

Docker 概念总览 Docker Engine Docker 引擎 Docker architecture Docker 架构 Docker daemon ...

3778
来自专栏ml

hdu---(1054)Strategic Game(最小覆盖边)

Strategic Game Time Limit: 20000/10000 MS (Java/Others)    Memory Limit: 65536/3...

2675
来自专栏张善友的专栏

Setting Up KeePass For Centos 6

This mini-howto describes how to set up KeePass on Centos 6. It requires buildin...

1848

扫码关注云+社区