NodeJS 各websocket框架性能分析

For a current project at WhoScored, I needed to learn JavaScript, Node.js and WebSocket channel, after seven years of writing web applications with Java and Spring framework. We wanted an application that can send data to thousands of concurrent users, and Node.js appeared to be the right way of doing it with its event-driven, non-blocking I/O model that can scale up easily.

Firstly, to learn JavaScript I started doing the tutorials in Code Academy and reading D. Crockford’s JavaScript: The Good Parts book. In parallel, I read J. R. Wilson’s Node.js the Right Way as well. It was a great read: by using some great modules like async, Q, ZMQ, Express, and methods like Promises and Generators, the book explains how to build scalable Node applications. It also covers sockets with Pub/Sub, Req/Res and Push/Pull patterns which shaped the WebSocket applications that I wrote.

System structure

After some trials, errors, and brainstorming sessions, the final structure looks like this:

A client connects for a specific key/resource, let’s say “matchesfeed/8/matchcentre”. WebSocket server’s responsibility is to return some JSON data for this key. If it doesn’t have data for this key in its memory, it requests it from “Fetcher” via a Push socket implemented with ZMQ library over a TCP connection. Fetcher receives this request, and adds it to a queue of jobs. Worker processes (implemented with Node.js clusters) check this queue constantly, get a job, and make an HTTP call to an API that has the actual JSON data for this key. When response is received by a worker, it passes the data to master process, which sends it to WebSocket server, and puts the job back into queue. When WebSocket server receives data, it publishes to clients who are connected for that key. All applications are open source, and you can find web socket server here, fetcher here and the data provider here. Node.js version used in all applications are 0.10.

einaros / ws (v 0.4)

Based on Heroku’s WebSocket demo, I began with einaros/ws implementation. It didn’t have as much as documentation that other frameworks had, but the development went smooth and I had something working in a short time.

Client beautifully connects via WebSockets

Then I wrote a load test to see the performance. The test basically connects clients every half a second, all requesting the same data. Data is 165KB, and is sent every second. With 120 concurrent clients, the application allocates around 210MB of RAM, and data sent appears to be around 35 MB/second. The test machine is a 3.4 Ghz Intel i7 (8 processors) with 16GB of RAM, running OSX v10.9.

I monitored the process with Strongloop’s agent, I find it pretty impressive.

At this point, the biggest drawback for me was it being a pure WebSocket implementation: if the client browser did not support WebSockets, it would not work. This made me try Socket.IO.

Socket.IO (v. 0.9)

Socket.IO seemed like the perfect solution, as it selects the most capable transport at runtime: it tries to connect to server via WebSockets first, if it doesn’t work, it downgrades by trying XHR-polling, if it doesn’t work, continues with flash sockets etc. Moreover, it would try to reconnect sockets if connection fails, and it supports rooms so that I could cluster connections based on what information they asked for, without any custom implementation. So I re-implemented the application by keeping everything the same but web socket framework.

Client connects to Socket.IO server

Indeed, when the server is down, Socket.IO tries to connect back with decreasing frequency.

And when the server is up, WebSocket connection is re-established:

Load test results are below, it is the same test I executed for ws/einaros implementation with the same data:

The data output looks like:

In WS, average load was 35 MBs

Performance limit

However, when I connect 140 clients, the memory usage starts to increment without the number of clients or the data size changing. Garbage Collector kicks in very frequently and gets more aggressive over time (sometimes a couple of times per second), eating up CPU time and desperately trying to free memory. Memory usage would reach ~1GB, then heartbeats would fail and clients would start to lose connection.

Heap size stops increasing when connections start dropping, and keeps stable as Socket.IO clients tries to connect back. Also notice the very low event loop latency.

It’s time to debug!

I used Mozilla’s memwatch module to detect memory leaks and take heap dumps to inspect the data. I found out that somehow 10MB Buffer objects would stack up constantly, you can find more details in my Stackoverflow Question. I couldn’t figure out why this would happen, but I think due to some cumulative CPU intensive operations, event loop response slows down and operations (sending messages) stack up in the queue. Being stuck, I decided to try out Sock.js.

I must admit my Sock.js adventure didn’t last long after discovering that Sock.js does not support connection query strings. This would mean that I’d need to amend my code so that after clients connect, they’d need to send a seperate message to pass the parameter(s), and this was a bit turn off. Next, I decided to give Engine.IO a try.

Engine.IO (v 0.7.12)

As far as I’ve seen, both Engine.IO and Socket.IO are mainly developed by Guillermo Rauch, and Engine.IO is a lower level library. Socket.IO v0.9 appears to be somewhat outdated and Engine.IO is the interim successor until Socket.IO v1.0 is released — which will use Engine.IO. One major difference is that Engine.IO always establishes a long-polling connection first, then tries to upgrade to better transports. This makes it resilient to firewalls and proxies, gives more predictable results and less frequent reconnecting. All other differences are very well listed on project’s readme. As you see, Engine.IO first started to connect via long-polling, then connected via WebSockets:

Notice the ‘transport’ parameter

However its client library does not try to reconnect when connection terminates.

Same load test with same values (160KB sent every second to 120 clients) looks like:

Performance limit

Let’s see what happens when we increase connection size. Engine.IO’s performance proved to be better for my application: I could load test with 230 concurrent connections, with (again) each one sent 165KB of data every second. Any number above 230 connections resulted in the application using massive amounts of data — up to 2GB (the same issue with Socket.IO implementation). Then connections would start dropping until 160 connections are remaining, and obviously this results in less resource usage. You can see this drop in the charts below. Remember in Engine.IO client implementation, clients don’t reconnect once they are disconnected.

Primus (v 1.5.0)

When I was about to end my research and development, I came across Primus. It is developed by Arnout Kazemier who is an active collabrator in some of the projects mentioned above. Primus actually lets you do what I have done all this time (changing WebSocket technologies for the business logic) dead easy: only with a single line of code. It has a similar syntax with all of the other frameworks, reconnects clients when connection terminates, and has lots of cool methods and structures — like rooms.

Primus connects

Here are the test results for einaros/ws library with Primus (160KB sent every second to 120 clients):

Compared to native implementation, it has a higher CPU usage and event loop latency, but a bit lower memory consumption. However, something unexpected happens after 10 minutes pass:

Event loop average drops to almost 0ms, but memory allocation starts to stack up, reaching GBs in minutes. This happened twice in two of my tries. I decided to leave this instead of digging deeper. (Please see Arnout’s comment about this issue on the right hand side)

As Primus is a wrapper for socket libraries, I decided to use a native library just to have things on a lower level.

Looking back: Performance limit of einaros/ws

At this point, I wanted to look back and see at what rate einaros/ws implementation (which I ditched as it doesn’t support other connection types) would crash.

By handling 410 concurrent connections, sending 165KB data to each every second, einaros/ws’s performance was far more superior than any other framework I tried. Compare it with Socket.IO’s max 140 connections and Engine.IO’s max 230 connections. einaros/ws would fail at 415 connections in the same way with other implementations: memory usage would stack up seemingly almost indefinitely (I waited until 8GB consumption before killing the process).

Final decision on the framework

In summary, these test results showed that Engine.IO looks like the better choice for my application. It supports different connection types apart from WebSockets, and has a significantly better memory usage compared to others.

The difference in memory seems like a good trade of for losing the functionality of clients not reconnecting in a connection failure. At least, until Socket.IO 1.0 is released.

Changing the flow based on limitations

The aim when starting this project was to have a system that can support thousands, even 10 thousand concurrent users. It seems, with 165KB of data sent every second, this won’t be possible. This made me decide to decrease the data size, and introduce a delta model, in which the application would send only the new JSON content instead of the whole object each time.

Test 1: The impact of data size

I ran the a load test that uses 5KB of data, sent to 4000 users every second. 5KB of JSON makes around 200 lines of data with reasonable string length. I’ve also modified the test so that 10 new clients connect every second. The data throughput is equivalent to 20MB sent per second.

We will take these values as a base and see what happens when we keep throughput same and change some variables.

Test 2: The impact of concurrent connection size

In this test, 2000 users connected to the server, and received 10KBs of data every second. Again in this case, 20MB is sent every second.

Compare these results with test one. As you see, doubling the data size and decreasing the connection size by half ended up in 20% lower memory consumption, 35% lower CPU usage and 75% less event loop latency.

Test 3: The impact of data broadcasting frequency

In this test, 4000 users connected to the server, and received 10KB of data every 2 seconds. Again, this equals to 20MB sent every second.

Compare these results with the first test. It seems doubling data size and decreasing frequency of data sent by half results in 25% lower CPU usage, but more or less the same memory consumption and event loop latency.

Conclusion

We do not really have a direct control over how many connections the application will have if we have one server, thus it seems we can only play around with the size of data sent, and how frequent we send this data. These tests show that for a constant throughput, (at least) for Engine.IO, the number of connections have the biggest impact on performance, then comes reducing size of data, and lastly reducing frequency of data sent. Sending larger data in less frequency appears to yield better performance. It’s a double-sided sword though: increasing size of data packets was what made my applications crash as well.

So I think the best way of balancing it is to estimate the max number of concurrent users in the long run (at least a couple of years forward), finding the maximum data size that can be used with that number, and then decrease the frequency of sending data if possible — but as said this is not that critical. After this point, as connection number has the biggest impact on the performance, by using a load balancer, adding more application instances should increase the performance more than any other method.


Discuss this on Hacker News

Acknowledgements: Benjamin Byholmfor reviewing and commenting on this article with his insights on caching, multiple servers, and V8 memory usage. Arnout Kazemier for his helpful comments, Nico Kaiser for his feedback, Guillermo Rauch and David Halls for taking the time to read the draft.

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏增长技术

App Guide相关

##TourGuide https://github.com/worker8/TourGuide

662
来自专栏c#开发者

Pass Multiple Values from a GridView to Another Page using ASP.NET

Pass Multiple Values from a GridView to Another Page  using ASP.NET A common req...

3345
来自专栏c#开发者

Simulate a Windows Service using ASP.NET to run scheduled jobs

Introduction How to run scheduled jobs from ASP.NET without requiring a Windows ...

3657
来自专栏张善友的专栏

Entity Framework Feature CTP 5系列文章

12月份发布了Entity Framework Feature CTP 5,这也是最后一个CTP版本了,明确了RTM的发布时间是2011年Q1,CTP5主要是加...

2086
来自专栏程序猿DD

Spring Boot 2.0 新特性(二):新增事件ApplicationStartedEvent

今天继续来聊Spring Boot 2.0的新特性。本文将具体说说2.0版本中的事件模型,尤其是新增的事件: ApplicationStartedEvent。 ...

3696
来自专栏张善友的专栏

单元测试同时支持 NUnit/MSTest

让单元测试代码同时支持NUnit/MSTest,可以参照MSDN magazine,也可以参看 Switching Between Using NUnit an...

1826
来自专栏施炯的IoT开发专栏

转贴-WP7开发资源大收集

文章作者: jason huang 文章标签: Microsoft, Windows Phone 7, WP7 转贴链接: WP7开发资源大收集 这里收集...

1768
来自专栏陈仁松博客

ASP.NET Core 'Microsoft.Win32.Registry' 错误修复

今天在发布Asp.net Core应用到Azure的时候出现错误InvalidOperationException: Cannot find compilati...

3878
来自专栏张善友的专栏

How to build Multi-Language Web Sites with ASP.NET 2.0 and VS.Net 2005

Introduction: In order to reach international markets through the Internet, sup...

2038
来自专栏张善友的专栏

.net和java互操作

.net网站theserverside.com上,有一篇讲.net和java互操作的文章,收集了net和java互操作性的文章精选 http://www.the...

1787

扫码关注云+社区