Golang生产级可靠UDP库

原文作者:xtaci

Introduction

kcp-go is a Production-Grade Reliable-UDP library for golang.

This library intents to provide a smooth, resilient, ordered, error-checked and anonymous delivery of streams over UDPpackets, it has been battle-tested with opensource project kcptun. Millions of devices(from low-end MIPS routers to high-end servers) have deployed kcp-go powered program in a variety of forms like online games, live broadcasting, file synchronization and network acceleration.

Lastest Release

Features

  1. Designed for Latency-sensitive scenarios.
  2. Cache friendly and Memory optimized design, offers extremely High Performance core.
  3. Handles >5K concurrent connections on a single commodity server.
  4. Compatible with net.Conn and net.Listener, a drop-in replacement for net.TCPConn.
  5. FEC(Forward Error Correction) Support with Reed-Solomon Codes
  6. Packet level encryption support with AES, TEA, 3DES, Blowfish, Cast5, Salsa20, etc. in CFB mode, which generates completely anonymous packet.
  7. Only A fixed number of goroutines will be created for the entire server application, costs in context switch between goroutines have been taken into consideration.
  8. Compatible with skywind3000's C version with various improvements.

Documentation

For complete documentation, see the associated Godoc.

Specification

+-----------------+
| SESSION         |
+-----------------+
| KCP(ARQ)        |
+-----------------+
| FEC(OPTIONAL)   |
+-----------------+
| CRYPTO(OPTIONAL)|
+-----------------+
| UDP(PACKET)     |
+-----------------+
| IP              |
+-----------------+
| LINK            |
+-----------------+
| PHY             |
+-----------------+
(LAYER MODEL OF KCP-GO)

Usage

Client: full demo

kcpconn, err := kcp.DialWithOptions("192.168.0.1:10000", nil, 10, 3)

Server: full demo

lis, err := kcp.ListenWithOptions(":10000", nil, 10, 3)

Benchmark

  Model Name:	MacBook Pro
  Model Identifier:	MacBookPro14,1
  Processor Name:	Intel Core i5
  Processor Speed:	3.1 GHz
  Number of Processors:	1
  Total Number of Cores:	2
  L2 Cache (per Core):	256 KB
  L3 Cache:	4 MB
  Memory:	8 GB
$ go test -v -run=^$ -bench .
beginning tests, encryption:salsa20, fec:10/3
goos: darwin
goarch: amd64
pkg: github.com/xtaci/kcp-go
BenchmarkSM4-4                 	   50000	     32180 ns/op	  93.23 MB/s	       0 B/op	       0 allocs/op
BenchmarkAES128-4              	  500000	      3285 ns/op	 913.21 MB/s	       0 B/op	       0 allocs/op
BenchmarkAES192-4              	  300000	      3623 ns/op	 827.85 MB/s	       0 B/op	       0 allocs/op
BenchmarkAES256-4              	  300000	      3874 ns/op	 774.20 MB/s	       0 B/op	       0 allocs/op
BenchmarkTEA-4                 	  100000	     15384 ns/op	 195.00 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR-4                 	20000000	        89.9 ns/op	33372.00 MB/s	       0 B/op	       0 allocs/op
BenchmarkBlowfish-4            	   50000	     26927 ns/op	 111.41 MB/s	       0 B/op	       0 allocs/op
BenchmarkNone-4                	30000000	        45.7 ns/op	65597.94 MB/s	       0 B/op	       0 allocs/op
BenchmarkCast5-4               	   50000	     34258 ns/op	  87.57 MB/s	       0 B/op	       0 allocs/op
Benchmark3DES-4                	   10000	    117149 ns/op	  25.61 MB/s	       0 B/op	       0 allocs/op
BenchmarkTwofish-4             	   50000	     33538 ns/op	  89.45 MB/s	       0 B/op	       0 allocs/op
BenchmarkXTEA-4                	   30000	     45666 ns/op	  65.69 MB/s	       0 B/op	       0 allocs/op
BenchmarkSalsa20-4             	  500000	      3308 ns/op	 906.76 MB/s	       0 B/op	       0 allocs/op
BenchmarkCRC32-4               	20000000	        65.2 ns/op	15712.43 MB/s
BenchmarkCsprngSystem-4        	 1000000	      1150 ns/op	  13.91 MB/s
BenchmarkCsprngMD5-4           	10000000	       145 ns/op	 110.26 MB/s
BenchmarkCsprngSHA1-4          	10000000	       158 ns/op	 126.54 MB/s
BenchmarkCsprngNonceMD5-4      	10000000	       153 ns/op	 104.22 MB/s
BenchmarkCsprngNonceAES128-4   	100000000	        19.1 ns/op	 837.81 MB/s
BenchmarkFECDecode-4           	 1000000	      1119 ns/op	1339.61 MB/s	    1606 B/op	       2 allocs/op
BenchmarkFECEncode-4           	 2000000	       832 ns/op	1801.83 MB/s	      17 B/op	       0 allocs/op
BenchmarkFlush-4               	 5000000	       272 ns/op	       0 B/op	       0 allocs/op
BenchmarkEchoSpeed4K-4         	    5000	    259617 ns/op	  15.78 MB/s	    5451 B/op	     149 allocs/op
BenchmarkEchoSpeed64K-4        	    1000	   1706084 ns/op	  38.41 MB/s	   56002 B/op	    1604 allocs/op
BenchmarkEchoSpeed512K-4       	     100	  14345505 ns/op	  36.55 MB/s	  482597 B/op	   13045 allocs/op
BenchmarkEchoSpeed1M-4         	      30	  34859104 ns/op	  30.08 MB/s	 1143773 B/op	   27186 allocs/op
BenchmarkSinkSpeed4K-4         	   50000	     31369 ns/op	 130.57 MB/s	    1566 B/op	      30 allocs/op
BenchmarkSinkSpeed64K-4        	    5000	    329065 ns/op	 199.16 MB/s	   21529 B/op	     453 allocs/op
BenchmarkSinkSpeed256K-4       	     500	   2373354 ns/op	 220.91 MB/s	  166332 B/op	    3554 allocs/op
BenchmarkSinkSpeed1M-4         	     300	   5117927 ns/op	 204.88 MB/s	  310378 B/op	    6988 allocs/op
PASS
ok  	github.com/xtaci/kcp-go	50.349s

Key Design Considerations

  1. slice vs. container/list

kcp.flush() loops through the send queue for retransmission checking for every 20ms(interval).

I've wrote a benchmark for comparing sequential loop through slice and container/list here:

https://github.com/xtaci/notes/blob/master/golang/benchmark2/cachemiss_test.go

BenchmarkLoopSlice-4   	2000000000	         0.39 ns/op
BenchmarkLoopList-4    	100000000	        54.6 ns/op

List structure introduces heavy cache misses compared to slice which owns better locality, 5000 connections with 32 window size and 20ms interval will cost 6us/0.03%(cpu) using slice, and 8.7ms/43.5%(cpu) for list for each kcp.flush().

  1. Timing accuracy vs. syscall clock_gettime

Timing is critical to RTT estimator, inaccurate timing leads to false retransmissions in KCP, but calling time.Now() costs 42 cycles(10.5ns on 4GHz CPU, 15.6ns on my MacBook Pro 2.7GHz).

The benchmark for time.Now() lies here:

https://github.com/xtaci/notes/blob/master/golang/benchmark2/syscall_test.go

BenchmarkNow-4         	100000000	        15.6 ns/op

In kcp-go, after each kcp.output() function call, current clock time will be updated upon return, and for a single kcp.flush()operation, current time will be queried from system once. For most of the time, 5000 connections costs 5000 * 15.6ns = 78us(a fixed cost while no packet needs to be sent), as for 10MB/s data transfering with 1400 MTU, kcp.output() will be called around 7500 times and costs 117us for time.Now() in every second.

Connection Termination

Control messages like SYN/FIN/RST in TCP are not defined in KCP, you need some keepalive/heartbeat mechanism in the application-level. A real world example is to use some multiplexing protocol over session, such as smux(with embedded keepalive mechanism), see kcptun for example.

FAQ

Q: I'm handling >5K connections on my server, the CPU utilization is so high.

A: A standalone agent or gate server for running kcp-go is suggested, not only for CPU utilization, but also important to the precision of RTT measurements(timing) which indirectly affects retransmission. By increasing update interval with SetNoDelay like conn.SetNoDelay(1, 40, 1, 1) will dramatically reduce system load, but lower the performance.

Who is using this?

  1. https://github.com/xtaci/kcptun -- A Secure Tunnel Based On KCP over UDP.
  2. https://github.com/getlantern/lantern -- Lantern delivers fast access to the open Internet.
  3. https://github.com/smallnest/rpcx -- A RPC service framework based on net/rpc like alibaba Dubbo and weibo Motan.
  4. https://github.com/gonet2/agent -- A gateway for games with stream multiplexing.
  5. https://github.com/syncthing/syncthing -- Open Source Continuous File Synchronization.
  6. https://play.google.com/store/apps/details?id=com.k17game.k3 -- Battle Zone - Earth 2048, a world-wide strategy game.

Links

  1. https://github.com/xtaci/libkcp -- FEC enhanced KCP session library for iOS/Android in C++
  2. https://github.com/skywind3000/kcp -- A Fast and Reliable ARQ Protocol
  3. https://github.com/klauspost/reedsolomon -- Reed-Solomon Erasure Coding in Go

原文发布于微信公众号 - Golang语言社区(Golangweb)

原文发表时间:2018-10-14

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏Golang语言社区

在Go中使用服务对象模式

NOTE: Most of the code and ideas in this post are things I have been experimenti...

1022
来自专栏智能计算时代

Microservices Ecosystem Transit Map

…we assembled a map of the ecosystem to help guide practitioners, vendors, inves...

3714
来自专栏我和未来有约会

silverlight beta 2 将在本周末发布.

太开心了,刚刚收到微软silverlight.net的邮件.  确定了 将在本周末发布beta2. 以下是部分邮件内容 引用: There are some...

2056
来自专栏张善友的专栏

SharpForge - Open source SourceForge / CodePlex implementation

SharpForge - Open source SourceForge / CodePlex implementation SharpForge suppo...

19610
来自专栏Y大宽

金黄葡萄球菌RNA-seq数据分析

这里出现问题了,突变株的比对率太低,不到1%,这是不可能的,怀疑样品污染,然后随机挑选了5条序列blast了下,发现应该是被溶血葡萄球菌污染。

1412
来自专栏10km的专栏

Ubuntu16:cmake生成Makefile编译caffe过程(OpenBLAS/CPU+GPU)塈解决nvcc warning:The 'compute_20', 'sm_20'

之前在ubuntu14下实现了Caffe编译(参见去年写的博客 《 Ubuntu14:cmake生成Makefile编译caffe过程(OpenBLAS/CPU...

4388
来自专栏技术小黑屋

Jar Mismatch! Fix Your Dependencies

There was a requirement of my work. It requires me to integrated my current proj...

872
来自专栏游戏杂谈

cocos2d-x 2.x版本接入bugly的总结

最开始项目使用的是自己DIY的很简陋的上报系统,后来改成google breakpad来上报,发现其实都做的不太理想,游戏引擎因为版本历史问题存在一些崩溃问题。...

1150
来自专栏码匠的流水账

聊聊lettuce的指标监控

lettuce-core-5.0.4.RELEASE-sources.jar!/io/lettuce/core/event/metrics/DefaultCom...

2522
来自专栏CodingToDie

Awesome 项目

5865

扫码关注云+社区