作者 | 陌无崖


Go + Service = One Goliath Project

Kevin Dangoor on December 20th

Khan Academy is embarking on a huge effort to rebuild our server software on a more modern stack in Go. 可汗学院正在努力着用Go重建着我们的服务器软件

At Khan Academy, we don’t shy away from a challenge. After all, we’re a non-profit with a mission to provide a “free world-class education to anyone, anywhere”. Challenges don’t get much bigger than that.


Our mission requires us to create and maintain software to provide tools which help teachers and coaches who work with students, and a personalized learning experience both in and out of school. Millions of people rely on our servers each month to provide a wide variety of features we’ve built up over the past ten years.


Ten years is a long time in technology! We chose Python as our backend server language and it has been a productive choice for us. Of course, ten years ago we chose Python 2 because Python 3 was still very new and not well supported.

十年是很长的技术!我们选择Python作为我们的后端服务器语言,这对我们来说是一种高效的选择。当然,十年前我们选择了Python 2,因为Python 3仍然很新,并且没有得到很好的支持。

The Python 2 end-of-life

Now, in 2019, Python 3 versions are dominant and the Python Software Foundation has said that Python 2 reaches its official end-of-life on January 1, 2020, so that they can focus their limited time fully on the future. Undoubtedly, there are still millions of lines of Python 2 out there, but the truth is undeniable: Python 2 is on its way out.

现在,在2019年,Python 3版本占据了主导地位,Python软件基金会表示Python 2将于2020年1月1日正式终止使用寿命,以便他们可以将有限的时间完全用于未来。毫无疑问,仍然有数百万行的Python 2,但事实是不可否认的:Python 2即将问世。

Moving from Python 2 to 3 is not an easy task. Beyond that hurdle, which has been widely written about elsewhere, we also have a bunch of other APIs in libraries we use which have undergone huge changes.

从Python 2迁移到3并非易事。除了在其他地方广为介绍的这一障碍之外,我们在使用的库中还有许多其他API,它们已经发生了巨大的变化。

All of these differences mean that we’d have to split our code to run in at least two services (the old Python 2 codebase and the Python 3 replacement) which can coexist during the transition.

所有这些差异意味着我们必须将代码拆分为至少可以在过渡期间共存的两个服务(旧的Python 2代码库和Python 3替代品)中运行。

For all of that work, we’d receive these benefits:

  • Likely a 10-15% boost in backend server code performance
  • Python 3’s language features


  • 后端服务器代码性能可能提高10-15%
  • Python 3的语言功能

Other languages

Given all of the work required and the relatively small benefits, we wanted to consider other options. We started using Kotlin for specific jobs within Khan Academy a year ago. Its performance benefits have saved us money, which we can apply in other ways to help people around the world learn. If we moved from Python to a language that is an order of magnitude faster, we can both improve how responsive our site isanddecrease our server costs dramatically.


Moving to Kotlin was an appealing alternative. While we were at it, we decided to dig deeper into other options. Looking at the languages that have first-class support in Google App Engine, another serious contender appeared: Go. Kotlin is a very expressive language with an impressive set of features. Go, on the other hand, offers simplicity and consistency. The Go team is focused on making a language which helps teams reliably ship software over the long-term.

搬到Kotlin是一个吸引人的选择。在此期间,我们决定更深入地研究其他选择。查看在Google App Engine中具有一流支持的语言,出现了另一个严重的竞争者:Go。Kotlin是一种非常富有表现力的语言,具有令人印象深刻的功能。另一方面,Go提供了简单性和一致性。Go团队致力于开发一种可帮助团队长期可靠地发布软件的语言。

As individuals writing code, we can iterate faster due to Go’s lightning quick compile times. Also, members of our team have years of experience and muscle memory built around many different editors. Go is better supported than Kotlin by a broad range of editors.


Finally, we ran a bunch of tests around performance and found that Go and Kotlin (on the JVM) perform similarly, with Kotlin being perhaps a few percent ahead. Go, however, used a lot less memory, which means that it can scale down to smaller instances.


We still like Python, but the dramatic performance difference which Go brings to us is too big to ignore, and we think we’ll be able to better support a system running on Go over the years. Moving to Go will undeniably be more effort than moving to Python 3, but the performance win alone makes it worth it.

我们仍然喜欢Python,但是Go带给我们的巨大的性能差异太大了,不容忽视,并且我们认为多年来我们将能够更好地支持在Go上运行的系统。无疑,向Go迁移比向Python 3迁移要付出更多的努力,但是仅凭性能方面的胜利就值得。

From monolith to services

With a few exceptions, our servers have historically all run the same code and can respond to a request for any part of Khan Academy. We use separate services for storing data and managing caches, but the logic for any request can be easily traced through our code and is the same regardless of which server responds.


When a function calls another in a program, those calls are extremely reliable and very fast. This is a fundamental advantage of monoliths. Once you break up your logic into services, you’re putting slower, more fragile boundaries between parts of your code. You also have to consider how, exactly, that communication is going to happen. Do you put a publish/subscribe bus in between? Make direct HTTP or gRPC calls? Dispatch via some gateway?


Even recognizing this added complexity, we re breaking up our monolith into services. There s an element of necessity to it, because new Go code would have to run in a separate process at least from our existing Python.


The added complexity of services is balanced by a number of big benefits:

  • By having more services which can be deployed independently, deployment and test runs can move more quickly for a single service, which means engineers will be able to spend less of their time on deployment activities. It also means they’ll be able to get changes out more quickly when needed.
  • We can have more confidence that a problem with a deployment will have a limited impact on other parts of the site.
  • By having separate services, we can also choose the right kinds of instances and hosting configuration needed for each service, which helps to optimize both performance and cost.


  • 通过拥有更多可以独立部署的服务,部署和测试运行可以针对单个服务更快地进行移动,这意味着工程师将能够花费更少的时间在部署活动上。这也意味着他们将能够在需要时更快地获取更改。
  • 我们可以更有信心,部署问题对站点其他部分的影响有限。
  • 通过拥有单独的服务,我们还可以选择每种服务所需的正确种类的实例和托管配置,这有助于优化性能和成本。

We posted a series of blog posts (part 1, part 2, part 3) about how we had performed a significant refactoring of our Python code, drawing boundaries and creating constraints around which code could import which other code. Those boundaries provided a starting point for thinking about how we’d break our code into services. Craig Silverstein and Ben Kraft led an effort to figure out an initial set of services and how we would need to accommodate the boundaries between them.

我们发布了一系列博客文章(第1部分,第2部分,第3部分),介绍了我们如何对Python代码进行重大重构,绘制边界并围绕哪些代码可以导入其他代码创建约束。这些界限为思考如何将代码分解为服务提供了一个起点。克雷格·西尔弗斯坦(Craig Silverstein)和本·卡夫(Ben Kraft)致力于找出最初的服务集,以及我们将如何适应它们之间的界限。

In our current monolith, code is free to read and update any data models it needs to. To keep things sane, we made some rules around data access from services, but that’s a topic for another day.


Cleaning house

Ten years is a long time in technology. GraphQL didn’t exist in 2009, and two years ago we decided to migrate all of our HTTP GET APIs to GraphQL, later deciding to also adopt GraphQL mutations. We adopted React just after it was introduced, and it has spread to much of our web frontend. Google Cloud has grown in breadth of features. Server architectures have moved in the direction of independently deployable services.

十年是很长的技术。GraphQL在2009年不存在,两年前,我们决定将所有HTTP GET API迁移到GraphQL,后来决定也采用GraphQL突变。引入React之后,我们就采用了React,它已经扩展到我们的许多Web前端。Google Cloud的功能广泛。服务器体系结构已朝着可独立部署的服务方向发展。

We’re going to do a lot of housecleaning in Python. We’re very aware of the second-system effect and our goal with this work is not to “create the perfect system” but rather to make it easier to port to Go. We started some of these technical migrations earlier, and some of them will continue on past the point at which our system is running in Go, but the end result will be more modern and coherent.


  • We’ll only generate web pages via React server side rendering, eliminating the Jinja server-side templating we’ve been using
  • We’ll use GraphQL federation to dispatch requests to our services (and to our legacy Python code during the transition)
  • Where we need to offer REST endpoints, we’ll do so through a gateway that converts the request to GraphQL
  • We will rely more heavily on Fastly, our CDN provider, to enable more requests to be served quickly, closer to our users, and without requiring our server infrastructure to handle the request at all
  • We’re going to deprecate some largely unused, outdated features that are an ongoing maintenance burden and would slow down our path forward
  • 我们只会通过React服务器端渲染生成网页,从而消除了我们一直在使用的Jinja服务器端模板。
  • 我们将使用GraphQL联合将请求分派到我们的服务(以及过渡期间的旧Python代码)
  • 在需要提供REST端点的地方,我们将通过网关将请求转换为GraphQL。
  • 我们将弃用一些未使用的,过时的功能,这些功能会带来持续的维护负担,并且会拖慢我们的前进速度

There are other things we might want to fix, but we re making choices that ultimately will help us complete the project more quickly and safely.


What’s not changing

Everything I’ve described to this point is a huge amount of change, but there is a lot that we’re not changing. As much as possible, we’re going to port our logic straight from Python to Go, just making sure the code looks like idiomatic Go when it’s done.


We’ve been using Google App Engine since day 1, and it has worked well for us and scaled automatically as we’ve grown. So, we’re going to keep using App Engine for our new Go services. We’re using Google Cloud Datastore as our database for the site, which is also staying the same. This also applies to the variety of other Google Cloud service we use, which have been performing well and scaling with our needs.

从第一天开始,我们就一直在使用Google App Engine,它对我们来说运作良好,并且随着我们的成长而自动扩展。因此,我们将继续为新的Go服务使用App Engine。我们将Google Cloud Datastore用作网站的数据库,并且保持不变。这也适用于我们使用的其他各种Google Cloud服务,这些服务表现良好并且可以根据我们的需求进行扩展。

The plan

As of December 2019, we have our first few Go services running in production behind an Apollo GraphQL gateway. These services are pretty small today, because the way we’re doing the migration is very incremental. This incremental switchover is another good topic to talk about on another day (subscribe to our RSS feed or our Twitter account to read new posts as they go live).

截至2019年12月,我们在Apollo GraphQL网关后面开始运行了几个Go服务。这些服务今天很小,因为我们进行迁移的方式是渐进式的。渐进式切换是另一天要讨论的好话题(订阅我们的RSS feed或我们的Twitter帐户,以在新帖子上线时阅读它们)。

For us, 2020 is going to be filled with technical challenge and opportunity: Converting a large Python monolith to GraphQL-based services in Go. We’re excited about this project, which we’ve named Goliath (you can probably imagine all of the “Go-” names we considered!). It’s a once in a decade opportunity to take a revolutionary step forward, and a big example of how we live our "We champion quality" engineering principle.

对我们来说,2020年将充满技术挑战和机遇:在Go中将大型Python整体组件转换为基于GraphQL的服务。我们对这个名为Goliath的项目感到非常兴奋(您可以想象我们考虑过的所有“ Go-”名称!)。这是十年一次的革命性机遇,也是我们如何践行“以质量为先”的工程原则的一个重要例子。

If you’re also excited about this opportunity, check out our careers page. As you can imagine, we’re hiring engineers!



本文分享自微信公众号 - golang技术杂文(gh_ebbdb61f463e),作者:Kevin Dangoor

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。




0 条评论
登录 后参与评论


  • 最终,为什么选择go-kit

    前言 工作这些年,先后经历过两家公司,分别参与过php语言框架的设计和主导过golang技术栈的落地工作,在此过程中有一些感悟和总结。我想以之前我主导的gola...

  • 为什么chain33选择用go语言开发?

    一个语言是一个成熟的语言,一个重要的标志就是开发过非常大型的系统。云计算领域的两个核心系统 docker 和 k8s 都是用go 语言开发的。目前大部分区块链系...

  • Golang横空出世的背景(为什么选择Go)


  • 为什么感觉这么多人选择学java?


  • 为什么选择GSEA分析?和KEGG和GO分析有什么区别?


  • 总结:为什么要选择机器学习


  • 机器学习5大数学知识(附详细课程资源)


  • 想转行人工智能行业,几乎零基础,如何入门与进阶?


  • 机器学习该怎么入门?


  • 为什么大多数人不选择自学编程?


  • 他把自己估值上万亿美元的项目免费化了.....

    这个不起眼的小伙子叫萨尔曼·可汗(Salman Khan),今年39岁。他颠覆了美国教育,成为了数学教父,让数学老师不再讲课,比尔盖茨都捧着他。他成功登上了《福...

  • Life 小手术后的一些思考

    不知道有多少人被「手汗症」困扰呢?反正我被这「手汗症」困扰了 10+ 年了,记得从我小学开始,每一次考试都需要握着纸巾,目的呢,当然是防止试卷被弄湿,说多了都是...

  • 入门机器学习的路线图,国外优质资源推荐


  • 为什么要选择Python语言实现机器学习算法

    点击上方 “蓝色字” 可关注我们! 我们选择Python作为实现机器学习算法的编程语言:(1) Python的语法清晰;(2) 易于操作纯文本文件;(3) 使用...

  • 为什么很多人选择默默放弃学习C语言?


  • 如此多的深度学习框架,为什么我选择PyTorch?


  • 巧用MOOC组合掌握机器学习

    咱们不提CES 2017上激动人心的自动驾驶产品(估计七八年之后你的驾驶证就可以扔掉了),也不细讲《最强大脑》节目里人类精英在图像识别环节被碾压(这曾经是人类可...

  • 图灵奖颁给《玩具总动员》打造者!他们是动画特效先驱,奥斯卡得主,图形学集大成者


  • 三分钟学 Go 语言——开始

    我已经习惯了每天抽 30 分钟在 github 上打卡,连着更新完了整个 go 语言系列的学习过程,当然了我会一直维护这个项目到天荒地老,后面我学到新知识或者看...