专栏首页数据科学与人工智能数据科学家/数据工程师

数据科学家/数据工程师

As the field of data science continues to grow and mature, it is nice to begin seeing some distinction in the roles of a data scientist. A new job title gaining popularity is the data engineer. In this post, I lay out some of the distinctions between the 2 roles.

Data Scientist

A data scientist is responsible for pulling insights from data. It is the data scientists job to pull data, create models, create data products, and tell a story. A data scientist should typically have interactions with customers and/or executives. A data scientist should love scrubbing a dataset for more and more understanding.

The main goal of a data scientist is to produce data products and tell the stories of the data. A data scientist would typically have stronger statistics and presentation skills than a data engineer.

Data Engineer

Data Engineering is more focused on the systems that store and retrieve data. A data engineer will be responsible for building and deploying storage systems that can adequately handle the needs. Sometimes the needs are fast real-time incoming data streams. Other times the needs are massive amounts of large video files. Still other times the needs are many many reads of the data. In other words, a data engineer needs to build systems that can handle the 3 Vs of big data.

The main goal of data engineer is to make sure the data is properly stored and available to the data scientist and others that need access. A data engineer would typically have stronger software engineering and programming skills than a data scientist.

Conclusion

It is too early to tell if these 2 roles will ever have a clear distinction of responsibilities, but it is nice to see a little separation of responsibilities for the mythical all-in-one data scientist. Both of these roles are important to a properly functioning data science team.

Do you see other distinctions between the roles?

原文链接http://101.datascience.community/2014/07/08/data-scientist-vs-data-engineer

本文分享自微信公众号 - 数据科学与人工智能(DS_AI_shujuren),作者:数据人网

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2016-12-17

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 员工流动分析和预测

    公司员工,是一家公司成长和发展的关键要素之一。留不住优秀的员工,也就难以打造出卓越的公司。很多公司,比方说,惠普公司,IBM公司等,已经采用数据科学的手段,对内...

    陆勤_数据人网
  • 50个数据科学应用领域|数据科学

    数据就是资源,如何利用此资源创造商业价值,大家共同研究和实践的问题。数据科学专注于从数据中学习那些有商业价值的东西并加以利用,玩数据的人角色多样,有数据分析师、...

    陆勤_数据人网
  • 【FAQ】数据科学是什么?

    小编邀请您,先思考: 1 什么是数据科学? Data science is a multidisciplinary blend of data inferenc...

    陆勤_数据人网
  • 数据科学:挑战和方向 (CS)

    尽管数据科学已经成为一个有争议的新科学领域,但关于我们为何需要数据科学以及是什么使它成为一门科学,已经进行了许多争议和讨论。在审阅数百篇标题包括数据科学的文献后...

    DDDDDaemon
  • Github 项目推荐 | TensorFlow 项目模板架构最佳实践

    一个简单且设计良好的架构对于任何深度学习项目来讲非常有必要,这里的 Tensorflow 项目模板经过了大量的实践,拥有简单性、良好的文件结构以及 OOP 设计...

    AI研习社
  • sftp config 转

    (adsbygoogle = window.adsbygoogle || []).push({});

    双面人
  • flask返回自定义的Response

    py3study
  • Python--生成Wav格式文件

    scipy下载链接: http://www.scipy.org/Download#head-0dfc04e10313d2e70988c6cb3bef7a9e09...

    py3study
  • Flink Session Cluster on K8S

    Flink session cluster 是作为 K8S 的 Deployment,Flink 的作业会被提交到 session cluster。至于什么是 ...

    runzhliu
  • R语言分析协变量之间的非线性关系

    最近我被问到我的 - [R和Stata的软件包是否能够适应协变量之间的非线性关系。答案是肯定的,在这篇文章中,我将说明如何做到这一点。

    拓端

扫码关注云+社区

领取腾讯云代金券