As the field of data science continues to grow and mature, it is nice to begin seeing some distinction in the roles of a data scientist. A new job title gaining popularity is the data engineer. In this post, I lay out some of the distinctions between the 2 roles.
A data scientist is responsible for pulling insights from data. It is the data scientists job to pull data, create models, create data products, and tell a story. A data scientist should typically have interactions with customers and/or executives. A data scientist should love scrubbing a dataset for more and more understanding.
The main goal of a data scientist is to produce data products and tell the stories of the data. A data scientist would typically have stronger statistics and presentation skills than a data engineer.
Data Engineering is more focused on the systems that store and retrieve data. A data engineer will be responsible for building and deploying storage systems that can adequately handle the needs. Sometimes the needs are fast real-time incoming data streams. Other times the needs are massive amounts of large video files. Still other times the needs are many many reads of the data. In other words, a data engineer needs to build systems that can handle the 3 Vs of big data.
The main goal of data engineer is to make sure the data is properly stored and available to the data scientist and others that need access. A data engineer would typically have stronger software engineering and programming skills than a data scientist.
It is too early to tell if these 2 roles will ever have a clear distinction of responsibilities, but it is nice to see a little separation of responsibilities for the mythical all-in-one data scientist. Both of these roles are important to a properly functioning data science team.
Do you see other distinctions between the roles?
本文分享自微信公众号 - 数据科学与人工智能（DS_AI_shujuren），作者：数据人网
原文出处及转载信息见文内详细说明，如有侵权，请联系 email@example.com 删除。
小编邀请您，先思考： 1 什么是数据科学？ Data science is a multidisciplinary blend of data inferenc...
一个简单且设计良好的架构对于任何深度学习项目来讲非常有必要，这里的 Tensorflow 项目模板经过了大量的实践，拥有简单性、良好的文件结构以及 OOP 设计...
Flink session cluster 是作为 K8S 的 Deployment，Flink 的作业会被提交到 session cluster。至于什么是 ...