## map和flatMap之间有什么区别？内容来源于 Stack Overflow，并遵循CC BY-SA 3.0许可协议进行翻译与使用

• 回答 (2)
• 关注 (0)
• 查看 (20)

“扁平化”结果是什么意思？有什么好处呢？

### 2 个回答

``````val rdd = sc.parallelize(Seq("Roses are red", "Violets are blue"))  // lines

rdd.collect

res0: Array[String] = Array("Roses are red", "Violets are blue")
``````

``````rdd.map(_.length).collect

res1: Array[Int] = Array(13, 16)
``````

``````rdd.flatMap(_.split(" ")).collect

res2: Array[String] = Array("Roses", "are", "red", "Violets", "are", "blue")
``````

``````["aa bb cc", "", "dd"] => [["aa","bb","cc"],[],["dd"]] => ["aa","bb","cc","dd"]
``````

``````hadoop is fast
hive is sql on hdfs
spark is superfast
spark is awesome
``````

## 运用 `map`

``````>>> wc = data.map(lambda line:line.split(" "));
>>> wc.collect()
[u'hadoop is fast', u'hive is sql on hdfs', u'spark is superfast', u'spark is awesome']
``````

## 运用 `flatMap`

``````>>> fm = data.flatMap(lambda line:line.split(" "));
>>> fm.collect()
[u'hadoop', u'is', u'fast', u'hive', u'is', u'sql', u'on', u'hdfs', u'spark', u'is', u'superfast', u'spark', u'is', u'awesome']
``````

``````>>> fm.map(lambda word : (word,1)).collect()
[(u'hadoop', 1), (u'is', 1), (u'fast', 1), (u'hive', 1), (u'is', 1), (u'sql', 1), (u'on', 1), (u'hdfs', 1), (u'spark', 1), (u'is', 1), (u'superfast', 1), (u'spark', 1), (u'is', 1), (u'awesome', 1)]
``````

`map`RDD `wc`将给出下面输出：

``````>>> wc.flatMap(lambda word : (word,1)).collect()
[[u'hadoop', u'is', u'fast'], 1, [u'hive', u'is', u'sql', u'on', u'hdfs'], 1, [u'spark', u'is', u'superfast'], 1, [u'spark', u'is', u'awesome'], 1]
``````

`map`：通过对RDD的每个元素应用给定的函数来返回一个新的RDD。函数`map`只返回一个项目。 `flatMap`：类似于`map`它通过对RDD的每个元素应用一个函数来返回一个新的RDD，但输出是扁平化的。