在RDD中扁平化列表是可能的吗?例如convert:
val xxx: org.apache.spark.rdd.RDD[List[Foo]]
至:
val yyy: org.apache.spark.rdd.RDD[Foo]
该怎么做呢?
发布于 2015-01-30 18:27:54
val rdd = sc.parallelize(Array(List(1,2,3), List(4,5,6), List(7,8,9), List(10, 11, 12)))
// org.apache.spark.rdd.RDD[List[Int]] = ParallelCollectionRDD ...
val rddi = rdd.flatMap(list => list)
// rddi: org.apache.spark.rdd.RDD[Int] = FlatMappedRDD ...
// which is same as rdd.flatMap(identity)
// identity is a method defined in Predef object.
// def identity[A](x: A): A
rddi.collect()
// res2: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
发布于 2015-01-30 18:13:53
你只需要扁平化它,但是因为RDD上没有显式的' flatten‘方法,所以你可以这样做:
rdd.flatMap(identity)
https://stackoverflow.com/questions/28233405
复制相似问题