pyspark.RDD:http://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.html#pyspark.RDD...图来自 edureka 的pyspark入门教程
下面我们用自己创建的RDD:sc.parallelize(range(1,11),4)
import os
import pyspark
from pyspark..., 3])
print(rdd.union(rdd).collect())
# [1, 1, 2, 3, 1, 1, 2, 3]
# 10. intersection: 取两个RDD的交集,同时有去重的功效...sc.parallelize([1, 6, 2, 3, 7, 8])
print(rdd1.intersection(rdd2).collect())
# [1, 2, 3]
# 11. cartesian: 生成笛卡尔积...[(0, 1000), (1, 1001), (2, 1002), (3, 1003), (4, 1004)]
# 13. zipWithIndex: 将RDD和一个从0开始的递增序列按照拉链方式连接