在Spark中颠倒RDD的takeOrdered()方法的顺序的语法是什么?
对于奖励积分,在Spark中对RDD进行自定义排序的语法是什么?
发布于 2014-10-16 18:02:20
逆序
val seq = Seq(3,9,2,3,5,4)
val rdd = sc.parallelize(seq,2)
rdd.takeOrdered(2)(Ordering[Int].reverse)
结果将是Array(9,5)
自定义订单
我们将按年龄对人进行排序。
case class Person(name:String, age:Int)
val people = Array(Person("bob", 30), Person("ann", 32), Person("carl", 19))
val rdd = sc.parallelize(people,2)
rdd.takeOrdered(1)(Ordering[Int].reverse.on(x=>x.age))
结果将是数组(Person(ann,32))
发布于 2016-02-01 14:20:06
val rdd1 = sc.parallelize(List(("Hadoop PIG Hive"), ("Hive PIG PIG Hadoop"), ("Hadoop Hadoop Hadoop")))
val rdd2 = rdd1.flatMap(x => x.split(" ")).map(x => (x,1))
val rdd3 = rdd2.reduceByKey((x,y) => (x+y))
//逆序(降序)
rdd3.takeOrdered(3)(Ordering[Int].reverse.on(x=>x._2))
输出:
res0: Array[(String, Int)] = Array((Hadoop,5), (PIG,3), (Hive,2))
//升序
rdd3.takeOrdered(3)(Ordering[Int].on(x=>x._2))
输出:
res1: Array[(String, Int)] = Array((Hive,2), (PIG,3), (Hadoop,5))
发布于 2021-02-28 17:21:14
对于K,V对的字数类型的问题。如果你想从你的有序列表中得到最后10个-
SparkContext().parallelize(wordCounts.takeOrdered(10, lambda pair: -pair[1]))
https://stackoverflow.com/questions/26387753
复制相似问题