考虑到数百万个对象的列表,如:
case class Point(val name:String, val x:Double, val y:Double)
对于给定的点目标,我需要选择离目标最近的另外10个点。
val target = Point("myPoint", 34, 42)
val points = List(...) // list of several million points
def distance(p1: Point, p2: Point) = ??? // return the distance between two points
val closest10 = points.sortWith((a, b) => {
distance(a, target) < distance(b, target)
}).take(10)
这种方法很有效,但速度很慢。实际上,对于每个目标请求,整个列表都进行了详尽的排序,而在前10个最近的点之后,我真的不关心任何类型的排序。我甚至不需要以正确的顺序返回前10个最近的。
理想情况下,我会寻找一种“先返回10,不注意其余的”的方法。
我能想到的天真的解决方案听起来像这样:按1000桶排序,取第一个桶,按100桶排序,取第一个桶,按10个桶排序,用第一个桶排序,完成第一个桶。
问题是,我想这在CS中肯定是一个很常见的问题,所以在推出基于这种天真方法的解决方案之前,我想知道任何最先进的方法,或者即使已经存在一些标准的方法。
TL;博士如何获得未排序列表的前10项,而不必对整个列表进行排序?
发布于 2017-11-29 13:18:33
下面是所以回答中的一种barebone方法,用于从列表中选择n
最小整数(可以增强以处理更复杂的数据结构):
def nSmallest(n: Int, list: List[Int]): List[Int] = {
def update(l: List[Int], e: Int): List[Int] =
if (e < l.head) (e :: l.tail).sortWith(_ > _) else l
list.drop(n).foldLeft( list.take(n).sortWith(_ > _) )( update(_, _) )
}
nSmallest( 5, List(3, 2, 8, 2, 9, 1, 5, 5, 9, 1, 7, 3, 4) )
// res1: List[Int] = List(3, 2, 2, 1, 1)
请注意输出的顺序是相反的。
发布于 2017-11-30 10:07:52
我看了一下这个,想知道PriorityQueue
是否有用。
import scala.collection.mutable.PriorityQueue
case class Point(val name:String, val x:Double, val y:Double)
val target = Point("myPoint", 34, 42)
val points = List(...) //list of points
def distance(p1: Point, p2: Point) = ??? //distance between two points
//load points-priority-queue with first 10 points
val ppq = PriorityQueue(points.take(10):_*){
case (a,b) => distance(a,target) compare distance(b,target) //prioritize points
}
//step through everything after the first 10
points.drop(10).foldLeft(distance(ppq.head,target))((mxDst,nextPnt) =>
if (mxDst > distance(nextPnt,target)) {
ppq.dequeue() //drop current far point
ppq.enqueue(nextPnt) //load replacement point
distance(ppq.head,target) //return new max distance
} else mxDst)
val result: List[Double] = ppq.dequeueAll //10 closest points
发布于 2017-12-02 03:43:53
如何使用QuickSelect来完成它。我用的是就地QuickSelect。基本上,对于每个目标点,我们计算出所有点与目标之间的距离,并利用QuickSelect得到k个最小距离(k阶统计量)。这是否比使用排序更快,取决于点数、近邻人数和目标数量等因素。在我的3kk随机生成点、10目标点和请求10个最近点的机器中,它比使用排序algo快2倍:
Number of points: 3000000
Number of targets: 10
Number of nearest: 10
QuickSelect: 10737 ms.
Sort: 20763 ms.
Results from QuickSelect are valid
代码:
import scala.annotation.tailrec
import scala.concurrent.duration.Deadline
import scala.util.Random
case class Point(val name: String, val x: Double, val y: Double)
class NearestPoints(val points: Seq[Point]) {
private case class PointWithDistance(p: Point, d: Double) extends Ordered[PointWithDistance] {
def compare(that: PointWithDistance): Int = d.compareTo(that.d)
}
def distance(p1: Point, p2: Point): Double = {
Math.sqrt(Math.pow(p2.x - p1.x, 2) + Math.pow(p2.y - p1.y, 2))
}
def get(target: Point, n: Int): Seq[Point] = {
val pd = points.map(p => PointWithDistance(p, distance(p, target))).toArray
(1 to n).map(i => quickselect(i, pd).get.p)
}
// In-place QuickSelect from https://gist.github.com/mooreniemi/9e45d55c0410cad0a9eb6d62a5b9b7ae
def quickselect[T <% Ordered[T]](k: Int, xs: Array[T]): Option[T] = {
def randint(lo: Int, hi: Int): Int =
lo + scala.util.Random.nextInt((hi - lo) + 1)
@inline
def swap[T](xs: Array[T], i: Int, j: Int): Unit = {
val t = xs(i)
xs(i) = xs(j)
xs(j) = t
}
def partition[T <% Ordered[T]](xs: Array[T], l: Int, r: Int): Int = {
var pivotIndex = randint(l, r)
val pivotValue = xs(pivotIndex)
swap(xs, r, pivotIndex)
pivotIndex = l
var i = l
while (i <= r - 1) {
if (xs(i) < pivotValue) {
swap(xs, i, pivotIndex)
pivotIndex = pivotIndex + 1
}
i = i + 1
}
swap(xs, r, pivotIndex)
pivotIndex
}
@tailrec
def quickselect0[T <% Ordered[T]](xs: Array[T], l: Int, r: Int, k: Int): T = {
if (l == r) {
xs(l)
} else {
val pivotIndex = partition(xs, l, r)
k compare pivotIndex match {
case 0 => xs(k)
case -1 => quickselect0(xs, l, pivotIndex - 1, k)
case 1 => quickselect0(xs, pivotIndex + 1, r, k)
}
}
}
xs match {
case _ if xs.isEmpty => None
case _ if k < 1 || k > xs.length => None
case _ => Some(quickselect0(xs, 0, xs.size - 1, k - 1))
}
}
}
object QuickSelectVsSort {
def main(args: Array[String]): Unit = {
val rnd = new Random(42L)
val MAX_N: Int = 3000000
val NUM_OF_NEARESTS: Int = 10
val NUM_OF_TARGETS: Int = 10
println(s"Number of points: $MAX_N")
println(s"Number of targets: $NUM_OF_TARGETS")
println(s"Number of nearest: $NUM_OF_NEARESTS")
// Generate random points
val points = (1 to MAX_N)
.map(x => Point(x.toString, rnd.nextDouble, rnd.nextDouble))
// Generate target points
val targets = (1 to NUM_OF_TARGETS).map(x => Point(s"Target$x", rnd.nextDouble, rnd.nextDouble))
var start = Deadline.now
val np = new NearestPoints(points)
val viaQuickSelect = targets.map { case target =>
val nearest = np.get(target, NUM_OF_NEARESTS)
nearest
}
var end = Deadline.now
println(s"QuickSelect: ${(end - start).toMillis} ms.")
start = Deadline.now
val viaSort = targets.map { case target =>
val closest = points.sortWith((a, b) => {
np.distance(a, target) < np.distance(b, target)
}).take(NUM_OF_NEARESTS)
closest
}
end = Deadline.now
println(s"Sort: ${(end - start).toMillis} ms.")
// Validate
assert(viaQuickSelect.length == viaSort.length)
viaSort.zipWithIndex.foreach { case (p, idx) =>
assert(p == viaQuickSelect(idx))
}
println("Results from QuickSelect are valid")
}
}
https://stackoverflow.com/questions/47552191
复制相似问题