文章/答案/技术大牛

发布

社区首页 >问答首页 >Lucene按分数排序，然后修改日期

问Lucene按分数排序，然后修改日期
EN

Stack Overflow用户

提问于 2015-12-02 05:15:03

回答 2查看 4.9K关注 0票数 5

我的文档中有三个字段

标题
内容
修改日期

所以当我搜索一个term时，它是按照score排序的结果给出的

现在，我想进一步对基于modifiedDate的相同分数的结果进行排序，即在相同分数的顶部显示最近的文档。

我试过按分数排序，修改日期，但不起作用。有人能给我指明正确的方向吗？

lucene

Stack Overflow用户

发布于 2015-12-02 09:09:44

您可以使用自定义收集器来解决这个问题。它将根据分数，然后根据时间戳对结果进行排序。在这个收集器中，您应该检索时间戳值，以便进行第二次排序。见下面的类

public class CustomCollector extends TopDocsCollector<ScoreDocWithTime> {

    ScoreDocWithTime pqTop;

    // prevents instantiation
    public CustomCollector(int numHits) {
        super(new HitQueueWithTime(numHits, true));
        // HitQueue implements getSentinelObject to return a ScoreDoc, so we know
        // that at this point top() is already initialized.
        pqTop = pq.top();
    }

    @Override
    public LeafCollector getLeafCollector(LeafReaderContext context)
            throws IOException {
        final int docBase = context.docBase;
        final NumericDocValues modifiedDate =
                DocValues.getNumeric(context.reader(), "modifiedDate");

        return new LeafCollector() {
            Scorer scorer;


            @Override
            public void setScorer(Scorer scorer) throws IOException {
                this.scorer = scorer;
            }

            @Override
            public void collect(int doc) throws IOException {
                float score = scorer.score();

                // This collector cannot handle these scores:
                assert score != Float.NEGATIVE_INFINITY;
                assert !Float.isNaN(score);

                totalHits++;
                if (score <= pqTop.score) {
                    // Since docs are returned in-order (i.e., increasing doc Id), a document
                    // with equal score to pqTop.score cannot compete since HitQueue favors
                    // documents with lower doc Ids. Therefore reject those docs too.
                    return;
                }
                pqTop.doc = doc + docBase;
                pqTop.score = score;
                pqTop.timestamp = modifiedDate.get(doc);
                pqTop = pq.updateTop();
            }

        };
    }

    @Override
    public boolean needsScores() {
        return true;
    }
}

另外，要执行第二次排序，需要向ScoreDoc添加一个附加字段

public class ScoreDocWithTime extends ScoreDoc {
    public long timestamp;

    public ScoreDocWithTime(long timestamp, int doc, float score) {
        super(doc, score);
        this.timestamp = timestamp;
    }

    public ScoreDocWithTime(long timestamp, int doc, float score, int shardIndex) {
        super(doc, score, shardIndex);
        this.timestamp = timestamp;
    }
}

并创建一个自定义优先级队列来支持这一点。

public class HitQueueWithTime extends PriorityQueue<ScoreDocWithTime> {

    public HitQueueWithTime(int numHits, boolean b) {
        super(numHits, b);
    }

    @Override
    protected ScoreDocWithTime getSentinelObject() {
        return new ScoreDocWithTime(0, Integer.MAX_VALUE, Float.NEGATIVE_INFINITY);
    }

    @Override
    protected boolean lessThan(ScoreDocWithTime hitA, ScoreDocWithTime hitB) {
        if (hitA.score == hitB.score)
            return (hitA.timestamp == hitB.timestamp) ?
                    hitA.doc > hitB.doc :
                    hitA.timestamp < hitB.timestamp;
        else
            return hitA.score < hitB.score;

    }
}

在此之后，您可以根据需要搜索结果。见下面的例子

public class SearchTest {

    public static void main(String[] args) throws IOException {
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new StandardAnalyzer());
        Directory directory = new RAMDirectory();
        IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);

        addDoc(indexWriter, "w1", 1000);
        addDoc(indexWriter, "w1", 3000);
        addDoc(indexWriter, "w1", 500);
        addDoc(indexWriter, "w1 w2", 1000);
        addDoc(indexWriter, "w1 w2", 3000);
        addDoc(indexWriter, "w1 w2", 2000);
        addDoc(indexWriter, "w1 w2", 5000);

        final IndexReader indexReader = DirectoryReader.open(indexWriter, false);
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        BooleanQuery query = new BooleanQuery();
        query.add(new TermQuery(new Term("desc", "w1")), BooleanClause.Occur.SHOULD);
        query.add(new TermQuery(new Term("desc", "w2")), BooleanClause.Occur.SHOULD);

        CustomCollector results = new CustomCollector(100);
        indexSearcher.search(query, results);
        TopDocs search = results.topDocs();
        for (ScoreDoc sd : search.scoreDocs) {
            Document document = indexReader.document(sd.doc);
            System.out.println(document.getField("desc").stringValue() + " " + ((ScoreDocWithTime) sd).timestamp);
        }

    }

    private static void addDoc(IndexWriter indexWriter, String decs, long modifiedDate) throws IOException {
        Document doc = new Document();
        doc.add(new TextField("desc", decs, Field.Store.YES));
        doc.add(new LongField("modifiedDate", modifiedDate, Field.Store.YES));
        doc.add(new NumericDocValuesField("modifiedDate", modifiedDate));
        indexWriter.addDocument(doc);
    }
}

程序将输出以下结果

P.S.这是Lucene 5.1的解决方案

票数 4

查看全部 2 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/34035405

复制

相似问题

问Lucene按分数排序，然后修改日期
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Lucene按分数排序，然后修改日期EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Lucene按分数排序，然后修改日期
EN