首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >Lucene按分数排序,然后修改日期

Lucene按分数排序,然后修改日期
EN

Stack Overflow用户
提问于 2015-12-02 05:15:03
回答 2查看 4.9K关注 0票数 5

我的文档中有三个字段

  1. 标题
  2. 内容
  3. 修改日期

所以当我搜索一个term时,它是按照score排序的结果给出的

现在,我想进一步对基于modifiedDate的相同分数的结果进行排序,即在相同分数的顶部显示最近的文档。

我试过按分数排序,修改日期,但不起作用。有人能给我指明正确的方向吗?

EN

Stack Overflow用户

发布于 2015-12-02 09:09:44

您可以使用自定义收集器来解决这个问题。它将根据分数,然后根据时间戳对结果进行排序。在这个收集器中,您应该检索时间戳值,以便进行第二次排序。见下面的类

代码语言:javascript
运行
复制
public class CustomCollector extends TopDocsCollector<ScoreDocWithTime> {

    ScoreDocWithTime pqTop;

    // prevents instantiation
    public CustomCollector(int numHits) {
        super(new HitQueueWithTime(numHits, true));
        // HitQueue implements getSentinelObject to return a ScoreDoc, so we know
        // that at this point top() is already initialized.
        pqTop = pq.top();
    }

    @Override
    public LeafCollector getLeafCollector(LeafReaderContext context)
            throws IOException {
        final int docBase = context.docBase;
        final NumericDocValues modifiedDate =
                DocValues.getNumeric(context.reader(), "modifiedDate");

        return new LeafCollector() {
            Scorer scorer;


            @Override
            public void setScorer(Scorer scorer) throws IOException {
                this.scorer = scorer;
            }

            @Override
            public void collect(int doc) throws IOException {
                float score = scorer.score();

                // This collector cannot handle these scores:
                assert score != Float.NEGATIVE_INFINITY;
                assert !Float.isNaN(score);

                totalHits++;
                if (score <= pqTop.score) {
                    // Since docs are returned in-order (i.e., increasing doc Id), a document
                    // with equal score to pqTop.score cannot compete since HitQueue favors
                    // documents with lower doc Ids. Therefore reject those docs too.
                    return;
                }
                pqTop.doc = doc + docBase;
                pqTop.score = score;
                pqTop.timestamp = modifiedDate.get(doc);
                pqTop = pq.updateTop();
            }

        };
    }

    @Override
    public boolean needsScores() {
        return true;
    }
}

另外,要执行第二次排序,需要向ScoreDoc添加一个附加字段

代码语言:javascript
运行
复制
public class ScoreDocWithTime extends ScoreDoc {
    public long timestamp;

    public ScoreDocWithTime(long timestamp, int doc, float score) {
        super(doc, score);
        this.timestamp = timestamp;
    }

    public ScoreDocWithTime(long timestamp, int doc, float score, int shardIndex) {
        super(doc, score, shardIndex);
        this.timestamp = timestamp;
    }
}

并创建一个自定义优先级队列来支持这一点。

代码语言:javascript
运行
复制
public class HitQueueWithTime extends PriorityQueue<ScoreDocWithTime> {

    public HitQueueWithTime(int numHits, boolean b) {
        super(numHits, b);
    }

    @Override
    protected ScoreDocWithTime getSentinelObject() {
        return new ScoreDocWithTime(0, Integer.MAX_VALUE, Float.NEGATIVE_INFINITY);
    }

    @Override
    protected boolean lessThan(ScoreDocWithTime hitA, ScoreDocWithTime hitB) {
        if (hitA.score == hitB.score)
            return (hitA.timestamp == hitB.timestamp) ?
                    hitA.doc > hitB.doc :
                    hitA.timestamp < hitB.timestamp;
        else
            return hitA.score < hitB.score;

    }
}

在此之后,您可以根据需要搜索结果。见下面的例子

代码语言:javascript
运行
复制
public class SearchTest {

    public static void main(String[] args) throws IOException {
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new StandardAnalyzer());
        Directory directory = new RAMDirectory();
        IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);

        addDoc(indexWriter, "w1", 1000);
        addDoc(indexWriter, "w1", 3000);
        addDoc(indexWriter, "w1", 500);
        addDoc(indexWriter, "w1 w2", 1000);
        addDoc(indexWriter, "w1 w2", 3000);
        addDoc(indexWriter, "w1 w2", 2000);
        addDoc(indexWriter, "w1 w2", 5000);

        final IndexReader indexReader = DirectoryReader.open(indexWriter, false);
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        BooleanQuery query = new BooleanQuery();
        query.add(new TermQuery(new Term("desc", "w1")), BooleanClause.Occur.SHOULD);
        query.add(new TermQuery(new Term("desc", "w2")), BooleanClause.Occur.SHOULD);

        CustomCollector results = new CustomCollector(100);
        indexSearcher.search(query, results);
        TopDocs search = results.topDocs();
        for (ScoreDoc sd : search.scoreDocs) {
            Document document = indexReader.document(sd.doc);
            System.out.println(document.getField("desc").stringValue() + " " + ((ScoreDocWithTime) sd).timestamp);
        }

    }

    private static void addDoc(IndexWriter indexWriter, String decs, long modifiedDate) throws IOException {
        Document doc = new Document();
        doc.add(new TextField("desc", decs, Field.Store.YES));
        doc.add(new LongField("modifiedDate", modifiedDate, Field.Store.YES));
        doc.add(new NumericDocValuesField("modifiedDate", modifiedDate));
        indexWriter.addDocument(doc);
    }
}

程序将输出以下结果

代码语言:javascript
运行
复制
w1 w2 5000
w1 w2 3000
w1 w2 2000
w1 w2 1000
w1 3000
w1 1000
w1 500

P.S.这是Lucene 5.1的解决方案

票数 4
EN
查看全部 2 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/34035405

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档