前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Lucene入门实例

Lucene入门实例

作者头像
卡尔曼和玻尔兹曼谁曼
发布2019-01-22 17:40:05
7780
发布2019-01-22 17:40:05
举报

下面的这个例子摘自Lucene in Action (2010版本),上面的示例使用的是Lucene 3.x,现在的Lucene最新版本是4.10.3。由于Lucene2.x和3.x,3.x和4.x的API变化还是挺大的,所以书上面的示例不能在4.x下运行。

下面的示例主要是从一堆文本文件中建立索引,然后根据建立的索引进行搜索的一个过程。

我使用的Lucene版本是4.10.2,其中我把源代码中Indexer和Searcher中的main方法,我使用JUnit测试框架写到了单元测试中(我使用的是JUnit4)。

在你自己的工程中要引入下面的3个jar包:lucene-core-4.10.2.jar,lucene-analyzers-common-4.10.2.jar,lucene-queryparser-4.10.2.jar

首先建立索引,Indexer类主要完成索引的建立。

代码语言:javascript
复制
package cn.tzy.lucene;

import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

/**
 * It takes two arguments:
 * A path to a directory where we store the Lucene index
 * A path to a directory that contains the files we want to index
 * @author Zhenyu Tan
 */
public class Indexer {
	
	private IndexWriter writer;
	
	public Indexer(String indexDir) throws IOException, ParseException {
		Directory dir = FSDirectory.open(new File(indexDir));
		// Create Lucene IndexWriter
		IndexWriterConfig config = new IndexWriterConfig(Version.parse("4.0.0"), new StandardAnalyzer());
		writer = new IndexWriter(dir, config);
	}
	
	public void close() throws IOException {
		// Close IndexWriter
		writer.close();
	}
	
	public int index(String dataDir, FileFilter filter) throws Exception {
		File[] files = new File(dataDir).listFiles();
		for (File file : files) {
			if (!file.isDirectory() && !file.isHidden() && file.exists() && file.canRead() && (filter != null && filter.accept(file))) {
				indexFile(file);
			}
		}
		// Return number of documents indexed
		return writer.numDocs();
	}
	
	private void indexFile(File file) throws Exception {
		System.out.println("Indexing " + file.getCanonicalPath());;
		Document doc = getDocument(file);
		// Return number of documents indexed
		writer.addDocument(doc);
	}
	
	protected Document getDocument(File file) throws Exception {
		Document doc = new Document();
		// Index file content
		doc.add(new TextField("content", new FileReader(file)));
		doc.add(new TextField("name", file.getName(), Field.Store.YES));
		// Index file path
		doc.add(new TextField("path", file.getCanonicalPath(), Field.Store.YES));
		return doc;
	}
	
	public static class TextFileFilter implements FileFilter {
		@Override
		public boolean accept(File pathname) {
			// Index .xml files only
			return pathname.getName().toLowerCase().endsWith(".xml");
		}
	}
}

然后根据索引进行搜索,Searcher类完成搜索:

代码语言:javascript
复制
package cn.tzy.lucene;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class Searcher {
	public static void search(String indexDir, String squery) throws IOException, ParseException {
		Directory dir = FSDirectory.open(new File(indexDir));
		// Open index
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher searcher = new IndexSearcher(reader);
		
		QueryParser parser = new QueryParser("content", new StandardAnalyzer());
		Query query = parser.parse(squery);
		long start = System.currentTimeMillis();
		TopDocs hits = searcher.search(query, 10);
		long end = System.currentTimeMillis();
		
		// Write search status
		System.out.println("Found " + hits.totalHits +
				" document(s) (in " + (end - start) +
				" milliseconds) that matched query '" +
				squery + "':");
		
		// Retrieve matching document
		for(ScoreDoc scoreDoc : hits.scoreDocs) {
			Document doc = searcher.doc(scoreDoc.doc);
			System.out.println(doc.get("path"));
		}
	}
}

下面是测试代码:

代码语言:javascript
复制
package cn.tzy.lucene.test;

import org.junit.Test;

import cn.tzy.lucene.Indexer;
import cn.tzy.lucene.Searcher;

public class LuceneTest {
	// Create Lucene index in this directory
	private String indexDir = "index";
	// Index *.xml files in this directory
	private String dataDir = "document";
	
	@Test
	public void indexerTest() throws Exception {
		long start = System.currentTimeMillis();
		Indexer indexer = new Indexer(indexDir);
		int numIndexed;
		try {
			numIndexed = indexer.index(dataDir, new Indexer.TextFileFilter());
		} finally {
			indexer.close();
		}
		long end = System.currentTimeMillis();
		System.out.println("Indexing " + numIndexed + " files took " + (end - start) + " milliseconds");
	}
	
	@Test
	public void searcherTest() throws Exception {
		String squery = "buffer";
		Searcher.search(indexDir, squery);
	}
}

indexerTest方法为dataDir文件夹下的文本文件建立索引,然后在indexDir文件夹生成索引文件。运行结果如下:

代码语言:javascript
复制
Indexing E:\EclipseWorkSpace\HelloLucene\document\AngleService-angleBetween.xml
Indexing E:\EclipseWorkSpace\HelloLucene\document\AngleService-interiorAngle.xml
...
(中间部分省略)
...
Indexing E:\EclipseWorkSpace\HelloLucene\document\SpatialAnalysisServices-measureArea.xml
Indexing E:\EclipseWorkSpace\HelloLucene\document\SpatialAnalysisServices-measureLength.xml
Indexing 137 files took 777 milliseconds

searcherTest方法查询包含buffer的文件,运行结果如下:

代码语言:javascript
复制
Found 16 document(s) (in 15 milliseconds) that matched query 'buffer':
E:\EclipseWorkSpace\HelloLucene\document\SpatialAnalysisServices-bufferAnalysis.xml
E:\EclipseWorkSpace\HelloLucene\document\RasterBufferProcess.xml
E:\EclipseWorkSpace\HelloLucene\document\GeoBufferProcess.xml
E:\EclipseWorkSpace\HelloLucene\document\RasterGrowProcess.xml
E:\EclipseWorkSpace\HelloLucene\document\GeoRandomProcess.xml
E:\EclipseWorkSpace\HelloLucene\document\RasterColorsProcess.xml
E:\EclipseWorkSpace\HelloLucene\document\RasterLakeProcess.xml
E:\EclipseWorkSpace\HelloLucene\document\RasterParamscaleProcess.xml
E:\EclipseWorkSpace\HelloLucene\document\RasterRandomProcess.xml
E:\EclipseWorkSpace\HelloLucene\document\RasterSurfcontourProcess.xml
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2015年02月04日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档