With the increase of information of LAN, personalization and lightweight search engine has been concerned and admired. This paper realizes the retrieval of multi-type content using Lucene, JSP, struts2 etc, after studying of the principle of search engine on local area network. Experiment proves that the system can extract and analyze text of HTML, PDF, Word, txt, besides, the system is open, extended, real-time and safe. It achieves the anticipated results successfully.%随着局域网信息的海量增长,个性化的轻量级搜索引擎已经被中、小型企业和校园关注和青睐.本文在研究搜索引擎基本原理的基础上,通过Lucene、JSP和Struts2等技术实现多种类型文件的文本内容的检索功能.测试结果表明,该系统实现了局域网内部对HTML、PDF、Word、txt等格式文件的内容提取和解析,具有开放性、可扩展、实时性和安全的特点,成功达到了预期目标.
展开▼