信息科技技术的快速发展给我们的日常生活带来了诸多便利同时各个领域的相关数据文件也在不断增多.在越来越多的信息数据堆积的情况下,为了能够更快速地获取自己想要的信息就必须对文本信息进行分类,文本分类可以有效地查询到数据信息的同时还可以对现有的数据进行自动化管理和分类,这样既方便了数据存储也方便了数据查询.本文的主要研究内容为在利用Lucene实现对POI处理过后的中文文本进行全文检索的基础上,研究现阶段的一些主流的机器学习分类算法,利用Weka对中文文本进行自动分类,以提高平台的查询效率,设计实现一个针对中文文本的自动分类检索平台,具有较强的实现价值.%With the rapid development of information technology,science and technology had play an important role in our daily lives and presented a lot of convenience in all areas.In the case of the accumulation of more and more information and data,in order to be able to more quickly process the information and achieve the information we want, the text need to classify.Text classification can be efficiently querying the data information while also automatic manage and classify the existing data.This make data storage more conveniently and data query more effectively.The main con-tent of this paper is based on the use POI to process the Chinese text and use Lucene to achieve the full-text search abil-ity.After this,the open source Weka is used to classify the text to make the query of user more efficient.
展开▼