...
首页> 外文期刊>Journal of computer networking, wireless and mobile communications >A COMPARATIVE COST ANALYSIS OF TEXT BASED ON CLUSTERING AND ENTROPY MODEL
【24h】

A COMPARATIVE COST ANALYSIS OF TEXT BASED ON CLUSTERING AND ENTROPY MODEL

机译:基于聚类和熵模型的文本比较成本分析

获取原文
获取原文并翻译 | 示例

摘要

The World Wide Web is very large and fast growing source of information now days. Lot of this information is in the unstructured form of text which makes user hard to extract the information by query. To make the queries easy and to improve accuracy of result, template extraction technique is used .In the existing system the techniques which are used to extract the data are not efficient and causes the factors such as delay. The proposed system is presented with MDL Principle. System extracts the templates from a large number of web documents which are generated from heterogeneous templates. This helps web application like web search to improve performance. In addition the proposed technique makes use of a clustering technique to retrieve the web documents based on the similarity of underlying template structures in the documents. So the template for each cluster is extracted simultaneously with its fast approximation of clustering.
机译:如今,万维网是一个非常庞大且快速增长的信息来源。其中很多信息都是非结构化的文本形式,这使得用户很难通过查询提取信息。为了简化查询并提高结果的准确性,使用了模板提取技术。在现有系统中,用于提取数据的技术效率不高,并导致延迟等因素。所提出的系统提出了MDL原理。系统从大量由异构模板生成的 Web 文档中提取模板。这有助于 Web 搜索等 Web 应用程序提高性能。此外,所提出的技术利用聚类技术,根据文档中底层模板结构的相似性来检索Web文档。因此,每个聚类的模板都与聚类的快速近似同时提取。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号