首页> 外文学位 >Faceted Search and Browsing of Indonesian Text Collection Using Shallow Parsing Techniques.
【24h】

Faceted Search and Browsing of Indonesian Text Collection Using Shallow Parsing Techniques.

机译:使用浅层解析技术对印度尼西亚文本集合进行多面搜索和浏览。

获取原文
获取原文并翻译 | 示例

摘要

Text search is a very useful way of retrieving document information from a particular website. The public generally use internet search engines over the local enterprise search engines, because the enterprise content is not cross linked and does not follow a page rank algorithm. On the other hand the enterprise search engine uses metadata information, which allows the user to specify the conditions that any retrieved document should meet. Therefore, using metadata information for searching will also be very useful. My thesis aims on developing an enterprise search engine using metadata information by providing advanced features like faceted navigation. The search engine data was extracted from various Indonesian web sources. Metadata information like person, organization, location, and sentiment analytic keyword entities should be tagged in each document to provide facet search capability. A shallow parsing technique like named entity recognizer is used for this purpose. There are more than 1500 entities that have been tagged in this process. These documents have been successfully converted into XML format and are indexed with "Apache Solr". It is an open source enterprise search engine with full text search and faceted search capabilities. The entities will be helpful for users to specify conditions and search faster through the large collection of documents. The user is assured results by clicking on a metadata condition. Since the sentiment analytic keywords are tagged with positive and negative values, social scientists can use these results to check for overlapping or conflicting organizations and ideologies. In addition, this tool is the first of its kind for the Indonesian language. The results are fetched much faster and with better accuracy.
机译:文本搜索是从特定网站检索文档信息的非常有用的方法。公众通常在本地企业搜索引擎上使用Internet搜索引擎,因为企业内容没有交叉链接并且不遵循页面排名算法。另一方面,企业搜索引擎使用元数据信息,该信息允许用户指定任何检索到的文档应满足的条件。因此,使用元数据信息进行搜索也将非常有用。本文旨在通过提供诸如多面导航的高级功能来开发使用元数据信息的企业搜索引擎。搜索引擎数据是从印度尼西亚的各种网络资源中提取的。应在每个文档中标记元数据信息,例如人,组织,位置和情感分析关键字实体,以提供方面搜索功能。为此,使用了诸如命名实体识别器之类的浅层解析技术。在此过程中已标记超过1500个实体。这些文档已成功转换为XML格式,并使用“ Apache Solr”建立了索引。它是一个开源的企业搜索引擎,具有全文搜索和多面搜索功能。这些实体将有助于用户指定条件并在大量文档中进行更快的搜索。通过单击元数据条件,可以确保为用户提供结果。由于情感分析关键字被标记为正值和负值,因此社会科学家可以使用这些结果来检查组织和意识形态的重叠或冲突。此外,该工具是印尼语言中的第一个此类工具。可以更快,更准确地获取结果。

著录项

  • 作者

    Sanaka, Srinivasa Raviteja.;

  • 作者单位

    Arizona State University.;

  • 授予单位 Arizona State University.;
  • 学科 Computer Science.
  • 学位 M.S.
  • 年度 2010
  • 页码 51 p.
  • 总页数 51
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:37:09

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号