首页> 外文会议>International e-Conference on Advanced Science and Technology >A Novel Approach for Designing Indian Regional Language Based Raw-Text Extractor and Unicode Font-Mapping Tool
【24h】

A Novel Approach for Designing Indian Regional Language Based Raw-Text Extractor and Unicode Font-Mapping Tool

机译:一种设计基于印度区域语言的原始文本提取器和Unicode字体映射工具的新方法

获取原文

摘要

Extracting specific information from a collection of documents is called information extraction (IE). In general, the information on the a Web is well structured in HTML or XML format. And the work of IE from structured documents (in HTML or XML), basically uses learning techniques for pattern matching in the content. In this paper, we have proposed a novel approach for interactive information extraction technique. Here, we have described how this approach enables any naive user to extract Indian regional language based document from a Web document efficiently which is quite similar to a standard search engine. It is just similar to a pre-programmed information extraction engine.
机译:从文件集合中提取特定信息称为信息提取(即)。通常,Web的信息以HTML或XML格式结构良好。并且IE来自结构化文档(以HTML或XML)的工作基本上使用了内容中的模式匹配的学习技术。在本文中,我们提出了一种用于交互式信息提取技术的新方法。在这里,我们描述了这种方法如何使任何NAIVE用户能够有效地从Web文档中提取印度区域语言的文档,这与标准搜索引擎非常相似。它类似于预编程信息提取引擎。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号