【24h】

Text-Mining: Application Development Challenges

机译:文本挖掘:应用程序发展挑战

获取原文

摘要

This paper reviews the best practices and challenges for project managers and developers involved in implementing text-mining applications. With focus on rule-based information extraction, and references to actual cases, the authors share their experiences from having developed several text-mining applications in diverse industries. First, project management issues are discussed, including a process for capturing business requirements and mapping them into features and linguistic patterns, development of linguistic rules, rule development standards, performance metrics, and an evaluation methodology. Linguistic representations such as sub-syntactic, syntactic, semantic, and application-specific rules are identified. Special emphasis is placed on post-information extraction processing, such as improving the relevance of the extracted information, summarization models, techniques for handling typographical errors, resolution of temporal information, anaphora resolution, and a discussion on shallow vs. full parsing. Lastly, the paper discusses various utilities to help with the development of a text-mining application, such as feature analysis, visualization, source document pre-processing, and rule authoring tools.
机译:本文介绍了参与实施文本挖掘应用程序的项目经理和开发人员的最佳实践和挑战。专注于基于规则的信息提取和对实际情况的引用,作者分享了他们在各种行业开发了几种文本挖掘应用的经验。首先,讨论了项目管理问题,包括捕获业务需求并将其映射到特征和语言模式,语言规则,规则开发标准,绩效指标以及评估方法的过程中映射它们的过程。识别出语言表征,例如子句法,语法,语义和应用程序特定规则。特别强调在信息后提取处理中,例如提高提取的信息,摘要模型,处理印刷错误的技术,解决时间信息,Anaphora解决方案的技术以及浅与完全解析的讨论。最后,本文讨论了各种实用程序,帮助开发文本挖掘应用程序,例如特征分析,可视化,源文档预处理和规则创作工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号