【24h】

Aplicação de text mining na deteção de evidência de fraude em documentos de texto

机译:文本挖掘在检测文本文档中欺诈证据的应用

获取原文

摘要

This article aims to test the results of applying some preprocessing and processing techniques in text of previously unknown information to automatic discovery information that may be potentially usable for fraud detection. The methodology used was the application of cleaning techniques, stopword removal, lemmatization and the creation of terms matrix by document from a set of Portuguese Republic Assembly (AR) Diaries as a way of comparing the results with the subject treated in the Parliamentary sessions previously annotated by cataloging professionals. The results obtained allowed to conclude that the removal of specific "stopwords" allow greater efficiency in the extraction of terms and keywords from the subjects addressed in the texts analyzed. This result may be applicable in a fraud audit scenario that involves selecting a significant number of documents for reading with previously unknown content.
机译:本文旨在测试将某些未知信息的文本中的某些预处理和处理技术应用于自动发现信息的结果,这些信息可能会用于欺诈检测。所使用的方法学是应用清洁技术,去除停用词,词组去词句化以及通过从一组葡萄牙共和国议会(AR)日记中的文档中创建术语矩阵的方法,以将结果与先前标注的议会会议处理的主题进行比较通过编目专业人员。获得的结果可以得出这样的结论,即删除特定的“停用词”可以提高从所分析文本中涉及的主题中提取术语和关键字的效率。此结果可能适用于欺诈审核场景,该场景涉及选择大量文档以供阅读,其中包含以前未知的内容。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号