Aplicação de text mining na deteção de evidência de fraude em documentos de texto




This article aims to test the results of applying some preprocessing and processing techniques in text of previously unknown information to automatic discovery information that may be potentially usable for fraud detection. The methodology used was the application of cleaning techniques, stopword removal, lemmatization and the creation of terms matrix by document from a set of Portuguese Republic Assembly (AR) Diaries as a way of comparing the results with the subject treated in the Parliamentary sessions previously annotated by cataloging professionals. The results obtained allowed to conclude that the removal of specific "stopwords" allow greater efficiency in the extraction of terms and keywords from the subjects addressed in the texts analyzed. This result may be applicable in a fraud audit scenario that involves selecting a significant number of documents for reading with previously unknown content.



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号