首页> 外文会议>AAAI Workshop on Link Analysis >Relational Graph Analysis with Real-World Constraints: An Application in IRS Tax Fraud Detection
【24h】

Relational Graph Analysis with Real-World Constraints: An Application in IRS Tax Fraud Detection

机译:与实际约束的关系图分析:IRS税务欺诈检测中的应用

获取原文

摘要

In this paper, we describe research and application of relational graph mining in IRS investigations. One key scenario in this domain is the iterative construction of models for identifying tax fraud. For example, an investigator may be interested in understanding variations in schemes involving individuals sending money off-shore. This domain lends itself naturally to a graph representation with entities and their relationships represented as node and edges, respectively. There are two critical constraints in this application which make it unsuitable for existing work on relational graph mining. First, our data set is large (20 million nodes, 20 million edges, in 500GB) and includes multiple types of entities and relationships. Second, due to both the size, and the active nature of this data, it is necessary to do the mining directly against the database. Extracting and maintaining a separate data store would be impractical and costly to maintain. We focus on describing our approach to one of the core tasks in this process: allowing the investigator to mine potentially illegal activity by iteratively suggesting and refining loosely defined scenarios. Our current methodology combines three components: (1) a graph representation language which allows flexibility for inexact matches, (2) custom data structures, combined with dynamically generated sequences of SQL queries, to perform efficient mining directly against the database, and (3) exploiting cost-based optimization information to help improve our results search. A prototype solution has been deployed and used by the IRS and has resulted in both identification of criminal activity and accolades for ease of use and efficacy.
机译:在本文中,我们描述了IRS调查中关系图挖掘的研究和应用。该领域的一个关键情景是识别税务欺诈的模型的迭代构建。例如,调查员可能有兴趣了解涉及送资金的计划的方案的变化。该域自然地利用与实体的图表表示,它们分别表示为节点和边的关系。本申请中有两个关键约束,使其不适合现有的关系图挖掘。首先,我们的数据集很大(2000万节点,500GB),包括多种类型的实体和关系。其次,由于尺寸和这种数据的主动性质,有必要直接对数据库进行挖掘。提取和维护单独的数据存储是维护的不切实际和昂贵的。我们专注于将我们的方法描述为此进程中的一个核心任务:允许调查人员通过迭代地提出和精制摆动定义的情景来挖掘可能的非法活动。我们当前的方法组合了三个组件:(1)图表表示语言,允许灵活性匹配,(2)自定义数据结构,与动态生成的SQL查询序列组合,直接对数据库进行有效的挖掘,(3)利用基于成本的优化信息,以帮助改进我们的结果搜索。美国国税局部署并使用了原型解决方案,并导致犯罪活动的识别,易于使用和疗效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号