首页> 外文会议>Database systems for advanced applications >Automatically Classify Chinese Judgment Documents Utilizing Machine Learning Algorithms
【24h】

Automatically Classify Chinese Judgment Documents Utilizing Machine Learning Algorithms

机译:利用机器学习算法对中文判断文件进行自动分类

获取原文
获取原文并翻译 | 示例

摘要

In law, a judgment is a decision by a court that resolves a controversy and determines the rights and liabilities of parties in a legal action or proceeding. In 2013, China Judgments Online system was launched officially for record keeping and notification, up to now, over 23 million electronic judgment documents are recorded. The huge amount of judgment documents has witnessed the improvement of judicial justice and openness. Document categorization becomes increasingly important for judgments indexing and further analysis. However, it is almost impossible to categorize them manually due to their large volume and rapid growth. In this paper, we propose a machine learning approach to automatically classify Chinese judgment documents using machine learning algorithms including Naive Bayes (NB), Decision Tree (DT), Random Forest (RF) and Support Vector Machine (SVM). A judgment document is represented as vector space model (VSM) using TF-IDF after words segmentation. To improve performance, we construct a set of judicial stop words. Besides, as TF-IDF generates a high dimensional feature vector, which leads to an extremely high time complexity, we utilize three dimensional reduction methods. Based on 6735 pieces of judgment documents, extensive experiments demonstrate the effectiveness and high classification performance of our proposed method.
机译:在法律上,判决是法院的一项判决,可以解决争议,并确定当事人在法律诉讼或诉讼中的权利和义务。 2013年,中国审判在线系统正式启动,用于记录保存和通知,截止到目前,已记录的电子审判文件超过2300万张。大量的判决文件见证了司法公正和开放性的提高。文档分类对于判断索引和进一步分析变得越来越重要。但是,由于它们的数量大且增长迅速,几乎不可能对其进行手动分类。在本文中,我们提出了一种机器学习方法,该方法可以使用包括朴素贝叶斯(NB),决策树(DT),随机森林(RF)和支持向量机(SVM)在内的机器学习算法对中文判断文档进行自动分类。单词分割后,使用TF-IDF将判断文档表示为向量空间模型(VSM)。为了提高绩效,我们构建了一组司法停用词。此外,由于TF-IDF生成高维特征向量,从而导致极高的时间复杂度,因此我们使用了三维降维方法。基于6735个判断文件,大量实验证明了该方法的有效性和较高的分类性能。

著录项

  • 来源
  • 会议地点 Suzhou(CN)
  • 作者单位

    State Key Laboratory for Novel Software Technology, Software Institute, Nanjing University, Nanjing 210093, Jiangsu, China;

    State Key Laboratory for Novel Software Technology, Software Institute, Nanjing University, Nanjing 210093, Jiangsu, China;

    State Key Laboratory for Novel Software Technology, Software Institute, Nanjing University, Nanjing 210093, Jiangsu, China;

    State Key Laboratory for Novel Software Technology, Software Institute, Nanjing University, Nanjing 210093, Jiangsu, China;

    State Key Laboratory for Novel Software Technology, Software Institute, Nanjing University, Nanjing 210093, Jiangsu, China;

    State Key Laboratory for Novel Software Technology, Software Institute, Nanjing University, Nanjing 210093, Jiangsu, China;

    State Key Laboratory for Novel Software Technology, Software Institute, Nanjing University, Nanjing 210093, Jiangsu, China;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Chinese judgment documents; Text classification; TF-IDF; Support Vector Machine; Naive Bayes; Decision Tree; Random Forest; Judicial stop-words construction; Dimensional reduction;

    机译:中国判决文件;文字分类; TF-IDF;支持向量机朴素贝叶斯;决策树;随机森林司法停用词的建设;尺寸缩小;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号