首页> 外文会议>Foundations of Intelligent Systems; Lecture Notes in Artificial Intelligence; 4203 >Exploring Phrase-Based Classification of Judicial Documents for Criminal Charges in Chinese
【24h】

Exploring Phrase-Based Classification of Judicial Documents for Criminal Charges in Chinese

机译:基于短语的汉语刑事指控司法文件分类研究

获取原文
获取原文并翻译 | 示例

摘要

Phrases provide a better foundation for indexing and retrieving documents than individual words. Constituents of phrases make other component words in the phrase less ambiguous than when the words appear separately. Intuitively, classifiers that employ phrases for indexing should perform better than those that use words. Although pioneers have explored the possibility of indexing English documents decades ago, there are relatively fewer similar attempts for Chinese documents, partially because segmenting Chinese text into words correctly is not easy already. We build a domain dependent word list with the help of Chien's PAT tree-based method and HowNet, and use the resulting word list for defining relevant phrases for classifying Chinese judicial documents. Experimental results indicate that using phrases for indexing indeed allows us to classify judicial documents that are closely similar to each other. With a relatively more efficient algorithm, our classifier offers better performances than those reported in related works.
机译:短语比单个单词为索引和检索文档提供了更好的基础。短语的组成部分使短语中的其他组成词的歧义性比单词单独出现时的歧义性低。直观地讲,使用短语进行索引的分类器应比使用单词的分类器表现更好。尽管数十年前,先驱们已经探索了对英语文档建立索引的可能性,但是对中文文档进行类似尝试的次数相对较少,部分原因是将中文文本正确地分割成单词已经不容易了。我们借助Chien的基于PAT树的方法和HowNet来构建依赖于域的单词列表,并使用生成的单词列表来定义用于对中国司法文档进行分类的相关短语。实验结果表明,使用短语进行索引确实可以使我们对彼此非常相似的司法文件进行分类。通过相对高效的算法,我们的分类器提供了比相关工作中报告的更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号