Mining Large-sca e Comparab e Corpora from Chinese-Eng ish News Co ections

机译：矿业大型SCA E比较来自中国英语新闻连接

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we explore a CLIR based approach to constr ct large scale Chi nese English comparable corpora, which is val able for translation knowledge mining. The initial so rce and target doc ment sets are crawled from news website and standardized niformly. Keywords are extracted from the so rce doc ment firstly, and then the extracted keywords are translated and combined as q ery words thro gh certain criteria to retrieve against the index created sing target doc ment set. Meanwhile, the mapping correlations between so rce and target doc ments are developed accord ing to the val e of similarity calc lated by the retrieval tool. Two methods are eval ated to filter the comparable doc ment pairs so as to ens re the q ality of the comparable corpora. Experimental re s lts indicate that o r approach is effec tive on the constr ction of Chinese English comparable corpora.

机译：在本文中，我们探讨了基于CLIR的CT大规模Chi Neese英语比较的方法，这是瓦尔能够进行翻译知识挖掘。初始所以RCE和目标DOC MET集可以从新闻网站爬出并标准化。首先从SO RCE Doc Ment中提取关键字，然后将提取的关键字转换并将其组合为Q ery单词Thro GH某些标准来检索对索引创建的Sing目标Doc Ment集合。同时，根据检索工具的相似性计算的Val E开发了所以RCE和目标DOC分子之间的映射相关性。两种方法是用于过滤相可的DOC分对的评估，以便为可比较的语料库的Q为Q而获得。实验RES LTS表明，o r方法对中国英语比较Corpora的Contric CTION有效。

著录项

来源
《International conference on computational linguistics》|2010年||共9页
会议地点
作者
Degen Huang; Lian Zhao; Lishuang Li; Haitao Yu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Analysing headlines as a way of downsizing news corpora: Evidence from an Arabic-English comparable corpus of newspaper articles [J] . Haider Ahmad S., Hussein Riyad F. Literary & linguistic computing . 2020,第4期

机译：分析头条新闻作为缩小新闻学习的方式：来自阿拉伯语 - 英语的证据报纸文章
2. Mining English-Chinese Named Entity Pairs from Comparable Corpora [J] . LISHUANG LI, PENG WANG, DEGEN HUANG, ACM transactions on Asian language information processing . 2011,第4期

机译：从可比语料库中挖掘英汉命名实体对
3. SENTIMENT MINING AND ANALYSIS OVER TEXT CORPORA VIA COMPLEX DEEP LEARNING NEURAL ARCHITECTURES [J] . TERESA ALCAMO, ALFREDO CUZZOCREA, GIOVANNI PILATO, Journal of Data Intelligence . 2021,第4期

机译：通过复杂的深度学习神经结构对文本语料库的情感挖掘和分析
4. Mining Large-sca e Comparab e Corpora from Chinese-Eng ish News Co ections [C] . Degen Huang, Lian Zhao, Lishuang Li, Workshop on multiword expressions: from theory to application. . 2010

机译：从汉英新闻集中挖掘大型可比语料库
5. Mining for evidence in enterprise corpora. [D] . Almquist, Brian Alan. 2011

机译：在企业语料库中挖掘证据。
6. Redundancy in electronic health record corpora: analysis impact on text mining performance and mitigation strategies [O] . Raphael Cohen, Michael Elhadad, Noémie Elhadad 2013

机译：电子病历语料库中的冗余：分析对文本挖掘性能的影响和缓解策略
7. Document classification of SuDer Turkish news corpora [O] . Mehmet Umut Sen, Berrin Yanikoglu 2018

机译：文档分类Suder Turkish News Corpora

Mining Large-sca e Comparab e Corpora from Chinese-Eng ish News Co ections

摘要

著录项

相似文献

相关主题

期刊订阅