Design and Prototype of a Large-Scale and Fully Sense-Tagged Corpus

机译：大规模和完全感觉标记的语料库的设计和原型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sense tagged corpus plays a very crucial role to Natural Language Processing, especially on the research of word sense disambiguation and natural language understanding. Having a large-scale Chinese sense tagged corpus seems to be very essential, but in fact, such large-scale corpus is the critical deficiency at the current stage. This paper is aimed to design a large-scale Chinese full text sense tagged Corpus, which contains over 110,000 words. The Academia Sinica Balanced Corpus of Modern Chinese (also named Sinica Corpus) is treated as the tagging object, and there are 56 full texts extracted from this corpus. By using the N-gram statistics and the information of collocation, the preparation work for automatic sense tagging is planned by combining the techniques and methods of machine learning and the probability model. In order to achieve a highly precise result, the result of automatic sense tagging needs the touch of manual revising.

机译：感知标记的语料库对自然语言处理起到非常重要的作用，尤其是关于词学歧义和自然语言理解的研究。拥有大规模的中国感觉标记的语料库似乎是非常重要的，但实际上，这种大规模的语料库是当前阶段的临界缺陷。本文旨在设计大规模的中国全文感觉标记标记的语料库，其中包含超过110,000个字。近代汉语的学术学（也名叫Sinica Corpus）是标记对象的，并且从这个语料库中提取了56个完整文本。通过使用N-GRAM统计和搭配信息，通过组合机器学习技术和方法和概率模型来规划用于自动感测标记的准备工作。为了实现高精度的结果，自动感测标签的结果需要手动修改的触摸。

著录项

来源
《International Conference on Large-Scale Knowledge Resources》|2008年||共8页
会议地点
作者
Sue-jin Ker; Chu-Ren Huang; Jia-Fei Hong; Shi-Yin Liu; Hui-Ling Jian; I-Li Su; Shu-Kai Hsieh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词
word sense disambiguation; sense tagged corpus; natural language processing; bootstrap method;

机译：词感消歧;感知标记语料库;自然语言处理;引导方法;

相似文献

外文文献
中文文献
专利

1. Design and performance of a high-pressure xenon gas TPC as a prototype for a large-scale neutrinoless double-beta decay search [J] . S Ban, M Hirose, A K Ichikawa, Prog. Theor. Exp. Phys. . 2020,第3期

机译：高压氙气TPC的设计和性能作为大型中微子双β衰变搜索的原型
2. Oxymoron generation using an association word corpus and a large-scale N-gram corpus [J] . Yamane Hiroaki, Hagiwara Masafumi Soft computing: A fusion of foundations, methodologies and applications . 2015,第4期

机译：使用关联词语料库和大规模N-gram语料库生成Oxymoron
3. The Anatomy of Prototypes: Prototypes as Filters, Prototypes as Manifestations of Design Ideas [J] . YOUN-KYUNG LIM, ERIK STOLTERMAN, JOSH TENENBERG ACM Transactions on Computer-Human Interaction . 2008,第2期

机译：原型剖析：原型作为过滤器，原型作为设计思想的体现
4. Design and Prototype of a Large-Scale and Fully Sense-Tagged Corpus [C] . Sue-jin Ker, Chu-Ren Huang, Jia-Fei Hong, Large-Scale Knowledge Resources: Construction and Application . 2008

机译：大型且完全带有感官标记语料库的设计和原型
5. Prototype For X (PFX): A Prototyping Framework to Support Product Design. [D] . Menold, Jessica Dolores. 2017

机译：X原型（PFX）：支持产品设计的原型框架。
6. A System-of-Systems Bio-Inspired Design Process: Conceptual Design and Physical Prototype of a Reconfigurable Robot Capable of Multi-Modal Locomotion [O] . Ning Tan, Zhenglong Sun, Rajesh Elara Mohan, 2019

机译：系统的生物启发设计过程：具有多模式运动能力的可重构机器人的概念设计和物理原型。
7. Design and performance of a high-pressure xenon gas TPC as a prototype for a large-scale neutrinoless double-beta decay search [O] . S Ban, M Hirose, A K Ichikawa, 2020

机译：高压氙气TPC的设计和性能作为大型中微子双β衰变搜索的原型

Design and Prototype of a Large-Scale and Fully Sense-Tagged Corpus

摘要

著录项

相似文献

相关主题

期刊订阅