Chinese Sentence Pattern Feature Extraction Based on Massive Data Analysis

机译：基于海量数据分析的汉语句子模式特征提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the era of Data Technology, the data is characterized by huge scale, modal diversity, and rapid growth. The worth of corpus related to Chinese is also increased by multiplication correspondingly. Based on one of the Chinese language processing systems called the Language Technology Platform (LTP), using the Data Mining and the Machine Learning to extract and apply Chinese sentence features is a new perspective and entry point in the field of Chinese information processing. In this paper, the dependency grammar is selected for sentence pattern analysis, and the text representation model consisting of sequences and vectors is established. A specialized “Chinese Sentence Pattern Retrieve Library” including 1,032,480 sentences and 92,451 kinds of sentence patterns is built to provide a sentence pattern database service for more special sentence patterns studies. On the basis of this database, relevant statistics and preliminary analysis are made on the sentence patterns of different genres articles. It is found that there are about 2,000 core sentence patterns in Chinese and commonly used sentence patterns are relatively concentrated, with the frequency of the 10 sentence patterns with a higher frequency accounting for 50%. The proportion of some sentence patterns used in certain articles is much higher or lower than that in other articles. These researches achievements provide the basis for the establishment of the feature vectors of the sentence pattern in the article and offers a basis for feature extraction and application of articles in the later period.

机译：在数据技术时代，数据的特点是规模巨大，模式多样且增长迅速。与汉语相关的语料库的价值也相应地通过相乘而增加。基于一种称为语言技术平台（LTP）的中文处理系统，使用数据挖掘和机器学习来提取和应用中文句子特征是中文信息处理领域的新视角和切入点。本文选择了依存语法进行句子模式分析，建立了由序列和向量组成的文本表示模型。建立了专门的“汉语句型检索库”，其中包括1,032,480个句子和92,451种句型，为更特殊的句型研究提供了句型数据库服务。在此数据库的基础上，对不同体裁文章的句型进行了相关统计和初步分析。结果发现，汉语中的核心句型约有2,000种，常用句型相对集中，频率较高的10种句型占50％。在某些文章中使用的某些句子模式所占的比例远高于或低于其他文章。这些研究成果为在文章中建立句型特征向量提供了基础，并为以后的文章特征提取和应用奠定了基础。

著录项

来源
《IEEE International Conference on Artificial Intelligence and Computer Applications》|2020年|77-81|共5页
会议地点
作者
Zhengchen Cao; Jingchang Pan; Jiahao Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Feature extraction; Databases; Syntactics; Grammar; Data models; Data mining;

机译：特征提取;数据库;句法;语法;数据模型;数据挖掘;

相似文献

外文文献
中文文献
专利

1. Discovering Chinese sentence patterns for feature-based opinion summarization [J] . Huang Shiu-Li, Cheng Wen-Chi Electronic commerce research and applications . 2015,第1a6期

机译：发现中文句子模式以进行基于特征的意见汇总
2. Feature extraction from massive, dynamic computational data based on proper orthogonal decomposition and feature mining [J] . Yi Wang, Jing Qian, Hongjun Song, Journal of visualization . 2014,第4期

机译：基于适当的正交分解和特征挖掘从大量动态计算数据中提取特征
3. An Opinion Feature Extraction Approach Based on a Multidimensional Sentence Analysis Model [J] . JIUNN-LIANG GUO, JHIH-EN PENG, HEI-CHIA WANG Cybernetics and Systems . 2013,第5a8期

机译：基于多维句子分析模型的观点特征提取方法
4. An Analysis of the Sentence Pattern of Preverbal Object in Ancient Chinese: Taking Interrogative Sentence as an Example [C] . Haiyan Li International Conference on Culture, Education and Financial Development of Modern Society . 2017

机译：古代素食物体句子模式分析：以疑问句为例
5. Discriminant analysis based feature extraction for pattern recognition. [D] . Wu, Wei. 2009

机译：基于判别分析的特征提取用于模式识别。
6. Integrating multiple immunogenetic data sources for feature extraction and mining somatic hypermutation patterns: the case of towards analysis in chronic lymphocytic leukaemia [O] . Ioannis Kavakiotis, Aliki Xochelli, Andreas Agathangelidis, 2016

机译：集成多个免疫遗传学数据源以进行特征提取和挖掘体细胞高突变模式：在慢性淋巴细胞性白血病中趋向分析的案例
7. Massive power device condition monitoring data feature extraction and clustering analysis using MapReduce and graph model [O] . Hongtao Shen, Peng Tao, Pei Zhao, 2019

机译：庞大电源设备条件监控数据特征提取和使用MapRaduce和图模型的聚类分析

Chinese Sentence Pattern Feature Extraction Based on Massive Data Analysis

摘要

著录项

相似文献

相关主题

期刊订阅