Linguistic data mining with complex networks: A stylometric-oriented approach

Stanisz Tomasz; Kwapien Jaroslaw; Drozdz Stanislaw

首页> 外文期刊>Information Sciences: An International Journal >Linguistic data mining with complex networks: A stylometric-oriented approach

【24h】

Linguistic data mining with complex networks: A stylometric-oriented approach

机译：具有复杂网络的语言数据挖掘：面向训练轴的方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

By representing a text by a set of words and their co-occurrences, one obtains a word-adjacency network being a reduced representation of a given language sample. In this paper, the possibility of using network representation to extract information about individual language styles of literary texts is studied. By determining selected quantitative characteristics of the networks and applying machine learning algorithms, it is possible to distinguish between texts of different authors. Within the studied set of texts, English and Polish, a properly rescaled weighted clustering coefficients and weighted degrees of only a few nodes in the word-adjacency networks are sufficient to obtain the authorship attribution accuracy over 90%. A correspondence between the text authorship and the word-adjacency network structure can therefore be found. The network representation allows to distinguish individual language styles by comparing the way the authors use particular words and punctuation marks. The presented approach can be viewed as a generalization of the authorship attribution methods based on simple lexical features.

机译：通过代表一组单词及其共同发生的文本，获得一个单词邻接网络，其是给定语言样本的减少表示。在本文中，研究了使用网络表示来提取关于文学文本的各个语言风格的信息的可能性。通过确定网络和应用机器学习算法的所选择的定量特性，可以区分不同作者的文本。在研究的文本集合中，英语和波兰语集中，单词邻接网络中仅几个节点的适当重新划分的加权聚类系数和加权度是足以获得90％以上的作者归属精度。因此可以找到文本作者身份和邻接网络结构之间的对应关系。网络表示允许通过比较作者使用特定单词和标点符号的方式来区分单个语言样式。呈现的方法可以被视为基于简单词汇特征的作者归因方法的概括。

著录项

来源
《Information Sciences: An International Journal》 |2019年第2019期|共20页
作者
Stanisz Tomasz; Kwapien Jaroslaw; Drozdz Stanislaw;
展开▼
作者单位

Polish Acad Sci Inst Nucl Phys Complex Syst Theory Dept Ul Radzikowskiego 152 PL-31342 Krakow Poland;

Polish Acad Sci Inst Nucl Phys Complex Syst Theory Dept Ul Radzikowskiego 152 PL-31342 Krakow Poland;

Polish Acad Sci Inst Nucl Phys Complex Syst Theory Dept Ul Radzikowskiego 152 PL-31342 Krakow Poland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动信息理论;计算机的应用;信息与知识传播;自动化技术、计算机技术;
关键词
Complex networks; Natural language; Data mining; Stylometry; Authorship attribution;

机译：复杂网络;自然语言;数据挖掘;练习术;作者归因;

相似文献

外文文献
中文文献
专利

1. Linguistic data mining with complex networks: A stylometric-oriented approach [J] . Stanisz Tomasz, Kwapien Jaroslaw, Drozdz Stanislaw Information Sciences: An International Journal . 2019,第期

机译：具有复杂网络的语言数据挖掘：面向训练轴的方法
2. Linguistic complex networks: Rationale, application, interpretation, and directions Reply to comments on "Approaching human language with complex networks" [J] . Cong Jin, Liu Haitao Physics of life reviews . 2014,第4期

机译：语言复杂网络：原理，应用，解释和指示答复关于“使用复杂网络处理人类语言”的评论
3. A data-driven optimization of large-scale dry port location using the hybrid approach of data mining and complex network theory [J] . Transportation Research . 2020,第Feba期

机译：数据挖掘与复杂网络理论的混合方法，用于大型陆港位置的数据驱动优化
4. Data Mining via Linguistic Summaries of Data: An Interactive Approach [C] . Janusz KACPRZYK, Slawomir ZADROZNY International conference on soft computing and information/intelligent systems . 1998

机译：通过数据语言摘要进行数据挖掘：一种交互式方法
5. Statistical analysis of relational data: Mining and modeling complex networks. [D] . Wilson, James David. 2015

机译：关系数据的统计分析：复杂网络的挖掘和建模。
6. Parasite infection of public databases: a data mining approach to identify apicomplexan contaminations in animal genome and transcriptome assemblies [O] . Janus Borner, Thorsten Burmester 2017

机译：公共数据库的寄生虫感染：一种数据挖掘方法用于识别动物基因组和转录组中的apicomplexan污染
7. Linguistic data mining with complex networks: A stylometric-oriented approach [O] . Tomasz Stanisz, Jarosław Kwapień, Stanisław Drożdż 2019

机译：具有复杂网络的语言数据挖掘：面向训练轴的方法

Linguistic data mining with complex networks: A stylometric-oriented approach

摘要

著录项

相似文献

相关主题

期刊订阅