...
首页> 外文期刊>Procedia Computer Science >Influence of Lexical, Syntactic and Structural Features and their Combination on Authorship Attribution for Telugu Text
【24h】

Influence of Lexical, Syntactic and Structural Features and their Combination on Authorship Attribution for Telugu Text

机译:词法,句法和结构特征及其组合对泰卢固语文本作者权归属的影响

获取原文
           

摘要

Authorship attribution (AA) is the task of identifying author of an unknown text from the known author set. Authorship Attribution can be viewed is a problem of text classification. AA is based on the classification of documents on author writing style rather than the topic of the text. In this paper experimental evaluations were carried out on Telugu text for Authorship Attribution using various types of features and their combinations. Feature vectors were formed for the training set using lexical, syntactic and structural features and their combinations. Learned model was generated for each these vectors and performance of the learned model is calculated using F1 metric and accuracy. More number of features can slow down the model performance. Features which are not relevant or not more relevant were eliminated from the feature vectors using chi-square metric. Support Vector Machine (SVM) algorithm is used as a classifier to generate the learned model for each dimensional feature vector. This learned model is used to assign the anonymous text to one of the known authors.
机译:作者身份归因(AA)是从已知作者集中识别未知文本的作者的任务。可以查看作者身份归属是文本分类的问题。 AA基于作者写作风格而不是文本主题的文件分类。在本文中,使用各种类型的功能及其组合,对泰卢固语文本的作者身份归属进行了实验评估。使用词汇,句法和结构特征及其组合为训练集形成特征向量。为每个这些向量生成学习模型,并使用F1度量和准确性计算学习模型的性能。更多的功能会降低模型的性能。使用卡方度量从特征向量中消除了不相关或不相关的特征。支持向量机(SVM)算法用作分类器,以生成每个维特征向量的学习模型。该学习的模型用于将匿名文本分配给一位已知作者。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号