首页> 外文会议>22nd National Online Meeting, May 15-17, 2001, New York >A NOVEL APPROACH TO AUTOMATIC GENRE CLASSIFICATION
【24h】

A NOVEL APPROACH TO AUTOMATIC GENRE CLASSIFICATION

机译:自动类型分类的新方法

获取原文
获取原文并翻译 | 示例

摘要

In general, natural language processing researchers are finding that statistical methods can do something that it was once thought could only be done by intellectual understanding, but there were not many fruitful experiments reported in genre recognition algorithm. In this context, we have looked into the question of distinguishing different genres of text by purely statistical means. To illustrate our approach, we report here on experiments to distinguish news journal article from government documents using only information about the relative frequencies of punctuation marks. In our pervious study, we have applied discriminant analysis to achieve about 80% of correct classification rate. In the experiment reported in this paper, we used other statistical techniques to improve our methods and finally we could push correct classification rate up to about 90%. The coefficients of the classifying equations may serve as genre signatures. The methods developed here can be used for automatic classification of web pages into different genres after stable genre signatures are detected.
机译:通常,自然语言处理研究人员发现统计方法可以完成以前认为只能通过智力理解来完成的工作,但是在体裁识别算法中却没有很多富有成果的实验报道。在这种情况下,我们研究了通过纯粹的统计手段来区分不同类型的文本的问题。为了说明我们的方法,我们在此处报告有关仅使用有关标点符号相对频率的信息来区分新闻期刊文章与政府文档的实验。在我们以前的研究中,我们应用了判别分析以达到正确分类率的80%。在本文报道的实验中,我们使用了其他统计技术来改进我们的方法,最后我们可以将正确的分类率提高到大约90%。分类方程的系数可以用作体裁特征。在检测到稳定的类型签名后,此处开发的方法可用于将网页自动分类为不同的类型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号