Identifying Historical Period and Ethnic Origin of Documents Using Stylistic Feature Sets

机译：使用文体特征集识别文件的历史时期和民族起源

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text classification is an important and challenging research domain. In this paper, identifying historical period and ethnic origin of documents using stylistic feature sets is investigated. The application domain is Jewish Law articles written in Hebrew-Aramaic. Such documents present various interesting problems for stylistic classification. Firstly, these documents include words from both languages. Secondly, Hebrew and Aramaic are richer than English in their morphology forms. The classification is done using six different sets of stylistic features: quantitative features, orthographic features, topographic features, lexical features and vocabulary richness. Each set of features includes various baseline features, some of them formalized by us. SVM has been chosen as the applied machine learning method since it has been very successful in text classification. The quantitative set was found as very successful and superior to all other sets. Its features are domain-independent and language-independent. It will be interesting to apply these feature sets in general and the quantitative set in particular into other domains as well as into other.

机译：文本分类是一个重要且具有挑战性的研究领域。本文研究了使用文体特征集识别文件的历史时期和民族血统。应用领域是用希伯来语-阿拉姆语撰写的犹太法律文章。这样的文档提出了用于样式分类的各种有趣的问题。首先，这些文档包括来自两种语言的单词。其次，希伯来语和阿拉姆语在形态上比英语丰富。使用六套不同的文体特征进行分类：定量特征，正字特征，地形特征，词汇特征和词汇丰富性。每组功能都包括各种基准功能，其中一些功能是我们正式制定的。由于SVM在文本分类方面非常成功，因此已被选作应用的机器学习方法。发现定量组非常成功，并且优于所有其他组。它的功能是与域无关和与语言无关的。通常，将这些特征集尤其是定量集应用于其他领域以及其他领域将是很有趣的。

著录项

来源
《Discovery Science; Lecture Notes in Artificial Intelligence; 4265》|2006年|102-113|共12页
会议地点 Barcelona(ES)
作者
Yaakov HaCohen-Kerner; Hananya Beck; Elchai Yehudai; Dror Mughaz;
展开▼
作者单位

Department of Computer Science, Jerusalem College of Technology (Machon Lev) 21 Havaad Haleumi St., P.O.B. 16031, 91160 Jerusalem, Israel;

Department of Computer Science, Bar-Ilan University, 52900 Ramat-Gan, Israel;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词
入库时间 2022-08-26 13:59:27

相似文献

外文文献
中文文献
专利

1. STYLISTIC FEATURE SETS AS CLASSIFIERS OF DOCUMENTS ACCORDING TO THEIR HISTORICAL PERIOD AND ETHNIC ORIGIN [J] . Yaakov HaCohen-Kerner, Hananya Beck, Elchai Yehudai, Applied Artificial Intelligence . 2010,第8a10期

机译：根据其历史时期和民族起源将文体特征设置为文档的分类
2. WORDS AS CLASSIFIERS OF DOCUMENTS ACCORDING TO THEIR HISTORICAL PERIOD AND THE ETHNIC ORIGIN OF THEIR AUTHORS [J] . YAAKOV HACOHEN-KERNER, DROR MUGHAZ, HANANYA BECK, Cybernetics and Systems . 2008,第3期

机译：按其历史时期和其作者的民族起源作为文档分类的词汇
3. Recognizing the orthography changes for identifying the temporal origin on the example of the Balkan historical documents [J] . Brodic Darko, Amelio Alessia Neural computing & applications . 2019,第8期

机译：认识到识别BALKAN历史文档示例的时间原点的正射法变化
4. Identifying Historical Period and Ethnic Origin of Documents Using Stylistic Feature Sets [C] . Yaakov HaCohen-Kerner, Hananya Beck, Elchai Yehudai, International Conference on Discovery Science . 2006

机译：使用风格特征集识别历史时期和族裔血统
5. Literary theory: Historical origins, current constructs, derivative approaches and Atlantic Provinces Education Foundation document applications. [D] . Knox Lush, Linda Majella. 2002

机译：文学理论：历史渊源，当前结构，派生方法和大西洋省教育基金会的文献应用。
6. Origins in the USA in the 1980s of the warning that smokeless tobacco is not a safe alternative to cigarettes: a historical, documents-based assessment with implications for comparative warnings on less harmful tobacco/nicotine products [O] . Lynn T. Kozlowski 2012

机译：1980年代美国起源于无烟烟草不是卷烟的安全替代品的警告：一项基于历史的，基于文件的评估，对危害性较小的烟草/尼古丁产品的比较警告产生了影响
7. Performance Evaluation and Benchmarking of Six Texture-Based Feature Sets for Segmenting Historical Documents [O] . Mehri, Maroua, Mhiri, Mohamed, Héroux, Pierre, 2014

机译：用于分割历史文档的六个基于纹理的特征集的性能评估和基准测试

Identifying Historical Period and Ethnic Origin of Documents Using Stylistic Feature Sets

摘要

著录项

相似文献

相关主题

期刊订阅