Turkish word n-gram analyzing algorithms for a large scale Turkish corpus - TurCo

机译：大规模土耳其语料库的土耳其语n-gram分析算法-TurCo

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

To calculate some statistical properties of a language, first you need to take some samples of that language. That sample is called a corpus. An unbalanced large scale Turkish text corpus (TurCo) having /spl sim/362 MB capacity and more than 50 million words was prepared by using 12 different resources including Web sites and novels in Turkish language. Different algorithms were tested to obtain the n-gram (1/spl les/spl les/5) values. Efficiencies of different algorithms have been examined by applying them onto the each piece of the corpus one by one. Only detailed results of the two algorithms created without using database tables are given, because all the other algorithms need to run more than one day which makes those tests inefficient.

机译：要计算某种语言的某些统计属性，首先需要获取该语言的一些样本。该样本称为语料库。通过使用12种不同的资源（包括网站和土耳其语小说），准备了一个具有/ spl sim / 362 MB容量和超过5000万个单词的不平衡的大规模土耳其文本语料库（TurCo）。测试了不同的算法以获得n-gram（1 / spl les / n / spl les / 5）值。通过将算法应用到语料库的每一部分，已经研究了不同算法的效率。仅给出了在不使用数据库表的情况下创建的两种算法的详细结果，因为所有其他算法都需要运行超过一天的时间，这会使这些测试效率低下。

著录项

来源
《Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on》|2004年|p.236-240|共5页
会议地点
作者
Cebi Y.; Dalkilic G.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词
natural languages; text analysis; dictionaries; linguistics; word n-gram analyzing algorithm; Turkish text corpus; language statistical properties; programming language;

机译：自然语言;文本分析;词典;语言学;单词n-gram分析算法;土耳其语文本语料库;语言统计属性;编程语言;

相似文献

外文文献
中文文献
专利

1. Oxymoron generation using an association word corpus and a large-scale N-gram corpus [J] . Yamane Hiroaki, Hagiwara Masafumi Soft computing: A fusion of foundations, methodologies and applications . 2015,第4期

机译：使用关联词语料库和大规模N-gram语料库生成Oxymoron
2. Turkish synonym identification from multiple resources: monolingual corpus, mono/bilingual online dictionaries, and WordNet [J] . TU?BA YILDIZ, BANU D?R?, SAVA? YILDIRIM Turkish Journal of Electrical Engineering and Computer Sciences . 2017,第2期

机译：来自多种资源的土耳其语同义词识别：单语语料库，单语/双语在线词典和WordNet
3. Finite wordlength analyzing for rls systolic algorithm based on the square-root-free scaled givens rotations [J] . Xiong Jun, Liao Guisheng, Wu Shunjun Journal of Electronics (CHINA) . 1997,第4期

机译：基于无平方根缩放比例给定旋转的rls收缩算法的有限字长分析
4. Turkish word n-gram analyzing algorithms for a large scale Turkish corpus - TurCo [C] . Cebi Y., Dalkilic G. International Conference on Information Technology Coding and Computing . 2004

机译：土耳其词N-GRAM分析大型土耳其语料库 - Turco的分析算法
5. Facing the new Turkey: The Turco-American Treaty of Lausanne, 1900–1927 [D] . Delgadillo, Charles Edward 2002

机译：面对新的土耳其：《 1900年至1927年的《突尼斯美籍突尼斯条约》
6. Reliability and Validity of the Turkish Version of the Glasgow-Edinburgh Throat Scale: Use for a Symptom Scale of Globus Sensation in Turkish Population [O] . Müge Özçelik Korkmaz, Arzu Tüzüner, Melike Bahçecitapar, 2020

机译：格拉斯哥-爱丁堡喉咙量表的土耳其语版本的可靠性和有效性：用于土耳其人口的Globus感觉症状量表

Turkish word n-gram analyzing algorithms for a large scale Turkish corpus - TurCo

摘要

著录项

相似文献

相关主题

期刊订阅