首页> 外文会议>International Workshop on Computational Processing of the Portuguese Language >Crawling by Readability Level

【24h】

Crawling by Readability Level

机译：通过可读性水平爬行

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The availability of annotated corpora for research in the area of Readability Assessment is still very limited. On the other hand, the Web is increasingly being used by researchers as a source of written content to build very large and rich corpora, in the Web as Corpus (WaC) initiative. This paper proposes a framework for automatic generation of large corpora classified by readability. It adopts a supervised learning method to incorporate a readability filter based in features with low computational cost to a crawler, to collect texts targeted at a specific reading level. We evaluate this framework by comparing a readability-assessed web crawled corpus to a reference corpus (Both corpora are available in http://www. inf.ufrgs.br/pln/resource/CrawlingByReadabilityLevel.zip.). The results obtained indicate that these features are good at separating texts from level 1 (initial grades) from other levels. As a result of this work two Portuguese corpora were constructed: the Wikilivros Readability Corpus, classified by grade level, and a crawled WaC classified by readability level.

机译：可读性评估领域的研究的注释语料库仍然非常有限。另一方面，研究人员越来越多地用于书面内容的来源，以在网站中构建非常大而富有的基层，作为语料库（WAC）计划。本文提出了一种自动生成典型的大型公司的框架。它采用监督学习方法，以基于具有低计算成本的特征来纳入可读性滤波器，以将针对特定读取级别的文本收集。我们通过将可读性评估的网站爬到引用语料库进行比较来评估此框架（这两种Corpora在http：// www中提供。inf.ufrgs.br/pln/resource/crawlingbyreadabilitylevel.zip。）。获得的结果表明，这些特征擅长将文本与其他级别分离出来的文本（初始等级）。由于这项工作，建造了两种葡萄牙语学数：按年级水平分类的维基利沃尔可读性语料库，并通过可读性水平逐渐追逐WAC。

著录项

来源
《International Workshop on Computational Processing of the Portuguese Language 》|2016年|398p|共13页
会议地点
作者
Jorge A. Wagner Filho; Rodrigo Wilkens; Leonardo Zilio; Marco Idiart; Aline Villavicencio;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP304.6-53;
关键词
Readability assessment; Web as a corpus; Focused crawling;

机译：可读性评估;Web作为语料库;重点爬行;

相似文献

外文文献
中文文献
专利

1. Inter- and intra-rater reliability of swimming teachers with different skill levels, in different conditions, evaluating front crawl arm movement in non-expert swimmers [J] . Tucher Guilherme, Quint?￡o Gustavo Ferreira, Garrido Nuno Domingos, Motriz. Revista de Educacao Fisica . 2020 ,第2期

机译：游泳教师的间间和内部级级可靠性，具有不同的技能水平，在不同的条件下，评估非专家游泳者的前爬行臂运动
2. Inter- and intra-rater reliability of swimming teachers with different skill levels, in different conditions, evaluating front crawl arm movement in non-expert swimmers [J] . Tucher Guilherme, Quint?o Gustavo Ferreira, Garrido Nuno Domingos, Motriz: Revista de Educao Física . 2020 ,第2期

机译：游泳教师的间间和内部级级可靠性，具有不同的技能水平，在不同的条件下，评估非专家游泳者的前爬行臂运动
3. Intoxication Levels of Bar Patrons at an Organized Pub Crawl in a College Campus Community [J] . Virginia J. Dodd, David N. Khey, E. Maureen Miller American Journal of Criminal Justice . 2012 ,第2期

机译：高校校园社区有组织的酒吧聚会中酒吧顾客的醉酒程度
4. Crawling by Readability Level [C] . Jorge A. Wagner Filho, Rodrigo Wilkens, Leonardo Zilio, International conference on computational processing of portuguese . 2016

机译：按可读性级别进行爬网
5. STUDENT PERFORMANCES AND READABILITY LEVELS ON THE "STANFORD ACHIEVEMENT TEST" AND THE "MICHIGAN EDUCATIONAL ASSESSMENT PROGRAM TEST". [D] . STATEN, TERESSA V. 1980

机译：“斯坦福成就测试”和“密歇根州教育评估计划测试”的学生表现和阅读能力等级。
6. Discrepancy Between Patient Health Literacy Levels and Readability of Patient Education Materials from an Electronic Health Record [O] . Omoye E. Imoisili, Erik Levinsohn, Cassie Pan, 1972

机译：电子健康记录中患者健康素养水平与患者教育材料可读性之间的差异
7. 'EQUI-READABILITY SURFACE' OF THREE VISUAL FACTORS CORRESPONDING TO VARIOUS DESIGN LEVEL ABOUT READABILITY OF DOCUMENTS [O] . Naoya HARA, Ryuji SATOH 2004

机译：对对应于文档可读性的各种设计级别的三个视觉因素的“Equi-可读性表面”

Crawling by Readability Level

摘要

著录项

相似文献

相关主题

期刊订阅