Exploiting Attribute Redundancy in Extracting Open Source Forge Websites

机译：在提取开源Forge网站中利用属性冗余

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Open Source Forge (OSF) websites provide information on massive open source software projects, extracting these web data is important for open source research. Traditional extraction methods use string matching among pages to detect page template, which is time-consuming. A recent work published in VLDB exploits redundant entities among websites to detect web page coordinates of these entities. The experiment gives good results when these coordinates are used for extracting other entities of the target site. However, OSF websites have few redundant project entities. This paper proposes a modified version of that redundancy-based method tailored for OSF websites, which relies on a similar yet weaker presumption that entity attributes are redundant rather than whole entities. Like the previous work, we also construct a seed database to detect web page coordinates of the redundancies, but all at the attribute-level. In addition, we apply attribute name verification to reduce false positives during extraction. The experiment result indicates that our approach is competent in extracting OSF websites, in which scenario the previous method can not be applied.

机译：开源Forge（OSF）网站提供有关大规模开源软件项目的信息，提取这些Web数据对于开源研究非常重要。传统的提取方法使用页面之间的字符串匹配来检测页面模板，这很费时。 VLDB中发表的最新著作利用网站中的冗余实体来检测这些实体的网页坐标。当这些坐标用于提取目标位置的其他实体时，实验会给出良好的结果。但是，OSF网站几乎没有多余的项目实体。本文提出了一种针对OSF网站量身定制的基于冗余的方法的修改版本，该方法基于一个相似但较弱的假设，即实体属性是冗余而不是整个实体。像以前的工作一样，我们还构建了一个种子数据库来检测冗余的网页坐标，但这些都是在属性级别上进行的。此外，我们应用属性名称验证来减少提取过程中的误报。实验结果表明，我们的方法能够胜任OSF网站的提取，在这种情况下，不能采用以前的方法。

著录项

来源
《2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.》|2012年|p.13- 20|共8页
会议地点 Sanya(CN);Sanya(CN)
作者
Li Xiang; Zhu Yanxu; Yin Gang; Wang Tao; Wang Huaimin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates [J] . Ben J. Hayes, Kjetil Nilsen, Paul R. Berg, Bioinformatics . 2007,第13期

机译：利用大型EST集合中的多个冗余源进行SNP检测可提高验证率
2. SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates [J] . Ben J. Hayes, Kjetil Nilsen, Paul R. Berg, Bioinformatics . 2007,第13期

机译：利用大型EST集合中的多个冗余源进行SNP检测可提高验证率
3. SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates [J] . Ben J. Hayes, Kjetil Nilsen, Paul R. Berg, Bioinformatics . 2007,第13期

机译：利用大型EST集合中的多个冗余源进行SNP检测可提高验证率
4. Exploiting Attribute Redundancy in Extracting Open Source Forge Websites [C] . Li Xiang, Zhu Yanxu, Yin Gang, International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery . 2012

机译：在提取开源伪造网站中的应用程序冗余
5. ConCORD: Tracking and Exploiting Cross-Node Memory Content Redundancy in Large-Scale Parallel Systems [D] . Xia, Lei 2013

机译：ConCORD：跟踪和利用大规模并行系统中的跨节点内存内容冗余
6. Redundancy and Plasticity of Neutralizing Antibody Responses Are Cornerstone Attributes of the Human Immune Response to the Smallpox Vaccine [O] . Mohammed Rafii-El-Idrissi Benhnia, Megan M. McCausland, Hua-Poo Su, 2008

机译：中和抗体反应的冗余和可塑性是人类对天花疫苗免疫反应的基石属性
7. SNP detection exploiting multiple sources of redundancy in large EST collections improves validation rates [O] . Ben J. Hayes, Kjetil Nilsen, Paul R. Berg, 2007

机译：SNP检测利用大型EST集合中的多个冗余源的源可提高验证率

Exploiting Attribute Redundancy in Extracting Open Source Forge Websites

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅