首页> 外文会议>Internet Measurement Conference >Who is .com? Learning to Parse WHOIS Records

【24h】

Who is .com? Learning to Parse WHOIS Records

机译：谁是.com？学习解析Whois记录

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

WHOIS is a long-established protocol for querying information about the 280M+ registered domain names on the Internet. Unfortunately, while such records are accessible in a "human-readable" format, they do not follow any consistent schema and thus are challenging to analyze at scale. Existing approaches, which rely on manual crafting of parsing rules and per-registrar templates, are inherently limited in coverage and fragile to ongoing changes in data representations. In this paper, we develop a statistical model for parsing WHOIS records that learns from labeled examples. Our model is a conditional random field (CRF) with a small number of hidden states, a large number of domain-specific features, and parameters that are estimated by efficient dynamic-programming procedures for probabilistic inference. We show that this approach can achieve extremely high accuracy (well over 99%) using modest amounts of labeled training data, that it is robust to minor changes in schema, and that it can adapt to new schema variants by incorporating just a handful of additional examples. Finally, using our parser, we conduct an exhaustive survey of the registration patterns found in 102M com domains.

机译：WHOIS是一个长期以来的协议，用于查询Internet上的280m +注册域名的信息。不幸的是，虽然这些记录以“人类可读”格式可访问，但它们不遵循任何一致的模式，因此在规模上分析有挑战性。现有方法依赖于对解析规则和每位注册商模板进行手动制作，本质上是覆盖范围和脆弱的覆盖范围和数据表示的变化。在本文中，我们开发了一个统计模型，用于解析从标记的示例学习的Whois记录。我们的模型是一种条件随机字段（CRF），具有少量隐藏状态，大量的域特征特征，以及通过高效动态编程程序来估计的概率推断的参数。我们表明这种方法可以使用适度的标记训练数据来实现极高的准确度（超过99％），这在架构中的微小变化是强大的，并且它可以通过仅少数额外的额外加入新的架构变体例子。最后，使用我们的解析器，我们对102米COM域中的注册模式进行了详尽的调查。

著录项

来源
《Internet Measurement Conference 》|2015年||共12页
会议地点
作者
Suqi Liu; Ian Foster; Stefan Savage; Geoffrey M. Voelker; Lawrence K. Saul;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词
WHOIS; Named Entity Recognition; Machine Learning; Information Extraction;

机译：WHOIS;命名实体识别;机器学习;信息提取;

相似文献

外文文献
中文文献
专利

1. Parsing clinical text using the state-of-the-art deep learning based parsers: a systematic comparison [J] . Yaoyun Zhang, Firat Tiryaki, Min Jiang, BMC Medical Informatics and Decision Making . 2019 ,第3期

机译：使用基于深度学习的最新解析器解析临床文本：系统比较
2. Learning Neural Parsers with Deterministic Differentiable Imitation Learning [J] . Tanmay Shankar, Nicholas Rhinehart, Katharina Muelling, JMLR: Workshop and Conference Proceedings . 2018 ,第4期

机译：通过确定性可模仿学习学习神经解析器
3. Automatic Semantic Parsing of the Ground Plane in Scenarios Recorded With Multiple Moving Cameras [J] . Alejandro López-Cifuentes, Marcos Escudero-Viñolo, Jesús Bescós IEEE signal processing letters . 2018 ,第10期

机译：使用多台移动摄像机记录的场景中的地平面自动语义解析
4. Who is .com? Learning to Parse WHOIS Records [C] . Suqi Liu, Ian Foster, Stefan Savage, Internet Measurement Conference . 2015

机译：谁是.com？学习解析Whois记录
5. Where Do You Look? Relating Visual Attention to Learning Outcomes and URL Parsing [D] . Ramkumar, Niveta. 2021

机译：你在哪里看？与学习结果和URL解析有关的视觉关注
6. Parsing clinical text using the state-of-the-art deep learning based parsers: a systematic comparison [O] . Yaoyun Zhang, Firat Tiryaki, Min Jiang, 2019

机译：使用基于深度学习的最新解析器解析临床文本：系统比较
7. Latent Tree Learning with Differentiable Parsers: Shift-Reduce Parsing and Chart Parsing [O] . Jean Maillard, Stephen Clark 2018

机译：与可分辨率解配器的潜在树学习：转移减少解析和图表解析
8. Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques [R] . Ge, R. 2010

机译：利用统计句法分析技术学习语义分析

Who is .com? Learning to Parse WHOIS Records

摘要

著录项

相似文献

相关主题

期刊订阅