首页> 外文会议>International conference on computational linguistics >Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models
【24h】

Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models

机译:使用基于宪录的统计语言模型从有针对性的文本流提取的位置名称

获取原文

摘要

Extracting location names from informal and unstructured social media data requires the identification of referent boundaries and partitioning compound names. Variability, particularly systematic variability in location names (Carroll, 1983), challenges the identification task. Some of this variability can be anticipated as operations within a statistical language model, in this case drawn from gazetteers such as OpenStrcetMap (OSM), Geonames, and DBpedia. This permits evaluation of an observed n-gram in Twitter targeted text as a legitimate location name variant from the same location-context. Using /(-gram statistics and location-related dictionaries, our Location Name Extraction tool (LNEx) handles abbreviations and automatically filters and augments the location names in gazetteers (handling name contractions and auxiliary contents) to help detect the boundaries of multi-word location names and thereby delimit them in texts. We evaluated our approach on 4,500 event-specific tweets from three targeted streams to compare the performance of LNEx against that of ten state-of-the-art taggers that rely on standard semantic, syntactic and/or orthographic features. LNEx improved the average F-Score by 33-179%, outperforming all taggers. Further. LNEx is capable of stream processing.
机译:从非正式的和非结构化的社交媒体数据的抽取部位的名称需要指涉边界的确定和划分化合物名称。变异,在位置名称(卡罗尔,1983年),特别是系统性的变化,挑战识别任务。一些这种变化的可预期的统计语言模型中的操作,在这种情况下,从地方志得出诸如OpenStrcetMap(OSM),国地名,和DBpedia的。这允许观察到正克在Twitter上的评价有针对性的文字从同一位置的情况下合法的地点名称的变体。使用/( - 克的统计数据和位置相关的字典,我们的位置,名称提取工具(LNEX)手柄的缩写,并自动过滤器并增强在地方志(办理名称收缩和辅助内容),以帮助地名检测多字位置的边界名称,从而界定它们的文本。我们评估了三个目标流4500事件特异性鸣叫我们的方法来LNEX的性能比较针对的是依赖于标准的语义,句法和/或十个国家的最先进的标注器的拼写特征。LNEX由33-179%提高平均F-分数,超越所有标注器。此外。LNEX能够流处理的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号