首页> 外文会议>NSF/NIJ Symposium on Intelligence and Security Informatics >Automatic Construction of Cross-Lingual Networks of Concepts from the Hong Kong SAR Police Department
【24h】

Automatic Construction of Cross-Lingual Networks of Concepts from the Hong Kong SAR Police Department

机译:香港特别行政区概念交叉思想网络的自动施工

获取原文
获取外文期刊封面目录资料

摘要

The tragic event of September 11 has prompted the rapid growth of attention of national security and criminal analysis. In the national security world, very large volumes of data and information are generated and gathered. Much of this data and information written in different languages and stored in different locations may be seemingly unconnected. Therefore, cross-lingual semantic interoperability is a major challenge to generate an overview of this disparate data and information so that it can be analysed, searched. The traditional information retrieval (IR) approaches normally require a document keywords that are different from what used in the documents. There are then two different term spaces, one for the users, and another for the documents. The problem can be viewed as the creation of a thesaurus. The creation of such relationships would allow the system to match queries with relevant documents, even though they contain different terms. Apart from this, terrorists and other than English. The translation ambiguity significantly exacerbates the retrieval problem. To facilitate cross-lingual information retrieval, a corpus-based approach uses the term co-occurrence statistics in parallel or comparable corpora to construct a statistical translation model to cross the language boundary. However, collecting parallel corpora between European language and Oriental language is not an easy task due to the unique linguistics and grammar structures of oriental languages. In this paper, the text-based approach to align English/Chinese Hong Kong Police press release documents from the Web is first presented. This article then reports an algorithmic approach to generate a robust knowledge base based on statistical correlation analysis of the semantics (knowledge) embedded in the bilingual press release corpus. The research output consisted of a thesaurus-like, semantic network knowledge base, which can aid in semantics-based cross-lingual information management and retrieval.
机译:9月11日的悲惨事件促使国家安全和刑事分析的关注快速增长。在国家安全世界中,生成和收集了非常大量的数据和信息。以不同语言编写的大部分数据和信息以及存储在不同位置的信息可能看似未连接。因此,交叉语言语义互操作性是产生这种不同数据和信息的概述的主要挑战,以便可以分析,搜索它。传统信息检索(IR)方法通常需要与文档中使用的内容不同的文档关键字。然后有两个不同的术语空间,一个用于用户,另一个用于文件。问题可以被视为创建词库。即使它们包含不同的术语,也会允许系统允许系统与相关文档匹配查询。除此之外,恐怖分子和英语之外。翻译模糊性显着加剧了检索问题。为了促进交叉语言信息检索,基于语料库的方法使用并行或可比语料库的术语共同发生统计来构建统计翻译模型以跨语言边界。然而,由于东方语言的独特语言学和语法结构,收集欧洲语言和东方语言之间的平行对语言并不一致。本文首先提出了基于文本的英语/中国香港警察报刊释放文件的文本方法。然后,本文报告了一种算法方法,基于嵌入在双语新闻释放语料库中的语义(知识)的统计相关分析来生成鲁棒知识库。研究结果包括类似的语义,语义网络知识库,可以帮助基于语义的交叉信息管理和检索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号