首页>
外国专利>
Using canonical forms to develop a dictionary of names in a text
Using canonical forms to develop a dictionary of names in a text
展开▼
机译:使用规范形式开发文本中的名称字典
展开▼
页面导航
摘要
著录项
相似文献
摘要
Descriptive canonical forms of entity types are created by scanning one or more documents in a database of a computer system to identify one or more proper names that appear in the documents as raw names. Each of the raw names has zero or more proper names, zero or more medial substrings, zero or more leading substrings, and zero or more trailing substrings. The raw names of one or more documents are "cleaned" and "split" until certain "cleaning and splitting conditions" are no longer met to obtain a list of clean and split candidate names. Anchor names are selected from the list that unambiguously represent an entity type. The anchor names have one or more entity-type attribute values. Variant names, clean and split candidate names having one or more shared attribute (values) with the anchor name, are combined with the anchor name to create an equivalence group of names that refer to the same entity. A canonical form is generated for the group from a subset of the anchor name attributes. A canonical form is created in this manner for all of the clean and split candidate names on the list.
展开▼