首页> 外国专利> Extracting ordered list of words from documents comprising text and code fragments, without interpreting the code fragments

Extracting ordered list of words from documents comprising text and code fragments, without interpreting the code fragments

机译：从包含文本和代码片段的文档中提取单词的有序列表，而无需解释代码片段

页面导航

摘要
著录项
相似文献

摘要

A computer implemented method is applied to convert a formatted document or text to an ordered list of words. The formatted document is first partitioned into first and second data structures stored in a memory of a computer. The first data structure stores text fragments, and the second data structure stores code fragments of the formatted document. Adjacent text fragments are concatenated to form possible ordered word lists. Possible words are matched against a dictionary of representative words. A best ordered word list having the fewest number of words is selected from the possible ordered word lists.

机译：应用计算机实现的方法将格式化的文档或文本转换为单词的有序列表。首先将格式化的文档分为存储在计算机内存中的第一和第二数据结构。第一数据结构存储文本片段，第二数据结构存储格式化文档的代码片段。相邻的文本片段被连接起来以形成可能的有序单词列表。将可能的单词与代表单词的词典进行匹配。从可能的排序单词列表中选择单词数量最少的最佳排序单词列表。

著录项

公开/公告号US6470362B1

专利类型
公开/公告日2002-10-22

原文格式PDF
申请/专利权人 COMPAQ COMPUTER CORPORATION;
展开▼

申请/专利号US19970857458
发明设计人 ROBERT ALAN EUSTACE;JEREMY DION;
展开▼

申请日1997-05-16
分类号G06F70/00;
国家 US
入库时间 2022-08-22 00:48:29

相似文献

专利
外文文献
中文文献