首页>
外国专利>
Extracting ordered list of words from documents comprising text and code fragments, without interpreting the code fragments
Extracting ordered list of words from documents comprising text and code fragments, without interpreting the code fragments
展开▼
机译:从包含文本和代码片段的文档中提取单词的有序列表,而无需解释代码片段
展开▼
页面导航
摘要
著录项
相似文献
摘要
A computer implemented method is applied to convert a formatted document or text to an ordered list of words. The formatted document is first partitioned into first and second data structures stored in a memory of a computer. The first data structure stores text fragments, and the second data structure stores code fragments of the formatted document. Adjacent text fragments are concatenated to form possible ordered word lists. Possible words are matched against a dictionary of representative words. A best ordered word list having the fewest number of words is selected from the possible ordered word lists.
展开▼