首页> 外文会议>ISTE International Conference on Transdisciplinary Engineering >Using Machine Learning Approach to Identify Synonyms for Document Mining
【24h】

Using Machine Learning Approach to Identify Synonyms for Document Mining

机译:使用机器学习方法来标识文件挖掘的同义词

获取原文

摘要

Technical or knowledge documents, such as research papers, patents, and technical documents, e.g., request for quotations (RFQ), are important knowledge references for multiple purposes. For example, enterprises and R&D institutions often need to conduct literature and patent searches and analyses before, during, and after R&D and commercialization. These knowledge discovery processes help them identify prior arts related to the current R&D efforts to avoid duplicating research efforts or infringing upon existing intellectual property rights (IPRs). It is common to have many synonyms (i.e., words and phrases with near-identical meanings) appeared in documents, which may hinder search results, if queries do not consider these synonyms. For instance, conducting "freedom-to-operate" (FTO) patent search may not find all related patents if synonyms were not taking into consideration. This research develops methodologies of generating domain specific "word" and "phrase" synonym dictionaries using machine learning. The generation and validation of both domain-specific "word" and "phrase" synonym dictionaries are conducted using more than 2000 solar power related patents as testing document set. The testing result shows that, in the solar power domain, both word level and phrase level dictionaries identify synonyms effectively and, thus, significantly improve the patent search results.
机译:技术或知识文件,例如研究论文,专利和技术文件,例如报价要求(RFQ)是多种目的的重要知识引用。例如,企业和研发机构通常需要在研发和商业化之前,期间和之后进行文献和专利搜索和分析。这些知识发现过程有助于他们识别与当前研发努力相关的现有技术,以避免复制研究努力或侵犯现有的知识产权(IPRS)。在文档中出现了许多同义词(即,具有近乎相同含义的单词和短语),其中可能会妨碍搜索结果,如果查询不考虑这些同义词。例如,如果同义词未考虑,则执行“自由到操作”(FTO)专利搜索可能找不到所有相关专利。本研究开发了使用机器学习生成域特定的“单词”和“短语”的方法的方法。使用2000多个太阳能相关专利作为测试文档集,使用2000多项多于2000个太阳能相关专利进行了代表和验证。测试结果表明,在太阳能域中,单词级别和短语级别词典有效地标识同义词,从而显着地改善了专利搜索结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号