首页> 外文期刊>Journal of Computing and Information Science in Engineering >A Framework Based on K-Means Clustering and Topic Modeling for Analyzing Unstructured Manufacturing Capability Data
【24h】

A Framework Based on K-Means Clustering and Topic Modeling for Analyzing Unstructured Manufacturing Capability Data

机译:基于K-Means聚类和主题建模的框架,用于分析非结构化制造能力数据

获取原文
获取原文并翻译 | 示例
           

摘要

The natural language descriptions of the capabilities of manufacturing companies can be found in multiple locations including company websites, legacy system databases, and ad hoc documents and spreadsheets. To unlock the value of unstructured capability data and learn from it, there is a need for developing advanced quantitative methods supported by machine learning and natural language processing techniques. This research proposes a hybrid unsupervised learning methodology using K-means clustering and topic modeling techniques in order to build clusters of suppliers based on their capabilities, automatically infer topics from the created clusters, and discover nontrivial patterns in manufacturing capability corpora. The capability data is extracted either directly from the website of manufacturing firms or from their profiles in e-sourcing portals and directories. Feature extraction and dimensionality reduction process in this work are supported by N-gram extraction and latent semantic analysis (LSA) methods. The proposed clustering method is validated experimentally based on a dataset composed of 150 capability descriptions collected from web-based sourcing directories such as the Thomas Net directory for manufacturing companies. The results of the experiment show that the proposed method creates supplier cluster with high accuracy. Two example applications of the proposed framework, related to supplier similarity measurement and automated thesaurus creation, are introduced in this paper.
机译:可以在多个位置找到制造公司能力的自然语言描述,包括公司网站,旧系统数据库以及临时文档和电子表格。为了释放非结构化能力数据的价值并从中学习,需要开发由机器学习和自然语言处理技术支持的高级定量方法。这项研究提出了一种使用K-means聚类和主题建模技术的混合无监督学习方法,以便基于供应商的能力构建供应商集群,从创建的集群中自动推断主题,并发现制造能力语料库中的重要模式。能力数据可以直接从制造公司的网站提取,也可以从其在电子采购门户网站和目录中的配置文件中提取。 N-gram提取和潜在语义分析(LSA)方法支持这项工作中的特征提取和降维过程。所提出的聚类方法是基于一个数据集进行实验验证的,该数据集由从基于Web的采购目录(例如,用于制造公司的Thomas Net目录)收集的150个功能描述组成。实验结果表明,该方法可以准确地建立供应商集群。本文介绍了所提出框架的两个示例应用程序,它们与供应商相似性度量和自动同义词库创建有关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号