Multi-level K-means text clustering technique for topic identification for competitor intelligence

机译：多级K均值文本聚类技术用于竞争对手情报的主题识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Proliferation of web as an easily accessible information resource has led many corporations to gather competitor intelligence from the internet. While collection of such information is easy from internet, the collation and structuring of them for perusal of business decision makers, is a real trouble. Text clustering based topic identification techniques are expected to be very useful for such application. Using appropriate clustering technologies, the competitor intelligence corpus, gathered from the web, can be divided into topical groups and henceforth the analysis of this information becomes comparatively easier for the managers. This paper presents a study on the effectiveness of standard K-means text clustering algorithm applied at multiple levels, in a top-down, divide-and-conquer fashion, on competitor intelligence corpus, created from publicly available sources on the web, such as news, blogs, research papers etc. The paper also demonstrates the capability of Multi-level K-means (ML-KM) clustering technique to determine the optimal number of clusters as part of clustering process. The cluster validity metric used to determine cluster quality has also been explained along with other user-controlled configuration parameters. It is empirically found that ML-KM technique also addresses one problem of stand-alone standard K-means (S-KM), which is its bias towards convex, spherical clusters, resulting in bigger clusters subsuming smaller ones. This specific advantage of ML-KM over stand-alone S-KM to detect smaller clusters, makes it more suitable for clustering competitor intelligence related text corpus where niche, smaller clusters can actually lead to important findings. The experimental results are presented for both ML-KM and stand-alone S-KM clustering techniques based on competitor intelligence corpus as well as the standard Reuters corpus.

机译：网络作为一种易于访问的信息资源的激增，导致许多公司从互联网上收集竞争对手的情报。尽管从互联网上可以很容易地收集这些信息，但是对它们进行整理和构造以供业务决策者细读，这确实是一个麻烦。预期基于文本聚类的主题标识技术对于此类应用程序非常有用。使用适当的聚类技术，可以将从网上收集的竞争对手情报库分为主题组，从此以后，对于管理者来说，此信息的分析变得相对容易。本文以自上而下，分而治之的方式，对竞争对手情报语料库进行了多级应用的标准K-means文本聚类算法的有效性研究，该算法是从网络上的公开来源创建的，例如新闻，博客，研究论文等。本文还展示了多级K均值（ML-KM）聚类技术确定聚类最佳数量的能力，这是聚类过程的一部分。还已经解释了用于确定集群质量的集群有效性度量标准以及其他用户控制的配置参数。从经验上发现，ML-KM技术还解决了独立标准K均值（S-KM）的一个问题，即它偏向凸球形簇，从而导致较大的簇包含较小的簇。与独立S-KM相比，ML-KM具有检测较小聚类的特殊优势，使其更适合于与竞争者情报相关的文本语料聚类，在这些领域中，细分，较小聚类实际上可以带来重要发现。给出了基于竞争者情报语料库和标准Reuters语料库的ML-KM和独立S-KM聚类技术的实验结果。

著录项

来源
《International Conference on Research Challenges in Information Science》|2016年|1-10|共10页
会议地点
作者
Swapnajit Chakraborti; Shubhamoy Dey;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Standards; Companies; Clustering algorithms; Internet; Algorithm design and analysis; Decision making;

机译：标准;公司;聚类算法;互联网;算法设计与分析;决策;

相似文献

外文文献
中文文献
专利

1. Clustering with Probabilistic Topic Models on Arabic Texts: A Comparative Study of LDA and K-Means [J] . Kelaiaia Abdessalem, Merouani Hayet The international arab journal of information technology . 2016,第2期

机译：阿拉伯语文本上带有概率主题模型的聚类：LDA和K-Means的比较研究
2. A Hybrid Arabic Text Summarization Technique Based on Text Structure and Topic Identification [J] . Bassam H. Hammo, Hani Abu-Salem, Martha W. Evens International journal of computer processing of languages . 2011,第1期

机译：基于文本结构和主题识别的混合阿拉伯文本摘要技术
3. Research on Multiple Layer Text Topics Identification Algorithm Based on the Dynamic Diverse Thresholds Clustering [J] . Yong-Dong Xu, Ting-Bin Zhang, Guang-Ri Quan, Advanced Science Letters . 2012,第Null期

机译：基于动态多样性阈值聚类的多层文本主题识别算法研究
4. Multi-level K-means text clustering technique for topic identification for competitor intelligence [C] . Swapnajit Chakraborti, Shubhamoy Dey IEEE International Conference on Research Challenges in Information Science . 2016

机译：多级K均值竞争智能主题识别文本聚类技术
5. Organizational attributes in competitor identification and competitor intelligence [D] . Picken, Joseph Clarke 1995

机译：竞争对手识别和竞争对手情报中的组织属性
6. Analysis of big data job requirements based on K-means text clustering in China [O] . Dai Debao, Ma Yinxia, Zhao Min, 2021

机译：基于K-MESS文本聚类的大数据职能分析
7. Topic Modeling Technique for Text Mining Over Biomedical Text Corpora Through Hybrid Inverse Documents Frequency and Fuzzy K-Means Clustering [O] . Junaid Rashid, Syed Muhammad Adnan Shah, Aun Irtaza, 2019

机译：通过混合逆文档频率和模糊k叶片频率和模糊k型群体挖掘生物医学文本语料主题建模技术

Multi-level K-means text clustering technique for topic identification for competitor intelligence

摘要

著录项

相似文献

相关主题

期刊订阅