首页> 外文会议>IEEE International Conference on Data Engineering >Domain-Independent Automated Processing of Free-Form Text Data in Telecom
【24h】

Domain-Independent Automated Processing of Free-Form Text Data in Telecom

机译:电信中格式自由的文本数据的域独立自动处理

获取原文

摘要

Free-form, unstructured and semi-structured textual data has become increasingly more prevalent in the telecommunications industry, with service and equipment providers alike. Some typical examples include textual data from customer care tickets, machine logs, alarm and alerting systems, and diagnostics. There is a growing business need to rapidly and automatically understand the underlying key topics and categories of this bulk collection of text. With the present mode of operation of relying on domain experts to analyze textual data, there is a clear need to apply text analytics to automate the process. Difficulties arise due to the jargon-filled and fragmented, incomplete nature of textual data in this field. In this paper, we propose a domain-agnostic, unsupervised approach that deploys a multi-stage text processing pipeline for automatically discovering the key topics and categories from free-form text documents. Using anonymized datasets retrieved from actual customer care tickets and system logs, we show that our approach outperforms traditional text mining approaches, and performs comparably to manual categorization tasks that were undertaken by domain experts with full system knowledge.
机译:自由格式,非结构化和半结构化的文本数据在电信行业变得越来越普遍,服务和设备提供商也是如此。一些典型示例包括来自客户服务票证,机器日志,警报和警报系统以及诊断的文本数据。迅速增长的业务需求是快速,自动地理解此大量文本集合的基本关键主题和类别。利用依靠领域专家来分析文本数据的当前操作模式,显然需要应用文本分析来使过程自动化。由于该字段中的术语数据充满行话和零散,不完整的性质,因此出现了困难。在本文中,我们提出了一种与领域无关的无监督方法,该方法部署了多阶段文本处理管道,用于自动从自由格式文本文档中发现关键主题和类别。使用从实际客户服务单和系统日志中检索的匿名数据集,我们证明了我们的方法优于传统的文本挖掘方法,并且与由具有完整系统知识的领域专家执行的手动分类任务相类似。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号