首页> 外文学位 >Benchmarking scripting languages, Microsoft .NET, and databases with a focus on text mining performance.
【24h】

Benchmarking scripting languages, Microsoft .NET, and databases with a focus on text mining performance.

机译:对脚本语言,Microsoft .NET和数据库进行基准测试,重点是文本挖掘性能。

获取原文
获取原文并翻译 | 示例

摘要

In an increasingly connected world, the ability to quickly extract and process data and turn this data into useful information is becoming progressively more important. Text mining is focused on the extraction of information from unstructured data sources. Not only is the extraction speed critical, but the time it takes to implement a viable solution is just as important. The focus of this research is to investigate these two areas of text mining.; This research demonstrates that interpreted languages are a feasible alternative to compilation for text mining. Seven modern languages were selected and five experiments were designed to investigate common text mining operations. The languages consisted of four interpreted languages (Ruby, Perl, VBS, and Python), one hybrid .NET language (IronPython), and one compiled language (C# running on both the .NET 1.1 and 2.0 platforms) using the Microsoft Windows operating system.; The goal was to establish that interpreted languages could accomplish the same text mining task as compared to compiled languages, taking no more than twice as long based on wall-clock time, with a 25% reduction in lines of code. This research was successful. Both Perl, Python, and, in some cases, Ruby showed to be acceptable alternatives on execution speed. In some experiments the interpreted solutions actually executed faster than compilation (string concatenation). The lines of code reduction as compared to compilation greatly exceeded the 25%. Perl, Python, and Ruby showed reductions of 48, 58, and 54%, respectively.; Investigation into modern databases and text mining performance were also part of this research. Specifically, Microsoft Server 2000 (MSSQL2000), Microsoft Server 2005 (MSSQL2005), and Oracle 10g were investigated. Oracle10g's new regular expression syntax was also explored. In general, Oracle 10g currently offers the fastest overall performance when it comes to text mining queries. Also, stored procedures were investigated as compared to issuing the SQL text directly. Stored procedures were found to be faster although there was not a significant, practical difference.; Finally, this research outlines a benchmarking methodology that is applicable to both new emerging languages and database platforms with a focus on text mining.
机译:在日益连接的世界中,快速提取和处理数据并将其转换为有用信息的能力变得越来越重要。文本挖掘专注于从非结构化数据源中提取信息。提取速度不仅至关重要,而且实施可行解决方案所花费的时间也同样重要。该研究的重点是研究文本挖掘的这两个领域。这项研究表明,解释语言是文本挖掘编译的可行替代方案。选择了七种现代语言,并设计了五个实验来研究常见的文本挖掘操作。这些语言由使用Microsoft Windows操作系统的四种解释语言(Ruby,Perl,VBS和Python),一种混合​​.NET语言(IronPython)和一种编译语言(在.NET 1.1和2.0平台上运行的C#)组成。 。;目标是确定与编译语言相比,解释语言可以完成相同的文本挖掘任务,基于挂钟时间的花费不超过两倍,并且代码行减少了25%。这项研究是成功的。 Perl,Python和某些情况下的Ruby在执行速度方面均显示出可接受的替代方法。在某些实验中,解释后的解决方案实际上比编译(字符串串联)执行得更快。与编译相比,代码减少的行大大超过了25%。 Perl,Python和Ruby分别减少了48%,58%和54%。对现代数据库和文本挖掘性能的调查也是该研究的一部分。具体来说,研究了Microsoft Server 2000(MSSQL2000),Microsoft Server 2005(MSSQL2005)和Oracle 10g。还探讨了Oracle10g的新正则表达式语法。通常,就文本挖掘查询而言,Oracle 10g当前提供最快的整体性能。此外,与直接发布SQL文本相比,还对存储过程进行了调查。尽管没有明显的实际差异,但是发现存储过程更快。最后,本研究概述了适用于新兴语言和数据库平台的基准测试方法,其重点是文本挖掘。

著录项

  • 作者

    Chadwick, Stephen C.;

  • 作者单位

    Colorado Technical University.;

  • 授予单位 Colorado Technical University.;
  • 学科 Computer Science.
  • 学位 D.C.S.
  • 年度 2007
  • 页码 137 p.
  • 总页数 137
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号