首页> 外文会议>International computer science and engineering conference >Enhancement of a text clustering technique for the classification of Thai tourism websites
【24h】

Enhancement of a text clustering technique for the classification of Thai tourism websites

机译:文本聚类技术的增强,用于泰国旅游网站的分类

获取原文

摘要

Tourism is an industry that is vital to the economic development of a country. Publicity and promotion of tourism is continuously carried out, especially with the help of the Internet. When tourists need to get more information, they usually search the web and use search engines. However, the number of search results can be huge and unwanted information that is uncategorized and incoherent may be presented. Furthermore, the results of the search are not presented in a single site. Extracting all the relevant information can waste time and is an inconvenient method of gathering information from a single information source, e.g. where to travel, dine, stay, and shop. We solved the problem by modifying the algorithms for the classification of travel sites with a Thai text analysis technique using five parts of the website HTML structure: the title tag, the body tag, the meta name description, the meta name keywords, and the links to other pages. Next, we developed algorithms to analyze and categorize websites with 31 combinations, based on various website structures, and measured the efficiency using the F-measure statistic. Then, we compared our results with another technique. These new results showed that our modified technique was better. To find the best pattern from 31 different combinations, we tested the algorithms using 200 Thai tourist websites and used four categories: attractions, accommodation, restaurants, and gift shops. Our results showed that the content within the HTML body tag alone was sufficient to classify the sites.
机译:旅游业是一个对国家经济发展至关重要的产业。尤其是在互联网的帮助下,旅游业的宣传和推广不断进行。当游客需要更多信息时,他们通常会在网上搜索并使用搜索引擎。但是,搜索结果的数量可能非常庞大,并且可能会显示未分类和不连贯的不需要的信息。此外,搜索结果不会显示在单个站点中。提取所有相关信息可能会浪费时间,并且是从单个信息源(例如,信息源)收集信息的不便方法。在哪里旅行,用餐,住宿和购物。我们通过使用网站文本HTML结构的五个部分,使用泰国文本分析技术修改了旅行网站分类算法,解决了该问题:标题标签,正文标签,元名称描述,元名称关键字和链接到其他页面。接下来,我们根据各种网站结构开发了用于对31个组合的网站进行分析和分类的算法,并使用F-measure统计量来衡量效率。然后,我们将我们的结果与另一种技术进行了比较。这些新结果表明,我们的改进技术更好。为了从31种不同的组合中找到最佳模式,我们使用200个泰国旅游网站测试了算法,并使用了四个类别:景点,住宿,餐厅和礼品店。我们的结果表明,仅HTML body标记内的内容足以对网站进行分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号