首页> 外文会议>IEEE Global Engineering Education Conference >A Proposal for a Hybrid Syllabus Search Tool that Combines Keyword Search and Content Based Classification
【24h】

A Proposal for a Hybrid Syllabus Search Tool that Combines Keyword Search and Content Based Classification

机译:一个组合关键字搜索和基于内容的分类的混合系统搜索工具的提案

获取原文

摘要

A syllabus is one of the most important clues in the analysis of the educational activities. Our previous works reported that the course syllabi of computer science (CS) curricula from about 47 universities can disclose the interesting structures in the CS curricula. However, the course syllabi were collected manually. Therefore, it was difficult to increase the number of syllabi largely, and semi-automatic crawling of massive course syllabi is needed for further analysis. We have been studying to collect syllabus information based on the contents of a large number of web pages downloaded from the university’s website by using a general-purpose web crawler. We discovered the structures of the syllabus pages to some extent automatically by using the linear support vector machine (linear SVM). We used the top page of the target department educating bachelor’s degree in CS field as a start page of crawling for each university. To look for such a department’s page, we sometimes used Google search. Google Custom Search API 1,(Google API) is expected to provide an efficient way to gather syllabus information while saving computation time, storage, and other resources. In this study, we propose a hybrid method which combines Google API as a general keyword search engine and linear SVM as content-based classification models. We developed a system to support the syllabus collection process. The system consists of three subsystems: Crawler, Classifier, and Database. Crawler is the combination of Google API and general-purpose web crawler. We can search syllabus-related web pages from university websites using Google API with syllabus-related search keywords and domain names of the websites. Classifier ranks pages related to CS syllabus from a large number of web pages according to the confidence scores of the linear SVM. We trained the decision model of linear SVM using the syllabus pages we collected in the former studies. Using the pages obtained from Google API and linear SVM, we can find a list of CS syllabus pages from more universities than using each method alone. Combining the top nine of Google API results and the top two of linear SVM’s decision model, we obtained the CS syllabus pages from more than 96.6% of the 58 universities.
机译:一个教学大纲是对教育活动分析的最重要的线索之一。我们以前的作品报告说,计算机科学(CS)课程从大约47所大学的课程课程可以披露CS课程中的有趣结构。但是,课程大纲手动收集。因此,很大程度上难以增加教学大纲的数量,并且需要对大规模课程的半自动爬网获得进一步分析。我们一直在学习通过使用通用网络爬虫的大学网站上下载的大量网页的内容来学习教学大纲信息。我们通过使用线性支持向量机(Linear SVM)自动发现了教学大纲页面的结构。我们使用了目标部门的首页,教育了CS字段的学士学位作为每个大学爬行的起始页面。要查找此类部门的页面,我们有时会使用谷歌搜索。 Google自定义搜索API 1 (Google API)预计将提供一种有效的方法来收集大纲信息,同时节省计算时间,存储和其他资源。在这项研究中,我们提出了一种混合方法,其将Google API作为常规关键字搜索引擎和线性SVM作为基于内容的分类模型。我们开发了一个支持教学大纲收集过程的系统。该系统由三个子系统组成:爬虫,分类器和数据库。爬虫是Google API和通用网络履带的结合。我们可以使用Google API与大学网站一起搜索与大学网站的教学大纲相关的网页与网站有关的搜索关键字和域名。分类器根据线性SVM的置信区分,从大量网页排列与CS课程相关的页面。我们使用前研究中收集的教学大纲页面训练了线性SVM的决策模型。使用从Google API和Linear SVM获取的页面,我们可以从更多大学找到的CS Syllabus页面列表,而不是单独使用每个方法。结合Google API的最高九个结果和线性SVM决策模型的前两个,我们从58所大学的96.6%获得了CS课程页面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号