Improving Webpage Content Extraction by extending a novel single page extraction approach: A case study with Thai websites

机译：通过扩展新颖的单页提取方法来改善网页内容提取：以泰国网站为例

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web Content Extraction technique is proposed in this paper. The technique is able to work with both single and multiple pages based on heuristic rules. An Extracted Content Matching (ECM) technique is proposed in the multiple page extraction to identify the noises among the extracted results. Some features in this technique are also introduced in order to reduce processing time such as use of XPath, file compression, and parallel processing. Assessment of the performance is based on precision, recall and F-measure by using the length of extracted content. Initial results by comparing results from the proposed approach to extraction by manual process are good.

机译：本文提出了Web内容提取技术。该技术能够基于启发式规则处理单个页面和多个页面。在多页提取中提出了一种提取内容匹配（ECM）技术，以识别提取结果中的噪声。还介绍了此技术的某些功能，以减少处理时间，例如使用XPath，文件压缩和并行处理。通过使用提取的内容的长度，基于精度，召回率和F量度对性能进行评估。通过比较所提出的方法与手工提取方法的结果，初步结果是很好的。

著录项

来源
《ICMLC;International Conference on Machine Learning and Cybernetics》|2012年|p.1263- 1267|共5页
会议地点
作者
Thanadechteemapat Wigrai; Chun Che Fung;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动推理、机器学习;自动推理、机器学习;
关键词
入库时间 2022-08-26 14:26:11

相似文献

外文文献
中文文献
专利

1. A Grammatical Evolution Approach for Content Extraction of Electronic Commerce Website [J] . Wei Qing-jin, Peng Jian-sheng Research journal of applied science, engineering and technology . 2013,第7期

机译：电子商务网站内容提取的语法进化方法
2. A Grammatical Evolution Approach for Content Extraction of Electronic Commerce Website [J] . Wei Qing-jin, Peng Jian-sheng Research journal of applied science, engineering and technology . 2013,第7期

机译：电子商务网站内容提取的语法进化方法
3. Extraction of fetal ECG signal by an improved method using extended Kalman smoother framework from single channel abdominal ECG signal [J] . Panigrahy D., Sahu P. K. Australasian physical & engineering sciences in medicine . 2017,第1期

机译：利用扩展卡尔曼平滑框架从单通道腹部ECG信号中提取胎儿ECG信号的改进方法
4. Improving Webpage Content Extraction by extending a novel single page extraction approach: A case study with Thai websites [C] . Thanadechteemapat Wigrai, Chun Che Fung International Conference on Machine Learning and Cybernetics . 2012

机译：通过扩展新颖的单页提取方法提高网页内容提取：泰国网站的案例研究
5. Study of Sequential Accelerated Solvent Extraction of Different Depths of Oak Tank Staves, Affected by Three Different Heat Sources, Analyzed by Gas Chromatography-Mass Spectrometry and Correlations to Sensory Descriptive Analysis of Their Model Wine Extractions. [D] . Llodra, David. 2013

机译：气相色谱-质谱联用分析了三种不同热源对不同深度橡木桶壁深度的顺序加速溶剂萃取及其与模型提取酒的感官描述相关性的研究。
6. A Comprehensive Approach Limiting Extractions under General Anesthesia Could Improve Oral Health [O] . Nicolas Decerle, Pierre-Yves Cousson, Emmanuel Nicolas, 2020

机译：全身麻醉下的综合方法限制提取可以改善口腔健康
7. Improving Webpage Content Extraction by extending a novel single page extraction approach: A case study with Thai websites [O] . Thanadechteemapat W., Fung C.C. 2012

机译：通过扩展新颖的单页提取方法来改善网页内容提取：以泰国网站为例
8. Multilingual Content Extraction Extended with Background Knowledge for Military Intelligence [R] . Hecking, M., Wotzlaw, A., Coote, R. 2011

机译：用军事情报背景知识扩展的多语言内容提取

Improving Webpage Content Extraction by extending a novel single page extraction approach: A case study with Thai websites

摘要

著录项

相似文献

相关主题

期刊订阅