An Iterative Approach to Text Segmentation

机译：文本分割的迭代方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present divSeg, a novel method for text segmentation that iteratively splits a portion of text at its weakest point in terms of the connectivity strength between two adjacent parts. To search for the weakest point, we apply two different measures: one is based on language modeling of text segmentation and the other, on the interconnectivity between two segments. Our solution produces a deep and narrow binary tree - a dynamic object that describes the structure of a text and that is fully adaptable to a user's segmentation needs. We treat it as a separate task to flatten the tree into a broad and shallow hierarchy either through supervised learning of a document set or explicit input of how a text should be segmented. The rich structure of our created tree further allows us to segment documents at varying levels such as topic, sub-topic, etc. We evaluated our new solution on a set of 265 articles from Discover magazine where the topic structures are unknown and need to be discovered. Our experimental results show that the iterative approach has the potential to generate better segmentation results than several leading baselines, and the separate flattening step allows us to adapt the results to different levels of details and user preferences.

机译：我们介绍了divSeg，这是一种新颖的文本分割方法，它根据两个相邻部分之间的连接强度，在最弱的位置迭代地分割一部分文本。为了搜索最弱点，我们应用了两种不同的方法：一种基于文本分段的语言建模，另一种基于两个分段之间的互连性。我们的解决方案产生了一个深而窄的二叉树-一个动态对象，它描述文本的结构，并且完全适应用户的细分需求。我们将其视为一项单独的任务，通过监督性学习文档集或显式输入应如何分割文本，将树分为平坦和浅层的层次结构。我们创建的树的丰富结构进一步允许我们按不同级别（例如主题，子主题等）对文档进行细分。我们从Discover杂志的265篇文章中评估了我们的新解决方案，其中主题结构是未知的并且需要发现。我们的实验结果表明，与几种领先的基准相比，迭代方法有可能产生更好的细分结果，而单独的展平步骤使我们能够将结果适应于不同级别的详细信息和用户偏好。

著录项

来源
《Advances in information retrieval》|2011年|p.629-640|共12页
会议地点 Dublin(IE);Dublin(IE)
作者
Fei Song; William M. Darling; Adnan Duric; Fred W. Kroon;
展开▼
作者单位

School of Computer Science, University of Guelph, 50 Stone Road East, Guelph, Ontario, NIG 2W1, Canada;

School of Computer Science, University of Guelph, 50 Stone Road East, Guelph, Ontario, NIG 2W1, Canada;

School of Computer Science, University of Guelph, 50 Stone Road East, Guelph, Ontario, NIG 2W1, Canada;

PryLynx Corporation, 21 Oneida Place, Kitchener, Ontario, N2A 3G2, Canada;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信息处理（信息加工）;
关键词
text segmentation; language modeling.;

机译：文本分割；语言建模。;
入库时间 2022-08-26 13:47:04

相似文献

外文文献
中文文献
专利

1. A segmentation-free approach to text recognition with application to Arabic text [J] . Badr Al-Badr, Robert M. Haralick International Journal on Document Analysis and Recognition . 1998,第3期

机译：一种无分段的文本识别方法，适用于阿拉伯文本
2. An ensemble clustering approach for topic discovery using implicit text segmentation [J] . Muhammad Qasim Memon, Yu Lu, Penghe Chen, Journal of Information Science . 2021,第4期

机译：使用隐式文本分段进行主题发现的集群聚类方法
3. Text and graphics segmentation of newspapers printed in Gurmukhi script: a hybrid approach [J] . Kaur Rupinder Pal, Jindal M. K., Kumar Munish The Visual Computer . 2021,第7期

机译：在Gurmukhi脚本中印刷报纸的文本和图形细分：混合方法
4. A Novel Iterative Algorithm to Text Segmentation for Web Born-digital Images [C] . Zhigang Xu, Yuesheng Zhu, Ziqiang Sun, International Conference on Digital Image Processing . 2015

机译：Web天生数字图像文本分割的一种新的迭代算法
5. A segmentation-free approach to text recognition with application to Arabic text. [D] . Al-Badr, Badr H. 1995

机译：一种无分段的文本识别方法，适用于阿拉伯文本。
6. Segmentation Based Denoising of PET Images: An Iterative Approach via Regional Means and Affinity Propagation [O] . Ziyue Xu, Ulas Bagci, Jurgen Seidel, -1

机译：基于分割的PET图像降噪：一种通过区域均值和亲和力传播的迭代方法
7. March 2016 VOLUME 3, ISSUE 3, MARCH 2016 Composite Silicon Solar Cell Efficiency Simulation Study; Sensitivity to the Absorption Coefficients and the Thickness of Intrinsic Absorber Layer V. Tudić, M. Marochini, T. Luke Abstract PDF with Text DOI 10.17148/IARJSET.2016.3301 Molecular Phylogeny of Turbinaria Ornata (Turner) J. Agardh E. Neelamathi and R. Kannan Abstract PDF with Text DOI 10.17148/IARJSET.2016.3302 Human Factors in Aircraft Maintenance Suhas H Begur, Dr J Ashok Babu Abstract PDF with Text DOI 10.17148/IARJSET.2016.3303 Human Factors in Aircraft Maintenance Suhas H Begur, Dr J Ashok Babu Abstract PDF with Text DOI 10.17148/IARJSET.2016.3304 Foliar nutraceutical and antioxidant property of Diospyros lanceifolia Roxb. (Ebenaceae) – An important medicinal plant of Assam, India Dipjyoti Kalita, N. Devi and D. Baishya Abstract PDF with Text DOI 10.17148/IARJSET.2016.3305 Study of Ion Mobility Characteristics and Morphology of some Electrochemically-Synthesised Polypyrroles Danesh Roudini, Peter J. S. Foot Abstract PDF with Text DOI 10.17148/IARJSET.2016.3306 Physico-Chemical Characterization of an Artificial Pond to Control the Eutrophication Process: A Case Study Sameer Al-Asheh, Hani Abu Qdais, Adnan Alquraishi, Osama Husain, Ismail Sadoon Abstract PDF with Text DOI 10.17148/IARJSET.2016.3307 Survey: Recommendation System for Web Portal using Customer Segmentation Neha Badami, Vipul Wakkar, Monica Jain, Devendra Pandit Abstract PDF with Text DOI 10.17148/IARJSET.2016.3308 Web Archiving: Past Present and Future of Evolving Multimedia Legacy Meenakshi Srivastava, Dr. S.K. Singh, Dr. S.Q. Abbas Abstract PDF with Text DOI 10.17148/IARJSET.2016.3309 Labour Contract Management System Kajol Bhutada, Ketaki Kivade, Vishakha Gokhale, Pallavi Bhore, Prof. Shiv Prasad P. Patil Abstract PDF with Text DOI 10.17148/IARJSET.2016.3310 Minimization of Torque Ripple and Multi Quadrant Operation of Direct Torque Control for Three Phase Induction Motor Using Fuzzy Logic Controller P.Ramesh Babu, S. Ramprasath, N.Vijayasarathi Abstract PDF with Text DOI 10.17148/IARJSET.2016.3311 Alert Me: A Real Time Video Surveillance System Implementing IoT D.P Gaikwad, Pooja kumawat, Saurabh Bhalerao, Akhilesh Khalate, Hrishikesh Dongre Abstract PDF with Text DOI 10.17148/IARJSET.2016.3312 Validity, Reliability and Item Analysis of AMAIUB Admission Test Dr. Lina S. Calucag and Dr. Danilo A. Tabalan Abstract PDF with Text DOI 10.17148/IARJSET.2016.3313 Design and Analysis of Track and Hold Circuit for high speed communication Smita D. Waghmare, Dr. U. A. Kshirsagar Abstract PDF with Text DOI 10.17148/IARJSET.2016.3314 Design of Low Power Digitally Operated Voltage Regulator by using CMOS Technology Nikita V. Dhomane, Dr. U. A. Kshirsagar Abstract PDF with Text DOI 10.17148/IARJSET.2016.3315 Automation in Ration Distribution System Rajesh B.Shinde, Prof. A.G. Gaikwad, Prof. Sonali Chincholikar Abstract PDF with Text DOI 10.17148/IARJSET.2016.3316 Use of MnSo4 Sludge as a Partial Replacement for Cement in Concrete Golhar Ankush, Jogdand Mohini, Malvi Ketan, Salunke Swanand, Gorade Swapnil Abstract PDF with Text DOI 10.17148/IARJSET.2016.3317 Ethnobotanical Studies on Medicinal Plant Utilization by the Yanadhi Tribe of Ananthasagaram Mandal, Nellore District, Andhra Pradesh, India K. Sasdhar, P. Brahmajirao and A. Sujith Kumar Abstract PDF with Text DOI 10.17148/IARJSET.2016.3318 Effect of Soil Structure Interaction on the Storey Lateral Displacement of a Multi Storied Building Surya Teja Ch, Sai Kiran T Abstract PDF with Text DOI 10.17148/IARJSET.2016.3319 An Overview of Narcolepsy Touseef Rahman, Omer Farook, Md Belal Bin Heyat, Mohd Maroof Siddiqui Abstract PDF with Text DOI 10.17148/IARJSET.2016.3320 Significance of Air Movement for Thermal Comfort in Educational Buildings, Case Study of a Classroom Geethu Priya, Nagaraju Kaja Abstract PDF with Text DOI 10.17148/IARJSET.2016.3321 A Load Balancing Approach to Minimize the Resource Wastage in Cloud Computing Sachin Soni, Praveen Yadav Abstract PDF with Text DOI 10.17148/IARJSET.2016.3322 Modeling and Simulation of Fluidized Bed Drying of Chickpea S.N. Saha, G.P. Dewangan, R.S. Thakur Abstract PDF with Text DOI 10.17148/IARJSET.2016.3323 Photocatalytic-Ozonation of Textile Dyeing Wastewater using Fixed Catalyst System Rajendiran S, Shriram B, Kanmani S Abstract PDF with Text DOI 10.17148/IARJSET.2016.3324 Mesh less Analysis of Orthotropic Skew Plate under Sinusoidal Line Load Kumari Shipra Suman, Jeeoot Singh Abstract PDF with Text DOI 10.17148/IARJSET.2016.3325 Performance Analysis of 2*2 Dual Frequency Wide Band Circular Patch Antenna Array P. Sai Vinay Kumar, P. Jagadamba, M. N. Giri Prasad Abstract PDF with Text DOI 10.17148/IARJSET.2016.3326 A Multi-Cloud Approach Towards Addressing Security Issues of Cloud: A Survey Kumar M.V, Poornima A. S Abstract PDF with Text DOI 10.17148/IARJSET.2016.3327 Improved Efficiency of Boiler Plant with Different GCV and Carbon Percentage Ishan. P. Bhatt, C.P. Panchal Abstract PDF with Text DOI 10.17148/IARJSET.2016.3328 Industrial Automation using Sensing based Applications for Internet of Things Geetesh Chaudhari, Sudarshan Jadhav, Sandeep Batule, Sandeep Helkar Abstract PDF with Text DOI 10.17148/IARJSET.2016.3329 Assessment of Engineering Students Learning [O] . Hamdia Hmmad Alyazeedi 2016

机译：2016年3月第3卷，第3款，2016年3月复合硅太阳能电池效仿真研究;对吸收系数的敏感性和内在吸收层V.Tudić，M.Marochini，T. Luke摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3301 Turbinaria ornata（特纳）J. Agardh E. Neelamathi和R. Kannan摘要的分子系统PDF与文本 DOI 10.17148 / IARJSET.2016.3302在飞机维修中的人类因素Suhas H Begur，J Ashok Babu摘要博士 PDF与文本 DOI 10.17148 / IARJSET.2016.3303人类因素在飞机维修SUHAS H Begur，J Ashok Babu摘要摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3304叶面植物和抗氧化剂的Diospyros Lancefolia Roxb。（eBenaceae） - 印度Assam的重要药用植物Dipjyoti Kalita，N. Devi和D.Baishya摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3305离子迁移性特性和某些电化学综合的多滤网的形态的研究Danesh Roudini，Peter J. S. Stock PDF与文本 DOI 10.17148 / IARJSET.2016.3306人工池塘的物理化学表征控制富营养化过程：一个案例研究同样的Al-Asheh，Hani Abu Qdais，Adnan Alquraishi，Osama Husain，Ismail Sadoon Abstract PDF与文本 DOI 10.17148 / IARJSET.2016.3307调查显示：Web门户网站推荐系统使用客户细分Neha Badami，vipul Wakkar，Monica Jain，Devendra Pandit摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3308 Web归档：过去的现状和不断发展的多媒体遗产Meenakshi Srivastava，S.K. Singh，S.Q博士。 ABBAS摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3309劳动合同管理系统Kajol Bhutada，Ketaki Kivade，Vishakha Gokhalale，Pallavi Bhore，Shiv Prasad P. Putil摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.310使用模糊逻辑控制器P.RAMESH BABU，S.RAMPRASATH，N.Vijayasarath，N.VijayasArathi摘要，最小化扭矩纹波和三相感应电动机直接扭矩控制的多象限操作。 PDF与文本 DOI 10.17148 / IARJSET.2016.3311提醒我：实时视频监控系统实施物联网D.P Gaikwad，Pooja Kumawat，Saurabh Bhalerao，Akhilesh Khalate，Hrishikesh Dongre摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3312 AMAIB录取测试的有效性，可靠性和物品分析林纳·卡卢格博士和Danilo A. Tabalan摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3313高速通信轨道和保持电路的设计与分析SMITA D. Waghmare，U. A. Kshirsagar摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3314使用CMOS Technology Nikita V. Dhomane的低功耗数字操作电压调节器设计，Dhomane，U. A. Kshirsagar摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3315配给分配系统RAJESH B.Shinde，A.G.GAIKWAD教授，Sonali Chincholikar教授摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3316使用MNSO4污泥作为水泥的局部替代品在混凝土戈霍尔ankush，Jogdand Mohini，Malvi Ketan，Salunke Swanand，Gorade Swapnil摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3317 Zhanthasagaram Mandal，Nellore District，Andhra Pradesh，India K. Sasdhar，P. Brahmajiroao和A.苏公爵Kumar摘要PDF与文本 DOI 10.17148 / IARJSET.2016.318土壤结构互动对多层建筑苏里亚TEJA CH，SAI KIRAN T摘要的楼层横向位移PDF与文本 DOI 10.17148 / IARJSET.2016.3311概述NARCHEPSY TOUTEEF RAHMAN，OMER FAROOK，MD BELAL BIN HEYAT，MOHD Maroof Siddiqui摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.330在教育建筑中热舒适性的空气运动的意义，案例研究麦德鲁普里亚，Nagaraju Kaja摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3321一种负载均衡方法，以最大限度地减少云计算Sachin Soni，前列yadav摘要的资源浪费 PDF与文本 DOI 10.17148 / IARJSET.2016.3322 Chickpea S.N流化床干燥的建模与仿真萨哈，G.P.德湾，R.S. Thakur摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3323光催化纺织染料废水采用固定催化剂系统Rajendiran S，Shriram B，Kanmani S摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3324网眼较少分析正弦偏斜板在正弦线载荷kumari shipra suman，jeeoot singh摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.325 2 * 2双频宽带圆形贴片天线阵列P. Sai Vinay Kumar，P.Jagadamba，M. N.Giri Prasad摘要摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3326一种解决云安全问题的多云方法：Qumar M.V，Poornima A. S Abstract PDF与文本 DOI 10.17148 / IARJSET.2016.3327锅炉厂具有不同GCV和碳百分比的升高效率。 P. Bhatt，C.P. Panchal摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3328工业自动化使用基于传感的应用程序的东西，Geething Chaudhari，Sudarshan Jadhav，Sandeep Batule，Sandeep Helkar摘要 PDF与文本 DOI 10.17148 / IARJSET.2016.3329工程学生学习的评估

An Iterative Approach to Text Segmentation

摘要

著录项

相似文献

相关主题

期刊订阅