Rich-text document styling restoration via reinforcement learning

Hongwei LI; Yingpeng HU; Yixuan CAO; Ganbin ZHOU; Ping LUO

首页> 外文期刊>Frontiers of computer science >Rich-text document styling restoration via reinforcement learning

【24h】

Rich-text document styling restoration via reinforcement learning

机译：丰富的文本文档通过强化学习造型恢复

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Richly formatted documents, such as financial disclosures, scientific articles, government regulations, widely exist on Web. However, since most of these documents are only for public reading, the styling information inside them is usually missing, making them improper or even burdensome to be displayed and edited in different formats and platforms. In this study we formulate the task of document styling restoration as an optimization problem, which aims to identify the styling settings on the document elements, e.g., lines, table cells, text, so that rendering with the output styling settings results in a document, where each element inside it holds the (closely) exact position with the one in the original document. Considering that each styling setting is a decision, this problem can be transformed as a multi-step decision-making task over all the document elements, and then be solved by reinforcement learning. Specifically, Monte-Carlo Tree Search (MCTS) is leveraged to explore the different styling settings, and the policy function is learnt under the supervision of the delayed rewards. As a case study, we restore the styling information inside tables, where structural and functional data in the documents are usually presented. Experiment shows that, our best reinforcement method successfully restores the stylings in 87.65% of the tables, with 25.75% absolute improvement over the greedy method. We also discuss the tradeoff between the inference time and restoration success rate, and argue that although the reinforcement methods cannot be used in real-time scenarios, it is suitable for the offline tasks with high-quality requirement. Finally, this model has been applied in a PDF parser to support cross-format display.

机译：格式化的文件，如金融披露，科学文章，政府法规，广泛存在于网络上。但是，由于大多数这些文档仅供公开阅读，因此通常丢失它们内部的造型信息，使它们不正确甚至是繁重的，以便以不同的格式和平台显示和编辑。在本研究中，我们将文档造型恢复的任务作为优化问题，旨在识别文档元素上的样式设置，例如线条，表格单元格，文本，以便使用输出样式设置导致文档，其中，它内部的每个元素都将（紧密地）与原始文档中的一个精确位置保持在其中。考虑到每个造型设置是一个决定，这个问题可以转换为所有文档元素的多步决策任务，然后通过强化学习来解决。具体而言，利用Monte-Carlo树搜索（MCT）探索不同的样式设置，并在延迟奖励的监督下了解策略功能。作为案例研究，我们恢复了表中的样式信息，通常呈现文档中的结构和功能数据。实验表明，我们最好的钢筋方法成功地将造型恢复为87.65％的表格，对贪婪方法的绝对改善25.75％。我们还讨论了推理时间和恢复成功率之间的权衡，并争辩说，尽管钢筋方法不能用于实时场景，但它适用于具有高质量要求的离线任务。最后，该模型已应用于PDF解析器以支持跨格式显示。

著录项

来源
《Frontiers of computer science》 |2021年第4期|154328.1-154328.11|共11页
作者
Hongwei LI; Yingpeng HU; Yixuan CAO; Ganbin ZHOU; Ping LUO;
展开▼
作者单位

Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS) Institute of Computing Technology CAS Beijing 100190 China University of Chinese Academy of Sciences Beijing 100049 China;

Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS) Institute of Computing Technology CAS Beijing 100190 China University of Chinese Academy of Sciences Beijing 100049 China;

Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS) Institute of Computing Technology CAS Beijing 100190 China University of Chinese Academy of Sciences Beijing 100049 China;

Search Product Center WeChat Search Application Department Tencent Beijing 100080 China;

Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS) Institute of Computing Technology CAS Beijing 100190 China University of Chinese Academy of Sciences Beijing 100049 China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
styling restoration; monte-carlo tree search; reinforcement learning; richly formatted documents; tables;

机译：造型修复;Monte-Carlo树搜索;加强学习;格式化的文件;桌子;

相似文献

外文文献
中文文献
专利

1. Reinforcement Learning Approach for Adaptive E-learning Systems using Learning Styles [J] . Balasubramanian Velusamy, S. Margret Anouneia, George Abraham Information Technology Journal . 2013,第12期

机译：使用学习方式的自适应电子学习系统的强化学习方法
2. Reinforcement Learning Approach for Adaptive E-learning Systems using Learning Styles [J] . Balasubramanian Velusamy, S. Margret Anouneia, George Abraham Information Technology Journal . 2013,第12期

机译：使用学习方式的自适应电子学习系统的强化学习方法
3. Reinforcement Learning Approach for Adaptive E-learning Systems using Learning Styles [J] . Balasubramanian Velusamy, S. Margret Anouneia, George Abraham Information Technology Journal . 2013,第12期

机译：使用学习方式的自适应电子学习系统的强化学习方法
4. Learning Behavior Styles with Inverse Reinforcement Learning [C] . Seong Jae Lee, Zoran Popovic ACM SIGGRAPH 2010 : proceedings . 2010

机译：通过逆向强化学习来学习行为方式
5. The relationship of document and quantitative literacy with learning styles and selected personal variables for aerospace technology students at Indiana State University [D] . Martin, Royce Ann 1997

机译：印第安纳州立大学航空技术专业学生的文件和定量素养与学习风格和所选个人变量的关系
6. Correction: Linking Individual Learning Styles to Approach-Avoidance Motivational Traits and Computational Aspects of Reinforcement Learning [O] . Kristoffer Carl Aberg, Kimberly C. Doell, Sophie Schwartz -1

机译：纠正：将个人学习风格与避免方法的动机特征和强化学习的计算方面联系起来
7. Learning style detection based on cognitive skills to support adaptive learning environment – A reinforcement approach [O] . Balasubramanian V., Margret Anouncia S. 2016

机译：基于认知技能的学习风格检测，以支持自适应学习环境–一种强化方法

Rich-text document styling restoration via reinforcement learning

摘要

著录项

相似文献

相关主题

期刊订阅