Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network

Mona Kirstin Fehling; Fabian Grosch; Maria Elke Schuster; Bernhard Schick; J?rg Lohscheller

首页> 外文期刊>PLoS One >Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network

【24h】

Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network

机译：使用深卷积LSTM网络全自动地分割内镜喉部高速视频的光泽和声带

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process. In this work we propose for the first time a procedure to fully automatically segment not only the time-varying glottal area but also the vocal fold tissue directly from laryngeal high-speed video (HSV) using a deep Convolutional Neural Network (CNN) approach. Eighteen different Convolutional Neural Network (CNN) network configurations were trained and evaluated on totally 13,000 high-speed video (HSV) frames obtained from 56 healthy and 74 pathologic subjects. The segmentation quality of the best performing Convolutional Neural Network (CNN) model, which uses Long Short-Term Memory (LSTM) cells to take also the temporal context into account, was intensely investigated on 15 test video sequences comprising 100 consecutive images each. As performance measures the Dice Coefficient (DC) as well as the precisions of four anatomical landmark positions were used. Over all test data a mean Dice Coefficient (DC) of 0.85 was obtained for the glottis and 0.91 and 0.90 for the right and left vocal fold (VF) respectively. The grand average precision of the identified landmarks amounts 2.2 pixels and is in the same range as comparable manual expert segmentations which can be regarded as Gold Standard. The method proposed here requires no user interaction and overcomes the limitations of current semiautomatic or computational expensive approaches. Thus, it allows also for the analysis of long high-speed video (HSV)-sequences and holds the promise to facilitate the objective analysis of vocal fold vibrations in clinical routine. The here used dataset including the ground truth will be provided freely for all scientific groups to allow a quantitative benchmarking of segmentation approaches in future.

机译：物流折叠振动动态特性的客观调查要求喉头高速视频（HSV）的记录和进一步定量分析。声音折叠振动模式的定量需要作为第一步骤，每个录像帧内的光泽区域的分割通常导出声带的振动边缘。因此，任何进一步的振动分析的结果取决于该初始分割过程的质量。在这项工作中，我们第一次提出了一种过程，不仅可以完全自动分割，而且不仅使用深卷积神经网络（CNN）方法直接从喉部高速视频（HSV）直接来自喉部高速视频（HSV）的声音折叠组织。培训并在从56个健康和74个病理受试者获得的完全13,000个高速视频（HSV）帧上进行培训和评估18个不同的卷积神经网络（CNN）网络配置。在包括每个连续图像的15个测试视频序列上，在包括每个连续图像的15个测试视频序列中，使用长短期存储器（LSTM）单元进行时间内容的最佳卷积神经网络（CNN）模型的分割质量。随着性能测量骰子系数（DC）以及使用四个解剖标记位置的精确。在所有测试数据中，分别为左右声带（VF）的光泽和0.91和0.90获得0.85的平均骰子系数（DC）。所识别的地标量的宏观平均精度为2.2像素，与可比手动专家分段相同，可以被视为金标准。这里所提出的方法不需要用户交互，并克服当前半自动或计算昂贵方法的局限性。因此，它还允许分析长高速视频（HSV）序列，并持有促进临床常规中声带振动的客观分析。这里使用包括地面真理的使用数据集将自由为所有科学群体提供，以允许将来进行分割方法的定量基准。

著录项

来源
《PLoS One》 |2020年第2期|共29页
作者
Mona Kirstin Fehling; Fabian Grosch; Maria Elke Schuster; Bernhard Schick; J?rg Lohscheller;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词

相似文献

外文文献
中文文献
专利

1. ANALYSIS OF HUMAN VOCAL FOLD VIBRATIONS BY MEANS OF MEASUREMENT OF FLOW VELOCITY VARIATIONS JUST ABOVE THE GLOTTIS AND SIMULTANEOUS OBSERVATION OF VOCAL FOLD MOVEMENT USING HIGH-SPEED DIGITAL CAMERA [J] . Shiro ARll Noise & vibration bulletin . 2009,第2期

机译：通过测量流率变化的手段分析人声褶皱振动，同时利用高速数码相机同时观察声褶皱运动
2. A generalized procedure for analyzing sustained and dynamic vocal fold vibrations from laryngeal high-speed videos using phonovibrograms [J] . Unger Jakob, Schuster Maria, Hecker Dietmar J., Artificial intelligence in medicine . 2016,第Jana期

机译：使用声振动图分析喉部高速视频中持续和动态声带振动的通用程序
3. Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos [J] . IEEE transactions on multimedia . 2020,第3期

机译：深度多核卷积LSTM网络和基于注意力的视频机制
4. Automated Segmentation of the Vocal Folds in Laryngeal Endoscopy Videos Using Deep Convolutional Regression Networks [C] . Ali Hamad, Megan Haney, Teresa E. Lever, IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . 2019

机译：使用深度卷积回归网络自动分割喉镜检查视频中的声带
5. Segmentation of laryngeal High-Speed Videoendoscopy in temporal domain using paired active contours [D] . Moukalled, Habib J. 2009

机译：使用配对活动轮廓在时域内对喉高速视频内窥镜进行分割
6. Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network [O] . Mona Kirstin Fehling, Fabian Grosch, Maria Elke Schuster, 2020

机译：使用深度卷积LSTM网络对喉镜内窥镜高速视频中的声门和声带进行全自动分割
7. Two step convolutional neural network for automatic glottis localization and segmentation in stroboscopic videos [O] . Varun Belagali, Achuth Rao M V, Pebbili Gopikishore, 2020

机译：用于自动发光本地化和频闪视频分割的两步卷积神经网络

Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network

摘要

著录项

相似文献

相关主题

期刊订阅