首页> 外文期刊>Advanced Science Letters >Development of Modified Analytical Model for Investigating Acceptable Delay of TCP-Based Speech Recognition
【24h】

Development of Modified Analytical Model for Investigating Acceptable Delay of TCP-Based Speech Recognition

机译:改进分析模型调查基于TCP的语音识别的可接受的分析模型

获取原文
获取原文并翻译 | 示例
           

摘要

Many studies have proposed solutions to overcome the degradation of network speech recognition (NSR) caused by packet loss and jitter. The most popular cloud-based speech recognition systems, such as Google speech recognition and Apple Siri, have currently been employing TCP in cooperationwith HTTP. TCP as a reliable transport protocol appropriately deliver all speech data to the server but may have delays due to unexpected network condition. In order to achieve a satisfactory performance of NSR against TCP delay, an acceptable delay should be fulfilled. In this paper, a schemeof TCP-based NSR with a speech segmenter at the client side is proposed and an analytical model to investigate the acceptable delay is developed on the basis of a study of the stored streaming via TCP employing a discrete-time Markov model. The speech segmenter allows TCP to send the speechsignal sentence by sentence, so that the resulted texts are recognized using a language model. The acceptable delay is defined as a specified length of time required for the server to receive the entire data of sentence. A negative value of the number of early packets within the acceptabledelay bound indicates that the sentence streaming is slow. Our model is validated via ns-3 simulations. Moreover, the model is verified for a distribution of 2500 Indonesian sentences using TANGRAM-II to prove the real-time factor (RTF) of TCP-based speech recognition and to identifyits working region. The model advises that the real-time factor (RTF) is not achieved when loss rate is 0.014 and RTT is 100 ms. The streaming over TCP leads to a satisfactory performance within an acceptable delay of eight seconds when the loss rate is smaller than 0.05 and round-trip-timeis 100 ms. When the round-trip-time is doubled, the streaming works within an acceptable delay of 17 seconds.
机译:许多研究提出了克服由数据包丢失和抖动引起的网络语音识别(NSR)的降级的解决方案。最受欢迎的基于云语音识别系统(例如Google语音识别和Apple Siri)目前在HTTP中采用了TCP。 TCP作为可靠的传输协议适当地将所有语音数据传送到服务器,但由于意外的网络条件可能具有延迟。为了实现NSR对TCP延迟的令人满意的性能,应满足可接受的延迟。在本文中,提出了一种基于TCP的NSR的基于语音分段器的基于语音分段器的方案,并且基于采用离散时间马尔可夫模型的TCP存储流的研究,开发了一种研究可接受延迟的分析模型。语音分段器允许TCP按句子发送语音句子,以便使用语言模型识别所产生的文本。可接受的延迟定义为服务器接收句子的整个数据所需的指定时间长度。 AcceptabledElay绑定中的早期数据包数的负值表明句子流速度很慢。我们的模型通过 ns -3模拟验证。此外,使用Tangram-II验证了模型,验证了2500印度尼西亚语句子的分布,以证明基于TCP的语音识别的实时因素(RTF),并识别工作区域。该模型建议当损耗率为0.014而RTT为100毫秒时,无法实现实时因素(RTF)。当损耗速率小于0.05和往返时,100秒和往返时间100毫秒,TCP上的流率导致令人满意的性能。当往返时间加倍时,流媒体工作在17秒的可接受延迟范围内。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号