【24h】

Clustering the Unknown - The Youtube Case

机译:群集未知-Youtube案例

获取原文

摘要

Recent stringent end-user security and privacy requirements caused the dramatic rise of encrypted video streams in which YouTube encrypted traffic is one of the most prevalent. Regardless of their encrypted nature, metadata derived from such traffic flows can be utilized to identify the title of a video, thus enabling the classification of video streams into a single video title using a given video title set. Nonetheless, scenarios where no video title set is present and a supervised approach is not feasible, are both frequent and challenging. In this paper we go beyond previous studies and demonstrate the feasibility of clustering unknown video streams into subgroups although no information is available about the title name. We address this problem by exploring Natural Language Processing (NLP) formulations and Word2vec techniques to compose a novel statistical feature in order to further cluster unknown video streams. Through our experimental results over real datasets we demonstrate that our methodology is capable to cluster 72 video titles out of 100 video titles from a dataset of 10,000 video streams. Thus, we argue that the proposed methodology could sufficiently contribute to the newly rising and demanding domain of encrypted Internet traffic classification.
机译:最近严格的最终用户安全性和隐私要求导致了加密视频流的急剧增长,其中YouTube加密流量是最流行的加密流之一。不管它们的加密性质如何,都可以利用从此类业务流派生的元数据来标识视频的标题,从而可以使用给定的视频标题集将视频流分类为单个视频标题。但是,没有视频标题集并且没有受监督的方法不可行的场景既频繁又充满挑战。在本文中,我们将超越以往的研究,并展示了将未知视频流聚类为子组的可行性,尽管尚无有关标题名称的信息。我们通过探索自然语言处理(NLP)公式和Word2vec技术来构成一个新颖的统计功能,以进一步对未知视频流进行聚类,从而解决了这一问题。通过对真实数据集的实验结果,我们证明了我们的方法能够将10,000个视频流的数据集中的100个视频标题中的72个视频标题聚类。因此,我们认为所提出的方法可以充分促进加密互联网流量分类的新兴领域和高要求领域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号