The recent success of Artificial Intelligence (AI) is rooted into severalconcomitant factors, namely theoretical progress coupled withabundance of data and computing power. Large companies can takeadvantage of a deluge of data, typically withhold from the researchcommunity due to privacy or business sensitivity concerns, andthis is particularly true for networking data. Therefore, the lackof high quality data is often recognized as one of the main factorscurrently limiting networking research from fully leveraging AImethodologies potential.Following numerous requests we received from the scientificcommunity, we release AppClassNet, a commercial-grade datasetfor benchmarking traffic classification and management methodologies.AppClassNet is significantly larger than the datasets generallyavailable to the academic community in terms of both the numberof samples and classes, and reaches scales similar to the popularImageNet dataset commonly used in computer vision literature. Toavoid leaking user- and business-sensitive information, we opportunelyanonymized the dataset, while empirically showing that itstill represents a relevant benchmark for algorithmic research. Inthis paper, we describe the public dataset and our anonymizationprocess. We hope that AppClassNet can be instrumental for otherresearchers to address more complex commercial-grade problemsin the broad field of traffic classification and management.
展开▼