Citation: Velarde-Alvarado, P.;
Gonzalez, H.; Martínez-Peláez, R.;
Mena, L.J.; Ochoa-Brust, A.;
Moreno-García, E.; Félix, V.G.; Ostos,
R. A novel framework for generating
personalized network datasets for
NIDS, based on traffic aggregation.
Sensors 2022, 22, 1847. https://
doi.org/10.3390/s22051847
Academic Editors: Alexios Mylonas
and Nikolaos Pitropakis
Received: 30 December 2021
Accepted: 6 February 2022
Published: 26 February 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
A Novel Framework for Generating Personalized Network
Datasets for NIDS Based on Traffic Aggregation
Pablo Velarde-Alvarado
1,†
, Hugo Gonzalez
2,†
, Rafael Martínez-Peláez
3
, Luis J. Mena
4,
* ,
Alberto Ochoa-Brust
5
, Efraín Moreno-García
6
, Vanessa G. Félix
4
and Rodolfo Ostos
4
1
Unidad Académica de Ciencias Básicas e Ingenierías, Universidad Autónoma de Nayarit,
Tepic 63000, Mexico; pvelarde@uan.edu.mx
2
Academia de Tecnologías de la Información y Telemática, Universidad Politécnica de San Luis Potosí,
San Luis Potosí 78363, Mexico; hugo.gonzalez@upslp.edu.mx
3
Facultad de Ingenierías y Tecnologías, Universidad De La Salle Bajío, Av. Universidad 602,
León 37150, Mexico; rmartinezp@delasalle.edu.mx
4
Unidad Académica de Computación, Universidad Politécnica de Sinaloa, Ctra. Libre Mazatlán Higueras
Km 3, Mazatlán 82199, Mexico; vfelix@upsin.edu.mx (V.G.F.); rostos@upsin.edu.mx (R.O.)
5
Facultad de Ingeniería Mecánica y Eléctrica, Universidad de Colima, Av. Universidad 333,
Colima 28040, Mexico; aochoa@ucol.mx
6
Dirección de Posgrado e investigación, Instituto Tecnológico de Tepic, Tepic 63175, Mexico;
emoreno@ittepic.edu.mx
* Correspondence: lmena@upsin.edu.mx
† These authors contributed equally to this work.
Abstract:
In this paper, we addressed the problem of dataset scarcity for the task of network intrusion
detection. Our main contribution was to develop a framework that provides a complete process for
generating network traffic datasets based on the aggregation of real network traces. In addition, we
proposed a set of tools for attribute extraction and labeling of traffic sessions. A new dataset with
botnet network traffic was generated by the framework to assess our proposed method with machine
learning algorithms suitable for unbalanced data. The performance of the classifiers was evaluated in
terms of macro-averages of F1-score (0.97) and the Matthews Correlation Coefficient (0.94), showing
a good overall performance average.
Keywords:
intrusion detection; network security; traffic generation; machine learning; unbalanced
dataset; botnet detection
1. Introduction and Motivation
Nowadays, cybersecurity plays a fundamental role to ensure the usability and integrity
of the information technology and telecommunication infrastructure. These technologies
are fundamental for the common activities of organizations, enterprises, and individuals.
As an example, using perimeter security it is possible to implement a layered approach
protection, with the objective to identify and stop cyber attacks or network anomalies in the
inbound and outbound network traffic using network monitor techniques over a network
segment. In recent years, Network Intrusion Detection Systems (NIDS), based on anomalies
(Anomay-based NIDS), had been incorporated machine learning (ML) and deep learning
models to detect malicious network traffic patterns with excellent results [1,2].
NIDS and Intrusion Prevention Systems (IPS) are part of the defense strategies from
cybercriminal tactics and attacks. However, this task to be efficient requires some desirable
features in the NIDS, as such as: (1) fault tolerant; (2) minimum of human intervention
on the administration of the devices; (3) avoid excessive work over system resources;
(4) detect significant deviations from acceptable behavior; (5) high precision to minimize
false positives and false negatives; (6) detect all kinds of patterns and sophisticated attacks;
and (7) quick to detect intrusions and response effective to reduce possible damages [3,4].
Sensors 2022, 22, 1847. https://doi.org/10.3390/s22051847 https://www.mdpi.com/journal/sensors