Citation: Siahaan, D.; Darnoto, B.R.P.
A Novel Framework to Detect
Irrelevant Software Requirements
Based on MultiPhiLDA as the Topic
Model. Informatics 2022, 9, 87.
https://doi.org/10.3390/
informatics9040087
Academic Editors: Sanjay Misra,
Robertas Damaševiˇcius and
Bharti Suri
Received: 27 August 2022
Accepted: 17 October 2022
Published: 27 October 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
A Novel Framework to Detect Irrelevant Software
Requirements Based on MultiPhiLDA as the Topic Model
Daniel Siahaan *
,†
and Brian Rizqi Paradisiaca Darnoto
Informatics Department, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia
* Correspondence: daniel@if.its.ac.id; Tel.:+62-31-5939214
† Current address: Gedung Informatika, Jl. Teknik Kimia, Kampus ITS Sukolilo, Surabaya 60111, Indonesia.
Abstract:
Noise in requirements has been known to be a defect in software requirements specifica-
tions (SRS). Detecting defects at an early stage is crucial in the process of software development.
Noise can be in the form of irrelevant requirements that are included within an SRS. A previ-
ous study had attempted to detect noise in SRS, in which noise was considered as an outlier.
However, the resulting method only demonstrated a moderate reliability due to the overshad-
owing of unique actor words by unique action words in the topic–word distribution. In this study,
we propose a framework to identify irrelevant requirements based on the MultiPhiLDA method.
The proposed framework distinguishes the topic–word distribution of actor words and action words
as two separate topic–word distributions with two multinomial probability functions. Weights are
used to maintain a proportional contribution of actor and action words. We also explore the use
of two outlier detection methods, namely percentile-based outlier detection (PBOD) and angle-
based outlier detection (ABOD), to distinguish irrelevant requirements from relevant requirements.
The experimental results show that the proposed framework was able to exhibit better performance
than previous methods. Furthermore, the use of the combination of ABOD as the outlier detection
method and topic coherence as the estimation approach to determine the optimal number of top-
ics and iterations in the proposed framework outperformed the other combinations and obtained
sensitivity, specificity, F1-score, and G-mean values of 0.59, 0.65, 0.62, and 0.62, respectively.
Keywords:
angle-based outlier detection; percentile-based outlier detection; multiphilda; noise;
irrelevant software requirements
1. Introduction
The requirements specification process is one of the key stages in a software devel-
opment project that determines its success. Researchers indicate that 40–60% of software
development project failures originate in requirements specification [
1
,
2
]. Requirements
dictate how the product is designed and implemented in the following stages [
3
]. Overlook-
ing software requirements during the requirements specification process can cause future
threats and failures in the operational phase [
4
]. Furthermore, the cost of detecting and
correcting defects increases exponentially as the software progresses along the software
development life cycle (SDLC) [
5
]. Therefore, it is essential and critical to carry out an
effective requirements specification.
In an object-oriented approach, requirements engineers deliver an SRS, which is a doc-
ument that serves as a guideline for the subsequent processes of software development [
6
].
Therefore, the SRS should have a set of quality attributes in order to maintain the quality of
the end product. Nevertheless, requirements engineers often fail to comply with the quality
attributes and produce ambiguity, contradictions, forward references, over-specifications,
and noise in SRS [
7
,
8
]. Meyer [
8
] argues that the use of natural language to specify software
requirements is the cause of these problems. This study focuses on noise, as it is a mistake
Informatics 2022, 9, 87. https://doi.org/10.3390/informatics9040087 https://www.mdpi.com/journal/informatics