Citation: Liu, X.; Li, G.; Chen, W.;
Liu, B.; Chen, M.; Lu, S. Detection of
Dense Citrus Fruits by Combining
Coordinated Attention and
Cross-Scale Connection with
Weighted Feature Fusion. Appl. Sci.
2022, 12, 6600. https://doi.org/
10.3390/app12136600
Academic Editor: Rubén
Usamentiaga
Received: 27 May 2022
Accepted: 27 June 2022
Published: 29 June 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Detection of Dense Citrus Fruits by Combining Coordinated
Attention and Cross-Scale Connection with Weighted Feature Fusion
Xiaoyu Liu
1
, Guo Li
1,
*, Wenkang Chen
1
, Binghao Liu
2
, Ming Chen
1
and Shenglian Lu
1,
*
1
Guangxi Key Lab of Multisource Information Mining & Security, College of Computer Science & Engineering,
Guangxi Normal University, Guilin 541004, China; xy_liu666@126.com (X.L.);
cwk1031645988@gmail.com (W.C.); mingchen@gxnu.edu.cn (M.C.)
2
Guangxi Citrus Breeding and Cultivation Engineering Technology Center,
Guangxi Academy of Specialty Crops, Guilin 541004, China; liubh-311@126.com
* Correspondence: liguo@gxnu.edu.cn (G.L.); lsl@gxnu.edu.cn (S.L.)
Abstract:
The accuracy detection of individual citrus fruits in a citrus orchard environments is one of
the key steps in realizing precision agriculture applications such as yield estimation, fruit thinning,
and mechanical harvesting. This study proposes an improved object detection YOLOv5 model to
achieve accurate the identification and counting of citrus fruits in an orchard environment. First, the
latest visual attention mechanism coordinated attention module (CA) was inserted into an improved
backbone network to focus on fruit-dense regions to recognize small target fruits. Second, an efficient
two-way cross-scale connection and weighted feature fusion BiFPN in the neck network were used to
replace the PANet multiscale feature fusion network, giving effective feature corresponding weights
to fully fuse the high-level and bottom-level features. Finally, the varifocal loss function was used
to calculate the model loss for better model training results. The results of the experiments on four
varieties of citrus trees showed that our improved model proposed to this study could effectively
identify dense small citrus fruits. Specifically, the recognized AP (average precision) reached 98.4%,
and the average recognition time was 0.019 s per image. Compared with the original YOLOv5
(including deferent variants of n, s, m, l, and x), the increase in the average accuracy precision of
the improved YOLOv5 ranged from 7.5% to 0.8% while maintaining similar average inference time.
Four different citrus varieties were also tested to evaluate the generalization performance of the
improved model. The method can be further used as a part in a vision system to provide technical
support for the real-time and accurate detection of multiple fruit targets during mechanical picking
in citrus orchards.
Keywords: computer vision; citrus detection; YOLOv5; small objects; real-time detection
1. Introduction
Early yield estimates of fruits in orchards can help to plan subsequent fertilization and
other operations more accurately. Currently, precision agriculture faces various challenges
to its development. First, the proper selection and application of models among the
various available models is important. Moreover, following the advanced techniques in
computer vision and deep learning, the influence of factors of the natural environment on
the application of these techniques can be studied [
1
]. With the rapid progress of artificial
intelligence (AI), AI-driven technical tools and solutions [
2
] have shown their profitability
and potentiality in addressing farming problems including monitoring crop status and
field production management such as pest monitoring, fruit thinning, and mechanical
harvesting [
3
]. Therefore, developing a low-cost, highly maneuverable computer vision
system for small targets to perform fruit recognition on orchard trees is of great significance
to precision agriculture.
Before deep learning technologies became popular, traditional computer techniques
were often used to detect fruit. Simple visual features such as circular Hough transform
Appl. Sci. 2022, 12, 6600. https://doi.org/10.3390/app12136600 https://www.mdpi.com/journal/applsci