Citation: Huang, L.; Mao, F.; Zhang,
K.; Li, Z. Spatial-Temporal
Convolutional Transformer Network
for Multivariate Time Series
Forecasting. Sensors 2022, 22, 841.
https://doi.org/10.3390/
s22030841
Academic Editors: Yangquan Chen,
Subhas Mukhopadhyay,
Nunzio Cennamo, M. Jamal Deen,
Junseop Lee and Simone Morais
Received: 27 December 2021
Accepted: 19 January 2022
Published: 22 January 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Spatial-Temporal Convolutional Transformer Network for
Multivariate Time Series Forecasting
Lei Huang
1,2
, Feng Mao
1,2
, Kai Zhang
1,3
and Zhiheng Li
1,
*
1
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China;
huangl19@mails.tsinghua.edu.cn (L.H.); mf19@mails.tsinghua.edu.cn (F.M.);
zhangkai@sz.tsinghua.edu.cn (K.Z.)
2
Department of Automation, Tsinghua University, Beijing 100086, China
3
Research Institute of Tsinghua, Pearl River Delta, Guangzhou 510530, China
* Correspondence: zhhli@mail.tsinghua.edu.cn
Abstract:
Multivariate time series forecasting has long been a research hotspot because of its wide
range of application scenarios. However, the dynamics and multiple patterns of spatiotemporal
dependencies make this problem challenging. Most existing methods suffer from two major short-
comings: (1) They ignore the local context semantics when modeling temporal dependencies. (2) They
lack the ability to capture the spatial dependencies of multiple patterns. To tackle such issues, we
propose a novel Transformer-based model for multivariate time series forecasting, called the spatial–
temporal convolutional Transformer network (STCTN). STCTN mainly consists of two novel attention
mechanisms to respectively model temporal and spatial dependencies. Local-range convolutional
attention mechanism is proposed in STCTN to simultaneously focus on both global and local context
temporal dependencies at the sequence level, which addresses the first shortcoming. Group-range
convolutional attention mechanism is designed to model multiple spatial dependency patterns at
graph level, as well as reduce the computation and memory complexity, which addresses the second
shortcoming. Continuous positional encoding is proposed to link the historical observations and
predicted future values in positional encoding, which also improves the forecasting performance.
Extensive experiments on six real-world datasets show that the proposed STCTN outperforms the
start-of-the-art methods and is more robust to nonsmooth time series data.
Keywords:
multivariate time series forecasting; spatiotemporal; convolutional Transformer; attention
mechanism
1. Introduction
Time series forecasting has a wide range of application scenarios in transportation,
finance, medical, and other fields. Precise forecasting of time series can help people
prepare for future changes, assist production management decisions, and demonstrate
its important application value in traffic jam prevention, financial investment decisions,
disease prevention, etc. [1–3].
The challenge of multivariate time series forecasting is the need to simultaneously
capture complex spatiotemporal dependencies, which are mainly reflected in two aspects:
•
Dynamic. Due to the changes in the external environment (such as events, weather,
etc.), the spatiotemporal dependencies will dynamically change over time.
•
Multiple patterns. Both temporal and spatial dependencies have multiple patterns.
The temporal dependencies not only depend on the pointwise value of the observation
point but also the local context of the surrounding observation points. In the spatial
dimension, we need to consider not only local connectivity but also global semantic
proximity. For example, in traffic time series, road nodes belonging to the same type of
functional area have strong global semantic proximity, although they are not adjacent
geographically [4,5].
Sensors 2022, 22, 841. https://doi.org/10.3390/s22030841 https://www.mdpi.com/journal/sensors