用于从视频生成背景音乐的双线转换器

ID:38957

大小:0.56 MB

页数:13页

时间:2023-03-14

金币:2

上传者:战必胜
Citation: Yang, X.; Yu, Y.; Wu, X.
Double Linear Transformer for
Background Music Generation from
Videos. Appl. Sci. 2022, 12, 5050.
https://doi.org/10.3390/app12105050
Academic Editors: Katia Lida
Kermanidis, Phivos Mylonas and
Manolis Maragoudakis
Received: 22 April 2022
Accepted: 13 May 2022
Published: 17 May 2022
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
applied
sciences
Article
Double Linear Transformer for Background Music Generation
from Videos
Xueting Yang , Ying Yu * and Xiaoyu Wu
Faculty of Information and Communication Engineering, Communication University of China,
Beijing 100024, China; yangxueting@cuc.edu.cn (X.Y.); wuxiaoyu@cuc.edu.cn (X.W.)
* Correspondence: yuying@cuc.edu.cn; Tel.: +86-10-6577-9427
Abstract:
Many music generation research works have achieved effective performance, while rarely
combining music with given videos. We propose a model with two linear Transformers to generate
background music according to a given video. To enhance the melodic quality of the generated music,
we firstly input note-related and rhythm-related music features separately into each Transformer
network. In particular, we pay attention to the connection and the independence of music features.
Then, in order to generate the music that matches the given video, the current state-of-the-art cross-
modal inference method is set up to establish the relationship between visual mode and sound mode.
Subjective and objective experiment indicate that the generated background music matches the video
well and is also melodious.
Keywords: video background music generation; music feature extraction; linear Transformer
1. Introduction
Music can effectively convey information and express emotions. Compared with
silent videos, an appropriate background music can make the video content easier to
understand and accept. However, in daily life, generating video soundtrack is often a
technical and time-consuming work. It requires the selection of suitable music from a large
amount of music and needs people capable of using specific tools to edit the corresponding
audio paragraphs. Furthermore, the existing methods cannot automatically customize the
appropriate background music for the given video. To address these problems, this paper
proposes an automatic background music generation model with two linear Transformers
training jointly. This method ensures the convenience in use as well as the music uniqueness.
At the same time, after a large amount of data training, it ensures both the rhythmicity of
the generated music and a high degree of matching with the given video.
For the tasks related to the automatic generation of video background music, there
have been many excellent achievements, such as music generation and video-audio match-
ing tasks. However, as far as we know, the combination of generated music and video
associations has not been considered for most of the existing works. Many works on music
generation focus on music generation itself [
1
,
2
], and recently, more studies have paid atten-
tion to controllable music generation [
3
5
], while seldom [
6
] combining music generation
with videos. As a result, the generating music cannot meet the background requirement
for a given video. Furthermore, since there is no paired video–background music dataset,
the existing video background music generation methods [
6
] skillfully established the
corresponding relationship between video features and music elements, and then used the
video features to change the music elements for different given videos. Although these
approaches have achieved breakthrough results, they have paid less attention to the rela-
tionship and the independence of musical elements, which has led to a weak melodiousness.
In this article, the proposed model improves the extraction of musical elements with two
linear Transformers [
7
] training jointly and using the above inference method to improve
the rhythm of the generated music as well as matching the given video.
Appl. Sci. 2022, 12, 5050. https://doi.org/10.3390/app12105050 https://www.mdpi.com/journal/applsci
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭