用于从视频生成背景音乐的双线转换器

ID：38957

大小：0.56 MB

页数：13页

时间：2023-03-14

金币：2

上传者：战必胜

Citation: Yang, X.; Yu, Y.; Wu, X.

Double Linear Transformer for

Background Music Generation from

Videos. Appl. Sci. 2022, 12, 5050.

https://doi.org/10.3390/app12105050

Academic Editors: Katia Lida

Kermanidis, Phivos Mylonas and

Manolis Maragoudakis

Received: 22 April 2022

Accepted: 13 May 2022

Published: 17 May 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

applied

sciences

Article

Double Linear Transformer for Background Music Generation

from Videos

Xueting Yang , Ying Yu * and Xiaoyu Wu

Faculty of Information and Communication Engineering, Communication University of China,

Beijing 100024, China; yangxueting@cuc.edu.cn (X.Y.); wuxiaoyu@cuc.edu.cn (X.W.)

* Correspondence: yuying@cuc.edu.cn; Tel.: +86-10-6577-9427

Abstract:

Many music generation research works have achieved effective performance, while rarely

combining music with given videos. We propose a model with two linear Transformers to generate

background music according to a given video. To enhance the melodic quality of the generated music,

we ﬁrstly input note-related and rhythm-related music features separately into each Transformer

network. In particular, we pay attention to the connection and the independence of music features.

Then, in order to generate the music that matches the given video, the current state-of-the-art cross-

modal inference method is set up to establish the relationship between visual mode and sound mode.

Subjective and objective experiment indicate that the generated background music matches the video

well and is also melodious.

Keywords: video background music generation; music feature extraction; linear Transformer

1. Introduction

Music can effectively convey information and express emotions. Compared with

silent videos, an appropriate background music can make the video content easier to

understand and accept. However, in daily life, generating video soundtrack is often a

technical and time-consuming work. It requires the selection of suitable music from a large

amount of music and needs people capable of using speciﬁc tools to edit the corresponding

audio paragraphs. Furthermore, the existing methods cannot automatically customize the

appropriate background music for the given video. To address these problems, this paper

proposes an automatic background music generation model with two linear Transformers

training jointly. This method ensures the convenience in use as well as the music uniqueness.

At the same time, after a large amount of data training, it ensures both the rhythmicity of

the generated music and a high degree of matching with the given video.

For the tasks related to the automatic generation of video background music, there

have been many excellent achievements, such as music generation and video-audio match-

ing tasks. However, as far as we know, the combination of generated music and video

associations has not been considered for most of the existing works. Many works on music

generation focus on music generation itself [

], and recently, more studies have paid atten-

tion to controllable music generation [

–

], while seldom [

] combining music generation

with videos. As a result, the generating music cannot meet the background requirement

for a given video. Furthermore, since there is no paired video–background music dataset,

the existing video background music generation methods [

] skillfully established the

corresponding relationship between video features and music elements, and then used the

video features to change the music elements for different given videos. Although these

approaches have achieved breakthrough results, they have paid less attention to the rela-

tionship and the independence of musical elements, which has led to a weak melodiousness.

In this article, the proposed model improves the extraction of musical elements with two

linear Transformers [

] training jointly and using the above inference method to improve

the rhythm of the generated music as well as matching the given video.

Appl. Sci. 2022, 12, 5050. https://doi.org/10.3390/app12105050 https://www.mdpi.com/journal/applsci

资源描述：

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 13



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

用于从视频生成背景音乐的双线转换器

最近更新

大家都在看

相关文章

相关标签