出处：arXiv: Artificial Intelligence, 2016（一年了还没中吗？）

Motivation

使用GAN+RNN来处理continuous sequential data，并训练生成古典音乐

Introduction

In this work, we investigate the feasibility of using adversarial training for a sequential model with continuous data, and evaluate it using classical music in freely available midi files.也就是利用GAN+RNN来处理midi file中的连续数据。RNN主要工作用于处理时序相关的自然语言，同时也被引入到了音乐生成的领域[1,2,3]，but to our knowledge they always use a symbolic representation. In contrast,our work demonstrates how one can train a highly flexible and expressive model with fully continuous sequence data for tone lengths, frequencies, intensities, and timing.作者还刻意提到了LapGAN实现coarse-to-fine的图片生成过程（个人思考：对音乐生成很有启发，包括利用双层GAN来从caption生成image，一层用于生成低分辨率的粗线条色彩图片，一层用于生成细节，这些思路应该可以结合到音乐生成中去）。

Model

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

对抗网络中的G和D都是RNN模型，损失函数定义为

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

The input to each cell in G is a random vector, concatenated with the output of previous cell.D采用的是双向循环RNN（LSTM）。数据方面构建了一个tone length, frequency, intensity, and time的四元数组，数据可以表示出复调和弦polyphonous chords。

G和D的LSTM层数皆设置为2，BaseLine为去掉对抗性的单一的RNN生成网络。训练集Dataset是从网上down下来的标准midi格式的古典音乐文件，对所有的”note on“事件进行了记录的读取（包括该note的其他属性，时延，tone，强度等等），代码地址：https://github.com/olofmogren/c-rnn-gan

Training过程中使用了很多小技巧：

使用L2 regularization对G和D的权重做正则化约束
The model was pretrained for 6 epochs with a squared error loss for predicting the next event in the
training sequence
the input to each LSTM cell is a random vector v, concatenated with the output at previous time step. v is uniformly distributed in [0; 1]k, and k
was chosen to be the number of features in each tone, 4.
在预训练时，对采样的序列长度做了管理，从小序列开始逐渐加大，最后变成长序列
采用了[4]中的freezen的trick，当D或G被训练得异常强大以至于对方梯度消失，无法正常进行训练时，对过于强大的一方实施冻结。这里采用的是A‘s training loss is less than 70% of the training loss of B时，冻结A
采用了[4]中的feature matching的trick，将G的目标函数替换为使真假样本的feature差值最小化：

　　《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记其中，R是D的最后一层（激活函数logistic之前）输出。

评估标准

Polyphony 复音是否在同一时间点开始

Scale consistency were computed by counting the fraction of tones that were part of a standard scale, and reporting the number for the best matching such scale.（标准音程是什么鬼？）

Repetitions 小节重复数量

Tone span 最高音和最低音的音程统计

评估工具代码也放在github上面了

结论

第一例通过GAN对抗训练来生成音乐的paper。从人耳听觉的感受上来说，c-RNN-GAN生成的音乐完全不能和真实样本相提并论，应该是单纯地进行对抗训练，单轨音调，缺乏先验乐理知识的融入的缘故导致。

sample 试听：http://mogren.one/publications/2016/c-rnn-gan/

[1]Douglas Eck and Juergen Schmidhuber. Finding temporal structure in music: Blues improvisation
with lstm recurrent networks. In Neural Networks for Signal Processing, 2002. Proceedings of the
2002 12th IEEE Workshop on, pages 747–756. IEEE, 2002.

[2]Pascal Vincent Nicolas Boulanger-Lewandowski, Yoshua Bengio. Modeling temporal dependencies
in high-dimensional sequences: Application to polyphonic music generation and transcription. In
Proceedings of the 29th International Conference on Machine Learning (ICML), page 1159–1166,
2012.

[3]Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. Seqgan: Sequence generative adversarial nets
with policy gradient. arXiv preprint arXiv:1609.05473, 2016.

[4]Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen.
Improved techniques for training gans. In Advances in Neural Information Processing Systems,
pages 2226–2234, 2016.

代码分析

Restore保存的参数：

'num_layers_g' ： RNN cell g的层数

'num_layers_d' ：RNN Cell D的层数

'meta_layer_size'：

'hidden_size_g'：

'hidden_size_d':

'biscale_slow_layer_ticks':

'multiscale':

'disable_feed_previous':

'pace_events':

'minibatch_d':

'unidirectional_d':

'feature_matching':

'composer':选取训练集中哪个作曲家的风格来进行训练，如巴赫贝多芬......

do-not-redownload.txt存在，则不再下载新的midi文件

read_data函数读出的格式为[genre, composer, song_data]

这里组织了一个sources列表，键值为风格，艺术家

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

用python-midi读出midi_pattern后，遍历每一个track的每一个event,通过NoteOnEvent和NoteOffEvent记录每一个note的四个维度数值：

TICKS_FROM_PREV_START = 0

LENGTH = 1

FREQ = 2

VELOCITY = 3

最后，一首歌的所有的note被汇总到一个song_data的list中去了。每一个[genre, composer, song_data]代表一首歌的特征数据，这些数据被append到 loader.songs['validation']， loader.songs['test'] ，loader.songs['train']中去了。

创建模型训练时使用了l2正则项来避免过拟合：scope.set_regularizer(tf.contrib.layers.l2_regularizer(scale=FLAGS.reg_scale))

创建G，一个多层的LSTM：

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

输入噪声random_rnninputs的shape为[batch_size, songlength, int(FLAGS.random_input_scale*num_song_features)]，然后转换为list

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

---恢复内容结束---

出处：arXiv: Artificial Intelligence, 2016（一年了还没中吗？）

Motivation

使用GAN+RNN来处理continuous sequential data，并训练生成古典音乐

Introduction

In this work, we investigate the feasibility of using adversarial training for a sequential model with continuous data, and evaluate it using classical music in freely available midi files.也就是利用GAN+RNN来处理midi file中的连续数据。RNN主要工作用于处理时序相关的自然语言，同时也被引入到了音乐生成的领域[1,2,3]，but to our knowledge they always use a symbolic representation. In contrast,our work demonstrates how one can train a highly flexible and expressive model with fully continuous sequence data for tone lengths, frequencies, intensities, and timing.作者还刻意提到了LapGAN实现coarse-to-fine的图片生成过程（个人思考：对音乐生成很有启发，包括利用双层GAN来从caption生成image，一层用于生成低分辨率的粗线条色彩图片，一层用于生成细节，这些思路应该可以结合到音乐生成中去）。

Model

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

对抗网络中的G和D都是RNN模型，损失函数定义为

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

Training过程中使用了很多小技巧：

使用L2 regularization对G和D的权重做正则化约束
The model was pretrained for 6 epochs with a squared error loss for predicting the next event in the
training sequence
the input to each LSTM cell is a random vector v, concatenated with the output at previous time step. v is uniformly distributed in [0; 1]k, and k
was chosen to be the number of features in each tone, 4.
在预训练时，对采样的序列长度做了管理，从小序列开始逐渐加大，最后变成长序列
采用了[4]中的freezen的trick，当D或G被训练得异常强大以至于对方梯度消失，无法正常进行训练时，对过于强大的一方实施冻结。这里采用的是A‘s training loss is less than 70% of the training loss of B时，冻结A
采用了[4]中的feature matching的trick，将G的目标函数替换为使真假样本的feature差值最小化：

　　《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记其中，R是D的最后一层（激活函数logistic之前）输出。

评估标准

Polyphony 复音是否在同一时间点开始

Scale consistency were computed by counting the fraction of tones that were part of a standard scale, and reporting the number for the best matching such scale.（标准音程是什么鬼？）

Repetitions 小节重复数量

Tone span 最高音和最低音的音程统计

评估工具代码也放在github上面了

结论

sample 试听：http://mogren.one/publications/2016/c-rnn-gan/

[3]Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. Seqgan: Sequence generative adversarial nets
with policy gradient. arXiv preprint arXiv:1609.05473, 2016.

代码分析

Restore保存的参数：

'num_layers_g' ： RNN cell g的层数

'num_layers_d' ：RNN Cell D的层数

'meta_layer_size'：

'hidden_size_g'：

'hidden_size_d':

'biscale_slow_layer_ticks':

'multiscale':

'disable_feed_previous':

'pace_events':

'minibatch_d':

'unidirectional_d':

'feature_matching':

'composer':选取训练集中哪个作曲家的风格来进行训练，如巴赫贝多芬......

do-not-redownload.txt存在，则不再下载新的midi文件

read_data函数读出的格式为[genre, composer, song_data]

这里组织了一个sources列表，键值为风格，艺术家

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

用python-midi读出midi_pattern后，遍历每一个track的每一个event,通过NoteOnEvent和NoteOffEvent记录每一个note的四个维度数值：

TICKS_FROM_PREV_START = 0

LENGTH = 1

FREQ = 2

VELOCITY = 3

最后，一首歌的所有的note被汇总到一个song_data的list中去了。每一个[genre, composer, song_data]代表一首歌的特征数据，这些数据被append到 loader.songs['validation'] loader.songs['test'] loader.songs['train']中去了。

对于待训练的placeholder数据有：

self._input_songdata = tf.placeholder(shape=[batch_size, songlength, num_song_features], dtype=data_type())

self._input_metadata = tf.placeholder(shape=[batch_size, num_meta_features], dtype=data_type())

songdata_inputs将_input_songdata转成songlength个tensor的list，shape为[batch_size,num_song_features](这里用unstack要方便点吧，待测试)：

songdata_inputs = [tf.squeeze(input_, [1])

for input_ in tf.split(self._input_songdata, songlength, 1)]

创建模型训练时使用了l2正则项来避免过拟合：scope.set_regularizer(tf.contrib.layers.l2_regularizer(scale=FLAGS.reg_scale))

创建G的LSTM网络：

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

输入噪声random_rnninputs的shape为[batch_size, songlength, int(FLAGS.random_input_scale*num_song_features)]，然后转换为list（unstack？）

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

对G进行RNN的分步训练过程，每个循环是一步，输入为噪音random_rnninput和上一步的输出generated_point（两者concat为一个[batch_size,2*num_song_features]的tensor,第一步输出的初始化从均匀分布中采样）

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

对G还有个pretraining的过程，输入为噪音random_rnninputs和真实的sample songdata_input[i]

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

针对G的pretraining的loss是L2距离，注意这里的链表stack和[1,0,2]转置：

self.rnn_pretraining_loss = tf.reduce_mean(tf.squared_difference(x=tf.transpose(tf.stack(self._generated_features_pretraining), perm=[1, 0, 2]), y=self._input_songdata))

并加上一个正则项防止过拟合：

self.rnn_pretraining_loss = self.rnn_pretraining_loss+reg_loss

D采用了多（双）层双向LSTM，由于版本问题，我改写了一个多层lstm的接口:

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

要注意的是（1）由于bidirectional_dynamic_rnn每构建一次就会自动在名字空间中序号+1，所以用层数名来限定了scope（折腾了一天，是我菜还是tf太坑？）

（2）每次的输入_inputs需要把output中包含了bw和fw的tuple元组concat起来，每个tensor的shape为[batch_size,song_length,ouput_dim],其中output_dim和lstm隐层单元数量（状态数量）

一致,合并后shape为[batch_size,song_length,2×ouput_dim]

随后D将双向LSTM的输出全连接（output num = 1）并sigmoid映射为真假概率，同时输出output作为features，参与到feature loss的计算中去。

loss计算：

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记的更多相关文章

循环神经网络(RNN, Recurrent Neural Networks)介绍（转载）
循环神经网络(RNN, Recurrent Neural Networks)介绍这篇文章很多内容是参考:http://www.wildml.com/2015/09/recurrent-neur ...
循环神经网络(Recurrent Neural Networks, RNN)介绍
目录 1 什么是RNNs 2 RNNs能干什么 2.1 语言模型与文本生成Language Modeling and Generating Text 2.2 机器翻译Machine Translati ...
《转》循环神经网络(RNN, Recurrent Neural Networks)学习笔记：基础理论
转自 http://blog.csdn.net/xingzhedai/article/details/53144126 更多参考:http://blog.csdn.net/mafeiyu80/arti ...
转：RNN(Recurrent Neural Networks)
RNN(Recurrent Neural Networks)公式推导和实现 http://x-algo.cn/index.php/2016/04/25/rnn-recurrent-neural-net ...
RNN(Recurrent Neural Networks)公式推导和实现
RNN(Recurrent Neural Networks)公式推导和实现 http://x-algo.cn/index.php/2016/04/25/rnn-recurrent-neural-net ...
The Unreasonable Effectiveness of Recurrent Neural Networks (RNN)
http://karpathy.github.io/2015/05/21/rnn-effectiveness/ There’s something magical about Recurrent Ne ...
循环神经网络(RNN, Recurrent Neural Networks)介绍
原文地址: http://blog.csdn.net/heyongluoyao8/article/details/48636251# 循环神经网络(RNN, Recurrent Neural Netw ...
简述RNN Recurrent Neural Networks
本文结构: 什么是 Recurrent Neural Networks ? Recurrent Neural Networks 的优点和应用? 训练 Recurrent Neural Networks ...
Pixel Recurrent Neural Networks翻译
Pixel Recurrent Neural Networks 目前主要在用的文档存放: https://www.yuque.com/lart/papers/prnn github存档: https: ...

随机推荐

IO流
流的概念和作用学习JavaIO,不得不提到的就是JavaIO流. 流是一组有顺序的,有起点和终点的字节集合,是对数据传输的总称或抽象.即数据在两设备间的传输称为流,流的本质是数据传输,根据数据传输特 ...
couldn&&num;39&semi;t find setter for xxxxx
springmvc hibernate框架,在执行普通的插入操作时报错,仔细检查实体类文件,确认该属性的getter setter方法都是有的,哭哭(´;︵;`)jpa策略生成的当然有.可是老是报这个 ...
Android-小tips
1.只保留float类型的一位小数, String.format("%.1f", float值) 2.android edittext 限制输入内容: android:d ...
Node&period;SelectNodes
http://www.crifan.com/csharp_under_some_node_search_specific_child_node/ https://msdn.microsoft.com/ ...
Oracle数据库初级学习 2
今天我们介绍Oracle数据库中剩余的查询方法,今天的查询方法会比昨天的更为复杂一些(PS:我也是个初学者,请见谅..). 一.分组函数分组函数是为了区分同一个表中的不同数据而建立,其关键字为GRO ...
C++11 并发指南------std&colon;&colon;thread 详解
参考: https://github.com/forhappy/Cplusplus-Concurrency-In-Practice/blob/master/zh/chapter3-Thread/Int ...
C++链接器工具错误：LNK2001, LNK2019（转载）
这是归属于链接器工具错误这一类. 无法解析的外部符号“symbol” 代码引用了链接器无法在库和对象文件中找到的内容(如函数.变量或标签). 可能的原因代码请求的内容不存在(例如,符号拼写错误或使 ...
通过配置Windows 防火墙允许使用TCP/IP协议远程访问数据库
原文:通过配置Windows 防火墙允许使用TCP/IP协议远程访问数据库本文适用于:2005.2008.2008R2所有版本为了可以通过TCP/IP协议远程访问SQLServer数据库,需要做以 ...
03&lowbar; if 练习 &lowbar; little2big
<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <title&g ...
POJ 3384 放地毯【半平面交】
<题目链接> 题目大意: 给出一个凸多边形的房间,根据风水要求,把两个圆形地毯铺在房间里,不能折叠,不能切割,可以重叠.问最多能覆盖多大空间,输出两个地毯的圆心坐标.多组解输出其中一个,题 ...