Hawthorne, Curtis

Elsen, Erich

Song, Jialin

Roberts, Adam

Simon, Ian

Raffel, Colin

Engel, Jesse

Oore, Sageev

Eck, Douglas


此文的方法用了onset和frame两个objectives。在学习时同时minimize这两个losses,在inference时用onset来限制frame level的pitch prediction. 结果比之前的state-of-the-art好了去了。

另外源代码公开 https://goo.gl/7zTMPf,

Colab demo: https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/magenta/onsets_frames_transcription/onsets_frames_transcription.ipynb

附带的blog演示内容丰富: https://magenta.tensorflow.org/onsets-frames


Two task and two objectives learning for piano transcription.

Our previously reviewed works only use NN to predict the pitch at frame level. This work predicts both pitch and onset by jointly minimize these two losses.

In the inference, they added some restrictions, such as an activation from the frame detector is only allowed to start the note when an onset is presented in that frame. The frame loss is weighted according to the distance between the current frame to the onset frame.


On all evaluation metrics (1) frame (2) note, (3) note with offset, this method is much better than the state-of-the-art.


(1) The input representation of the NN is not a small frame context, but 20 seconds segments.

(2) There is a network connection between onset output representation to the frame BLSTM. I guess the intuitive behind this is to combine onset feature with frame feature.

(3) They didn't share the conv layer for learning both frame and onset representations. Apparently, this allows learning better features for onset and frame.

(4) The performance gain in this work mainly comes from two sources - joint onset and frame training and restricted inference.

  • 发表于:
  • 原文链接http://kuaibao.qq.com/s/20180501G06OBF00?refer=cp_1026
  • 腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号(企鹅号)传播渠道之一,根据《腾讯内容开放平台服务协议》转载发布内容。
  • 如有侵权,请联系 cloudcommunity@tencent.com 删除。