Georgia Tech的音乐技术组还有三篇不用深度学习的音乐表演评价文章。都是在Florida Bandmasters Association dataset这个数据库上做的。前两篇十分类似,最后一篇用sparse coding来无监督学特征。
其实整个三篇都没什么亮点,review的目的是吃吃鸡肋。
Towards the Objective Assessment of Music Performances
ICMPC 2016
此文可以不用读,直接跳到后两篇就好。。。
Vidwans, Amruta
Gururani, Siddharth
Wu, Chih-Wei
Subramanian, Vinod
Swaminathan, Rupak Vignesh
Lerch, Alexander
Objective descriptors for the assessment of student music performances
http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/06/Vidwans-et-al_2017_Objective-descriptors-for-the-assessment-of-student-music-performances.pdf
用DTW对齐了pitch track和reference score,然后将pitch track切成了notes。提取的特征分两类,一个是score-based,一类是score independent。两类特征合起来用效果最好。
Using score-based features, DTW alignment the pitch track with the reference score helps segment the pitch track into notes.
Features:
(1) note steadiness
(2) duration histogram
(3) DTW based feature, cost normalized by the DTW length and slope deviation
(4) note insertion ratio
(5) note deletion ratio
score-based and score independent features are the best performed one.
Dataset:
A subset of 394 students in Florida Bandmasters Association dataset, three grades and 4 assessment dimensions.
Wu, Chih-Wei
Lerch, Alexander
Learned Features for the Assessment of Percussive Music Performances
http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/01/Wu_Lerch_2018_Learned-Features-for-the-Assessment-of-Percussive-Music-Performances.pdf
此文用Sparce coding无监督的方式来学习特征,用来评价打击乐的表演。
数据库:274首中学生的军鼓表演。评价指标是musicality和节奏准确性。
特征:
Local histogram matrix (LHM)。使用了三个特征IOI, amplitude, average MFCC。将一整首表演切成10秒的小段落,对每个段落计算histogram。然后将这三个特征的histogram在特征维度和时间唯独concatenate一下变成一个矩阵。
模型:Sparse coding从LHM学出来的特征加上SVR回归
结果:Sparse coding+LHM和直接从LHM里面计算的统计量两种特征的效果相当。把两个特征一起用效果更好。
Using sparse coding unsupervised learning to learn features for the percussive music performance.
Dataset: 274 recordings of middle school snare etudes. Assessment of musicality and rhythmic accuracy.
Features:
LHM local histogram matrix:
(1) IOI (inter-onset interval) histogram vector
(2) Amplitude histogram vector
(3) Average MFCC vector
They segment the whole music piece into non-overlapped 10s segments, compute the local histogram vectors of above three features and concatenate these vector in both feature and time dimensions.
Baseline:
(1) a bunch of features
(2) statistics of LHM features (crest, skewness, ...)
(3) Sparse code of STFT
Model:
SVR
Metrics:
correlation coef and coef of determination
Results:
Learned features (Sparse code with LHM) achieve comparable results with the designed features, Finally, combining the designed features with the SC features, the highest performance can be achieved.