%%%%%%%%%%%%%%%%%%%%%%%% Classification of Consumer Video by Soundtrack Courtenay Cotton cvcotton@ee.columbia.edu 11/30/2010 %%%%%%%%%%%%%%%%%%%%%%%% Introduction: This package contains audio data and Matlab code to reproduce the baseline concept classification experiments (single-Gaussian models only) described in: K. Lee and D. Ellis (2010). Audio-Based Semantic Concept Classification for Consumer Video. IEEE Trans. Audio, Speech, and Lang. Proc, vol. 18 no. 6 pp. 1406-1416, Aug 2010. %%%%%%%%%%%%%%%%%%%%%%%% Contents: - 1873 consumer video soundtracks (in mp3 format) - train, validation, and test lists for 5 partitions of the data, with binary labels over 25 concept classes - list of concept names - Matlab code to perform concept classification experiment %%%%%%%%%%%%%%%%%%%%%%%% Installation and Use: To run this code, you just need it to be in your Matlab path. However, you will need to install three external packages to run the experiment: - mp3read: www.ee.columbia.edu/~dpwe/resources/matlab/mp3read.html You won't need mp3write. - rastamat: www.ee.columbia.edu/~dpwe/resources/matlab/rastamat - LIBSVM's Matlab interface: www.csie.ntu.edu.tw/~cjlin/libsvm/#matlab You will need the package that is maintained by the LIBSVM authors at National Taiwan University. To run the experiment: "runBaseline(distType,plotResults,dataDir)" distType = 1 for Mahalanobis distance (default), 2 for KL divergence plotResults = 1 to display bar graph of average precision results (default), 0 otherwise dataDir = location of 'data' folder with mp3s and labels (by default, looks in the same directory where "runBaseline.m" resides) %%%%%%%%%%%%%%%%%%%%%%%% Experiment overview: The function "runBaseline.m" computes the mean and covariance of MFCC features for each soundtrack file. It then performs the following experiment over each of the 5 partitions of the data files. The distances between files are computed according to the distType specified. For each of the 25 concepts, an SVM is trained with a kernel created from these distances. (Actually, optimal SVM parameters {gamma,C} are first selected by repeatedly training SVMs and testing on the validation set. For the best parameter settings, a final SVM for the concept is trained.) For each concept, files in the test set are ranked according to the SVM decision values and the performance is evaluated by calculating the average precision (AP) of this list. The average AP results over the 5 experiments are reported and displayed, compared with the expected AP of guessing. %%%%%%%%%%%%%%%%%%%%%%%% Results: If everything is working correctly you should get approximately the following results (for distType = 1, Mahalanobis): %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Average Precision results: %% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 1.animal: 0.314 2.baby: 0.333 3.beach: 0.274 4.birthday: 0.302 5.boat: 0.189 6.crowd: 0.688 7.graduation: 0.261 8.group of three or more: 0.891 9.group of two: 0.261 10.museum: 0.196 11.night: 0.414 12.one person: 0.416 13.park: 0.275 14.picnic: 0.225 15.playground: 0.146 16.show: 0.588 17.sports: 0.216 18.sunset: 0.300 19.wedding: 0.370 20.dancing: 0.310 21.parade: 0.196 22.singing: 0.589 23.ski: 0.394 24.cheer: 0.619 25.music: 0.864 Mean Average Precision (MAP) over all concepts: 0.385 %%%%%%%%%%%%%%%%%%%%%%%%