C3D is a deep learning tool which is modified version of BVLC caffe to support 3D convolution and pooling. it was released by Facebook. In the field of human action recognition, C3D feature of video clip is the state-of-the-art feature. In this blog, I write some notes for using this tool in practice.
-j 32
means use 32 cores to compile it in parallel.
When something goes wrong, search the Internet, find a solution and update your makefile.
<string_path> <starting_frame> <label>
for example, /home/yunfeng/dataset/ucf101/ucf101_frm/YoYo/v_YoYo_g23_c01/ 1 100
NOTE: for video clip, starting_frame
starts from 0, but for frames, it starts from 1.
the format of output list file is like this:
<output_folder>
for example, /output/c3d/YoYo//v_YoYo_g23_c01/000001
prototxt/c3d_sport1m_feature_extractor_frm.prototxt
, you can adapt it to have right access to input list file.
We use tool named extract_image_features.bin
in build/tools
directory to extract features, the usage of it is
extract_image_features.bin <feature_extractor_prototxt_file> <c3d_pre_trained_model> <gpu_id> <mini_batch_size> <number_of_mini_batches> <output_prefix_file> <feature_name1> <feature_name2> ... We can use command below to extract feature:
GLOG_logtosterr=1 ../../build/tools/extract_image_features.bin prototxt/c3d_sport1m_feature_extractor_frm.prototxt conv3d_deepnetA_sport1m_iter_1900000 0 50 1 prototxt/output_list_prefix.txt fc71 fc61 prob After extraction of feature, we can use matlab code in script
subdirectory of example/c3d_feature_extraction
to do further job. There are two matlab files in script
, read_binary_blob.m
, read_binary_blob_preserve_shape.m
. There are used to transform features into binary blob data, We can use these two functions for further analysis of features.
Since C3D is a fork of Caffe, which is a fast open framework for deep learning, We can use C3D to train deep networks. You can train from scratch or fine-tune C3D on your own dataset.
After extracting features for batchs in each video using C3D tools, in orde to use SVM to classify the videos, we must get a descriptor for each video. We average the c3d features for each video, i.e., sum up those 4096 dimension’s data and calculate the mean of them. I use matlab code below to do this job(using offered funtion read_binary_blob
):
function []= read_ucf101_c3d_feat(output_list_relative)
% Read c3d features (fc6) for videos in ucf101 dataset.
% For each video, average all its features and get a video descriptor.
% rather than fileread, importdata save each line separetely.
dir_list = importdata(output_list_relative);
dim_feat = 4096;
for i = 1 : size(dir_list, 1)
dir_str = char(dir_list(i));
feat_files = dir([dir_str, '/*.fc6-1']);
num_feat = length(feat_files);
feat = zeros(num_feat, dim_feat);
for j = 1 : num_feat
feat_path = strcat(dir_str, '/', feat_files(j).name);
[~, feat(j,:)] = read_binary_blob(feat_path);
end
avg_feat = mean(feat, 1);
avg_feat_double = double(avg_feat);
fID = fopen(strcat(dir_str, '/c3d.fc6'), 'w');
% libsvm requires that input data must be double
fwrite(fID, avg_feat_double, 'double');
fclose(fID);
end
end
The input parameter is the file each line is a relative path to each video frames from the location of script. for example:
../output/c3d/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01
It takes about ten minutes to run this script.
When use libsvm to classify video features, there are two phases: training and testing. The declaration of training and testing functions are:
model = libsvmtrain(train_label_vector, train_data_matrix, options);
[label, accuracy, prob] = libsvmpredict(test_label_vector, test_data_matrix, model, options);
train_label_vector
is a m by 1 vector, each element is a double value. train_data_matrix
is a m by n matrix, each row is the data of one video. It is similar for predicting function.
In order to construct input data in right way, I write several wrapper functions for training and testing, which is more convenient to running:
% create_svm_input_data.m
function [data_matrix] = create_svm_input_data(output_list_train)
% read the c3d feature(fc6) for each video, construct libsvm format data.
dim_feat = 4096;
dir_list = importdata(output_list_train);
num_train_video = size(dir_list, 1);
data_matrix = zeros(num_train_video, dim_feat);
for i = 1 : num_train_video
feat_path = strcat(char(dir_list(i)), '/c3d.fc6');
fid = fopen(feat_path, 'r');
data = fread(fid, 'double');
fclose(fid);
normed_data = data / norm(data);
data_matrix(i, :) = normed_data;
end
end
The output_list_train
is the file contains relative path to each video directory. like:
../output/c3d/YoYo/v_YoYo_g07_c02
And then there is the file to train svm:
%% train_ucf101.m
function [model] = train_ucf101(label_file_path, data_file_path, varargin)
label_int = load(label_file_path);
label_double = double(label_int);
data = create_svm_input_data(data_file_path);
model = libsvmtrain(label_double, data, varargin{:});
end
The label_file_path
is the complete path to the file contains all training labels, including the file name, for example, label_file_path
can be:
/data/foo/training_label.txt
And each line in training_label.txt
contains only one label, for example:
# training_label.txt
0
0
0
0
1
1
1
1
There is the matlab file to test svm:
%% test_ucf101.m
function [label, accuracy, predict_prob] = test_ucf101(test_label_path, test_data_path, model, varargin)
label_int = load(test_label_path);
label_double = double(label_int);
label_size = size(label_double)
data = create_svm_input_data(test_data_path);
data_size = size(data)
[label, accuracy, predict_prob] = libsvmpredict(label_double, data, model, varargin{:});
end
Note we can use varagin
to pass parameters from a wrapper function to an internal function.