This implements "Time Contrastive Networks", which is part of the larger Self-Supervised Imitation Learning project.Contacts
Maintainers of TCN:
pip install tf-nightly-gpu.
Run the script that downloads the pretrained InceptionV3 checkpoint:
cd tensorflow-models/tcn python download_pretrained.py
bazel test :all
We provide utilities to collect your own multi-view videos in dataset/webcam.py. See the webcam tutorial for an end to end example of how to collect multi-view webcam data and convert it to the TFRecord format expected by this library.
We use the tf.data.Dataset API to construct input pipelines that feed training, evaluation, and visualization. These pipelines are defined in
We define training, evaluation, and inference behavior using the tf.estimator.Estimator API. See
estimators/mvtcn_estimator.py for an example of how multi-view TCN training, evaluation, and inference is implemented.
Different embedder architectures are implemented in model.py. We used the
InceptionConvSSFCEmbedder in the pouring experiments, but we're also evaluating
We use the tf.contrib.losses.metric_learning library's implementations of triplet loss with semi-hard negative mining and npairs loss. In our experiments, npairs loss has better empirical convergence and produces the best qualitative visualizations, and will likely be our choice for future experiments. See the paper for details on the algorithm.
We support 3 modes of inference for trained TCN models:
labeled_eval.pyfor a usage example.
generate_videos.pyfor a usage example.
estimators/base_estimator.py for details.
Data pipelines, training, eval, and visualization are all configured using key-value parameters passed as YAML files. Configurations can be nested, e.g.:
learning: optimizer: 'adam' learning_rate: 0.001
YAML configs are converted to LuaTable-like
T object (see
utils/luatables.py), which behave like a python
dict, but allow you to use dot notation to access (nested) keys. For example we could access the learning rate in the above config snippet via
Multiple configs can be passed to the various binaries as a comma separated list of config paths via the
--config_paths flag. This allows us to specify a default config that applies to all experiments (e.g. how often to write checkpoints, default embedder hyperparams) and one config per experiment holding the just hyperparams specific to the experiment (path to data, etc.).
configs/tcn_default.yml for an example of our default config and
configs/pouring.yml for an example of how we define the pouring experiments.
Configs are applied left to right. For example, consider two config files:
learning: learning_rate: 0.001 # Default learning rate. optimizer: 'adam'
learning: learning_rate: 1.0 # Experiment learning rate (overwrites default).data: training: '/path/to/myexperiment/training.tfrecord'
bazel run train.py --config_paths='default.yml,myexperiment.yml'
results in a final merged config called final_training_config.yml
learning: optimizer: 'adam' learning_rate: 1.0data: training: '/path/to/myexperiment/training.tfrecord'
which is created automatically and stored in the experiment log directory alongside model checkpoints and tensorboard summaries. This gives us a record of the exact configs that went into each trial.
We usually look at two validation metrics during training: knn classification error and multi-view alignment.
In cases where we have labeled validation data, we can compute the average cross-sequence KNN classification error (1.0 - recall@k=1) over all embedded labeled images in the validation set. See
In cases where there is no labeled validation data, we can look at the how well our model aligns multiple views of same embedded validation sequences. That is, for each embedded validation sequence, for all cross-view pairs, we compute the scaled absolute distance between ground truth time indices and knn time indices. See
We visualize the embedding space learned by our models in two ways: nearest neighbor imitation videos and PCA/T-SNE.
One of the easiest way to evaluate the understanding of your model is to see how well the model can semantically align two videos via nearest neighbors in embedding space.
Consider the case where we have multiple validation demo videos of a human or robot performing the same task. For example, in the pouring experiments, we collected many different multiview validation videos of a person pouring the contents of one container into another, then setting the container down. If we'd like to see how well our embeddings generalize across viewpoint, object/agent appearance, and background, we can construct what we call "Nearest Neighbor Imitation" videos, by embedding some validation query sequence
i from view 1, and finding the nearest neighbor for each query frame in some embedded target sequence
j filmed from view 1. Here's an example of the final product.
generate_videos.py for details.
We can also embed a set of images taken randomly from validation videos and visualize the embedding space using PCA projection and T-SNE in the tensorboard projector. See
visualize_embeddings.py for details.
Here we give an end-to-end example of how to collect your own multiview webcam videos and convert them to the TFRecord format expected by training.
Note: This was tested with up to 8 concurrent Logitech c930e webcams extended with Plugable 5 Meter (16 Foot) USB 2.0 Active Repeater Extension Cables.
Go to dataset/webcam.py
--seqnameflag isn't set, the script will name the first sequence '0', the second sequence '1', and so on (meaning you can just keep rerunning step 3.). When you are finished, you should see an output viddir with the following structure: videos/0_view0.mov videos/0_view1.mov ... videos/0_viewM.mov videos/1_viewM.mov ... videos/N_viewM.movfor N sequences and M webcam views.
dataset/videos_to_tfrecords.py to convert the directory of videos into a directory of TFRecords files, one per multi-view sequence.
viddir=/tmp/tcn/videos dataset=tutorial mode=train videos=$viddir/$datasetbazel build -c opt videos_to_tfrecords && \ bazel-bin/videos_to_tfrecords --logtostderr \ --input_dir $videos/$mode \ --output_dir ~/tcn_data/$dataset/$mode \ --max_per_shard 400
--max_per_shard > 0 allows you to shard training data. We've observed that sharding long training sequences provides better performance in terms of global steps/sec.
This should be left at the default of 0 for validation / test data.
You should now have a directory of TFRecords files with the following structure:
output_dir/0.tfrecord ... output_dir/N.tfrecord1 TFRecord file for each of N multi-view sequences.
Now we're ready to move on to part II: training, evaluation, and visualization.
Here we give an end-to-end example of how to train, evaluate, and visualize the embedding space learned by TCN models.
We will be using the 'Multiview Pouring' dataset, which can be downloaded using the download.sh script here.
The rest of the tutorial will assume that you have your data downloaded to a folder at
mkdir ~/tcn_data mv ~/Downloads/download.sh ~/tcn_data ./download.sh
You should now have the following path containing all the data:
ls ~/tcn_data/multiview-pouring labels README.txt tfrecords videos
If you haven't already, run the script that downloads the pretrained InceptionV3 checkpoint:
For our experiment, we create 2 configs:
configs/tcn_default.yml: This contains all the default hyperparameters that generally don't vary across experiments.
configs/pouring.yml: This contains all the hyperparameters that are specific to the pouring experiment.
Important note about
Run the training binary:
logdir=/tmp/tcn/pouringc=configsconfigs=$c/tcn_default.yml,$c/pouring.ymlbazel build -c opt --copt=-mavx --config=cuda train && \bazel-bin/train \--config_paths $configs --logdir $logdir
Run the binary that computes running validation loss. Set
export CUDA_VISIBLE_DEVICES= to run on CPU.
bazel build -c opt --copt=-mavx eval && \ bazel-bin/eval \ --config_paths $configs --logdir $logdir
Run the binary that computes running validation cross-view sequence alignment. Set
export CUDA_VISIBLE_DEVICES= to run on CPU.
bazel build -c opt --copt=-mavx alignment && \ bazel-bin/alignment \ --config_paths $configs --checkpointdir $logdir --outdir $logdir
Run the binary that computes running labeled KNN validation error. Set
export CUDA_VISIBLE_DEVICES= to run on CPU.
bazel build -c opt --copt=-mavx labeled_eval && \ bazel-bin/labeled_eval \ --config_paths $configs --checkpointdir $logdir --outdir $logdir
tensorboard --logdir=$logdir. After a bit of training, you should see curves that look like this:
To visualize the embedding space learned by a model, we can:
# Use the automatically generated final config file as config.configs=$logdir/final_training_config.yml# Visualize checkpoint 40001.checkpoint_iter=40001# Use validation records for visualization.records=~/tcn_data/multiview-pouring/tfrecords/val# Write videos to this location.outdir=$logdir/tcn_viz/imitation_vids
bazel build -c opt --config=cuda --copt=-mavx generate_videos && \ bazel-bin/generate_videos \ --config_paths $configs \ --checkpointdir $logdir \ --checkpoint_iter $checkpoint_iter \ --query_records_dir $records \ --target_records_dir $records \ --outdir $outdir
After the script completes, you should see a directory of videos with names like:
that look like this:
Run the binary that generates embeddings and metadata.
outdir=$logdir/tcn_viz/embedding_viz bazel build -c opt --config=cuda --copt=-mavx visualize_embeddings && \ bazel-bin/visualize_embeddings \ --config_paths $configs \ --checkpointdir $logdir \ --checkpoint_iter $checkpoint_iter \ --embedding_records $records \ --outdir $outdir \ --num_embed 1000 \ --sprite_dim 64
Run tensorboard, pointed at the embedding viz output directory.
You should see something like this in tensorboard.
原文发布于微信公众号 - CreateAMind（createamind）