ASCI A21 Multimedia Retrieval 2011 - Image/Video Retrieval Tuesday
During the course of the lab for this day you will develop a simple object classification system in Python. We will be using Python on Tuesday and Wednesday during the ASCI course, so it is worthwhile to spend some time on it.
Introduction to Python
For those of you that have no experience with Python whatsoever, a general Python introduction can be found here. I suggest you quickly go over it to learn the basics before you go any further. At the start of the lab, I will also present this tutorial. A second, more extensive introduction which includes a comparison to Matlab can be found here. The second one is a bit terse on the basic Python constructs.
Installing Python (Windows)
If you then browse to
C:\Temp\PortablePython in Windows Explorer, there will be a
Python-Portable.exe, which starts a simple interactive Python prompt. There is also
PyScripter-Portable.exe, which provides a GUI and text editor from which you can write, run and debug scripts. Inside the GUI, a Python interpreter will also run. It is recommended to use the GUI to make it easy to save and run code snippets.
Bag-of-words step 1 and 2: Extracting featuresGo to ColorDescriptors.com (or just directly on my website to Color Descriptors) and download ColorDescriptor software. This software can be used to extract visual features from an image. To extract features from a video frame, extract the frame from the video using other software and store it as an image file.
To extract SIFT descriptors at Harris-Lapace keypoints in an image (e.g. the first two steps in a bag-of-words pipeline: point sampling and descriptor extraction), we would need to run
colordescriptor.exe image.jpg --detector harrislaplace --descriptor sift
If you need an image, you can get one here. From the ColorDescriptor download, you will need the
colorDescriptor.exe from the
i386-win-vc folder, as we are on 32-bit Windows.
However, this command does not store any of the descriptors on disk, so additionally use the
--output option with a filename to store the descriptors. If you open this file with a text editor, you will see many points and descriptors in a text format. To read these output files in Python, a DescriptorIO.py module is provided. If you download this file and put it in the same folder as your other Python code, you can do:
import DescriptorIO print DescriptorIO.readDescriptors("myfilename.txt")
You can also copy and paste all the code to a script in PyScripter, and run it with the filename of the file to read as a command-line argument. To give command-line parameters to a script, check out the Run menu and then Command-line Parameters...
Starting another program from Python
As was shown in the tutorial as well, we can run other programs from Python. Below a few examples are given (note that these only work in the
Python-Portable.exe environment, from the GUI a shell window will open and close too quickly to see the output.
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.system("echo hi") hi 0 >>>
You can use your new-found knowledge of invoking other software to run the ColorDescriptor software from Python to extract descriptors given an image filename, and after that read the descriptors from disk.
NumPy: arrays ("matrices and vectors") in PythonBesides this tutorial, there is an additional quickstart.
Bag-of-words step 3: Vector quantization
As part of the bag-of-words model, each descriptor in an image is quantized against a codebook of prototypical descriptors. We have prepared a codebook for you here, which is in a format that can be read by DescriptorIO.
Task: implement a function which finds, for each descriptor in an image, the closest codebook element. Base this on the Euclidean distance (this is a module with an euclidean distance function).
Task: count how many times a codebook element is closest for a descriptor, and create a frequency histogram out of this. This is a fixed-length feature vector that represents the image
Alternative: instead of implementing vector quantization yourself, it is also possible to use the
--codebook codebook.bin option of the ColorDescriptor software. It will then write a feature vector to the output file instead of descriptors.
Define an object classification/discrimination task
In general, an object classification system would be trained in an one-vs-all fashion: images of the target object are marked as positive, all other images are marked as negative. For this lab, we will work on a simpler task: discriminating between two object categories. In voc2007part1.zip you will find 250 images of 20 different object categories. In voc2007annotations.zip you will find textfiles with labels for each category. 1 = positive, -1 = negative and 0 = difficult (but would be positive). You can safely ignore (exclude) difficult images for this task. Select two object categories you would like to discriminate, and select 20 images from each as training material. Also create an holdout set with 20 of both categories. If there are not enough positive images for your objects in the first ZIP file, you can download another 750 in voc2007part2.zip.
Side-task (you can also do it by hand): can you automatically create your train and holdout set in Python, given the annotation text files?
Task: given a textfile with a list of image names, extract features and perform vector quantization to get a feature vector for each image.
Train an object classifier
Given the feature vectors and image labels for our train set, we can now train a classifier to discriminate between the two. I suggest to use SciKit Learn or LibSVM to train a model. For LibSVM, you will need to download the LibSVM package, and put
svm.py in the same folder as your scripts.
See this page on how to fill in X (training feature vectors), Y (the labels) and Z (holdout feature vectors)
from sklearn import svm cls = svm.SVC(probability=False) cls.fit(X, Y) print cls.predict(Z)
Task: train an SVM model on the train set
Task: apply the model to your holdout set
Task: by default, SVM software gives binary labels as output. Compute accuracy from this output
Task: get the SVM to output probabilities (or a better name, likelihoods), and use these to rank the output images. Compute average precision over this ranking (or precision@5, @10 and recall@5, @10)
Task: visualize your ranking: what goes wrong, what goes right? Do think the model is trained well enough? Does it work better with less/more examples?
Difficulty of object discrimination
Task: investigate additional pairs of object categories, and how well they can be discriminated. For this to work, you will need to construct a (new) holdout set which is common to all pairs.
Task: does training a one-vs-all classifier (e.g. including negatives for many object categories) improve precision on the new holdout set? And on the old holdout set with just two categories? Does adding negatives of other categories help there?
Option: Add a spatial pyramid
For this, look into the
--pointSelector option of the ColorDescriptor software, together with
--codebook. The latter option uses the vector quantization built into the ColorDescriptor software. What is the speed difference between your implementation and that of the software?
Option: Create your own codebook
exampleConstructCodebook.py script included with the ColorDescriptor software, and try creating your own codebook tailored to your classification task.