Skip to content
quantombone edited this page Jan 27, 2012 · 10 revisions

Framing exemplars is an important pre-processing step. The process of "framing" an exemplar is choosing a HOG feature size/location given an input image I and a ground-truth bounding box bb. I have successfully used two separate framing strategies, which are described below. In both cases, the full HOG feature pyramid is computed and a slice is selected from the pyramid. For each exemplar, I maintain both the slice-bb as well as the "ground-truth" gt-bb. Let's say that during detection we got a HOG template match at location detection-bb, I then estimate the transformation between slice-bb and detection-bb and apply that transformation to gt-bb to get a better localization of the bounding box.

By design, this means that running the Exemplar-SVM on the image which originated the exemplar will return a perfect detection. Since framing is done by choosing a slice from the pyramid, and detection uses the same exact pyramid, there are no border issues which can pop up if we manipulate the input image to coerce a feature descriptor at location bb.

Finds a slice from the pyramid which overlaps with GT and has GoalNCells cells. Zero cells (from boundary of hog descriptor) are simply left out. There is no need for an explicit mask, since the size of the region is not fixed.

init_params.sbin = 8;
init_params.goal_ncells = 100;
init_params.init_function = @esvm_initialize_goalsize_exemplar;

Outside-image cells: A mask is maintained which indicates which cells are within the image, and learning can only be performed over those cells. Experiments indicate that not doing this means that the exemplar will try to find matches with a zero in the same region.

init_params.sbin = 8;
init_params.hg_size = [8 8];
init_params.init_function = @esvm_initialize_fixedframe_exemplar;
Clone this wiki locally