Person re-identification is an emerging problem in visual surveillance, deals with maintaining entities of individuals while they traverse various locations across a camera network. From a visual perspective re-id is challenging due to significant changes in visual appearance of individuals in cameras with different pose, illumination and calibration.
Researchers approaches these difficulties with designing of distinctive view-invariant person representations, and learning of effective distance/similarity metrics. Similar to the two different focuses, we propose two algorithms: one designs a novel appearance model that takes into the visual pattern co-occurrence across different views; and the other formulates the problem in a global structured matching setting.
Person Re-identification with Visual Word Co-occurrence Model
Summary: We propose a novel visual word co-occurrence model to deal with the appearance variations across different views. We first map each pixel of an image to a visual word using a codebook, which is learned in an unsupervised manner. The appearance transformation between camera views is encoded by a co-occurrence matrix of visual word joint distributions in probe and gallery images. Our appearance model naturally accounts for spatial similarities and variations caused by pose, illumination & configuration change across camera views. Linear SVMs are then trained as classifiers using these co-occurrence descriptors.On the VIPeR and CUHK Campus benchmark datasets, our method achieves 83.86% and 85.49% at rank-15 on the Cumulative Match Characteristic (CMC) curves, and beats the state-of-the-art results by 10.44% and 22.27%.
Illustration of codeword co-occurrence in positive image pairs (i.e. two images from different camera views per column belong to a same person) and negative image pairs (i.e. two images from different camera views per column belong to different persons). For positive (or negative) pairs, in each row the enclosed regions are assigned the same codeword.
Person Re-identification via Structured Matching
Summary: From a visual perspective re-id is challenging due to significant changes in visual appearance of individuals in cameras with different pose, illumination and calibration. Globally the challenge arises from the need to maintain consistent matches among all the individual entities across different camera views. We propose PRISM, a structured matching method to jointly account for these challenges. We view the global problem as a weighted graph matching problem, and learn the edge weights (pairwise similarity scores) based on the co-occurrence of visual patters in the training examples. These co-occurrence based scores in turn account for appearance changes by inferring likely and unlikely visual co-occurrences appearing in training instances. We implement PRISM on single shot and multi-shot scenarios. PRISM uniformly outperforms state of the art by as much as 10%-30% in matching rate while being computationally efficient.
This is an overview of our method, PRISM, consisting of two levels where (a) entity-level structured matching is imposed on top of (b) image-level visual word deformable matching. In (a), each color represents an entity, and this example illustrates the general situation for real world re-id, including single-shot, multi-shot, and no matches. In (b), the idea of visual word co-occurrence for measuring image similarities is illustrated in a probabilistic way, where l 1 , l 2 denote the person entities, u 1 , u 2, v 1 , v 2 denote different visual words, and h 1 , h 2 denote two locations.