Place Recognition by Per-Location Classifiers
Type of document
disertační práceAuthor
Gronát, Petr
Supervisor
Pajdla, Tomáš
Šivic, Josef
Field of study
Umělá inteligence a biokybernetikaStudy program
Elektrotechnika a informatikaInstitutions assigning rank
České vysoké učení technické v Praze. Fakulta elektrotechnická. Katedra kybernetikyMetadata
Show full item recordAbstract
Place recognition is formulated as a task of finding the location where the query image
was captured. This is an important task that has many practical applications in robotics,
autonomous driving, augmented reality, 3D reconstruction or systems that organize imagery
in geographically structured manner. Place recognition is typically done by finding
a reference image in a large structured geo-referenced database.
In this work, we first address the problem of building a geo-referenced dataset for
place recognition. We describe a framework for building the dataset from the street-side
imagery of the Google Street View that provides panoramic views from positions along
many streets, cities and rural areas worldwide. Besides of downloading the panoramic
views and ability to transform them into a set of perspective images, the framework is
capable of getting underlying scene depth information.
Second, we aim at localizing a query photograph by finding other images depicting
the same place in a large geotagged image database. This is a challenging task due
to changes in viewpoint, imaging conditions and the large size of the image database.
The contribution of this work is two-fold; (i) we cast the place recognition problem as a
classification task and use the available geotags to train a classifier for each location in the
database in a similar manner to per-exemplar SVMs in object recognition, and (ii) as only
a few positive training examples are available for each location, we propose two methods
to calibrate all the per-location SVM classifiers without the need for additional positive
training data. The first method relies on p-values from statistical hypothesis testing and
uses only the available negative training data. The second method performs an affine
calibration by appropriately normalizing the learned classifier hyperplane and does not
need any additional labeled training data. We test the proposed place recognition method
with the bag-of-visual-words and Fisher vector image representations suitable for large
scale indexing.
Experiments are performed on three datasets: 25,000 and 55,000 geotagged street
view images of Pittsburgh, and the 24/7 Tokyo benchmark containing 76,000 images with
varying illumination conditions. The results show improved place recognition accuracy of
the learned image representation over direct matching of raw image descriptors.
Collections
- Disertační práce - 13000 [712]
The following license files are associated with this item: