6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference

Mai Bui, Tolga Birdal, Haowen Deng, Shadi Albarqouni, Leonidas Guibas , Slobodan Ilic & Nassir Navab

Stanford University & Technical University of Munich & Siemens AG

Multimodal 6D Camera Pose Predictions In a highly ambiguous environment, similar looking views can easily confuse current camera pose regression models and lead to incorrect localization results. Instead, given a query RGB image, our aim is to predict the possible modes as well as the associated uncertainties, which we model by the parameters of Bingham and Gaussian mixture models.


We present a multimodal camera relocalization framework that captures ambiguities and uncertainties with continuous mixture models defined on the manifold of camera poses. In highly ambiguous environments, which can easily arise due to symmetries and repetitive structures in the scene, computing one plausible solution (what most state-of-the-art methods currently regress) may not be sufficient. Instead we predict multiple camera pose hypotheses as well as the respective uncertainty for each prediction. Towards this aim, we use Bingham distributions, to model the orientation of the camera pose, and a multivariate Gaussian to model the position, with an end-to-end deep neural network. By incorporating a Winner-Takes-All training scheme, we finally obtain a mixture model that is well suited for explaining ambiguities in the scene, yet does not suffer from mode collapse, a common problem with mixture density networks. We introduce a new dataset specifically designed to foster camera localization research in ambiguous environments and exhaustively evaluate our method on synthetic as well as real data on both ambiguous scenes and on non-ambiguous benchmark datasets.


We created a synthetic dataset, that is specifically designed to contain repetitive structures and introduce highly ambiguous viewpoints. Qualitative results on synthetic scenes

In addition, we create highly ambiguous real scenes using Google Tango and a graph-based SLAM approach. We acquire RGB images as well as distinct ground truth camera trajectories for training and testing. Qualitative results on real scenes In comparison to current state-of-the-art methods, the proposed model is able to capture plausible, but diverse modes as well as associated uncertainties for each pose hypothesis.


More information and details can be found in our paper.

The link for downloading our ambiguous relocalization dataset is available here.


The implementation of our work can be found here.


Video - 6D Continuous Multimodal Inference


  title={6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference},
  author={Bui, Mai and Birdal, Tolga and Deng, Haowen and Albarqouni, Shadi and Guibas, Leonidas and Ilic, Slobodan and Navab, Nassir},
  journal={European Conference on Computer Vision (ECCV)},


This joint effort is supported by BaCaTec, the Bavaria California Technology Center.

Interested in Collaborating with Us?

We would like this project to evolve towards a repository of methods for handling challenging multimodal problems of 3D computer vision. Therefore, we look for contributors and collaborators with great coding and mathematics skills as well as good knowledge in 3D vision, machine (deep) learning.