"MOVE TOWARDS THE BIG BLACK PIANO": HOW FINE-GRAINED FEATURES AFFECT THE GOAL OF NAVIGATION Improving salient landmark features in an end-to-end system
No Thumbnail Available
Date
2024-10-30
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Navigational instructions like ”Move towards the big black piano” or ”Head past the green armchair” are
intuitive for humans, as they rely on salient landmarks to guide movement through space. This thesis explores
how fine-grained features, such as spatial location, shape, and color, influence the salience of landmarks in
navigation systems. Through linguistic analysis of textual descriptions and object recognition using Faster
R-CNN implemented with a bottom-up attention mechanism, we captured key attributes that enhance the
clarity of instructions.
Our experiments were conducted using the Room-to-Room dataset (Anderson et al. (2018)), which provides
human instructions for indoor navigation, and the Matterport3D environment (Chang et al. (2017)), offering
egocentric visual data. By clustering nouns and attributes based on frequency and semantic similarity, we
identified important objects and attributes that guide users efficiently.
By examining object distribution in skyboxes and mapping instructions to visual scenes, we evaluated
whether accessing multiple skybox views (top, back, left, front, right, and bottom) instead of a single, centered
view provides additional contextual value in goal-oriented navigation systems.
Finally, we extended previous research by applying a bi-directional boost attention mask over salient landmarks
within Anderson et al. (2018)’s Seq2Seq LSTM model, where our experiments demonstrated significant
improvements. Notably, the dynamic weights in the attention class achieved 37.65% and 22.22%
success rates on seen and unseen data, outperforming the baseline. Therefore, by using linguistic salience
to guide visual attention, we improve the navigation task and demonstrate how language refines the model’s
focus.
Future work should continue refining the attention mechanism and explore further strategies, such as integrating
additional views, to provide even richer contextual information and further boost navigation accuracy.
Description
Keywords
salience, clustering, machine learning