Le Glouanec, Bérénice2024-10-302024-10-302024-10-30https://hdl.handle.net/2077/83907Navigational instructions like ”Move towards the big black piano” or ”Head past the green armchair” are intuitive for humans, as they rely on salient landmarks to guide movement through space. This thesis explores how fine-grained features, such as spatial location, shape, and color, influence the salience of landmarks in navigation systems. Through linguistic analysis of textual descriptions and object recognition using Faster R-CNN implemented with a bottom-up attention mechanism, we captured key attributes that enhance the clarity of instructions. Our experiments were conducted using the Room-to-Room dataset (Anderson et al. (2018)), which provides human instructions for indoor navigation, and the Matterport3D environment (Chang et al. (2017)), offering egocentric visual data. By clustering nouns and attributes based on frequency and semantic similarity, we identified important objects and attributes that guide users efficiently. By examining object distribution in skyboxes and mapping instructions to visual scenes, we evaluated whether accessing multiple skybox views (top, back, left, front, right, and bottom) instead of a single, centered view provides additional contextual value in goal-oriented navigation systems. Finally, we extended previous research by applying a bi-directional boost attention mask over salient landmarks within Anderson et al. (2018)’s Seq2Seq LSTM model, where our experiments demonstrated significant improvements. Notably, the dynamic weights in the attention class achieved 37.65% and 22.22% success rates on seen and unseen data, outperforming the baseline. Therefore, by using linguistic salience to guide visual attention, we improve the navigation task and demonstrate how language refines the model’s focus. Future work should continue refining the attention mechanism and explore further strategies, such as integrating additional views, to provide even richer contextual information and further boost navigation accuracy.engsalience, clustering, machine learning"MOVE TOWARDS THE BIG BLACK PIANO": HOW FINE-GRAINED FEATURES AFFECT THE GOAL OF NAVIGATION Improving salient landmark features in an end-to-end system"MOVE TOWARDS THE BIG BLACK PIANO": HOW FINE-GRAINED FEATURES AFFECT THE GOAL OF NAVIGATION Improving salient landmark features in an end-to-end systemText