"MOVE TOWARDS THE BIG BLACK PIANO": HOW FINE-GRAINED FEATURES AFFECT THE GOAL OF NAVIGATION Improving salient landmark features in an end-to-end system

dc.contributor.authorLe Glouanec, Bérénice
dc.contributor.departmentUniversity of Gothenburg / Department of Philosophy,Lingustics and Theory of Scienceeng
dc.contributor.departmentGöteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteoriswe
dc.date.accessioned2024-10-30T13:02:52Z
dc.date.available2024-10-30T13:02:52Z
dc.date.issued2024-10-30
dc.description.abstractNavigational instructions like ”Move towards the big black piano” or ”Head past the green armchair” are intuitive for humans, as they rely on salient landmarks to guide movement through space. This thesis explores how fine-grained features, such as spatial location, shape, and color, influence the salience of landmarks in navigation systems. Through linguistic analysis of textual descriptions and object recognition using Faster R-CNN implemented with a bottom-up attention mechanism, we captured key attributes that enhance the clarity of instructions. Our experiments were conducted using the Room-to-Room dataset (Anderson et al. (2018)), which provides human instructions for indoor navigation, and the Matterport3D environment (Chang et al. (2017)), offering egocentric visual data. By clustering nouns and attributes based on frequency and semantic similarity, we identified important objects and attributes that guide users efficiently. By examining object distribution in skyboxes and mapping instructions to visual scenes, we evaluated whether accessing multiple skybox views (top, back, left, front, right, and bottom) instead of a single, centered view provides additional contextual value in goal-oriented navigation systems. Finally, we extended previous research by applying a bi-directional boost attention mask over salient landmarks within Anderson et al. (2018)’s Seq2Seq LSTM model, where our experiments demonstrated significant improvements. Notably, the dynamic weights in the attention class achieved 37.65% and 22.22% success rates on seen and unseen data, outperforming the baseline. Therefore, by using linguistic salience to guide visual attention, we improve the navigation task and demonstrate how language refines the model’s focus. Future work should continue refining the attention mechanism and explore further strategies, such as integrating additional views, to provide even richer contextual information and further boost navigation accuracy.sv
dc.identifier.urihttps://hdl.handle.net/2077/83907
dc.language.isoengsv
dc.setspec.uppsokHumanitiesTheology
dc.subjectsalience, clustering, machine learningsv
dc.title"MOVE TOWARDS THE BIG BLACK PIANO": HOW FINE-GRAINED FEATURES AFFECT THE GOAL OF NAVIGATION Improving salient landmark features in an end-to-end systemsv
dc.title.alternative"MOVE TOWARDS THE BIG BLACK PIANO": HOW FINE-GRAINED FEATURES AFFECT THE GOAL OF NAVIGATION Improving salient landmark features in an end-to-end systemsv
dc.typeText
dc.type.degreeStudent essay
dc.type.uppsokH2

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Thesis_Berenice_Le_Glouanec.pdf
Size:
3.6 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.68 KB
Format:
Item-specific license agreed upon to submission
Description: