Comparative Perspective of Visual Attention: From Human Focus to Visual Transformers—An In-Depth Review

  • Luis Guillermo Oliveros Piñero
  • , Miguel Carrasco
  • , José Aranda
  • , César González-Martín

Research output: Contribution to journalArticlepeer-review

Abstract

Although neuroscience has made considerable progress in recent decades by proposing robust models that explain the mechanisms of attention and perception in humans, emulating this capability using computational techniques remains complex. It was not until the development of models such as Visual Transformers (ViT) that it became possible to partially replicate this uniquely human trait. The main objective of this study was to explore the extent to which attention models, such as ViT, can reproduce the manner in which people distribute their visual attention when exposed to various stimuli, particularly in the context of handcrafted objects. Human fixations (i.e., attention) were recorded using an eye tracker, while the ViT model processed the same images to generate attention maps to evaluate the degree of similarity between the two patterns. For this purpose, heatmaps were constructed, and quantitative metrics were applied to assess their similarity. The results revealed areas of convergence and significant differences, highlighting the current limitations of computational models in capturing the more subtle aspects of human perception. This comparison not only helps us better understand the capabilities of ViT but also provides a foundation for reflecting on future improvements in automated attention models and their potential applications in contexts where visual interpretation is crucial.

Original languageEnglish
Pages (from-to)172230-172244
Number of pages15
JournalIEEE Access
Volume13
DOIs
StatePublished - 2025
Externally publishedYes

Keywords

  • Attention
  • comparison
  • experiments
  • eye-tracker
  • human attention
  • multihead attention
  • transformer
  • vision computer
  • vision transformers
  • visual transformers

Fingerprint

Dive into the research topics of 'Comparative Perspective of Visual Attention: From Human Focus to Visual Transformers—An In-Depth Review'. Together they form a unique fingerprint.

Cite this