"In
shorter time,
more will be known about the most remote objects, namely the stars,
than about the most
nearby topic, namely perception" --
Aristotle
(384--322 BC)
The general research field
Human vision research aims at understanding the neuro-cognitive process
that takes the light in our eyes as input and that enables us to
perceive scenes as structured wholes consisting of objects arranged in
space. This perceptual organization process is believed to be one of
the autonomous brain processes that underly consciousness and, thereby,
virtually every impression we experience and virtually every action we
undertake. In everyday life, we take this process for
granted. We take vision as a reliable source of information
about the world, even though vision is not 100% veridical
(i.e.,
truthful). For instance, eye-witnesses often give different, if not
contradictory, accounts of the same event. Furthermore,
the next figure gives one of the many visual illusions showing that
"what we see" is not always "what we look at". See also
Kaleidoscope
for a class of arresting motion and velocity illusions.
In fact, it is amazing that vision is usually sufficiently
veridical to guide action. Aristotle already
realized that our eyes are not windows through which we see objects as
they are, but that, inversely, we take objects to be as we see them.
The point is that vision does not start with objects but with millions
of tiny light receptors in
the retina of each eye. At best, these light receptors can be said to
provide a two-dimensional image of colored patches. Yet, after
"automagic" processing in our brain, we experience a world of
three-dimensional objects. This automagic process, from images to
objects, is the subject of vision research. The next figure illustrates
the difference between vision as a tool and vision as a topic.
To study this automagic process of vision, the next three methodological distinctions (without clear-cut borders) are useful
to specify the position of scientific findings concerning subquestions in the total field of
vision research:
(1) The total field of vision
research may be divided into three subfields (see the right-hand picture above):
- low-level vision --- which concerns the extraction of image
properties from the retinal image.
- middle-level vision --- which concerns the integration of
image
properties into perceptual organizations.
- high-level vision --- which concerns the everyday
functionality
of perceptual organizations.
(2) Observations and phenomena may be analyzed at three levels of description:
- the computational level --- which focuses on the nature of the mental
representations that result from cognitive processes.
- the algorithmic level --- which focuses on the processing
mechanisms
of these cognitive processes.
- the implementational level --- which focuses on the neural realization of representations and cognitive processes.
(3) Theories and models may be enhanced, revised, or rejected via three cycles of research:
- the theoretical cycle --- which aims to assess the conceptual plausibility of ideas and assumptions.
- the empirical cycle --- which aims to test ideas and assumptions by way of controled experiments.
- the tractability cycle --- which aims to assess if ideas and assumptions allow for feasible implementations.
For more details on methodological principles guiding my research, see
Marr's levels,
Research cycles, and
Metaphors of cognition
My specific research field
Passing all three research cycles, my research focuses on middle-level vision (with low-level and
high-level offshoots) and on the computational and algorithmic
levels of description (with implementational offshoots).
That is, the core of my research concerns ideas about representational and
processing aspects of the integration of image properties into
perceptual
organizations.
The integration
of image properties into perceptual organizations is an intriguing
problem. The visual system performs this integration very rapidly and
fairly veridically, even though it is faced with a fundamental
ambiguity. That is, in everyday situations, the retinal image may form
a rich source of information but it is nevertheless just a 2-D
projection of a 3-D scene. This implies that the retinal image
underdetermines the 3-D scene, yielding a fundamental ambiguity that is illustrated in the next
figure.
It is pretty clear that the top-left pattern is interpreted as a 3-D
cube, and the top-right pattern as a 2-D "pie". But, then, the question
is why. For instance, why is the top-left pattern not interpreted as a
2-D "pie", and why is the top-right pattern not interpreted as a 3-D
cube? Both alternative interpretations are possible but are apparently
not selected by the visual system.
An open question is whether the visual system indeed considers all possible interpretations
of a stimulus to select a specific interpretation. Yet, in my research, vision is modeled as if it performs such a
selection, to gain
more insight in how the visual system solves the ambiguity problem.
Then,
pressing questions are, for instance:
- What is the selection criterion?
- How can the selection be performed so rapidly?
- Why does the selection criterion yield such veridical
outcomes?
Such are the questions my research focuses on, by means of mathematical
formalizations, computer implementations, cognitive models, and
psychophysical experiments.
For more details on my specific research
topics, see
Research