| Early Vision as a Symbolic and
Computational Process, by Anthony Peter Iannini, 2000 |
|
|
Essay Overview: A technical essay on the nature of early vision, one
of the conceptual stages of visual perception.
Zenon Pylyshyn, in his book Computation and Cognition, objects to the
computational analysis of low-level or early vision systems presented by David Marr.
Pylyshyns argument states that if the domain and range of a function are finite or
non-combinatorial, then it qualifies as an instantiated function.
The complexity of the
function, says Pylyshyn, is irrelevant in the sense that it may be a complex property of
the functional architecture that does not have any semantically valued or representational
content. Above all, the system at hand must go through representational levels in order to
be a computational, rather than a transductive, function.
If such a function is bound by
its input, (the photon stimuli of the retina in the case of early vision) then, argues
Pylyshyn, the function is instantiated by the functional architecture rather than a
process involving representations. As such, the early vision systems "...may be
merely complex neural processes that instantiate pieces of functional architecturein
this case, transducers." (Pylyshyn, 1984, p.215).
Appeal to the ability to realize
such early visual function on the digital computer is taken by Pylyshyn to be irrelevant
as "...the computer may simply be simulating brain processes that are not cognitive,
any more than are simulations of the endocrine system or the dynamics of the musculature.
The reason I feel they are probably not cognitive is that capturing regularities of these
processes seems not to require appeal to semantic properties." (Pylyshyn, 1984,
p.215).
Summarizing the argument, Pylyshyns three following criteria for a transductive
function encompass his view of early visual processes: Firstly, a transducer is
non-symbolic, in that the function is carried out by the functional architecture.
Secondly, the transducer is stimulus-bound, and data-driven by the distal environment.
Thirdly, the transducer is a physical to symbol function that can be (a) described in the
language of physics, (b) computationally available, in discrete atomic symbols, and (c)
used to capture the regularities of the system by changing from physical to symbolic
(Pylyshyn, 1984, pp.153-171).
The early visual system, as described by Marr, is a combinatorial process that builds
upon representations in which rules of formation and rules of transformation construct a
three dimensional image.
More specifically, Marrs model of vision includes (1) the primal sketch, the initial representations of changes in illumination, intensity
values, and surfaces, (2) the 2½-D sketch, involving depth and viewer orientation,
and (3) the 3-D sketch, which is a volumetric, surface-shaped spatial configuration
(Marr, 1982, p.37).
At the level of the primal sketch, an initial representation is built
up, using the rules of formation, in the context of a computationalist, or symbolic
scheme. The primal sketch must be rich in representational content, as the formation of
visual information must be derived from this initial process. From the symbolic
representation of the primal sketch, information can be passed to the 2½-D sketching
programs, and finally to the 3-D representation of the original, distal structure.
At the
heart of Marrs computationalist theory of early vision is the notion of a
combinatorial process that is able to pass representational information from one modular
program to another in such a way that the information is sorted, selected, and maintains
its integrity throughout.
Pylyshyns argument is that the model presented by Marr (and the whole of early
visual processes) is a property of the functional architecture, rather than a
computational function. However, there is evidence that would suggest that, at the level
of the primal sketch, the retinal information must become symbolic.
In order for the
visual system to select information from the distal environment, in terms of
differentiating from what Marr terms intensity values, (differences in the
illumination from one part of the visual field to the next) certain algorithmic functions
(or some form of computational process) must be involved.
As Marr writes, ". . .an
algorithm is likely to be understood more readily by understanding the nature of the
problem being solved than by examining the mechanism (and the hardware) in which it is
embodied." (Marr, 1982, p. 27). This notion is integral to Marrs distinction
between three distinct and important levels of explanation: (i) computational theory, (ii)
algorithm, and (iii) implementation.
According to Marr, a theory of why and how a function works must be understood in terms of its purpose and its ability carry out that
purpose. Both of these must be explained at first, the level of computation, and second,
the level of algorithms (Marr, 1982, pp. 23-25).
The level of implementation, which is the physical realization of the computational theory and the algorithm, cannot, in
Marrs view, explain the why and how of a function. This is in direct
conflict with Pylyshyns view that "They [functions of the mental architecture]
are functions instantiated in the biological medium...they are functions that are
themselves explainable biologically, rather than in terms of rules and
representations." (Pylyshyn, 1980, p.126). Pylyshyn extends this view of the
functional architecture to the realm of early vision. In order to see why, I will look at
each of Pylyshyns criteria for a transducer and examine it in the light of early
visual processes.
Pylyshyns first criteria for a transducer, that it be non-symbolic or
non-representational, seems to be violated if Marrs analyses of vision is sound. At
the stage of the primal sketch, information must be dealt with in a certain way.
After this initial processing of raw retinal information, the information which comes from
the primal sketch and is passed to the 2½-D sketching programs must be handled in another
way.
If all retinal information was handled identically, it seems implausible that we
could form a three dimensional image of something in the environment. Marrs view is
that "Vision is a process that produces from images of the external world a
description that is useful to the viewer and not cluttered with irrelevant
information." (Marr, 1982, p. 31).
Figure 1 (stock photography image of woman's face) |
In order to better define the concept of symbolic information, I
will look at one aspect of the primal sketch, namely zero-crossings, that begins
the representational process in Marrs theory of early vision.
In Figure 1, the left
image of the woman's face was input into a sub-program of a graphics editor (1) and algorithms
were able to find the points at which their is a difference between the light and dark
parts of the image. The greater the discontinuity of illumination in the original image,
the greater the thickness of the resulting line.
What results is a representation of the original image, which now has its own symbolic structure. Both the left and right
images are represented by values, within the context of a coordinate plane (2), but each
has a different set of values.
The information processed in the left image results
in the information in the right image, which is composed of selected information based on,
among other factors, assumptions (3).
If this process of discrimination in early vision
were merely a part of the functional architecture, then access to this symbolic
information would remain impenetrable until the early vision process was complete and it
was available to the utilization area of cognition.
The important point about this
example, is that models based on the computationalist approach have more explanatory force
than alternatives. And, objections to the computationalist approach, as in the case of
Pylyshyns objection to Marrs analysis of early vision, are not helpful in
trying to explain the way in which such processes occur.
In the defense of his approach,
Marr writes "The levels idea is crucial, and perception cannot be understood without
it never by thinking just about synaptic vesicles or about neurons and axons, just
as flight cannot be understood by studying only feathers." (Marr, 1984, p. 336).
Pylyshyns second criteria for a transductive function is that it be
stimulus-bound or data-driven. Within the explanatory model of early vision proposed by
Marr, there is a transformation from physical to symbolic during the process. On
the one hand, it seems as though the whole early visual process is dependent upon
and driven by the data or stimuli presented to it. However, it also seems as though the
process must be symbolic.
The discontinuity here lies in the distinction between a
function that is cognitively impenetrable and one that is symbolic. Within the domain of
early vision, each modular processing of information seems to require being symbolic and
penetrable by other early visual modules. But, it also appears that we do not have any
conscious cognitive control over our early visual systems.
My point is that cognitive
impenetrability does not necessarily entail a non-symbolic or non-representational system,
and therefore, I think that Pylyshyns criteria that a transducer be stimulus-bound
has no bearing on the computationalist approach to early vision.
The third criteria for a transductive function as outlined by Pylyshyn requires that
the transducer can be able to change from the physical to the symbolic in such a way that
it can be described in the language of physics, and accounts for the computationally
available, discrete, atomic symbols.
The semantic content of the primal sketch, it seems,
can not be captured in the language of physics (which includes such notions as force,
momentum, speed, energy, etc). As asserted by Pylyshyn previously, his argument rests on
the notion that early vision does not seem to rely on an appeal to semantic properties.
But, if we look closer at Marrs staged analysis of vision, it seems as though the
construction of a 3-D sketch must rely on the semantic content of symbols.
The alternative to this view would require a kind of "pick-up" model of visual
acquisition, like the one put forth by J.J. Gibson.
However, Gibson, though on the right
track, never answered the question of how humans can construct a visual
representation given a constantly changing environment (Gardener, 1985, pp.308-315).
Therefore, the computationalist view of early vision seems to win out in the domain of
explanatory value which is integral to any model in science.
A computationalist model of early vision makes an attempt at explaining how the
functions involved, using a symbolic and representational process, can create a three
dimensional visual representation of the world. Viewing the early visual processes as
instantiated in the functional architecture, and therefore explainable in terms of
biology, has given little explanation of how this process actually occurs.
Endnotes:
_________________
(1) The graphics editor used was Adobe Photoshop® 4.0. A filter ("find edge") was used to simulate the zero-crossing algorithm proposed by Marr
in the initial visual process.
(2) Each value is referred to as a pixel, which is one point in the x-y
coordinate plane of the computer screen.
(3) The existence of assumptions in the algorithmic functions of the
visual system suggest that the building up of a representation can not be merely
transductive. These assumptions, as evolutionary necessities in programming within certain
environmental circumstances, are an integral part of the ways in which information
is selected and passed in the symbolic representations through rules of formation.