an essay about the nature of symbolic vision with regards to marr and pylyshyn
 


graphic image of abstract eye for the symbolic vision page about marr and pylyshyn




back to main

back to psychology

Early Vision as a Symbolic and Computational Process, by Anthony Peter Iannini, 2000  

Essay Overview: A technical essay on the nature of early vision, one of the conceptual stages of visual perception.



Zenon Pylyshyn, in his book Computation and Cognition, objects to the computational analysis of low-level or early vision systems presented by David Marr. Pylyshyn’s argument states that if the domain and range of a function are finite or non-combinatorial, then it qualifies as an instantiated function.

The complexity of the function, says Pylyshyn, is irrelevant in the sense that it may be a complex property of the functional architecture that does not have any semantically valued or representational content. Above all, the system at hand must go through representational levels in order to be a computational, rather than a transductive, function.

If such a function is bound by its input, (the photon stimuli of the retina in the case of early vision) then, argues Pylyshyn, the function is instantiated by the functional architecture rather than a process involving representations. As such, the early vision systems "...may be merely complex neural processes that instantiate pieces of functional architecture—in this case, transducers." (Pylyshyn, 1984, p.215).

Appeal to the ability to realize such early visual function on the digital computer is taken by Pylyshyn to be irrelevant as "...the computer may simply be simulating brain processes that are not cognitive, any more than are simulations of the endocrine system or the dynamics of the musculature. The reason I feel they are probably not cognitive is that capturing regularities of these processes seems not to require appeal to semantic properties." (Pylyshyn, 1984, p.215).

Summarizing the argument, Pylyshyn’s three following criteria for a transductive function encompass his view of early visual processes: Firstly, a transducer is non-symbolic, in that the function is carried out by the functional architecture. Secondly, the transducer is stimulus-bound, and data-driven by the distal environment.

Thirdly, the transducer is a physical to symbol function that can be (a) described in the language of physics, (b) computationally available, in discrete atomic symbols, and (c) used to capture the regularities of the system by changing from physical to symbolic (Pylyshyn, 1984, pp.153-171).

The early visual system, as described by Marr, is a combinatorial process that builds upon representations in which rules of formation and rules of transformation construct a three dimensional image.

More specifically, Marr’s model of vision includes (1) the primal sketch, the initial representations of changes in illumination, intensity values, and surfaces, (2) the 2½-D sketch, involving depth and viewer orientation, and (3) the 3-D sketch, which is a volumetric, surface-shaped spatial configuration (Marr, 1982, p.37).

At the level of the primal sketch, an initial representation is built up, using the rules of formation, in the context of a computationalist, or symbolic scheme. The primal sketch must be rich in representational content, as the formation of visual information must be derived from this initial process. From the symbolic representation of the primal sketch, information can be passed to the 2½-D sketching programs, and finally to the 3-D representation of the original, distal structure.

At the heart of Marr’s computationalist theory of early vision is the notion of a combinatorial process that is able to pass representational information from one modular program to another in such a way that the information is sorted, selected, and maintains its integrity throughout.

Pylyshyn’s argument is that the model presented by Marr (and the whole of early visual processes) is a property of the functional architecture, rather than a computational function. However, there is evidence that would suggest that, at the level of the primal sketch, the retinal information must become symbolic.

In order for the visual system to select information from the distal environment, in terms of differentiating from what Marr terms intensity values, (differences in the illumination from one part of the visual field to the next) certain algorithmic functions (or some form of computational process) must be involved.

As Marr writes, ". . .an algorithm is likely to be understood more readily by understanding the nature of the problem being solved than by examining the mechanism (and the hardware) in which it is embodied." (Marr, 1982, p. 27). This notion is integral to Marr’s distinction between three distinct and important levels of explanation: (i) computational theory, (ii) algorithm, and (iii) implementation.

According to Marr, a theory of why and how a function works must be understood in terms of its purpose and its ability carry out that purpose. Both of these must be explained at first, the level of computation, and second, the level of algorithms (Marr, 1982, pp. 23-25).

The level of implementation, which is the physical realization of the computational theory and the algorithm, cannot, in Marr’s view, explain the why and how of a function. This is in direct conflict with Pylyshyn’s view that "They [functions of the mental architecture] are functions instantiated in the biological medium...they are functions that are themselves explainable biologically, rather than in terms of rules and representations." (Pylyshyn, 1980, p.126). Pylyshyn extends this view of the functional architecture to the realm of early vision. In order to see why, I will look at each of Pylyshyn’s criteria for a transducer and examine it in the light of early visual processes.

Pylyshyn’s first criteria for a transducer, that it be non-symbolic or non-representational, seems to be violated if Marr’s analyses of vision is sound. At the stage of the primal sketch, information must be dealt with in a certain way. After this initial processing of raw retinal information, the information which comes from the primal sketch and is passed to the 2½-D sketching programs must be handled in another way.

If all retinal information was handled identically, it seems implausible that we could form a three dimensional image of something in the environment. Marr’s view is that "Vision is a process that produces from images of the external world a description that is useful to the viewer and not cluttered with irrelevant information." (Marr, 1982, p. 31).

zero crossing
Figure 1 (stock photography image of woman's face)

In order to better define the concept of symbolic information, I will look at one aspect of the primal sketch, namely zero-crossings, that begins the representational process in Marr’s theory of early vision.

In Figure 1, the left image of the woman's face was input into a sub-program of a graphics editor (1) and algorithms were able to find the points at which their is a difference between the light and dark parts of the image. The greater the discontinuity of illumination in the original image, the greater the thickness of the resulting line.

What results is a representation of the original image, which now has its own symbolic structure. Both the left and right images are represented by values, within the context of a coordinate plane (2), but each has a different set of values.

The information processed in the left image results in the information in the right image, which is composed of selected information based on, among other factors, assumptions (3).

If this process of discrimination in early vision were merely a part of the functional architecture, then access to this symbolic information would remain impenetrable until the early vision process was complete and it was available to the utilization area of cognition.

The important point about this example, is that models based on the computationalist approach have more explanatory force than alternatives. And, objections to the computationalist approach, as in the case of Pylyshyn’s objection to Marr’s analysis of early vision, are not helpful in trying to explain the way in which such processes occur.

In the defense of his approach, Marr writes "The levels idea is crucial, and perception cannot be understood without it— never by thinking just about synaptic vesicles or about neurons and axons, just as flight cannot be understood by studying only feathers." (Marr, 1984, p. 336).

Pylyshyn’s second criteria for a transductive function is that it be stimulus-bound or data-driven. Within the explanatory model of early vision proposed by Marr, there is a transformation from physical to symbolic during the process. On the one hand, it seems as though the whole early visual process is dependent upon and driven by the data or stimuli presented to it. However, it also seems as though the process must be symbolic.

The discontinuity here lies in the distinction between a function that is cognitively impenetrable and one that is symbolic. Within the domain of early vision, each modular processing of information seems to require being symbolic and penetrable by other early visual modules. But, it also appears that we do not have any conscious cognitive control over our early visual systems.

My point is that cognitive impenetrability does not necessarily entail a non-symbolic or non-representational system, and therefore, I think that Pylyshyn’s criteria that a transducer be stimulus-bound has no bearing on the computationalist approach to early vision.

The third criteria for a transductive function as outlined by Pylyshyn requires that the transducer can be able to change from the physical to the symbolic in such a way that it can be described in the language of physics, and accounts for the computationally available, discrete, atomic symbols.

The semantic content of the primal sketch, it seems, can not be captured in the language of physics (which includes such notions as force, momentum, speed, energy, etc). As asserted by Pylyshyn previously, his argument rests on the notion that early vision does not seem to rely on an appeal to semantic properties.

But, if we look closer at Marr’s staged analysis of vision, it seems as though the construction of a 3-D sketch must rely on the semantic content of symbols. The alternative to this view would require a kind of "pick-up" model of visual acquisition, like the one put forth by J.J. Gibson.

However, Gibson, though on the right track, never answered the question of how humans can construct a visual representation given a constantly changing environment (Gardener, 1985, pp.308-315). Therefore, the computationalist view of early vision seems to win out in the domain of explanatory value— which is integral to any model in science.

A computationalist model of early vision makes an attempt at explaining how the functions involved, using a symbolic and representational process, can create a three dimensional visual representation of the world. Viewing the early visual processes as instantiated in the functional architecture, and therefore explainable in terms of biology, has given little explanation of how this process actually occurs.


Endnotes:
_________________

(1) The graphics editor used was Adobe Photoshop® 4.0. A filter ("find edge") was used to simulate the zero-crossing algorithm proposed by Marr in the initial visual process.

(2) Each value is referred to as a pixel, which is one point in the x-y coordinate plane of the computer screen.

(3) The existence of assumptions in the algorithmic functions of the visual system suggest that the building up of a representation can not be merely transductive. These assumptions, as evolutionary necessities in programming within certain environmental circumstances, are an integral part of the ways in which information is selected and passed in the symbolic representations through rules of formation.

 

 


All content on hiartx.com is by Anthony Peter Iannini © Copyright | All Rights Reserved. If properly attributed and referenced, all images and excerpts of written content from this site may be used for non-profit and/or educational purposes freely. Please provide a hyper link back to the website page where the images or text was found. E-mail contact regarding all uses of content on this site is appreciated. For all other uses of content on this site, please e-mail me at: apiannini@yahoo.com.