Reasoning about Space and Images

Much work in computer vision in the 70’s and 80’s aimed at the development of high-level vision, whereby the numerical processes feed a symbolic level of knowledge with which an agent is capable of interpreting the world. These early attempts were frustrated by the non-existence at the time of efficient algorithms for dealing with uncertainty, of tractable knowledge representation formalisms and also by the rudimentary stage of image-processing algorithms. Since then, important advances in Artificial Intelligence (AI) suggest that we may be at the stage of bridging the gap between AI and Computer Vision. One possible way of bridging this gap is the development of Qualitative Spatial Reasoning (QSR) methods based on sensor data. This idea is intrinsically connected to the tradition on logic-based image interpretation. This paper presents a brief introduction to these fields and discusses a possible research agenda for the future.


Introduction
Much work in computer vision in the 70's and 80's aimed at the development of high-level vision, whereby the numerical processes feed a symbolic level of knowledge with which an agent is capable of interpreting the world. These early attempts were frustrated by the non-existence at the time of efficient algorithms for dealing with uncertainty, of tractable knowledge representation formalisms and also by the rudimentary stage of image-processing algorithms. Since then, important advances in Artificial Intelligence (AI) suggest that we may be at the stage of bridging the gap between AI and Computer Vision. One possible way of bridging this gap is the development of Qualitative Spatial Reasoning (QSR) methods based on sensor data. This idea is intrinsically connected to the tradition on logic-based image interpretation. This paper presents a brief introduction to these fields and discusses a possible research agenda for the future.

Qualitative Spatial Reasoning (QSR)
The goal of Qualitative Spatial Reasoning (QSR) in Artificial Intelligence is to formalize spatial knowledge using elementary entities such as spatial regions, directions, line segments among others [1]. Traditionally, however, QSR formalisms are independent from an observer's viewpoint, which makes them not applicable to computer vision or robotic problems. There is, however, a growing interest in the development of dynamic formalisms about space in which qualitative changes observed by a mobile robot are the building blocks of the system [2][3][4].
The development of viewpoint-based QSR formalisms not only has a theoretical interest, but it is also an essential step towards equipping robots with the capability of interpreting its sensor data using high-level knowledge, since the notion of space is ubiquitous in our knowledge of the external world. In dos Santos et al. [5] the formalism developed in Souchanski [4] was applied to the task of image understanding from the point of view of the robot, where the notions of approximation and separation of visual objects were used to describe the scenes observed. Following a similar line, the perception of cast shadows, to make sense of a robot's environment, was recently formalized into a spatial reasoning system [6].
This sub-field of QSR, whose aim is the investigation of viewpoint dependent Qualitative Spatial Reasoning, is related to the long tradition of logic-based image interpretation, as we shall see in the next section.

Logic-based Scene Interpretation
The modern research on scene interpretation is largely concerned with the development of probabilistic methods motivated by the need to deal with sensor noise and image uncertainty [7]. Probabilistic methods, however, are propositional, imposing restrictions in their capability to represent general domain knowledge and their applicability on problems containing a possibly unbounded number of objects Tran and Davis, 2008. Logic-based image interpretation, on the other hand, tackles exactly the problem of the effective representation of general facts about the domain, as well as the generalization of these facts to problems with infinite variables [8]. Thus, research on logicbased image interpretation does not preclude the use of probabilistic methods, but complements them by making explicit the knowledge content of a domain.
The first framework for a logic-based scene interpretation system was proposed in Reiter and Mackworth [9] where three sets of axioms were defined to constrain the number of possible interpretations of the scenes observed. Therefore, the scene interpretation process is reduced to a constraint satisfaction problem. The SIGMA system [10] successfully deploys the ideas proposed by Reiter and Mackworth on the field of aerial image interpretation. Some properties of the formalism used by the SIGMA system were further developed in Schroeder and Neumann [11] and recently revisited and incorporated into a description logic setting [12].
In this context, there is a branch of research that follows the work presented in Shanahan [13] where a logical formalism is developed to rigorously define the information obtained from a robot's sensors in terms of symbols hypothesizing the existence, location and shape of the observed objects. Following these ideas Santos and Shanahan [14], Santos [3], dos Santos et al. [5] presented a theory aiming at the automatic scene understanding from a robot's viewpoint. In particular, Santos [3] presents formalism capable to interpret events such as approaching, receding, or coalescing from pairs of subsequent images obtained by a mobile robot's stereo pair. In order to further interpret these image-related events, an abductive procedure was developed for hypothesizing on the possible changes that might have occurred with the domain objects that could explain the image events.
In another work dos Santos et al. [5], a framework capable of interpreting events based on an arbitrary long image sequence was proposed. In this case, events such as the rotation of an object around a reference point; one object following another; and, one object trespassing another were formally defined within a dynamic logic and further used in a scene interpretation procedure involving perceptually indistinguishable objects. This system, however, could not handle uncertainty in the scenes. In Fenelon et al. [15], Santos et al. [16] a spatial reasoning system was developed within probabilistic logics that was applied on the task of interpreting images from traffic scenes from the viewpoint of a camera at the driver's position [17]. This work, along with others [18,19], constitute the initial attempts at the development of computer vision methods that are both: robust to uncertainties, and capable of executing automated high-level reasoning. However, much work remains to be done on this front.

Conclusion
Spatial reasoning is present in almost all human interactions in the real world, from tying a shoe-lace to executing complicated navigation tasks, or the simple interpretation of visual scenes. However, current computer vision systems have largely overlooked recent developments in the field of Qualitative Spatial Reasoning (QSR) in Artificial Intelligence. In order to foster across-fertilization between QSR and Computer Vision, this article briefly presents the main QSR formalisms for qualitative spatial reasoning whose aim is the automatic inter-pretation of scenes using high-level concepts. We set as a challenge the further development of QSR methods, grounded on computer vision data (and robust to uncertainty), that are capable of making sense of images using our knowledge of the external world 1 .