After 7 years of writting, the Book
is finished. Check it out !
In this paper we analyze in some detail the geometry of a pair of cameras,
i.e. a stereo rig. Contrarily to what has been done in the past and is still
done currently, for example in stereo or motion analysis, we do not
assume that the intrinsic parameters of the cameras are known (coordinates of
the principal points, pixels aspect ratio and focal lengths). This is important
for two reasons. First, it is more
realistic in applications where these parameters may vary according to the
task (active vision). Second, the general case considered here, captures all
the relevant information that is necessary for establishing correspondences
between two pairs of images. This information is fundamentally
projective and is hidden in a confusing manner in the commonly used formalism
of the Essential matrix introduced by Longuet-Higgins.
This paper clarifies the projective nature of the
correspondence problem in stereo and shows that the epipolar geometry can
be summarized in one 3x3 matrix of rank 2 which we propose to call
the Fundamental matrix.
After this theoretical analysis, we embark on the task of estimating
the Fundamental matrix from point correspondences, a task which is of practical
importance. We analyze theoretically,
and compare experimentally using synthetic and real data,
several methods of estimation. The problem of the stability of the
estimation is studied from two complementary viewpoints. First we show that
there is an interesting relationship between the Fundamental matrix and
three-dimensional planes which induce homographies between the images and
create unstabilities in the estimation procedures. Second, we point to a deep
relation between the unstability of the estimation procedure and the presence
in the scene of so-called critical surfaces which have been studied in
the context of motion analysis. Finally we conclude by stressing the
fact that we believe that the Fundamental matrix will play a crucial role in
future applications of three-dimensional Computer Vision by greatly increasing
its versatility, robustness and hence applicability to real difficult problems.
ERRATUM:
In equation (19) the error function is
(k0/k0' - k1/k1')^2 + (k1/k1' - k2/k2')^2 + (k0/k0' - k2/k2')^2
We address the problem of estimating three-dimensional motion, and structure
from motion with an
uncalibrated moving camera. We show that point correspondences between
three images, and the Fundamental matrices computed from these point
correspondences, are sufficient to recover the internal orientation of the
camera (its calibration), the motion parameters, and to compute coherent
perspective projection matrices which enable us to reconstruct 3-D structure up to a
similarity. In contrast with other methods, no calibration object with a known 3-D
shape is needed, and no limitations are put upon the unknown motions to be
performed or the parameters to be recovered, as long as they define a
projective camera.
The theory of the method, which is based on the constraint that the observed
points are part of a static scene, thus allowing us to link the intrinsic
parameters and the Fundamental matrix via the absolute conic, is first
detailed. Several algorithms are then presented, and their performances
compared by means of extensive simulations. An application of the method to a
binocular or trinocular stereo rig is also considered. It is illustrated by
several experiments with real images which conclude the paper.
This work is in the context of motion and stereo analysis. It presents a new
unified representation which will be useful when dealing with multiple views
in the case of uncalibrated cameras. Several levels of information might
be considered, depending on the availability of information.
Among other things, an algebraic description of the
epipolar geometry of N views is introduced, as well as a framework for
camera self-calibration, calibration updating, and structure from motion in
an image sequence taken by a camera which is zooming and moving at the same
time.
We show how a special decomposition of a set of two or three general projection
matrices, called "canonical" enables us
to build geometric descriptions for a system of
cameras which are invariant with respect to a given group of transformations.
These representations are minimal and capture completely the properties of
each level of
description considered: Euclidean (in the context of calibration, and in the
context of structure from motion, which we distinguish clearly), affine, and
projective, that we also relate to each other. In the last case, a
new decomposition of the well-known Fundamental matrix is obtained.
Dependencies, which appear when three or more views are available, are studied
in the context of the canonic decomposition, and new composition formulas
are established. The theory is illustrated by tutorial examples with real images.
Note: this is the last paper authored by Yvan Leclerc, who passed
away in Oct 2002.
Despite having been forced out of work by his illness and multiple
treatments, Yvan put together this final version during his last remission
in June 2002. If you were a colleague of him in the
computer vision community, please visit the
Yvan Leclerc's memorial page.
The self-consistency methodology is a new paradigm for evaluating
certain vision algorithms without relying extensively on ground truth.
We demonstrate its effectiveness in the case of point--correspondence
algorithms and use our approach to predict their accuracy.
For point--correspondence algorithms, our methodology consists in
applying independently the algorithm to subsets of images obtained by
varying the camera geometry while keeping 3-D object geometry
constant. Matches that should correspond to the same surface element
in 3-D are collected to create statistics that are then used as a
measure of the accuracy and reliability of the algorithm. These
statistics can then be used to predict the accuracy and reliability of
the algorithm applied to new images of new scenes.
An effective representation for these statistics is a scatter diagram
along two dimensions: A normalized distance and a matching score. The
normalized distance make the statistics invariant to camera geometry,
while the matching score allows us to predict the accuracy of
individual matches. We introduce a new matching score based on
Minimum Description Length (MDL) theory, which is shown to be a better
predictor of the quality of a match than the traditional Sum of
Squared Distance (SSD) score.
We demonstrate the potential of our methodology in two different
application areas. First, we compare different point--correspondence
algorithms, matching scores, and window sizes. Second, we detect
changes in terrain elevation between 3-D terrain models reconstructed
from two sets of images taken at a different time.
We finish by discussing the application of self-consistency to other
vision problems.
We introduce a methodology for radiometric reconstruction, the
simultaneous recovery of multiple illuminants and surface albedoes from multiple
views, assuming that the geometry of the scene and of the cameras is known. We
formulate the linear theory of multiple illuminants and show its similarities
with the theory of geometric recovery of multiple views. Linear and non-linear
implementations are proposed; simulation results are discussed; and, finally,
results on real images are presented.
We propose a new approach for vision based longitudinal and lateral vehicle
control. The novel feature of this approach is the use of binocular vision.
We integrate two modules consisting of a new, domain-specific, efficient
binocular stereo algorithm, and a lane marker detection algorithm, and show
that the integration results in a improved performance for each of the
modules.
Longitudinal control is supported by detecting and measuring the distances to
leading vehicles using binocular stereo. The knowledge of the camera geometry
with respect to the locally planar road is used to map the images of the road
plane in the two camera views into alignment. This allows us to separate
image features into those lying in the road plane, e.g. lane markers, and
those due to other objects which are dynamically integrated into an obstacle
map. Therefore, in contrast with the previous work, we can cope with the
difficulties arising from occlusion of lane markers by other vehicles. The
detection and measurement of the lane markers provides us with the positional
parameters and the road curvature which are needed for lateral vehicle
control. Moreover, this information is also used to update the camera
geometry with respect to the road, therefore allowing us to cope with the
problem of vibrations and road inclination to obtain consistent results from
binocular stereo.
We propose a methodology to sketch the 3D geometry
of an outdoor scene consisting of natural terrain. The method
requires only a pair of uncalibrated images, but it produces a
sketch where the order with respect to the dimensions of height
above the ground plane and depth are correct. A dense
representation is generated as a set of profile lines which overlays
the original images.
The Fundamental matrix
Note: if you cite a reference for the F-matrix, please use
the following journal paper.
In particular the ECCV 1992 papers of Faugeras as well as
Faugeras-Luong-Maybank merely hint at the concept.
Q.-T. Luong and O.D. Faugeras.
Intl. Journal of Computer Vision, 17(1):43--76, 1996
postscript 1970K
acrobat pdf 1924
Q.-T. Luong, R. Deriche, O.D. Faugeras, and T. Papadopoulo.
Technical Report RR-1894, INRIA, 1993.
postcript 1170K
Q.-T. Luong and O.D. Faugeras.
In Proc. Conference on Computer Vision and Pattern Recognition,
pages 489--494, New-York, 1993.
postcript 1634K
Q.-T. Luong and O.D. Faugeras.
In Proc. European Conference on Computer Vision, pages
577--588, Stockholm, Sweden, 1994.
postcript 92K
Z. Zhang, R. Deriche, O. Faugeras, Q.-T. Luong.
Artificial
Intelligence Journal, Vol.78, pages 87-119, October 1995.
Shorter version
In Proc. European Conference on Computer Vision, pages
567--576, Stockholm, Sweden, 1994.
postcript 5970K
Q.-T. Luong and O.D. Faugeras.
Computer Vision and Image Understanding,71(1):1--18,1998
postcript 280K
Self-calibration
Q.-T. Luong and O.D. Faugeras.
Intl. Journal of Computer Vision, 22(3):261--289, 1997.
postscript 854K
acrobat pdf 916K
O.D. Faugeras, Q.-T. Luong, and S.J. Maybank.
In Proc. European Conference on Computer Vision, pages
321--334, Santa-Margerita, Italy, 1992.
postcript 85K
Q.-T. Luong and O.D. Faugeras.
In A. Grun and T.S. Huang, editors, Calibration and orientation
of cameras in computer vision. Springer-Verlag, 1996
To appear. Also presented at XVII ISPRS, Washington, and INRIA Tech
Report RR-2014.
postcript 1148K
Z. Zhang, Q.-T. Luong, and O.D. Faugeras.
IEEE Trans. Robotics and Automation}, 12(1):103--113, 1995.
Also INRIA Tech RR-2079.
postcript 1685K
Q.-T. Luong and O.D. Faugeras.
In Proc. International Conference on Pattern Recognition,
pages
A--248--252,
Jerusalem, Israel, 1994.
postcript 490K
Stratification
Note: The two following papers have good ideas and I believe
that at the time when they were written they represented significant
advances, however, I later realized that they suffer from a few
imprecisions both in the presentation and the technical details.
I wrote a better exposition, but it was not used in the
book, so
if you're really interested in this material, email me.
Q.-T. Luong and T. Vieville.
Computer Vision and Image Understanding, 64(2), 193--229, 1996.
Also Technical Report UCB/CSD-93-772, University of California at
Berkeley, Sept 1993, Revised July 1994.
Shorter version
In Proc. European Conference on Computer Vision, pages
589--599, Stockholm, Sweden, 1994.
postscript 504K
acrobat pdf 681K
T. Vieville, O.D. Faugeras, and Q.-T. Luong.
Intl. Journal of Computer Vision, 17(1):7--42, 1996.
postcript 773K
acrobat pdf 807K
Self-consistency, change detection, multiple image radiometry
Y. G. Leclerc, Q.-T. Luong, and P. Fua.
Intl. Journal of Computer Vision. In press 2002.
acrobat pdf 793K
Y. G. Leclerc, Q.-T. Luong, and P. Fua.
Proceedings of the European Conference on Computer Vision (ECCV2000),
(Dublin, Ireland), June 2000.
acrobat pdf 927K
Y. G. Leclerc, Q.-T. Luong, and P. Fua.
Proceedings of the Conference on Computer Vision and Pattern
Recognition (CVPR2000), (Hilton Head, South Carolina), June 2000.
acrobat pdf 333K
Q.-T. Luong, P. Fua and Y. G. Leclerc.
IEEE Trans PAMI. Feb 2002.
acrobat pdf 1908K
Q.-T. Luong, P. Fua and Y. G. Leclerc
Proceedings of the European Conference on Computer Vision (ECCV2002),
(Copenhagen, Denmark), May 2002
acrobat pdf 508K
Various topics
Q.-T. Luong and O.D. Faugeras.
In Proc. European Conference on Artificial Intelligence, pages
800--802, Wien, Austria, 1992.
postcript 235K
Q.-T. Luong, J. Weber, D. Koller, and J. Malik.
In Proc. Fith International Conf. on Computer Vision, pages 52-57,
Cambridge, MA, 1995.
postcript 898K
Q.-T. Luong.
In Proc. International Conference on Pattern
Recognition, pages I-51-55, Brisbane, Autralia, 1998.
postcript 713K
Surveys
Q.-T. Luong.
In C.H. Chen, L.F. Pau, and P.S.P. Wang, editors, Handbook of
pattern recognition and computer vision, pages 311--368. World scientific,
1993. Also available in French, Traitement du Signal, 8(1):3--34, 1991,
and INRIA Technical Report RR-1251.
G. Healey and Q.-T. Luong.
In C.H. Chen, L.F. Pau, and P.S.P. Wang, editors, Handbook of
pattern recognition and computer vision}, pages 283--312. World scientific,
second edition, 1999.
D. Koller, Q.-T. Luong, J. Weber, J. Malik.
In C.H. Chen, L.F. Pau, and P.S.P. Wang, editors, Handbook of
pattern recognition and computer vision, pages 817--854. World scientific,
second edition, 1999. postcript 2486k
Back to Tuan's research page