CS231A Lecture 5:Epipolar Geometry
[HZ] Chapter: 4 “Estimation – 2D perspective transformations
[HZ] Chapter: 9 “Epipolar Geometry and the Fundamental Matrix Transformation”
[HZ] Chapter: 11 “Computation of the Fundamental Matrix F”
[FP] Chapter: 7 “Stereopsis”
[FP] Chapter: 8 “Structure from Motion”
Why is stereo useful?
Recovering structure from a single view
Why is it so difficult?
Intrinsic ambiguity of the mapping from 3D to image (2D)
Multi (stereo)-view geometry
Epipolar geometry
Example: Parallel image planes
Example: Forward translation
Epipolar constraints
- Two views of the same object
- Given a point on left image, how can I find the corresponding point on right image?
The Essential Matrix
Cross product as matrix multiplication
在《视觉SLAM十四讲》中,使用齐次坐标表示像素点是为了表达一个投影关系,这样的话$s\mathrm p$与$\mathrm p$成投影关系,它们在齐次意义下相等,也可以说在尺度意义下相等;
Q:什么是canonical cameras?
A:TODO(感觉是一个定义吧,内参已知的相机就可以转化为canonical camera,然后canonical camera的内参矩阵为单位矩阵)
The Fundamental Matrix
- 这部分的ppt和Essential Matrix非常类似,不过这部分考虑了相机内参$K$,指出了像素点$p$与canonical camera下像素点$p_c$的对应关系。
- 在《视觉SLAM十四讲》中,是直接从像素点$p$推导到了最后的一个等式,然后把等式中间的部分记作基础矩阵$F$和本质矩阵$E$(所以讲的没这门课的清楚。。)
- 本质矩阵则是基本矩阵的一种特殊情况,是在归一化图像坐标下的基本矩阵。
Why F is useful?
- Suppose $F$ is known
- No additional information about the scene and camera is given
- Given a point on left image, we can compute the corresponding epipolar line in the second image
- $F$ captures information about the epipolar geometry of 2 views + camera parameters
- MORE IMPORTANTLY: $F$ gives constraints on how the scene changes under view point transformation (without reconstructing the scene!)
- Powerful tool in:
- 3D reconstruction
- Multi-view object/scene matching
Estimating F
The Eight-Point Algorithm
Problems with the 8-Point Algorithm
The Normalized Eight-Point Algorithm
Example of normalization
The Normalized Eight-Point Algorithm
在course notes 3上有更详细的解读:
The main problem of the standard Eight-Point Algorithm stems from the fact that $W$ is ill-conditioned for SVD. For SVD to work properly, $W$ should have one singular value equal to (or near) zero, with the other singular values being nonzero. However, the correspondences $p_i = (u_i, v_i, 1)$ will often have extremely large values in the first and second coordinates due to the pixel range of a modern camera (i.e. $p_i = (1832, 1023, 1)$). If the image points used to construct $W$ are in a relatively small region of the image, then each of the vectors for $p_i$ and $p′_i$ will generally be very similar. Consequently, the constructed $W$ matrix will have one very large singular value, with the rest relatively small.
To solve this problem, we will normalize the points in the image before constructing $W$. This means we pre-condition $W$ by applying both a translation and scaling on the image coordinates.