CV / Pinehole Camera [Java]

This practical work illustrates the notions of the pinehole camera model and proposes the implementation of a camera calibration tool using the OpenCV library via its integration with Java using JavaCV.

The pinehole camera model

The pinehole camera model describe a camera as a simple point (the optical center) that enable to project a 3D scene on a 2D plane using a perspective transformation. This kind of model relies on 2 sets of parameters:

Intrinsics parameters, related to the camera itself
Extrinsics parameters, that are not related to camera self characteristics

Determining these two sets of parameters is called camera calibration.

Intrinsics parameters

Intrinsic parameters represent the camera internal settings, such as focal length f, coordinates of the principal point in the image or distortion parameters.

Camera matrix

Within OpenCV, camera mechanical parameters are represented using a $3\times{}3$ camera matrix:

$$K\ =\ \begin{bmatrix}f_{x} & 0 & c_{x}\\ 0 & f_{y} & c_{y}\\ 0 & 0 & 1 \end{bmatrix}$$

where:

$f_{x}$, $f_{y}$ are the focal lengthes along $X$ and $Y$ axis (most of the time $f_{x}\ =\ f_{y}$)
$(c_{x}, c_{y})$ are the coordinates of the principal point according to the image referential

Most of the computer vision libraries that works with digital images prefer to express focal lengthes in pixels instead of millimeters (mm). It is possible to pass from metric focal lengthes to pixels focal lengths with following computations:

$$\begin{cases}f_{x}\ =\ \dfrac{f\times{}i_{w}}{s_{w}} \\ \\ f_{y}\ =\ \dfrac{f\times{}i_{h}}{s_{h}} \end{cases}$$

where:

$f$ is the metric focal lenth, expressed in millimeters (mm)
$i_{w}$, $i_{h}$ are the image with and the image height respectively, expressed in pixels (px)
$s_{w}$, $s_{h}$ are the camera sensor with and the camera sensor height respectively, expressed in millimeters (mm)

The ratios $p_{w} = \dfrac{s_{w}}{i_{w}}$ and $p_{h} = \dfrac{s_{h}}{i_{h}}$ represents the width and the height of an image pixel on the camera sensor. According these definition, focal lengthes relations can be expressed as:

$$\begin{cases}f_{x}\ =\ \dfrac{f}{p_{w}} \\ \\ f_{y}\ =\ \dfrac{f}{p_{h}} \end{cases}$$

It may be noted that in the case of a camera with square pixels, then $p_{w} = p_{h} = p$ and so:

$$\begin{cases}f_{x}\ =\ \dfrac{f}{p_{w}}\ =\ \dfrac{f}{p_{h}}\ =\ f_{y} \\ \\ f_{y}\ =\ \dfrac{f}{p_{h}}\ =\ \dfrac{f}{p_{w}}\ =\ f_{x} \end{cases}$$

Camera distortion

The distortion is represented as a line matrix that contains the distorsion coefficients. According to the distortion type, the matrix can be one of the following:

$D\ =\ \begin{bmatrix}k_{1} & k_{2} & p_{1} & p_{2} \end{bmatrix}$
$D\ =\ \begin{bmatrix}k_{1} & k_{2} & p_{31} & p_{2} & k_{3} \end{bmatrix}$ (the most common)
$D\ =\ \begin{bmatrix}k_{1} & k_{2} & p_{31} & p_{2} & k_{3} & k_{4} & k_{5} & k_{6}\end{bmatrix}$
$D\ =\ \begin{bmatrix}k_{1} & k_{2} & p_{31} & p_{2} & k_{3} & k_{4} & k_{5} & k_{6} & s_{1} & s_{2} & s_{3} & s_{4} \end{bmatrix}$
$D\ =\ \begin{bmatrix}k_{1} & k_{2} & p_{31} & p_{2} & k_{3} & k_{4} & k_{5} & k_{6} & s_{1} & s_{2} & s_{3} & s_{4} & \tau_{x} & \tau_{y} \end{bmatrix}$

where:

$k_{1},\ldots{},\ k_{6}$ are the radial distortion coefficients
$p_{1},\ p_{2}$ are the tangential distortion coefficients
$s_{1},\ s_{2},\ s_{3},\ s_{4}$ are the thin prism distortion coefficients
$\tau_{x}, \tau_{y}$ are the tilt (Scheimpflug) distortion coefficients

Projecting a point $(x,\ y,\ z)$ from the 3D scene into a point $(u,\ v)$ on an image is then obtained by:

1. Computation of $(x_{p}^{h}, y_{p}^{h}, z_{p}^{h}, w)$ as the projection of the $(x,\ y,\ z)$ point onto the plane located at $z\ =\ 1$

$$\begin{bmatrix} x_{p}^{h} \\ y_{p}^{h} \\ z_{p}^{h} \\ w \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}\begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} = \begin{bmatrix} x \\ y \\ z \\ z \end{bmatrix}$$

2. Passing from homogeneous to euclidean coordinates

$$\begin{bmatrix} x_{p} \\ y_{p} \\ z_{p} \end{bmatrix} = \begin{bmatrix} x_{p}^{h}/w \\ y_{p}^{h}/w \\ z_{p}^{h}/w \end{bmatrix}=\begin{bmatrix} x_{p}^{h}/z \\ y_{p}^{h}/z \\ z_{p}^{h}/z \end{bmatrix}=\begin{bmatrix} x/z \\ y/z \\ 1 \end{bmatrix}$$

3. Distortion computation (only affect 2D $x$ and $y$ coordinates)

$$\begin{bmatrix} x_{d} \\ y_{d} \\ z_{d} \end{bmatrix} = \begin{bmatrix} x_{p} \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + 2 p_1 x_{p} y_{p} + p_2(r^2 + 2 x_{p}^2) + s_1 r^2 + s_2 r^4 \\ y_{p} \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + p_1 (r^2 + 2 y'^2) + 2 p_2 x_{p} y_{p} + s_3 r^2 + s_4 r^4 \\ 1 \end{bmatrix},\ r\ =\ x_{p}^{2}+y_{p}^{2},\ z_{d}\ =\ z_{p}\ =\ 1$$

4. Changing to image referential

$$\begin{bmatrix} u^{h} \\ v^{h} \\ t \end{bmatrix} = K\begin{bmatrix} x_{d} \\ y_{d} \\ 1 \end{bmatrix}=\begin{bmatrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{bmatrix}\begin{bmatrix} x_{d} \\ y_{d} \\ 1 \end{bmatrix}=\begin{bmatrix} f_x x_{d} + c_x \\ f_y y_{d} + c_y \\ 1 \end{bmatrix}$$

5. Passing from homogeneous to euclidean coordinates

$$\begin{bmatrix} u \\ v \end{bmatrix} = \begin{bmatrix} u^{h}/t \\ v^{h}/t \end{bmatrix}=\begin{bmatrix} f_x x_{d} + c_x \\ f_y y_{d} + c_y \end{bmatrix}$$

Extrinsics parameters

fsdfdqs

fqfqsd

qfqsfq

Vision par Ordinateur