Nerf Source Code Explain

Basic Conception

Neual Networks

CNN, namely Convolutional(畳み込み) Neural Networks
for example, if you want to recognize an image “X”, you can change the image into matrix, then you can make three matrices $$\begin{bmatrix}-1 & -1 & 1\\ -1 & 1 & -1\\ 1 & -1 & -1 \end{bmatrix}$$ $$\begin{bmatrix}1 & -1 & 1\\ -1 & 1 & -1\\ 1 & -1 & -1 \end{bmatrix}$$ $$\begin{bmatrix}1 & -1 & -1\\ -1 & 1 & -1\\ -1 & -1 & 1 \end{bmatrix}$$

which have the specific feature of “X”, left-above, center, and right above.

Sperical harmonics

It is a type of function expansion in 3D space. It is a basis fuction on a spherical surface, similar to how Fourier series perform an orthogonal expansion on a 1D interval.

spherical harmonics can indeed be seen as a 3D basis function expansion:

Polynomial expansion: Uses $x^3, x^2, x^1, x^0$ to construct polynomial basis functions.
Fourier expansion: Uses sine and cosine functions as basis functions.
Spherical harmonic expansion: Uses $Y_l^m(θ,ϕ)$ as basis functions for the surface of a sphere.

Hierarchical Sampling

Coarse(粗い) Network and Fine(細い) Net. Coarse sample is uniform(一様) sampling, which cannot sapling points of edge very clearly and will lead to issues of wasted sampling points in empty spaces ineffectively. So after coarse sampling, there is a fine samling, which sample more points in rapidly changed place, like edge.

Representing a 3D space with a continuous field

Continuous Field: This is a mathematical concept representing a mapping function where every point $(x,y,z)$ is associated with a property or value.

Camera

The position and orientation of the camera are determined by its extrinsic matrix, while the projection properties are determined by its intrinsic matrix.

extrinsic matrix is also called w2c(world-to-camera) matrix.

the inverse matrix(逆行列) of w2c is c2w matrix, which is widely used in Nerf.

the function of c2w is transform points from camera axis to world axis.

$$\begin{bmatrix} R & T\\ 0 & 1\end{bmatrix} = \begin{bmatrix} r11 & r12 & r13 & t1 \\ r21 & r22 & r23 & t2 \\ r31 & r32 & r33 & t3 \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

the intrinsic matrix of camera K is below:

$$\begin{bmatrix}f_x & 0 & c_x\\ 0 & f_y & c_y\\ 0 & 0 & 1 \end{bmatrix}$$

$f_x$ and $f_y$ represent the vertical focal length and horizontal focal length

How to Get Camera parameters

In Nerf, Intrinsic and Extrinsic parameters should be given before you start, but how can we get these parameters?

Well, there are two ways, the first way is using data from rendering software like blender.
The second way is using real data from our own real camera.

there are some tech help us, such as COLMAP: ColMap Link, which is a sfm algorithm.

3DGS use colmap to initialize a point cloud