Last summer a non-deep learning method for novel view synthesis has entered the game: 3D Gaussian splattig. It is a method to represent a scene in 3D and to render images in real-time from any viewing direction. Some even say they are replacing NeRFs, the predominant method for novel view synthesis and implicit scene representation at that time. I think that is debatable since NeRFs are much more than image renderers. But that is nothing we care about today… Today we only care about crisp looking 3D models and that is where 3D Gaussian splatting shines
In this post we will very briefly look into Gaussian Splatting and then switch gears and I’ll show you how you can turn yourself into a 3D model.
Bonus: At the end I’ll show you how you can then embed your model in an interactive viewer on any website.
So, let’s go!
- What are Gaussian Splats?
- Let’s Turn Ourselves into a 3D Gaussian Splatting
- Conclusion and Further Resources
3D Gaussian splatting is a technique to represent a scene in 3D. It is actually one of many ways. For example you could also represent a scene as a set of points, a mesh, voxels or using an implicit representation like Neural Radiance Fields (aka. NeRFs).
The foundation of 3D Gaussian Splatting has been around for quite some time leading back to 2001 to a classical approach from computer vision called surface splatting.
But how does 3D Gaussian Splatting actually represent a scene?
3D Representation
In 3D Gaussian Splatting a scene is represented by a set of points. Each point has certain attributes associated with it to parameterize an anisotropic 3D Gaussian. If an image is rendered, these Gaussians overlap to form the image. The actual parameterization takes place during the optimization phase that fits these parameters in such a way, that rendered images are as close as possible to the original input images.
A 3D Gaussian is parameterizedwith
- its mean µ, which is the x,y,z coordinate in 3D space.
- its covariance matrix Σ, which can be interpreted as the spread of the Gaussian in any 3D direction. Since the Gaussian is anisotropic it can be stretched in any direction.
- a color usually represented as spherical harmonics. Spherical harmonics allow the Gaussian splats to have different colors from different viewpoints which drastically improves the quality of renders. It allows rendering non-lambertian effects like specularities of metallic objects.
- an opacity 𝛼 that determines how transparent the Gaussian will be.
The image bellow shows the influence of a 3D Gaussian Splat with respect to a point p. Spoiler: that point p will be the one relevant if we render the image.
How do you get an image out of this representation?
Image Rendering
Like NeRFs, 3D Gaussian Splatting uses 𝛼-blending along a ray that is casted from a camera through the image plane and through the scene. This basically means that through integration along a ray al intersecting gaussians contribute to the final pixel’s color.
The image bellow shows the conceptual difference between the most basic NeRF (for simplicity) and gaussian splatting.
While conceptually similar, there is a large difference in the implementation though. In Gaussian Splatting we don’t have any deep learning model like the multi-layer perceptron (MLP) in NeRFs. Hence we don’t need to evaluate the implicit function approximated by the MLP for each point (which is relatively time consuming) but overlap various partially transparent Gaussians of different size and color. We still need to cast at least 1 ray per pixel of the image to render the final image.
So basically through the blending of all that Gaussians the illusion of a perfect image emerges. If you’d remove the transparency from the splats you can actually see the individual gaussians of varying size and orientation.
And how is it optimized?
Optimization
The optimization is theoretically straightforward and easy to understand. But of course, as always, the success lies in the details.
To optimize the Gaussian Splattings, we need an initial set of points and images of the scene. The authors of the paper suggest to use the structure from motion (SfM) algorithm to obtain the initial point cloud. During training, the scene is rendered with the estimated camera pose and camera intrinsic obtained from SfM. The rendered image and the original image are compared, a loss is calculated and the parameters of each Gaussian is optimized with stochastic gradient descent (SGD).
One of the important details worth mentioning is the adaptive densification scheme. SGD is only capable to adjust the parameter of existing Gaussians, but it cannot spawn new ones or destroy existing ones. This might lead to holes in the scene or to lack of fine-grained details if there are too few points and to unnecessarily large point clouds if there are too many points. To overcome this, the adaptive densification method splits points with large gradients and removes points that have converged to low values.
Having talked about some theoretical basics let’s now switch gears and jump into the practical part of this post, where I show you how you can create a 3D Gaussian splatting of yourself.
Note: The authors suggest using a GPU with at least 24GB but you can still create your 3D Gaussian Splats using some tricks I’ll will mention when they need to be applied. I have an RTX 2060 mobile with 6GB.
These are the steps we will cover:
- Installation
- Capture a Video
- Obtain point cloud and camera poses
- Run the Gaussian Splatting Algo
- Post processing
- (Bonus) Embed your model on a website in an interactive viewer
Installation
For the installation you can either jump over to the official 3D Gaussian Splatting repository and follow their instructions or head over to The NeRF Guru on YouTube who does an excellent job in showing how to install all you need. I recommend the later.
I personally chose to install colmap on windows because I was not able to build colmap from source with GPU support in my WSL environment and for windows there is a pre-built installer. The optimization for the 3D Gaussian Splatting has been done on Linux. But it actually does not really matter and the commands I show you are equal on either Windows or Linux.
Capture a Video
Ask someone to capture a video of you. You must stand as still as possible and the other person must walk around you trying to capture you from any angle.
Some Hints:
- Choose a pose where it is easy for you not to move. E.g. holding your hands up for 1 minute without moving is not that easy
- Choose a high framerate for capturing the video to reduce motion blur. E.g. 60fps.
- If you have a small GPU, don’t film in 4k otherwise the optimizer is likely to crash with an out of memory exception.
- Ensure there is sufficient light, so your recording is crisp and clear.
- If you have a small GPU, prefer indoor scenes over outdoor scenes. Outdoor scenes have a lot of “high frequency” content aka. small things close to each other like gras and leaves which leads to many Gaussians being spawned during the adaptive densification.
Once you have recorded your video move it to your computer and extract single frames using ffmpeg.
ffmpeg -i <PATH_VIDEO> -qscale:v 1 -qmin 1 -vf fps=<FRAMES_PER_SEC> <PATH_OUTPUT>/%04d.jpg
This command takes the video and converts it into jpg images of high quality with low compression (only jpg works). I usually use between 4–10 frames per second. The output files will be named with an up counting four-digit number.
You should then end up with a folder full of single frame images like so:
Some hints for better quality:
- Remove blurry images — otherwise leads to a haze around you and spawns “floaters”.
- Remove images where your eyes are closed — otherwise leads to blurry eyes in the final model.
Obtain Point Cloud and Camera Poses
As mentioned earlier the gaussian splatting algorithm needs to be initialized. One way is to initialize the Gaussians’ mean with the location of a point in 3D space. We can use the tool colmap which implements structure from motion (SfM) to obtain a sparse point cloud from images only. Luckily, the authors of the 3D Gaussian Splatting paper provided us with code to simplify the process.
So head over to the Gaussian Splatting repo you cloned, activate your environment and call the convert.py script.
python .\convert.py -s <ROOT_PATH_OF_DATA> --resize
The root path to your data is the directory that contains the “input” folder with all the input images. In my case I created a subfolder within in the repo: ./gaussian-splatting/data/<NAME_OF_MODEL>
. The argument --resize
will output additional images with a down sampling factors 2, 4, and 8. This is important in case you run out of memory for high resolution images, so you can simply switch to a lower resolution.
Note: I had to set the environment variable CUDA_VISIBLE_DEVICES=0 for the GPU to being used with colmap.
Depending on the number of images you have, this process might take a while, so either grab a cup of coffee or stare at the progress like I sometimes do wasting a lot of time
Once colmap is done you can type colmap gui
into your command line and inspect the sparse point cloud.
To open the point cloud click on “File>import model” and navigate to <ROOT_PATH_DATA>/sparse/0
and open that folder.
The red objects are cameras the SfM algorithm estimated from the input frames. They represent the position and pose of the camera where a frame was captured. SfM further provides the intrinsic camera calibration, which is important for the 3D gaussian splatting algorithm so gaussians can be rendered into a 2D image during optimization.
Run the Gaussian Splatting Optimizer
Everything up until now has been preparation for the actual 3D Gaussian splatting algorithm.
The script to train the 3D Gaussian splatt is train.py. I usually like to wrap those python scripts into a shell script to be able to add comments and easily change the parameters of a run. Here is what I use:
Except for the data_device=cpu
all arguments are set to their default. If you run into memory issues, you can try tweaking the following arguments:
resolution
: this is the down sampling factor of the image resolution. 1 means full resolution, and 2 means half resolution. Since we have used --resize
for the convert.py for the sparse point cloud generation, you can test with 1, 2, 4 and 8. Before lowering the resolution I recommend trying to lower sh_degree
first.
sh_degree
: Sets the maximum degree of the spherical harmonics, with 3 being the maximum. Lowering this value has a large impact on the memory footprint. Remember that the spherical harmonics control the view-dependent color rendering. Practically sh_degree=1
usually still looks good from my experience.
densify_*_iter
: Controls the span of iterations where adaptive densification is performed. Tweaking the argument might result in fewer points being spawned hence a lower memory footprint. Note that this might have a big impact on the quality.
If everything turns out well, you hopefully end up with a scene as shown below. In the next section we jump into the visualization and postprocessing.
You can actually see quite nice the gaussian shape of individual splats in low density regions.
Post Processing
Even though the Gaussian splatting repo comes with its own visualizer, I prefer to use Super Splat since it is much more intuitive and you can directly edit your scene.
So to get started, head the Super Splat editor and open your ply-file, located under ./output/<RUN_NAME/point_cloud/iteration_xxxx>
.
I usually start to remove most of the background points using a sphere as indicated below.