Neural Rendering

Introduction
Loss functions for nueral rendering
Generative Adaverial Networks with 3D Control
Neural Scene Representation and Rendering
Novel View Syntehsis for Obejcts and Scenes
Neural Volumetric Rendering: NeRF, etc

SIGGRAPH2021 Course: Adavnces in Neural Rendering link

Introduction

Michael Zollhoefer from Facebook Reality Labs Research
Two alteratives of realisting image syntehsis
- Photo-realistic rendering: Lots of manual work + Full control of scene parameter
- Generative ML: Lots of data + Automatic training + Interactive inference/rendering
Motivation: Creating photorealistic assets is challenging using classical CG techniques
Neural Rendering?: 3 Components
- Generative networks that synthesis raw pixel output
- Controllable by interpretable parameters or by video/audio input
- Illumination, camera, pose, geometry, appearance, or semantic structure controllable
Why neural rendering?:
- Can we learn (part of the) scene representation and/or (part of the) CG function?
Neural Rendering Zoo
- “Regress it” (GQN): 1D code => 2D generative network => 2D image
- “Make it more real” (DVP, DNR): 3D mesh/points + 1D codes => CG (3D to 2D) => 2D encoder/decoder => 2D image
- “Regress and render” (Neural volume): 1D code => 2D generative network => 3D mesh/points + 2D texture + 3D volume => CG (3D to 2D) => 2D
- “Step, sample and blend” (NeRF=super popular in the community): 3D space => Coordinates => MLP => CG (3D to 2D) => 2D image

Loss functions for nueral rendering

Jun-Yan Zhu, CMU CS
Problem statament?
- argmin_G Loss(G(x_input),y_output)
- So, what is a good objective loss function L?
Designing loss function?
- L2 regression: tends to average out L2-distance, not good for optimize each local image
- Classificiation loss - Cross entropoy objective with colofulness term
- Feature/Perceptual loss - Deep feature space matching objective
Loss fucntion in Generative Adversary Network?
- Distinguish whether it is real of fake image?
- Using human annotation is expensive. So replace it with classifier!
- Check pix2pix, edge2cat
- What can pix2pix do?
  - grayscale => automatic colorization
  - scatch => photo
  - But, it needs paired training data, which are expensive.
- Check Cycle-consistent adversarial networks: horse to zebra, orange to apple!
  - How to train with unpaired data (shape of horse + texture of zebra with differnt shape?)
  - Cycle-consistent loss: horse (X) to zebra (G(x)), and zebra to horse again (F(G(x))
- Patch-based contrastive loss using cosine similarity of small patches of X and G(X)
- Summary

Generative Adaverial Networks with 3D Control

Ayush Tewari, Max Planck Institute for Informatics
GAN?
Supervised training of GAN for neural rendering?
- Add scene prarameters (illumination/pose) to input data and do supervised training
- Training with synthetic datasets (so, no need lots of labeled data pairs)
- Training with supervised pairs? Add annotation tools and move along nomal direction of semantic property (pose/gender/expression) hyperplane in the latent space.
- Add non-linearity for high quality control!
- Inverse graphics & 3D control the semantic property gradually!
Unsupervised methods?
- Training generator for controllability
- Projecting real images to latent vector using optimziation-based methods
- Editing the projected latent vector
- Reguralization
- Transformation
- Learning-based methods for projection (not optimization): Using encoder!
Challenges?
- What can be edited?: Widen adjustable control parameters!
- What can be projected?: Trade betwen quality of projection/reconstruction/realistic-editing
3D GANs?
- View points can be controlled explicitly!

Neural Scene Representation and Rendering

Gordon Wetzstein, Stanford EE/CS, www.computataionlimaging.org
Self-supervised scene representation learning approach
Model 3D object to network network
Then came NeRF: Mildenhall et al., ECCV2020
- NN is more compact than 3D voxel or mesh
- Use sirec (periodic fucntion) instead of ReLU
Pi-GAN
Neural volume rendering is slow! (NeRF, Pi-GAN)
- Works by defining camera and shooting rays through the scene, and calcuating integrals of each ray (by approaximate integral).
- Need fast and efficinet integration technique!
- Not numerical integration, use anti-derivative!
- AutoInt:
Neural lumigraph rendering: real time rending during inference time
Summary

Novel View Syntehsis for Obejcts and Scenes

Goal: Get 2 images and generate scenes between the two (with arbitraty camera position).
- IF you have one image only, then use prior-based reconstruction
What method?
- Voxel-based methods: DeepVoxels, Neural Volumns, HoloGAN
- Neural implicit approaches: Scene representation netowrks, Differentiable volumetric rendering, NeRF, Implicit differentiable renderer
- Hybrid Implicit/Explicit: Neural sparse voxel fields, PiFU, GRF, pixelNeRF, MVSNerf, Unconstrained scene generation with locally conditioned radiance fields
- Multi-plane images
- Image-based: Stable view synthesis, IBRNet

Neural Volumetric Rendering: NeRF, etc

Ben Mildenhall, Google Research (bmild.github.io)
What is nerual volumetric rendering?
- Rendering?: Querying the radiance value along rays through 3D space
- Volumetric?: Continuous, differentiable, rendering model without concrete ray/surface intersections
- Neural: Using a neural network as a scene representation, rather than a voxel grid of data
- Inputs: sparse, unstructured, photographs of a scene
- Outputs: representation allowing us to render new views of that scene
Volumetric rendering math
- Traditional method?: using optical pysics => adapted for visualizing medical data and linked alpha compositing => Modern path tracers use sophisticated Monte Carlo methods to render volumetric effects
- Volumetric redering and ML?: Various volume-rendering-esque methods devised for 3D shape reconstruction methods, scaled up to higher resolution voxel grids, ML methods can achieve excellent view synthesis results.
- Volumetric formulation for NeRF
  - Scene is a cloud of tiny colored particles
  - If a ray traveling through the scene hits a particle at t, we return its color c(t)
  - This notion is probabilistic: Chance that ray stops in a small interval around t is sigma(t)dt. Sigma is known as the “volume density”
  - To determine if t is the first hit, need to know T(t): probability that the ray didn’t hit any particles earlier. T(t) is called “transmittance”. We assume sigma is known and want to use it to calculate T
  - P[no hits before t] = T(t)
  - P[hit at t] = sigma(t)dt
  - P[no hits before t+dt] = P[no hits before t]xP[no hit at t]
  - T(t+dt) = T(t) X (1-sigma(t)dt)
  - You can weight the intergrand with color function c(t)
  - Approaximating the nested integral? Use quadrature to approximate the nested integral, splitting the ray up into n segments with endpoints {t_1,t_2,…,t_(n+1)} with length delta_i=t_(i+1)-t_i.
  - Assume that volume density and color are roughly constant within each interval
  - Remind that piecewise constant density/color do not imply constant transmittance! Important to account for how early part of a segment blocks later part when sigma_i is high.
  - Connection to alpha compositing
  - Next question? How do we store the values of color and sigma at each point in space?
Neural networks as representations for spatial data
- Toy problem: storing 2D image data
  - Usually we stroe an image as a 2D grid of RGB color values
  - What if we train a simple fully-connected network (MLP) to do this instead?
  - Problem: Standard coordinate-based MLPs cannot represwent high frequency functions
  - Solution: Pass input coordinates through a high frequency mapping first
  - Input coordinate mapping?: Spatial position (x and y) to positional encoding features using sin/cos functions
  - Scaling frequency matrix B traverses underfitting-overfitting curve. So, optimal scale lies between the extremes.
Neural Radiance Fields (NeRF)
- NeRF = Volume rendering + Coordinate-based network
- Neural network replaces large N-D array
  - (x,y, z, theta, phi) => NN => (r,g,b,sigma)
  - (theta, phi) to visulaize view-dependent effects
  - Train network to reproduce input views of scene using gradient descent
NeRF improvements and extensions
- NeRF problems
  - Scene representation is not anti-aliased
  - Rendering is very slow: KiloNeRF, FastNeRF …
  - Network must be retrained for every scene: GRF, IBRNet, pixelNeRF
  - Requires many input images
  - Needs scen to be static and have fixed lighting

Neural Rendering

Introduction

Loss functions for nueral rendering

Generative Adaverial Networks with 3D Control

Neural Scene Representation and Rendering

Novel View Syntehsis for Obejcts and Scenes

Neural Volumetric Rendering: NeRF, etc

jonghoon.blog

Error

Introduction

Loss functions for nueral rendering

Generative Adaverial Networks with 3D Control

Neural Scene Representation and Rendering

Novel View Syntehsis for Obejcts and Scenes

Neural Volumetric Rendering: NeRF, etc

Templates (for web app):

Error