“Make it more real” (DVP, DNR): 3D mesh/points + 1D codes => CG (3D to 2D) => 2D encoder/decoder => 2D image
“Regress and render” (Neural volume): 1D code => 2D generative network => 3D mesh/points + 2D texture + 3D volume => CG (3D to 2D) => 2D
“Step, sample and blend” (NeRF=super popular in the community): 3D space => Coordinates => MLP => CG (3D to 2D) => 2D image
Loss functions for nueral rendering
Jun-Yan Zhu, CMU CS
Problem statament?
argmin_G Loss(G(x_input),y_output)
So, what is a good objective loss function L?
Designing loss function?
L2 regression: tends to average out L2-distance, not good for optimize each local image
Classificiation loss - Cross entropoy objective with colofulness term
Feature/Perceptual loss - Deep feature space matching objective
Loss fucntion in Generative Adversary Network?
Distinguish whether it is real of fake image?
Using human annotation is expensive. So replace it with classifier!
Check pix2pix, edge2cat
What can pix2pix do?
grayscale => automatic colorization
scatch => photo
But, it needs paired training data, which are expensive.
Check Cycle-consistent adversarial networks: horse to zebra, orange to apple!
How to train with unpaired data (shape of horse + texture of zebra with differnt shape?)
Cycle-consistent loss: horse (X) to zebra (G(x)), and zebra to horse again (F(G(x))
Patch-based contrastive loss using cosine similarity of small patches of X and G(X)
Summary
Generative Adaverial Networks with 3D Control
Ayush Tewari, Max Planck Institute for Informatics
GAN?
Supervised training of GAN for neural rendering?
Add scene prarameters (illumination/pose) to input data and do supervised training
Training with synthetic datasets (so, no need lots of labeled data pairs)
Training with supervised pairs? Add annotation tools and move along nomal direction of semantic property (pose/gender/expression) hyperplane in the latent space.
Add non-linearity for high quality control!
Inverse graphics & 3D control the semantic property gradually!
Unsupervised methods?
Training generator for controllability
Projecting real images to latent vector using optimziation-based methods
Editing the projected latent vector
Reguralization
Transformation
Learning-based methods for projection (not optimization): Using encoder!
Challenges?
What can be edited?: Widen adjustable control parameters!
What can be projected?: Trade betwen quality of projection/reconstruction/realistic-editing
Rendering?: Querying the radiance value along rays through 3D space
Volumetric?: Continuous, differentiable, rendering model without concrete ray/surface intersections
Neural: Using a neural network as a scene representation, rather than a voxel grid of data
Inputs: sparse, unstructured, photographs of a scene
Outputs: representation allowing us to render new views of that scene
Volumetric rendering math
Traditional method?: using optical pysics => adapted for visualizing medical data and linked alpha compositing => Modern path tracers use sophisticated Monte Carlo methods to render volumetric effects
Volumetric redering and ML?: Various volume-rendering-esque methods devised for 3D shape reconstruction methods, scaled up to higher resolution voxel grids, ML methods can achieve excellent view synthesis results.
Volumetric formulation for NeRF
Scene is a cloud of tiny colored particles
If a ray traveling through the scene hits a particle at t, we return its color c(t)
This notion is probabilistic: Chance that ray stops in a small interval around t is sigma(t)dt. Sigma is known as the “volume density”
To determine if t is the first hit, need to know T(t): probability that the ray didn’t hit any particles earlier. T(t) is called “transmittance”. We assume sigma is known and want to use it to calculate T
P[no hits before t] = T(t)
P[hit at t] = sigma(t)dt
P[no hits before t+dt] = P[no hits before t]xP[no hit at t]
T(t+dt) = T(t) X (1-sigma(t)dt)
You can weight the intergrand with color function c(t)
Approaximating the nested integral? Use quadrature to approximate the nested integral, splitting the ray up into n segments with endpoints {t_1,t_2,…,t_(n+1)} with length delta_i=t_(i+1)-t_i.
Assume that volume density and color are roughly constant within each interval
Remind that piecewise constant density/color do not imply constant transmittance! Important to account for how early part of a segment blocks later part when sigma_i is high.
Connection to alpha compositing
Next question? How do we store the values of color and sigma at each point in space?
Neural networks as representations for spatial data
Toy problem: storing 2D image data
Usually we stroe an image as a 2D grid of RGB color values
What if we train a simple fully-connected network (MLP) to do this instead?
Problem: Standard coordinate-based MLPs cannot represwent high frequency functions
Solution: Pass input coordinates through a high frequency mapping first
Input coordinate mapping?: Spatial position (x and y) to positional encoding features using sin/cos functions
Scaling frequency matrix B traverses underfitting-overfitting curve. So, optimal scale lies between the extremes.
Keep going!Keep going ×2!Give me more!Thank you, thank youFar too kind!Never gonna give me up?Never gonna let me down?Turn around and desert me!You're an addict!Son of a clapper!No wayGo back to work!This is getting out of handUnbelievablePREPOSTEROUSI N S A N I T YFEED ME A STRAY CAT