Ting-Chun Wang

Research Scientist

NVIDIA

Santa Clara, CA

Email:

tingchunw at nvidia dot com

 

GitHub | Google Scholar

 

I'm a research scientist at NVIDIA, working on computer vision, machine learning and computer graphics.
I received my PhD from University of California, Berkeley in 2017, advised by Professor Ravi Ramamoorthi and Alexei A. Efros.

I'm looking for interns to work on GAN related problems. If you're interested, please send me your resume and a brief introduction about yourself.



News

Publications

vid2vid_nips

Video-to-Video Synthesis

 

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro

 

Advances in Neural Information Processing Systems (NIPS), 2018

 

project | paper | arXiv | github | YouTube | abstract | bibtex

We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e.g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video. While its image counterpart, the image-to-image synthesis problem, is a popular topic, the video-to-video synthesis problem is less explored in the literature. Without understanding temporal dynamics, directly applying existing image synthesis approaches to an input video often results in temporally incoherent videos of low visual quality. In this paper, we propose a novel video-to-video synthesis approach under the generative adversarial learning framework. Through carefully-designed generator and discriminator architectures, coupled with a spatio-temporal adversarial objective, we achieve high-resolution, photorealistic, temporally coherent video results on a diverse set of input formats including segmentation masks, sketches, and poses. Experiments on multiple benchmarks show the advantage of our method compared to strong baselines. In particular, our model is capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long, which significantly advances the state-of-the-art of video synthesis. Finally, we apply our approach to future video prediction, outperforming several state-of-the-art competing systems.

@inproceedings{wang2018vid2vid,
   author    = {Ting-Chun Wang and Ming-Yu Liu and Jun-Yan Zhu and Guilin Liu
                and Andrew Tao and Jan Kautz and Bryan Catanzaro},
   title     = {Video-to-Video Synthesis},
   booktitle = {Advances in Neural Information Processing Systems (NIPS)},   
   year      = {2018},
}
      
ds_arxiv

Domain Stylization: A Strong, Simple Baseline for Synthetic to Real Image Domain Adaptation

 

Aysegul Dundar, Ming-Yu Liu, Ting-Chun Wang, John Zedlewski, Jan Kautz

 

arXiv preprint arXiv:1807.09384

 

arXiv | abstract | bibtex

Deep neural networks have largely failed to effectively utilize synthetic data when applied to real images due to the covariate shift problem. In this paper, we show that by applying a straightforward modification to an existing photorealistic style transfer algorithm, we achieve state-of-the-art synthetic-to-real domain adaptation results. We conduct extensive experimental validations on four synthetic-to-real tasks for semantic segmentation and object detection, and show that our approach exceeds the performance of any current state-of-the-art GAN-based image translation approach as measured by segmentation and object detection metrics. Furthermore we offer a distance based analysis of our method which shows a dramatic reduction in Frechet Inception distance between the source and target domains, offering a quantitative metric that demonstrates the effectiveness of our algorithm in bridging the synthetic-to-real gap.

@article{dundar2018domain,
  title={Domain Stylization: A Strong, Simple Baseline for Synthetic to Real Image Domain Adaptation},
  author={Dundar, Aysegul and Liu, Ming-Yu and Wang, Ting-Chun and Zedlewski, John and Kautz, Jan},
  journal={arXiv preprint arXiv:1807.09384},
  year={2018}
}
      
parconv_eccv

Image Inpainting for Irregular Holes Using Partial Convolutions

 

Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro

 

European Conference on Computer Vision (ECCV), 2018

 

project | arXiv | YouTube | abstract | bibtex

Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness. Postprocessing is usually used to reduce such artifacts, but are expensive and may fail. We propose the use of partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels. We further include a mechanism to automatically generate an updated mask for the next layer as part of the forward pass. Our model outperforms other methods for irregular masks. We show qualitative and quantitative comparisons with other methods to validate our approach.

          
@inproceedings{liu2018image,
   author    = {Liu, Guilin and Reda, Fitsum A and Shih, Kevin J 
                and Wang, Ting-Chun and Tao, Andrew and Catanzaro, Bryan},
   title     = {Image inpainting for irregular holes using partial convolutions},
   booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},   
   year      = {2018},
}
      
pix2pixHD_cvpr

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

 

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro

 

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (oral presentation)

 

project | paper | arXiv | slides |
github | YouTube | abstract | bibtex

We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). We generate 2048 × 1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures. Furthermore, we extend our framework to interactive visual manipulation with two additional features. First, we incorporate object instance segmentation information, which enables object manipulations such as removing/adding objects and changing the object category. Second, we propose a method to generate diverse results given the same input, allowing users to edit the object appearance interactively.

@inproceedings{wang2018pix2pixHD,
   author    = {Ting-Chun Wang and Ming-Yu Liu and Jun-Yan Zhu 
                and Andrew Tao and Jan Kautz and Bryan Catanzaro},
   title     = {High-Resolution Image Synthesis and Semantic Manipulation 
                with Conditional GANs},
   booktitle = {Proceedings of the IEEE Conference on 
                Computer Vision and Pattern Recognition (CVPR)},   
   year      = {2018},
}
      
thesis

Beyond Photo-Consistency: Shape, Reflectance, and Material Estimation Using
Light-Field Cameras

 

Ting-Chun Wang

 

PhD Thesis, 2017

 

lfv_sig

Light Field Video Capture Using a Learning-Based Hybrid Imaging System

 

Ting-Chun Wang, Jun-Yan Zhu, Nima Khademi Kalantari, Alexei Efros, Ravi Ramamoorthi

 

ACM Transactions on Graphics (SIGGRAPH), 2017

 

paper | lo-res pdf | abstract | YouTube | bibtex | project page

Capturing light fields requires a huge bandwidth to record the data: a modern light field camera can only take three images per second. Temporal interpolation at such extreme scale is infeasible as too much information will be entirely missing between adjacent frames. Instead, we develop a hybrid imaging system, adding another standard video camera to capture the temporal information. Given a 3 fps light field sequence and a standard 30 fps 2D video, our system can then generate a full light field video at 30 fps. We adopt a learning-based approach, which can be decomposed into two steps: spatio-temporal flow estimation and appearance estimation. The flow estimation propagates the angular information from the light field sequence to the 2D video, so we can warp input images to the target view. The appearance estimation then combines these warped images to output the final pixels. The whole process is trained end-to-end using convolutional neural networks.

@article{wang2017light,
   author  = {Ting-Chun Wang and Jun-Yan Zhu and Nima Khademi Kalantari 
              and Alexei A. Efros and Ravi Ramamoorthi},
   title   = {Light Field Video Capture Using a Learning-Based Hybrid 
              Imaging System},
   journal = {ACM Transactions on Graphics (Proceedings of SIGGRAPH)},
   volume  = {36},
   number  = {4},
   year    = {2017},
}
      
brdf_pami

SVBRDF-Invariant Shape and Reflectance Estimation from Light-Field Cameras

 

Ting-Chun Wang, Manmohan Chandraker, Alexei Efros, Ravi Ramamoorthi

 

Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017

 

paper | abstract | bibtex

In this paper, we derive a spatially-varying (SV)BRDF-invariant theory for recovering 3D shape and reflectance from light-field cameras. Our key theoretical insight is a novel analysis of diffuse plus single-lobe SVBRDFs under a light-field setup. We show that, although direct shape recovery is not possible, an equation relating depths and normals can still be derived. Using this equation, we then propose using a polynomial (quadratic) shape prior to resolve the shape ambiguity. Once shape is estimated, we also recover the reflectance. We present extensive synthetic data on the entire MERL BRDF dataset, as well as a number of real examples to validate the theory, where we simultaneously recover shape and BRDFs from a single image taken with a Lytro Illum camera.

@article{wang2017svbrdf,
   title={{SVBRDF}-Invariant Shape and Reflectance 
   Estimation from Light-Field Cameras},
   author={Wang, Ting-Chun and Chandraker, Manmohan
   and Efros, Alexei and Ramamoorthi, Ravi},
   journal={IEEE Transactions on Pattern 
   Analysis and Machine Intelligence (TPAMI)},
   year={2017},
}
      
lfvs_sig

Learning-Based View Synthesis for Light Field Cameras

 

Nima Khademi Kalantari, Ting-Chun Wang, Ravi Ramamoorthi

 

ACM Transactions on Graphics (SIGGRAPH Asia), 2016

 

paper | abstract | bibtex | project page

With the introduction of consumer light field cameras, light field imaging has recently become widespread. However, there is an inherent trade-off between the angular and spatial resolution, and thus, these cameras often sparsely sample in either spatial or angular domain. In this paper, we use machine learning to mitigate this trade-off. Specifically, we propose a novel learning-based approach to synthesize new views from a sparse set of input views. We build upon existing view synthesis techniques and break down the process into disparity and color estimation components. We use two sequential convolutional neural networks to model these two components and train both networks simultaneously by minimizing the error between the synthesized and ground truth images. We show the performance of our approach using only four corner sub-aperture views from the light fields captured by the Lytro Illum camera. Experimental results show that our approach synthesizes high-quality images that are superior to the state-of-the-art techniques on a variety of challenging real-world scenes. We believe our method could potentially decrease the required angular resolution of consumer light field cameras, which allows their spatial resolution to increase.

@article{Kalantari16ViewSynthesis,
   author={Nima Khademi Kalantari and Ting-Chun Wang 
   and Ravi Ramamoorthi},
   title={Learning-Based View Synthesis for Light 
   Field Cameras},
   journal={ACM Transactions on Graphics (Proceedings 
   of SIGGRAPH Asia 2016)},
   year={2016},
}
      
lfmr_eccv

A 4D Light-Field Dataset and CNN Architectures for Material Recognition

 

Ting-Chun Wang, Jun-Yan Zhu, Ebi Hiroaki, Manmohan Chandraker, Alexei Efros, Ravi Ramamoorthi

 

European Conference on Computer Vision (ECCV), 2016

 

paper | abstract | HTML comparison | bibtex | dataset (2D thumbnail) | full dataset (15.9G)

We introduce a new light-field dataset of materials, and take advantage of the recent success of deep learning to perform material recognition on the 4D light-field. Our dataset contains 12 material categories, each with 100 images taken with a Lytro Illum, from which we extract about 30,000 patches in total. Since recognition networks have not been trained on 4D images before, we propose and compare several novel CNN architectures to train on light-field images. In our experiments, the best performing CNN architecture achieves a 7% boost compared with 2D image classification (70% to 77%).

@inproceedings{wang2016dataset,
   title={A {4D} light-field dataset and {CNN} 
   architectures for material recognition},
   author={Wang, Ting-Chun and Zhu, Jun-Yan 
   and Hiroaki, Ebi and Chandraker, Manmohan 
   and Efros, Alexei and Ramamoorthi, Ravi},
   booktitle={Proceedings of European Conference on 
   Computer Vision (ECCV)},
   year={2016}
}
      
brdf_cvpr

SVBRDF-Invariant Shape and Reflectance Estimation from Light-Field Cameras

 

Ting-Chun Wang, Manmohan Chandraker, Alexei Efros, Ravi Ramamoorthi

 

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016 (oral presentation)

 

paper | abstract | supplementary | HTML comparison | bibtex

In this paper, we derive a spatially-varying (SV)BRDF-invariant theory for recovering 3D shape and reflectance from light-field cameras. Our key theoretical insight is a novel analysis of diffuse plus single-lobe SVBRDFs under a light-field setup. We show that, although direct shape recovery is not possible, an equation relating depths and normals can still be derived. Using this equation, we then propose using a polynomial (quadratic) shape prior to resolve the shape ambiguity. Once shape is estimated, we also recover the reflectance. We present extensive synthetic data on the entire MERL BRDF dataset, as well as a number of real examples to validate the theory, where we simultaneously recover shape and BRDFs from a single image taken with a Lytro Illum camera.

@inproceedings{wang2016svbrdf,
   title={SVBRDF-invariant shape and reflectance 
   estimation from light-field cameras},
   author={Wang, Ting-Chun and Chandraker, Manmohan 
   and Efros, Alexei and Ramamoorthi, Ravi},
   booktitle={Proceedings of the IEEE Conference on 
   Computer Vision and Pattern Recognition (CVPR)},
   year={2016}
}
      
stereo_cvpr

Depth from Semi-Calibrated Stereo and Defocus

 

Ting-Chun Wang, Manohar Srikanth, Ravi Ramamoorthi

 

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

 

paper | abstract | HTML comparison | bibtex

In this work, we propose a multi-camera system where we combine a main high-quality camera with two low-res auxiliary cameras. The auxiliary cameras are well calibrated and act as a passive depth sensor by generating disparity maps. The main camera has an interchangeable lens and can produce good quality images at high resolution. Our goal is, given the low-res depth map from the auxiliary cameras, generate a depth map from the viewpoint of the main camera. The advantage of our system, compared to other systems such as light-field cameras or RGBD sensors, is the ability to generate a high-resolution color image with a complete depth map, without sacrificing resolution and with minimal auxiliary hardware.

@inproceedings{wang2016semi,
   title={Depth from semi-calibrated stereo and defocus},
   author={Wang, Ting-Chun and Srikanth, Manohar
   and Ramamoorthi, Ravi},
   booktitle={Proceedings of the IEEE Conference on 
   Computer Vision and Pattern Recognition (CVPR)},
   year={2016}
}
      
occlusion_pami

Depth Estimation with Occlusion Modeling Using Light-field Cameras

 

Ting-Chun Wang, Alexei Efros, Ravi Ramamoorthi

 

Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016

 

paper | abstract | bibtex

In this paper, an occlusion-aware depth estimation algorithm is developed; the method also enables identification of occlusion edges,which may be useful in other applications. It can be shown that although photo-consistency is not preserved for pixels at occlusions, it still holds in approximately half the viewpoints. Moreover, the line separating the two view regions (occluded object vs. occluder) has the same orientation as that of the occlusion edge in the spatial domain. By ensuring photo-consistency in only the occluded view region, depth estimation can be improved.

@article{wang2016depth,
   title={Depth estimation with occlusion modeling 
   using light-field cameras},
   author={Wang, Ting-Chun and Efros, Alexei and 
   Ramamoorthi, Ravi},
   journal={IEEE Transactions on Pattern 
   Analysis and Machine Intelligence (TPAMI)},
   volume={38},
   number={11},
   pages={2170--2181},
   year={2016},
}
      
occlusion

Occlusion-aware depth estimation using light-field cameras

 

Ting-Chun Wang, Alexei Efros, Ravi Ramamoorthi

 

International Conference on Computer Vision (ICCV), 2015

 

paper | abstract | supplementary | bibtex | code | dataset (3.3GB)

In this paper, we develop a depth estimation algorithm for light field cameras that treats occlusion explicitly; the method also enables identification of occlusion edges, which may be useful in other applications. We show that, although pixels at occlusions do not preserve photo-consistency in general, they are still consistent in approximately half the viewpoints.

@inproceedings{wang2015occlusion,
  title={Occlusion-aware depth estimation 
  using light-field cameras.},
  author={Wang, Ting-Chun and 
  Efros, Alexei and Ramamoorthi, Ravi},
  booktitle={Proceedings of the IEEE International 
  Conference on Computer Vision (ICCV)},
  year={2015}
}
      
glossy_pami

Depth estimation and specular removal for glossy surfaces using
point and line consistency with light-field cameras

 

Michael Tao, Jong-Chyi Su, Ting-Chun Wang, Jitendra Malik, Ravi Ramamoorthi

 

Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

 

paper | abstract | bibtex

Light-field cameras have now become available in both consumer and industrial applications, and recent papers have demonstrated practical algorithms for depth recovery from a passive single-shot capture. However, current light-field depth estimation methods are designed for Lambertian objects and fail or degrade for glossy or specular surfaces. In this paper, we present a novel theory of the relationship between light-field data and reflectance from the dichromatic model.

@article{tao2015depth,
title={Depth Estimation and Specular Removal 
for Glossy Surfaces Using Point and 
Line Consistency with Light-Field Cameras},
author={Tao, Michael and Su, Jong-Chyi 
and Wang, Ting-Chun and Malik, Jitendra 
and Ramamoorthi, },
journal={IEEE Transactions on Pattern 
Analysis \& Machine Intelligence},
number={1},
pages={1--1},
year={2015},
publisher={IEEE}
}
      
glossy

Depth estimation for glossy surfaces with light-field cameras

 

Michael Tao, Ting-Chun Wang, Jitendra Malik, Ravi Ramamoorthi

 

ECCV workshop on Light Fields for Computer Vision (L4CV), 2014

 

paper | abstract | bibtex

Light-field cameras have now become available in both consumer and industrial applications, and recent papers have demonstrated practical algorithms for depth recovery from a passive single-shot capture. In this paper, we develop an iterative approach to use the benefits of light-field data to estimate and remove the specular component, improving the depth estimation. The approach enables light-field data depth estimation to support both specular and diffuse scenes.

@inproceedings{tao2014depth,
title={Depth estimation for glossy 
surfaces with light-field cameras},
author={Tao, Michael W and Wang, Ting-Chun 
and Malik, Jitendra and Ramamoorthi, Ravi},
booktitle={Computer Vision-ECCV 2014 Workshops},
pages={533--547},
year={2014},
organization={Springer}
}
      

Software

Light Field Video: Light field video applications (e.g. video refocusing, changing aperture and view).


Talks

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs (pix2pixHD)

CVPR (2018)

 

Beyond Photo-Consistency: Shape, Reflectance, and Material Estimation Using Light-Field Cameras

Dissertation talk (2017)

 

Light Field Video Capture Using a Learning-Based Hybrid Imaging System

Siggraph (2017)

 

SVBRDF-Invariant Shape and Reflectance Estimation from Light-Field Cameras

CVPR (2016)


web page statistics from GoStats