
我们提出了一种将输入视频分解或“展开”为一组分层2D地图集的方法,每个地图集提供视频上对象(或背景)外观的统一表示。对于视频中的每个像素,我们的方法估计其在每个地图集中对应的2D坐标,为我们提供视频的一致参数化以及相关的alpha(不透明度)值。
重要的是,我们设计的地图集具有可解释性和语义性,这有助于在地图集领域进行简单直观的编辑,所需的手动工作最少。应用于单个2D图集(或输入视频帧)的编辑将自动且一致地映射回原始视频帧,同时保留遮挡、变形和其他复杂场景效果(如阴影和反射)。我们的方法使用基于坐标的多层感知器(MLP)表示映射、地图集和Alpha,它们在每个视频的基础上联合优化,使用视频重建和正则化损失的组合。通过纯二维操作,我们的方法不需要任何关于场景几何体或相机姿势的先验三维知识,并且可以处理复杂的动态真实世界视频。我们演示了各种视频编辑应用程序,包括纹理映射、视频样式转换、图像到视频纹理转换以及分割/标记传播,所有这些都是通过编辑单个2D atlas图像自动生成的。
We present a method that decomposes, or “unwraps”, an input video into a set of layered 2D atlases, each providing a unified representation of the appearance of an object (or background) over the video. For each pixel in the video, our method estimates its corresponding 2D coordinate in each of the atlases, giving us a consistent parameterization of the video, along with an associated alpha (opacity) value.
Importantly, we design our atlases to be interpretable and semantic, which facilitates easy and intuitive editing in the atlas domain, with minimal manual work required. Edits applied to a single 2D atlas (or input video frame) are automatically and consistently mapped back to the original video frames, while preserving occlusions, deformation, and other complex scene effects such as shadows and reflections. Our method employs a coordinate-based Multilayer Perceptron (MLP) representation for mappings, atlases, and alphas, which are jointly optimized on a per-video basis, using a combination of video reconstruction and regularization losses. By operating purely in 2D, our method does not require any prior 3D knowledge about scene geometry or camera poses, and can handle complex dynamic real world videos. We demonstrate various video editing applications, including texture mapping, video style transfer, image-to-video texture transfer, and segmentation/labeling propagation, all automatically produced by editing a single 2D atlas image.
PDFAbstract