论文关注趋势

10月28

  • Alias-Free Generative Adversarial Networks

    10:20 作者:xiaoxingxing

    我们观察到,尽管它们具有层次卷积性质,但典型生成对抗网络的合成过程以一种不健康的方式依赖于绝对像素坐标。这表现为,例如,细节似乎粘在图像坐标上,而不是所描绘对象的表面上。 我们将根本原因追溯到粗心的信号处理,导致发电机网络出现混叠。将网络中的所有信号解释为连续的,我们导出了普遍适用的小型架构更改,以确保不需要的信息不会泄漏到分层合成过程中。由此产生的网络与StyleGAN2的FID匹配,但在内部表示上存在显著差异,即使在亚像素尺度上,它们也与平移和旋转完全相同。我们的结果为更适合视频和动画的生成模型铺平了道路。We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.PDFAbstract

    Alias-Free Generative Adversarial Networks

    论文关注趋势
    分享到

10月25

  • Parameter Prediction for Unseen Deep Architectures

    00:00 作者:xiaoxingxing

    深度学习已经成功地实现了机器学习管道中特征设计的自动化。然而,优化神经网络参数的算法大多是手工设计的,计算效率低下。 我们研究是否可以利用过去训练其他网络的知识,利用深度学习直接预测这些参数。我们介绍了一个由各种神经结构计算图组成的大规模数据集——DeepNets-1M,并使用它来探索CIFAR-10和ImageNet的参数预测。通过利用图形神经网络的进步,我们提出了一种超网络,它可以在一次只需几分之一秒的前向传递中预测性能参数,即使在CPU上也是如此。该模型在不可见和多样的网络上取得了令人惊讶的良好性能。例如,它能够预测ResNet-50的所有2400万个参数,在CIFAR-10上实现60%的准确率。在ImageNet上,我们的一些网络的前五名准确率接近50%。我们的任务以及模型和结果可能导致一种新的、计算效率更高的训练网络范例。我们的模型还学习了神经结构的强大表示,使其能够进行分析。Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of diverse computational graphs of neural architectures - DeepNets-1M - and use it to explore parameter prediction on CIFAR-10 and ImageNet. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a ResNet-50 achieving a 60% accuracy on CIFAR-10. On ImageNet, top-5 accuracy of some of our networks approaches 50%. Our task along with the model and results can potentially lead to a new, more computationally efficient paradigm of training networks. Our model also learns a strong representation of neural architectures enabling their analysis.PDFAbstract

    Parameter Prediction for Unseen Deep Architectures

    论文关注趋势
    分享到

9月23

  • Layered Neural Atlases for Consistent Video Editing

    00:00 作者:xiaoxingxing

    我们提出了一种将输入视频分解或“展开”为一组分层2D地图集的方法,每个地图集提供视频上对象(或背景)外观的统一表示。对于视频中的每个像素,我们的方法估计其在每个地图集中对应的2D坐标,为我们提供视频的一致参数化以及相关的alpha(不透明度)值。 重要的是,我们设计的地图集具有可解释性和语义性,这有助于在地图集领域进行简单直观的编辑,所需的手动工作最少。应用于单个2D图集(或输入视频帧)的编辑将自动且一致地映射回原始视频帧,同时保留遮挡、变形和其他复杂场景效果(如阴影和反射)。我们的方法使用基于坐标的多层感知器(MLP)表示映射、地图集和Alpha,它们在每个视频的基础上联合优化,使用视频重建和正则化损失的组合。通过纯二维操作,我们的方法不需要任何关于场景几何体或相机姿势的先验三维知识,并且可以处理复杂的动态真实世界视频。我们演示了各种视频编辑应用程序,包括纹理映射、视频样式转换、图像到视频纹理转换以及分割/标记传播,所有这些都是通过编辑单个2D atlas图像自动生成的。We present a method that decomposes, or "unwraps", an input video into a set of layered 2D atlases, each providing a unified representation of the appearance of an object (or background) over the video. For each pixel in the video, our method estimates its corresponding 2D coordinate in each of the atlases, giving us a consistent parameterization of the video, along with an associated alpha (opacity) value. Importantly, we design our atlases to be interpretable and semantic, which facilitates easy and intuitive editing in the atlas domain, with minimal manual work required. Edits applied to a single 2D atlas (or input video frame) are automatically and consistently mapped back to the original video frames, while preserving occlusions, deformation, and other complex scene effects such as shadows and reflections. Our method employs a coordinate-based Multilayer Perceptron (MLP) representation for mappings, atlases, and alphas, which are jointly optimized on a per-video basis, using a combination of video reconstruction and regularization losses. By operating purely in 2D, our method does not require any prior 3D knowledge about scene geometry or camera poses, and can handle complex dynamic real world videos. We demonstrate various video editing applications, including texture mapping, video style transfer, image-to-video texture transfer, and segmentation/labeling propagation, all automatically produced by editing a single 2D atlas image.PDFAbstract

    Layered Neural Atlases for Consistent Video Editing

    论文关注趋势
    分享到

个人中心
购物车
优惠劵
今日签到
搜索