Parameter Prediction for Unseen Deep Architectures

深度学习已经成功地实现了机器学习管道中特征设计的自动化。然而,优化神经网络参数的算法大多是手工设计的,计算效率低下。
我们研究是否可以利用过去训练其他网络的知识,利用深度学习直接预测这些参数。我们介绍了一个由各种神经结构计算图组成的大规模数据集——DeepNets-1M,并使用它来探索CIFAR-10和ImageNet的参数预测。通过利用图形神经网络的进步,我们提出了一种超网络,它可以在一次只需几分之一秒的前向传递中预测性能参数,即使在CPU上也是如此。该模型在不可见和多样的网络上取得了令人惊讶的良好性能。例如,它能够预测ResNet-50的所有2400万个参数,在CIFAR-10上实现60%的准确率。在ImageNet上,我们的一些网络的前五名准确率接近50%。我们的任务以及模型和结果可能导致一种新的、计算效率更高的训练网络范例。我们的模型还学习了神经结构的强大表示,使其能够进行分析。

Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient.

We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of diverse computational graphs of neural architectures – DeepNets-1M – and use it to explore parameter prediction on CIFAR-10 and ImageNet. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a ResNet-50 achieving a 60% accuracy on CIFAR-10. On ImageNet, top-5 accuracy of some of our networks approaches 50%. Our task along with the model and results can potentially lead to a new, more computationally efficient paradigm of training networks. Our model also learns a strong representation of neural architectures enabling their analysis.
PDFAbstract

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
搜索