2018-ECCV-Progressive Neural Architecture Search

  • Johns Hopkins University(霍普金斯大学) && Google AI && Stanford
  • GitHub:300+ stars
  • Citation:504


current techniques usually fall into one of two categories: evolutionary algorithms(EA) or reinforcement learning(RL).

SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。

Although both EA and RL methods have been able to learn network structures that outperform manually designed architectures, they require significant computational resources.



we describe a method that requiring 5 times fewer model evaluations during the architecture search.


We propose to use heuristic search to search the space of cell structures, starting with simple (shallow) models and progressing to complex ones, pruning out unpromising structures as we go.


Since this process is expensive, we also learn a model or surrogate function(替代函数) which can predict the performance of a structure without needing to training it.


Several advantages:

First, the simple structures train faster, so we get some initial results to train the surrogate quickly.


Second, we only ask the surrogate to predict the quality of structures that are slightly different (larger) from the ones it has seen


Third, we factorize(分解) the search space into a product(乘积) of smaller search spaces, allowing us to potentially search models with many more blocks.


we show that our approach is 5 times more efficient than the RL method of [41] in terms of number of models evaluated, and 8 times faster in terms of total compute.



Search Space

we first learn a cell structure, and then stack this cell a desired number of times, in order to create the final CNN.


2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第1张

一个cell接收HxWxF的tensor,如果cell的stride=1,输出HxWxF的tensor,如果stride=2,输出H/2 x W/2 x 2F的tensor。

2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第2张

一个cell由B个block组成,每个block有2个input和1个output,每个block可以用一个五元组表示\(\left(I_{1}, I_{2}, O_{1}, O_{2}, C\right)\),第c个cell的输出表示为\(H^c\),第c个cell的第b个block的输出表示为\(H^c_b\)

2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第3张

每个block的输入为当前cell中,在 {此block之前所有block的输出} 和 {上一个cell的输出,上上个cell的输出} 的集合。

2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第4张


2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第5张

2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第6张

we stack a predefined number of copies of the basic cell (with the same structure, but untied weights 不继承权重 ), using either stride 1 or stride 2, as shown in Figure 1 (right).


The number of stride-1 cells between stride-2 cells is then adjusted accordingly with up to N number of repeats.

Normal cell(stride=1)的数量,取决于N(超参)。

we only use one cell type (we do not distinguish between Normal and Reduction cells, but instead emulate a Reduction cell by using a Normal cell with stride 2),

我们没有区分normal cell 和Reduction cell,仅将Normal cell的stride设置为2作为Reduction cell。

Progressive Neural Architecture Search

Many previous approaches directly search in the space of full cells, or worse, full CNNs.


While this is a more direct approach, we argue that it is difficult to directly navigate in an exponentially large search space, especially at the beginning where there is no knowledge of what makes a good model.


2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第7张


2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第8张


2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第9张

2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第10张

Performance Prediction with Surrogate Model

Requirement of Predictor

  • Handle variable-sized inputs(接受可变输入)
  • Correlated with true performance(预测值与真实值得相关性)
  • Sample efficiency(简单高效)
  • The requirement that the predictor be able to handle variable-sized strings immediately suggests the use of an RNN.

Two Predictor method

RNN and MLP(多层感知机)

However, since the sample size is very small, we fit an ensemble of 5 predictors, We observed empirically that this reduced the variance of the predictions.



Performance of the Surrogate Predictors

we train the predictor on the observed performance of cells with up to b blocks, but we apply it to cells with b+1 blocks.


We therefore consider predictive accuracy both for cells with sizes that have been seen before (but which have not been trained on), and for cells which are one block larger than the training data.

2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第11张

在所有{B=b}的cell集合中随机选择10k个作为数据集\(U_{b,1 :R}\),训练20个epochs。

randomly select K = 256 models (each of size b) from \(U_{b,1 :R}\)to generate a training set \(S_{b,t,1:K}\);


2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第12张


2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第13张

We now use this random dataset to evaluate the performance of the predictors using the pseudocode(伪代码) in Algorithm 2, where A(H) returns the true validation set accuracies of the models in some set H.

A(H) 返回cell的集合H训练后真实的准确率。

2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第14张


2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第15张

2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第16张

We see that the predictor performs well on models from the training set, but not so well when predicting larger models. However, performance does increase as the predictor is trained on more (and larger) cells.


2018-ECCV-Progressive Neural Architecture Search-论文阅读 人工智能 第17张

We see that for predicting the training set, the RNN does better than the MLP, but for predicting the performance on unseen larger models (which is the setting we care about in practice), the MLP seems to do slightly better.



The main contribution of this work is to show how we can accelerate the search for good CNN structures by using progressive search through the space of increasingly complex graphs


combined with a learned prediction function to efficiently identify the most promising models to explore.

使用可学习的预测器来识别潜在的最优网络。(引入P网络来搜索目标网络的最佳结构。eg. 用C网络来搜索B网络的最佳结构,而B网络又是来搜索A网络的最佳结构,套娃)

The resulting models achieve the same level of performance as previous work but with a fraction of the computational cost.



拒绝背锅 运筹帷幄