Provable Advantage of Curriculum Learning on Boolean Targets
Experimental results have shown that curriculum learning, i.e., presenting simpler examples before more complex ones, can improve the efficiency of learning. In this talk, we prove a separation result in the number of training steps between curriculum and standard learning on a common sample distribution. We first focus on parity targets, and we show that if the data distribution is a mixture of sparse and dense inputs, there exists a regime in which a 2-layer ReLU neural network trained by a curriculum noisy-GD (or SGD) algorithm that uses sparse examples first, can learn parities of sufficiently large degree, while any fully connected neural network of possibly larger width or depth trained by noisy-GD on the unordered samples cannot learn without additional steps. Beyond parities, we show that under additional constraints on the sparsity parameters, using sparse examples first facilitates the learning of any k-sparse Boolean targets. On the experimental side, we show results supporting the qualitative separation beyond the specific regime of the theoretical results, and a separation in terms of sample complexity. Based on two joint works with E. Abbé, A. Lotfi and E. Mossel.
Area: CS61 - Deep learning theory (Alberto Bietti)
Keywords: Learning, k-sparse Boolean targets