11:00 Alain Rossier: Asymptotic analysis of deep residual networks
Abstract: Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
Further, we related the training of deep residual network to the field of stochastic optimal control, and describe the backward equation for the loss function. We study the similarities and differences between the backward equation and the classical backpropagation algorithm.
11:45 Matheus Manzatto De Castro: Existence and Uniqueness of Quasi-stationary and Quasi-ergodic Measures for Absorbing Markov Processes.
Abstract: A central question on absorbing Markov processes concerns the existence and uniqueness of quasi-stationary and quasi-ergodic measures, but sufficient conditions have remained quite restrictive in a general context. In this talk, we motivate and establish the existence and uniqueness of quasi-stationary and quasi-ergodic measures for almost surely absorbed-time Markov processes under mild conditions on evolution.
2:00-3:00 Patrick Kidger - Neural Differential Equations in Machine Learning
Neural Differential Equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin.Traditional parameterised differential equations are a special case.Many popular neural network architectures (e.g. residual networks,recurrent networks, StyleGAN2, coupling layers) are discretisations. By treating differential equations as a learnt component of a differentiable computation graph, then NDEs extend current physical modelling techniques whilst integrating tightly with current deep learning practice.
NDEs offer high-capacity function approximation, strong priors on model space,the ability to handle irregular data, memory efficiency,and a wealth of available theory on both sides. They are particularly suitable for tackling dynamical systems, time series problems, and generative problems.
This talk will offer a dedicated introduction to the topic, with examples including neural ordinary differential equations (e.g. to model unknown physics), neural controlled differential equations("continuous recurrent networks"; e.g. to model functions of time series),and neural stochastic differential equations (e.g. to model time series themselves). If time allows I will discuss other recent work, such as novel numerical neural differential equation solvers. This talk includes joint work with Ricky T.Q. Chen, Xuechen Li, James Foster, and James Morrill.