# Electives

During terms 2 and 3 of their first year students are required to undertake three elective courses from a selection of courses provided by Oxford and Imperial.

**List of elective courses (Term 2: Jan-March 2020)**

## Oxford Mathematical Institute

**Course Overview:**

This course will serve as an introduction to optimal transportation theory, its application in the analysis of PDE, and its connections to the macroscopic description of interacting particle systems.

**Learning Outcomes:**

Getting familar with the Monge-Kantorovich problem and transport distances. Derivation of macroscopic models via the mean-field limit and their analysis based on contractivity of transport distances. Dynamic Interpretation and Geodesic convexity. A brief introduction to gradient flows and examples.

**Course Synopsis:**

- Interacting Particle Systems & PDE (2 hours)
- Granular Flow Models and McKean-Vlasov Equations.
- Nonlinear Diffusion and Aggregation-Diffusion Equations.

- Optimal Transportation: The metric side (4 hours)
- Functional Analysis tools: weak convergence of measures. Prokhorov’s Theorem. Direct Method of Calculus of Variations. (1 hour)
- Monge Problem. Kantorovich Duality. (1.5 hours)
- Transport distances between measures: properties. The real line. Probabilistic Interpretation: couplings.(1.5 hours)

- Mean Field Limit & Couplings (4 hours)
- Dobrushin approach: derivation of the Aggregation Equation. (1.5 hour)
- Sznitmann Coupling Method for the McKean-Vlasov equation. (1.5 hour)
- Boltzmann Equation for Maxwellian molecules: Tanaka Theorem. (1 hour)

- Gradient Flows: Aggregation-Diffusion Equations (6 hours)
- Brenier’s Theorem and Dynamic Interpretation of optimal tranport. Otto’s calculus. (2 hours)
- McCann’s Displacement Convexity: Internal, Interaction and Confinement Energies. (2 hours)

- Gradient Flow approach: Minimizing movements for the (McKean)-Vlasov equation. Properties of the variational scheme. Connection to mean-field limits. (2 hours)

**Reading List:**

- F. Golse,
*On the Dynamics of Large Particle Systems in the Mean Field Limit, Lecture Notes in Applied Mathematics and Mechanics 3.*Springer, 2016. - L. C. Evans,
*Weak convergence methods for nonlinear partial differential equations. CBMS Regional Conference Series in Mathematics 74*, AMS, 1990. - F. Santambrogio,
*Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling, Progress in Nonlinear Differential Equations and Their Applications*, Birkhauser 2015. - C. Villani,
*Topics in Optimal Transportation*, AMS Graduate Studies in Mathematics, 2003

*Please note that e-book versions of many books in the reading lists can be found on SOLO*

**Further Reading:**

- L. Ambrosio, G. Savare,
*Handbook of Differential Equations: Evolutionary Equations*, Volume 3-1, 2007. - C. Villani,
*Optimal Transport: Old and New*, Springer 2009

**General Prerequisites:** Basic linear algebra (such as eigenvalues and eigenvectors of real matrices), multivariate real analysis (such as norms, inner products, multivariate linear and quadratic functions, basis) and multivariable calculus (such as Taylor expansions, multivariate differentiation, gradients).

**Course Overview: **The solution of optimal decision-making and engineering design problems in which the objective and constraints are nonlinear functions of potentially (very) many variables is required on an everyday basis in the commercial and academic worlds. A closely-related subject is the solution of nonlinear systems of equations, also referred to as least-squares or data fitting problems that occur in almost every instance where observations or measurements are available for modelling a continuous process or phenomenon, such as in weather forecasting. The mathematical analysis of such optimization problems and of classical and modern methods for their solution are fundamental for understanding existing software and for developing new techniques for practical optimization problems at hand.

more details: https://courses.maths.ox.ac.uk/node/42762

**Learning Outcomes:**

Students will learn how some of the various different ensembles of random matrices are defined. They will encounter some examples of the applications these have in Data Science, modelling Complex Quantum Systems, Mathematical Finance, Network Models, Numerical Linear Algebra, and Population Dynamics. They will learn how to analyse eigenvalue statistics, and see connections with other areas of mathematics and physics, including combinatorics, number theory, and statistical mechanics.

**Course Synopsis:**

Introduction to matrix ensembles, including Wigner and Wishart random matrices, and the Gaussian and Circular Ensembles. Overview of connections with Data Science, Complex Quantum Systems, Mathematical Finance, Network Models, Numerical Linear Algebra, and Population Dynamics (1 Lecture)

Statement and proof of Wigner’s Semicircle Law; statement of Girko’s Circular Law; applications to Population Dynamics (May’s model). (3 lectures)

Statement and proof of the Marchenko-Pastur Law using the Stieltjes and R-transforms; applications to Data Science and Mathematical Finance. (3 lectures)

Derivation of the Joint Eigenvalue Probability Density for the Gaussian and Circular Ensembles;

method of orthogonal polynomials; applications to eigenvalue statistics in the large-matric limit;

behaviour in the bulk and at the edge of the spectrum; universality; applications to Numerical Linear

Algebra and Complex Quantum Systems (5 lectures)

Dyson Brownian Motion (2 lectures)

Connections to other problems in mathematics, including the longest increasing subsequence

problem; distribution of zeros of the Riemann zeta-function; topological genus expansions. (2

lectures)

**Reading List:**

- ML Mehta,
*Random Matrices*(Elsevier, Pure and Applied Mathematics Series) - GW Anderson, A Guionnet, O Zeitouni,
*An Introduction to Random Matrices*(Cambridge Studies in Advanced Mathematics) - ES Meckes,
*The Random Matrix Theory of the Classical Compact Groups*(Cambridge University Press) - G. Akemann, J. Baik & P. Di Francesco,
*The Oxford Handbook of Random Matrix Theory*(Oxford University Press) - G. Livan, M. Novaes & P. Vivo,
*Introduction to Random Matrices*(Springer Briefs in Mathematical Physics)

*Please note that e-book versions of many books in the reading lists can be found on SOLO*

**Further Reading:**

- T. Tao,
*Topics in Random Matrix Theory*(AMS Graduate Studies in Mathematics)

**General Prerequisites:** Integration and measure theory, martingales in discrete and continuous time, stochastic calculus. Functional analysis is useful but not essential.

**Course Overview: **Stochastic analysis and partial differential equations are intricately connected. This is exemplified by the celebrated deep connections between Brownian motion and the classical heat equation, but this is only a very special case of a general phenomenon. We explore some of these connections, illustrating the benefits to both analysis and probability.

**Course Synopsis: **Feller processes and semigroups. Resolvents and generators. Hille-Yosida Theorem (without proof). Diffusions and elliptic operators, convergence and approximation. Stochastic differential equations and martingale problems. Duality. Speed and scale for one dimensional diffusions. Green's functions as occupation densities. The Dirichlet and Poisson problems. Feynman-Kac formula.

**More details: **https://courses.maths.ox.ac.uk/node/42876

**General Prerequisites: **Part B Graph Theory and Part A Probability. C8.3 Combinatorics is not as essential prerequisite for this course, though it is a natural companion for it.

**Course Overview: **Probabilistic combinatorics is a very active field of mathematics, with connections to other areas such as computer science and statistical physics. Probabilistic methods are essential for the study of random discrete structures and for the analysis of algorithms, but they can also provide a powerful and beautiful approach for answering deterministic questions. The aim of this course is to introduce some fundamental probabilistic tools and present a few applications.

**Course Synopsis: **First-moment method, with applications to Ramsey numbers, and to graphs of high girth and high chromatic number. Second-moment method, threshold functions for random graphs. Lovász Local Lemma, with applications to two-colourings of hypergraphs, and to Ramsey numbers. Chernoff bounds, concentration of measure, Janson's inequality. Branching processes and the phase transition in random graphs. Clique and chromatic numbers of random graphs.

**More details**: https://courses.maths.ox.ac.uk/node/42891

**General Prerequisites: **Analysis: Basic knowledge of differential equations, measure and integration, basic complex analysis, conformal map theory (you might consider taking the C4.8 course in the first term or read the lecture notes). Probability: Martingales, Itô formula. Some knowledge about lattice models such as percolation, loop-erased random walk, Ising model etc. will be beneficial but not required. All the necessary parts will be covered in the lectures.

**Course Overview: **The Schramm-Loewner Evolution (SLE) was introduced in 1998 in order to describe all possible conformally invariant scaling limits that appear in many lattice models of statistical physics. Since then the subject has received a lot of attention and developed into a thriving area of research in its own right which has a lot of interesting connections with other areas of mathematics and physics. Beyond the aforementioned lattice models it is now related to many other areas including the theory of `loop soups', the Gaussian Free Field, and Liouville Quantum Gravity. The emphasis of the course will be on the basic properties of SLE and how SLE can be used to prove the existence of a conformally invariant scaling limit for lattice models.

**Course Synopsis: **

1) (2 lectures) A quick recap of the necessary background from complex analysis. We will go through the Riemann mapping theorem and basic properties of univalent functions both in the unit disc and in the upper half plane. In these lectures I will give the main results and connections between them but not the proofs.

2) (4 lectures) Half-plane capacity, Beurling estimates and (deterministic) Loewner Evolution. We will show that any 'nice' curve can be described by a Loewner Evolution and will study the main properties of a Loewner Evolution driven by a measure.

3) (6 hours) Definition of Schramm-Loewner Evolution and Schramm's principle stating that SLEs are the only conformally invariant random curves satisfying the so called 'domain Markov property'. We will study the main properties of SLE. In particular, we will study its phase transitions and two special cases when SLE has the 'locality' property and the 'restriction' property.

4) (4 lectures) Show the crossing probability for the critical percolation on the triangular lattice has a conformally invariant scaling limit (Cardy's formula). Prove that this implies that the percolation interfaces converge to SLE curves.

**More details: **https://courses.maths.ox.ac.uk/node/44462

**General Prerequisites: **Part A Probability and Part A Integration are required. B8.1 (Measure, Probability and Martingales), B8.2 (Continuous Martingales and Stochastic Calculus) and C8.1 (Stochastic Differential Equations) are desirable, but not essential.

**Course Overview: **The convergence theory of probability distributions on path space is an essential part of modern probability and stochastic analysis allowing the development of diffusion approximations and the study of scaling limits in many settings. The theory of large deviation is an important aspect of limit theory in probability as it enables a description of the probabilities of rare events. The emphasis of the course will be on the development of the necessary tools for proving various limit results and the analysis of large deviations which have universal value. These topics are fundamental within probability and stochastic analysis and have extensive applications in current research in the study of random systems, statistical mechanics, functional analysis, PDEs, quantum mechanics, quantitative finance and other applications.

**Course Synopsis: **

1) (2 lectures) We will recall metric spaces, and introduce Polish spaces, and probability measures on metric spaces. Weak convergence of probability measures and tightness, Prohorov's theorem on tightness of probability measures, Skorohod's representation theorem for weak convergence.

2) (2 lectures) The criterion of pre-compactness for distributions on continuous path spaces, martingales and compactness.

3) (4 hours) Skorohod's topology and metric on the space D[0,∞) of right-continuous paths with left limits, basic properties such as completeness and separability, weak convergence and pre-compacness of distributions on D[0,∞) . D. Aldous' pre-compactness criterion via stopping times.

4) (4 lectures) First examples - Cramér's theorem for finite dimensional distributions, Sanov's theorem. Schilder's theorem for the large deviation principle for Brownian motion in small time, law of the iterated logarithm for Brownian motion.

5) (4 lectures) General tools in large deviations. Rate functions, good rate functions, large deviation principles, weak large deviation principles and exponential tightness. Varadhan's contraction principle, functional limit theorems.

**More details: **https://courses.maths.ox.ac.uk/node/44461

**Syllabus**

Lecture 1: Modelling: least squares, matrix completion, sparse inverse covariance estimation, sparse principal components, sparse plus low rank matrix decomposition, support vector machines.

Lecture 2: Further modelling: logistic regression, deep learning. Mathematical preliminaries: Global and local optimisers, convexity, subgradients, optimality conditions.

Lecture 3: Preliminaries: Proximal operators, convergence rates.

Lecture 4: Steepest descent method and its convergence analysis in the general case, the convex case and the strongly convex case.

Lecture 5: Prox-gradient methods.

Lecture 6: Accelerating gradient methods: heavy ball method, Nesterov acceleration.

Lecture 7: Oracle complexity and the stochastic gradient descent algorithm.

Lecture 8: Variance reduced stochastic gradient descent.

**Literature**

- S.J.Wright. “Optimization Algorithms for Data Analysis”, http://www.optimization-online.org/DB_FILE/2016/12/5748.pdf
- L. Bottou, F.E. Curtis, and J. Nocedal. “Optimization methods for large-scale machine learning. SIAM Review, 59(1): 65-98, 2017.
- Z. Allen-Zhu. Katyusha: The first direct acceleration of stochastic gradient methods. The Journal of Machine Learning Research, 18(1): 8194-8244, 2017.

## Department of Statistics, University of Oxford

The aim of the lectures is to introduce modern stochastic models in mathematical population genetics and give examples of real world applications of these models.

Stochastic and graph theoretic properties of coalescent and genealogical trees are studied in the first eight lectures.

Diffusion processes and extensions to model additional key biological phenomena are studied in the second eight lectures.

**Aims and Objectives: **Many data come in the form of networks, for example friendship data and protein-protein interaction data. As the data usually cannot be modelled using simple independence assumptions, their statistical analysis provides many challenges. The course will give an introduction to the main problems and the main statistical techniques used in this field. The techniques are applicable to a wide range of complex problems. The statistical analysis benefits from insights which stem from probabilistic modelling, and the course will combine both aspects.

**Synopsis: **

Exploratory analysis of networks. The need for network summaries. Degree distribution, clustering coefficient, shortest path length. Motifs.

Probabilistic models: Bernoulli random graphs, geometric random graphs, preferential attachment models, small world networks, inhomogeneous random graphs, exponential random graphs.

Small subgraphs: Stein’s method for normal and Poisson approximation. Branching process approximations, threshold behaviour, shortest path between two vertices.

Statistical analysis of networks: Sampling from networks. Parameter estimation for models. Inference from networks: vertex characteristics and missing edges. Nonparametric graph comparison: subgraph counts, subsampling schemes, MCMC methods. A brief look at community detection.

Examples: protein interaction networks, social ego-networks.

**Recommended Prerequisites: **The course requires a good level of mathematical maturity. Students are expected to be familiar with core concepts in statistics (regression models, bias-variance tradeoff, Bayesian inference), probability (multivariate distributions, conditioning) and linear algebra (matrix-vector operations, eigenvalues and eigenvectors). Previous exposure to machine learning (empirical risk minimisation, dimensionality reduction, overfitting, regularisation) is highly recommended. Students would also benefit from being familiar with the material covered in the following courses offered in the Statistics department: SB2.1 (formerly SB2a) Foundations of Statistical Inference and in SB2.2 (formerly SB2b) Statistical Machine Learning.

**Aims and Objectives: **Machine learning is widely used across many scientific and engineering disciplines, to construct methods to find interesting patterns and to predict accurately in large datasets. This course introduces several widely used data machine learning techniques and describes their underpinning statistical principles and properties. The course studies both unsupervised and supervised learning and several advanced topics are covered in detail, including some state-of-the-art machine learning techniques. The course will also cover computational considerations of machine learning algorithms and how they can scale to large datasets.

**Synopsis: **

Convex optimisation and support vector machines. Loss functions. Empirical risk minimisation.

Kernel methods and reproducing kernel Hilbert spaces. Representer theorem. Representation of probabilities in RKHS.

Nonlinear dimensionality reduction: kernel PCA, spectral clustering.

Probabilistic and Bayesian machine learning: mixture modelling, information theoretic fundamentals, EM algorithm, Probabilistic PCA. Variational Bayes. Laplace Approximation.

Collaborative filtering models, probabilistic matrix factorisation.

Gaussian processes for regression and classification. Bayesian optimisation.

*+ Latent Dirichlet allocation [if time allows]*

**More details: **SC4 Advanced Topics in Statistical Machine Learning

## Imperial College London

This course develops the analysis of boundary value problems for elliptic and parabolic PDE’s using the variational approach. It is a follow-up of ‘Function spaces and applications’ but is open to other students as well provided they have sufficient command of analysis. An introductory Partial Differential Equation course is not needed either, although certainly useful.

The course consists of three parts. The first part (divided in two chapters) develops further tools needed for the study of boundary value problem, namely distributions and Sobolev spaces. The following two parts are devoted to elliptic and parabolic equations on bounded domains. They present the variational approach and spectral

theory of elliptic operators as well as their use in the existence theory for parabolic problems. The aim of the course is to expose the students some important aspects of Partial Differential Equation theory, aspects that will be most useful to those who will further work with Partial Differential Equations be it on the Theoretical side or on the Numerical one.

The syllabus of the course is as follows:

- Distributions: The space of test functions. Definition and examples of distributions. Differentiation. Convolution. Convergence of distributions.
- Sobolev spaces: The space H1. Density of smooth functions. Extension lemma. Trace theorem. The space H10. Poincare inequality. The Rellich-Kondrachov compactness theorem (without proof). Sobolev imbedding (in the simple case of an interval of R). The space Hm. Compactness and Sobolev imbedding for arbitrary dimension (statement without proof).
- Linear elliptic boundary value problems: Dirichlet and Neumann boundary value problems via the Lax-Milgram theorem. The maximum principle. Regularity (stated without proofs). Classical examples: elasticity system, Stokes system.
- Spectral Theory: compact operators in Hilbert spaces. The Fredholm alternative. Spectral decomposition of compact self-adjoint operators in Hilbert spaces. Spectral theory of linear elliptic boundary value problems.
- Linear parabolic initial-boundary value problems: Existence and uniqueness by spectral decomposition on the eigenbasis of the associated elliptic operator. Classical examples (Navier-Stokes equation).

The goal of the module is to develop thorough understanding of how trades occur in financial markets. The main market types will be described as well as traders’ main motives for why they trade.

Market manipulation and high-frequency trading strategies have received a lot of attention in the press recently, so the module will illustrate them and examine recent developments in regulations that aim to limit them. Liquidity is a key theme in market microstructure, and the students will learn how to measure it and to recognise the recent increase in liquidity fragmentation and hidden, “dark” liquidity.

The Flash Crash of 6 May 2010 will be analysed as a case study of sudden loss of liquidity.

The remaining part of the module focuses on statistical analysis of market microstructure, concentrating on statistical modelling of tick-by-tick data, measurement of price impact and volatility estimation using high-frequency data.

- Electronic Markets and the limit Order Book
- Stochastic Optimal Control (a review)
- Optimal Execution with Continuous Trading
- Optimal Execution with Limit and Market Orders
- Market Making
- Statistical Arbitrage in High-Frequency Settings (if time permits)

This is an introductory course on the theory and applications of random dynamical systems and ergodic theory. Random dynamical systems are (deterministic) dynamical systems driven by a random input. The goal will be to present a solid introduction and, time permitting, touch upon several more advanced developments in this field. The contents of the module are:

- Random dynamical systems; definition in terms of skew products and elementary examples (including iterated function systems, discrete time dynamical systems with bounded noise and stochastic differential equations).
- Introduction to random dynamical systems theory in iterated function systems context.
- Background on measure theory and probability theory.
- Introduction to Ergodic Theory: Birkhoff Ergodic Theorem and Oseledets Ergodic Theorem.
- Dynamics of random circle maps: synchronisation.
- Chaos in random dynamical systems.

Markov processes are widely used to model random evolutions with the Markov property `given the present, the future is independent of the past’. The theory connects with many other subjects in mathematics and has vast applications.

This course is an introduction to Markov processes. We aim to build intuitions and foundations for further studies in stochastic analysis and in stochastic modelling.

Prerequisites: Measure and Integration (M345P19) is strongly recommended. A good knowledge of real analysis would be helpful (M2PM1).

Abstract:

Numerical simulations are nowadays used on a daily basis in order to model physical, biological, chemical or financial systems. The question of how to quantify the uncertainty on the output of these simulations is crucial in order to use these models with confidence. The objective of these lectures, at the interface between applied probability, statistics and numerical analysis, is to introduce mathematical methods to model, characterize and analyse the uncertainty on the results. Various deterministic and stochastic sampling techniques will be introduced. We will also discuss metamodels to build surface response. These metamodels are useful for example to analyze the sensitivity of the output of a numerical model. Finally, rare event sampling techniques will also be presented.

Here is a preliminary schedule:

- Introduction and basics about Monte Carlo methods.

- Parameter estimation: Maximum Likelihood Estimator, Bayesian approaches.

- Building metamodels using regression, Gaussian process regression.

- Sensitivity analysis.

- Reduced basis techniques and proper orthogonal decomposition.

- Low rank approximation and greedy algorithms, proper generalized decomposition.

- Risk analysis: FORM/SORM methods, quantile estimation, extreme value theory.

- Monte Carlo methods for rare events: importance sampling and splitting methods.

Exercise sheets will be provided to practice the concepts introduced in the lectures. The assessment will be done through projects, with numerical experiments to be conducted on simple examples.

Prerequisites: Ordinary differential equations, partial differential equations, real analysis, probability theory.

The course offers a bespoke introduction to stochastic calculus required to cover the classical theoretical results of nonlinear filtering as well as some modern numerical methods for solving the filtering problem. The first part of the course will equip the students with the necessary knowledge (e.g., Ito Calculus, Stochastic Integration by Parts, Girsanov’s theorem) and skills (solving linear stochastic differential equation, analysing continuous martingales, etc) to handle a variety of applications. The focus will be on the use of stochastic calculus to the theory and numerical solution of nonlinear filtering.

1. Martingales on Continuous Time (Doob Meyer decomposition, L_p bounds, Brownian motion, exponential martingales, semi-martingales, local martingales, Novikov’s condition)

2. Stochastic Calculus (Ito’s isometry, chain rule, integration by parts)

3. Stochastic Differential Equations (well posedness, linear SDEs, the Ornstein-Uhlenbeck process, Girsanov's Theorem)

4. Stochastic Filtering (definition, mathematical model for the signal process and the observation process)

5. The Filtering Equations (well-posedness, the innovation process, the Kalman-Bucy filter)

6. Numerical Methods (the Extended Kalman-filter, Sequential Monte-Carlo methods).

Malliavin Calculus is an extremely powerful tool in stochastic analysis, extending the classical notion of derivative to the space of stochastic processes.

A certain number of results arising from this theory turn out to provide the right framework to analysis several problems in mathematical finance.

The module will be divided into two parts:

the first one will concentrate on developing the theoretical tools of Malliavin Calculus, including analysis on Wiener space, the Wiener chaos decomposition, the Ornstein-Uhlenbeck semigroup and hypercontractivity, the Malliavin derivative operator and the divergence operator, and Sobolev spaces and equivalence of norms.

The second part of the module will focus on understanding how these tools come in handy in order toprice and hedge financial derivatives, and to compute their sensitivities.

Rough path theory was developed in the 1990s in order to understand the structure and information content of a given path (be it a financial time series, a hand-drawn character or the route taken by a vehicle).

It turned out to be one of the key developments in stochastic analysis over the past 20 years andhas allowed for a better understanding (and new proofs) to many problems in this field.

The goal of this module is to provide students with a flavour of this powerful theory and to understand how it can efficiently be applied in machine learning, one of the fast-developing techniques in the financial industry nowadays. One of the key elements in this exploration is the so-called signature of a path, of which we shall study the algebraic properties, the faithfulness, as well as the inversion and asymptotic properties. We shall further see how this signature is in fact a feature set in machine learning andillustrate these results in mathematical finance (in particular to predict financial time series), as well as in other areas (handwriting recognition, computer vision, classification problems in medical data).

## Taught Course Centre

Course runs 10-12 on Thursdays in the taught course centre for 8 weeks.

Students may participate in the course in person at Imperial or via live video in Oxford.