# Postgraduate Opportunities

## Current PhD Projects

Project title:

**Real time forecasting and monitoring of high frequency data**

Supervisor:

**Álvaro Faria**

With recent technological advances, there has been an increasing demand for statistical forecasting models that can detect and quantify patterns, assess uncertainties, produce forecasts and monitor changes in data from real-time high-frequency processes in various areas. Those include short-term electricity load forecasting in energy generation as well as wireless telemetric biosensing in healthcare where monitoring of patients in their natural environment is desirable. Usually, many such processes are well modelled by non-linear auto-regressive (NLAR) models that are dynamic and can be sequentially applied in real-time. There are a number of proposed NLAR forecasting models in the literature mostly non-dynamic and/or not appropriate for real-time applications.

Forecasting and monitoring data from high-frequency processes can be a multivariate non-linear time series problem. This project takes a Bayesian approach to the problem, building up on recently proposed analytical state-space dynamic smooth transition autoregressive (DSTAR) models that approximate process non-linearities. DSTAR models have been shown to be promising for forecasting of certain non-linear processes (as described in the reference listed below), but issues still remain before the models can be usefully adopted for assimilation of high-frequency data in practice. This project aims to tackle some of the outstanding issues, such as

· How to include information from co-variates on the DSTAR models without compromising real-time applicability?

· How to effectively model multiple cyclic behaviour of different orders?

· How alternative approximations to non-linearities improve on the existing polynomial ones? Would sequential simulation methods such as particle filtering provide appropriate answers?

Hourly electricity load data for a region in Brazil are available for the project. There may also be biosensing data available. The project will involve theoretical developments in statistical methodology, as well as a large amount of practical work requiring good statistical programing skills: current software for these models is written in R.

Project title:

**Forecasting and monitoring traffic network flows**

Supervisor:

**Catriona Queen**

Congestion on roads is a worldwide problem causing environmental, health and economic problems. On-line traffic data can be used as part of a traffic management system to monitor traffic flows at different locations across a network over time and reduce congestion by taking actions, such as imposing variable speed limits or diverting traffic onto alternative routes. Reliable short-term forecasting and monitoring models of traffic flows are crucial for the success of any traffic management system: this project will develop such models.

Forecasting and monitoring the traffic flows at different locations across a network over time, is a multivariate time series problem. This project takes a Bayesian approach to the problem, using dynamic graphical models. These models break the multivariate problem into separate, simpler, subproblems, so that model computation is simplified, even for very complex road networks. Dynamic graphical models have been shown to be promising for short-term forecasting of traffic flows (as described in the references listed below), but issues still remain before the models can be used for an on-line traffic management system in practice. This project aims to tackle some of the outstanding issues, such as the following.

· Any change in traffic flows is often associated with an incident, such as a road traffic accident. Can a monitor be developed which can detect any unexpected changes in traffic flow? And can a monitor detect when a road is reaching capacity, so that congestion is likely to occur?

· When traffic is flowing freely, upstream flows affect flows downstream. In times of congestion or when there is a road block, queuing vehicles can cause the relationships between flows at different locations to change so that downstream flows can affect upstream flows. How can a dynamic graphical model accommodate these changing relationships over time? And how can these changes be detected?

Minute-by-minute traffic flow data at a number of different locations at the intersection of three busy motorways near Manchester, UK, are available for the project (kindly supplied by the Highways Agency in England: http://www.highways.gov.uk/). The project will involve theoretical developments in statistical methodology, as well as a large amount of practical work requiring good statistical programing skills: current software for these models is written in R.

Project title: **Choosing from the Menu of Distributional Families**

Supervisor: **Chris Jones**

Families of univariate continuous distributions with parameters controlling skewness and tail weight have an underacknowledged role in modern statistical modelling: by building exible yet parsimonious distributions into complex statistical models as marginal and conditional distributions, robustness of inference for parameters of interest can be achieved. Several general methods for generating such families of distributions already exist; the aim of this project is to compare, contrast and, if necessary, improve them.

Let g and G be the density and distribution functions of a random variable X following a symmetric distribution on Ʀ, b a density on (0; 1), and W a transformation function usually satisfying a particular important constraint, with w = W’. Families of distributions on Ʀ can be generated from these ingredients in various different ways. Recipes for their densities might look, for example, like:

· 2w(x)g(x) "skew-symmetric" distributions;

· w(x)g(W(x)) transformations of X;

· 2g(W^{-1}(x)) "transformation of scale"; includes "two-piece" distributions

· g(x)b(G(x)) "b-g" distributions.

The menu of families of distributions is long, but which are best to consume? How do the properties of the various approaches compare? Can one reparametrise some families to enhance interpretability and estimability? Are these distributions, or transformations thereof, useful on other supports? How best should they be deployed in regression contexts? Is there a need for new multivariate versions of them? Do particular families have advantages for use in the Bayesian context? This project will start with a substantial review and comparison of existing methods and can then be taken in any of a subset of the above or other directions. The project is principally theoretical/methodological.statistical knowledge, but should also be approached with a view towards practical application.