# Scientific

## Conditional Sampling with Block-Triangular Transport Maps

Generative models such as Generative Adversarial Nets (GANs), Variational Autoencoders and Normalizing Flows have been very successful in the unsupervised learning task of generating samples from a high-dimensional probability distribution. However, the task of conditioning a high-dimensional distribution from limited empirical samples has attracted less attention in the literature but it is a central problem in Bayesian inference and supervised learning. In this talk we will discuss some ideas in this direction by viewing generative modelling as a measure transport problem. In particular, we present a simple recipe using block-triangular maps and monotonicity constraints that enables standard models such as the original GAN to perform conditional sampling. We demonstrate the effectiveness of our method on various examples ranging from synthetic test sets to image in-painting and function space inference in porous medium flow.

## The Connection Between RDEs and PDEs

Recursive distributional equations (RDEs) are ubiquitous in probability. For example, the standard Gaussian distribution can be characterized as the unique fixed point of the following RDE

$$

X = (X_1 + X_2) / \sqrt{2}

$$

among the class of centered random variables with standard deviation of 1. (The equality in the equation is in distribution; the random variables and must all be identically distributed; and and must be independent.)

Recently, it has been discovered that the dynamics of certain recursive distributional equations can be solved using by using tools from numerical analysis, on the convergence of approximation schemes for PDEs. In particular, the framework for studying stability and convergence for viscosity solutions of nonlinear second order equations, due to Crandall-Lions, Barles-Souganidis, and others, can be used to prove distributional convergence for certain families of RDEs, which can be interpreted as tree- valued stochastic processes. I will survey some of these results, as well as the (current) limitations of the method, and our hope for further interplay between these two research areas.

## A reproducing kernel Hilbert space framework for functional classification

The intrinsic infinite-dimensional nature of functional data creates a bottleneck in the application of traditional classifiers to functional settings. These classifiers are generally either unable to generalize to infinite dimensions or have poor performance due to the curse of dimensionality. To address this concern, we propose building a distance-weighted discrimination (DWD) classifier on scores obtained by projecting data onto one specific direction. We choose this direction by minimizing, over a reproducing kernel Hilbert space, an empirical risk function containing the DWD classifier loss function. Our proposed classifier avoids overfitting and enjoys the appealing properties of DWD classifiers. We further extend this framework to accommodate functional data classification problems where scalar covariates are involved. In contrast to previous work, we establish a non-asymptotic estimation error bound on the relative misclassification rate. Through simulation studies and a real-world application, we demonstrate that the proposed classifier performs favourably relative to other commonly used functional classifiers in terms of prediction accuracy in finite-sample settings.

## Optimal Study Design for Reducing Variances of Coefficient Estimators in Change-Point Models

In longitudinal studies, we measure the same variables at multiple time-points to track their change over time. The exact data collection schedules (i.e., time of participants' visits) are often pre-determined to accommodate the ease of project management and compliance. Therefore, it is common to schedule those visits at equally spaced time intervals. However, recent publications based on simulated experiments indicate that the power of studies and the precision of model parameter estimators is related to the participants' visiting scheme. So, in this work, we investigate how to schedule participants' visits to better study the accelerated cognitive decline of senior adults, where a broken-stick model is often applied. We formulate this optimal design problem on scheduling participants' visiting into a high- dimensional optimization problem and derive its approximate solution by adding reasonable constraints. Based on this approximation, we propose a novel design of the visiting scheme that aims to maximize the power (i.e. reduce the variance of estimators) in identifying the onset of accelerated decline. Using both simulation studies and evidence from real data, we demonstrate that our design outperforms the standard equally-spaced one when we have strong prior knowledge on the change-points. This novel design helps researchers plan their longitudinal studies with improved power in detecting pattern change without collecting extra data. Also, this individual-level scheduling system helps monitor seniors' cognitive function and, therefore, benefits the development of personal level treatment for cognitive decline, which agrees with the trend of the health care system.

## Variational Autoencoders: an introduction to new applications and a new regularization approach

In this presentation, we discuss the Variational AutoEncodeur (VAE): a latent variable model emerging from the machine learning community. To begin, we introduce the theoretical foundations of the model and its relationship with well-established statistical models. Then, we discuss how we used VAEs to solve two widely different problems. First, we tackled a classic statistical problem, survival analysis, and then a classic machine learning problems, image analysis and image generation. We conclude with a short discussion of our latest research project where we establish a new metric for the evaluation or regularization of latent variable models such a Gaussian Mixture Models and VAEs.

## Programmable Human Organoids via Genetic Design and Engineering

Synthetic biology offers bottom-up engineering strategies that intends to understand complex systems via design-build-test cycles. In development, gene regulatory networks emerge into collective cellular behaviors with multicellular forms and functions. Here, I will introduce a synthetic developmental biology approach for tissue engineering. It involves building developmental trajectories in stem cells via programmed gene circuits and network analysis. The outcome of our approach is decoding our own development and to create programmable organoids with both natural or artificial designs and augmented functions.

## High-Order Accuracy Computation of Coupling Functions for Strongly Coupled Oscillators

We develop a general framework for identifying phase reduced equations for finite populations of coupled oscillators that is valid far beyond the weak coupling approximation. This strategy represents a general extension of the theory from [Wilson and Ermentrout, Phys. Rev. Lett 123, 164101 (2019)] and yields coupling functions that are valid to higher-order accuracy in the coupling strength for arbitrary types of coupling (e.g., diffusive, gap-junction, chemical synaptic). These coupling functions can be used to understand the behavior of potentially high-dimensional, nonlinear oscillators in terms of their phase differences. The proposed formulation accurately replicates nonlinear bifurcations that emerge as the coupling strength increases and is valid in regimes well beyond those that can be considered using classic weak coupling assumptions. We demonstrate the performance of our approach through two examples. First, we use diffusively coupled complex Ginzburg-Landau (CGL) model and demonstrate that our theory accurately predicts bifurcations far beyond the range of existing coupling theory. Second, we use a realistic conductance-based model of a thalamic neuron and show that our theory correctly predicts asymptotic phase differences for non-weak synaptic coupling. In both examples, our theory accurately captures model behaviors that weak coupling theories can not.

### Speaker Biography

Youngmin Park, Ph.D., is currently a PIMS Postdoc at the University of Manitoba under the supervision of Prof. Stéphanie Portet. He received his PhD in Mathematics from the University of Pittsburgh in 2018, where he applied dynamical systems methods to problems in neuroscience. His first postdoc involved auditory neuroscience research at the University of Pennsylvania in the Department of Otorhinolaryngology, before moving on to his next postdoc researching molecular motor dynamics in the Department of Mathematics at Brandeis University. He is now at Manitoba, continuing to apply dynamical systems methods to biological questions related to molecular motor transport and neural oscillators.

## Footnotes to Turing (1952): Some Modern Challenges in Pattern Formation

Motivated by recent work with biologists, I will showcase some mathematical results on Turing instabilities in complex domains. This is scientiﬁcally related to understanding developmental tuning in a variety of settings such as mouse whiskers, human ﬁngerprints, bat teeth, and more generally pattern formation on multiple scales and evolving domains. Some of these problems are natural extensions of classical reaction-diffusion models, amenable to standard linear stability analysis, whereas others require the development of new tools and approaches. These approaches also help close the vast gap between the simple theory of diﬀusion-driven pattern formation, and the messy reality of biological development, though there is still much work to be done in validating even complex theories against the rich pattern dynamics observed in nature. I will emphasize throughout the role that Turing's 1952 paper had in these developments, and how much of our modern progress (and difficulties) were predicted in this paper. I will close by discussing a range of open questions, many of which fall well beyond the extensions I will discuss, but at least some of which were known to Turing.

## Large Systems of Interacting Particles and their Applications in Optimization

Large systems of interacting particles (or agents) are widely used to investigate self-organization and collective behavior. They frequently appear in modeling phenomena such as biological swarms, crowd dynamics, self-assembly of nanoparticles and opinion formation. Similar particle models are also used in metaheuristics, which provide empirically robust solutions to tackle hard optimization problems with fast algorithms. In this talk I will start with introducing some generic particle models and their underlying mean-field equations. Then we will focus on a specific particle model that belongs to the class of Consensus-Based Optimization (CBO) methods, and we show that it is able to perform essentially as good as ad hoc state of the art methods in challenging problems in signal processing and machine learning.

### Speaker Biography

Hui Huang, Ph.D., is currently a PIMS Postdoc at the University of Calgary under the supervision of Prof. Jinniao Qiu. Before moving to Calgary, he worked as a postdoctoral researcher in the Chair for Applied Numerical Analysis at the Technical University of Munich, Germany. Prior to being at TUM he was an Alan Mekler Postdoctoral Fellow in the Department of Mathematics at Simon Fraser University. In 2017, he received his PhD in Mathematics from Tsinghua University. His doctoral dissertation was conducted in consultations with Prof. Jian¬-Guo Liu from Duke University, where he studied as a joint PhD student from 2014 to 2016. His research has been focused on complex dynamical systems and their related kinetic equations.

Read more about Hui Huang on the PIMS Medium blog.

## Finite sample rates for optimal transport estimation problems

The theory of optimal transport (OT) gives rise to distance measures between probability distributions that take the geometry of the underlying space into account. OT is often used in the analysis of point cloud data, for example in domain adaptation problems, computer graphics, and trajectory analysis of single-cell RNA-Seq data. However, from a statistical perspective, straight-forward plug-in estimators for OT distances and couplings suffer from the curse of dimensionality in high dimensions. One way of alleviating this problem is to employ regularized statistical procedures, either by changing the transport objective or exploiting additional structure in the underlying probability distributions or ground truth couplings. In this talk, I will outline the problem and give an overview of recent solution approaches, in particular those employing entropically regularized optimal transport or imposing smoothness assumptions on the ground truth transport map.