Schedule

	Sun 4	Mon 5	Tue 6	Wed 7	Thu 8	Fri 9	Sat 10
9:00 - 10:30		Schölkopf Kernel Methods (1/3)	Schölkopf Kernel Methods (3/3)	Doucet Monte Carlo Methods (1/3)	Green Bayesian Inference (1/3)	Green Bayesian Inference (2/3)	Social day Saint-Emilion or Medoc
11:00 - 12:30		Schölkopf Kernel Methods (2/3)	Schapire Boosting (2/3)	Doucet Monte Carlo Methods (2/3)	Doucet Monte Carlo Methods (3/3)	Green Bayesian Inference (3/3)
15:15 - 16:45		Peters Machine Learning for Robotics (1/3)	Peters Machine Learning for Robotics (2/3)	Peters Machine Learning for Robotics (3/3)	Practical	Practical
17:15 - 18:45		Schapire Boosting (1/3)	Schapire Boosting (3/3)	Munos Reinforcement Learning (1/3)	Munos Reinforcement Learning (2/3)	Munos Reinforcement Learning (3/3)
Evening	Registration		Candès Low-rank modeling		Poster session	Dupoux Early language bootstrapping

	Sun 11	Mon 12	Tue 13	Wed 14	Thu 15	Fri 16	Sat 17
9:00 - 10:30		Teh Bayesian Nonparametrics (1/3)	Teh Bayesian Nonparametrics (3/3)	Vandenberghe Convex Optimization (2/3)	Cesa-Bianchi Learning Theory (2/3)		Departure
11:00 - 12:30		Teh Bayesian Nonparametrics (2/3)	Gribonval Sparse Methods (3/3)	Vandenberghe Convex Optimization (3/3)	Cesa-Bianchi Learning Theory (3/3)	Wainwright Graphical Models (2/3)
15:15 - 16:45		Gribonval Sparse Methods (1/3)	Practical	Practical	Practical	Practical
17:15 - 18:45		Gribonval Sparse Methods (2/3)	Vandenberghe Convex Optimization (1/3)	Cesa-Bianchi Learning Theory (1/3)	Wainwright Graphical Models (1/3)	Wainwright Graphical Models (3/3)
Evening				Poster session	Banquet

Google Calendar for MLSS 2011 Bordeaux.

Lectures

Kernel Methods

Bernhard Schölkopf, MPI Tuebingen

The course will start with basic ideas of machine learning, followed by some elements of learning theory. It will also introduce positive definite kernels and their associated feature spaces, and show how to use them for kernel mean embeddings, SVMs, and kernel PCA.

Download slides

Machine Learning for Robotics

Jan Peters, MPI Tuebingen

These lectures will discuss both challenges and opportunities for a machine learning researchers who are willing to enter the area of robot learning. First, we will discuss the generic problems of the domain of robotics including the core technical challenges, tools that lead to efficient development processes in robot learning, key insights from classical robotics as well as core points of view of the robotics community. Subsequently, we will focus on three core learning problems: (i) Model Learning, (ii) Policy Acquisition and (iii) Robot Self-Improvement. The lecture on Model Learning, we will give an overview on supervised learning problems in robot control which includes both solved and unsolved problems. The lecture on policy acquisition will start with a review on imitation learning with a strong focus on using dynamic systems motor primitives. The lecture on robot self-improvement will highlight the successes in robot reinforcement learning using either optimal control with learned models, value functions approximation or policy search approaches.

We discuss learning on three different levels of abstraction, i.e., learning for accurate control is needed to execute, learning of motor primitives is needed to acquire simple movements, and learning of the task-dependent "hyperparameters" of these motor primitives allows learning of complex tasks. Empirical evaluations on a several robot systems illustrate the effectiveness and applicability to learning control on anthropomorphic robots.

Download slides

Theory and Applications of Boosting

Robert Schapire, Princeton

Boosting is a general method for producing a very accurate classification rule by combining rough and moderately inaccurate "rules of thumb." While rooted in a theoretical framework of machine learning, boosting has been found to perform quite well empirically. This tutorial will focus on the boosting algorithm AdaBoost, and will explain the underlying theory of boosting, including explanations that have been given as to why boosting often does not suffer from overfitting, as well as interpretations based on game theory, optimization, statistics, and maximum entropy. Some practical applications and extensions of boosting will also be described.

Download slides

Monte Carlo Methods

Arnaud Doucet, University of Oxford

We will first review the Monte Carlo principle and standard Monte Carlo methods including rejection sampling, importance sampling and standard Markov chain Monte Carlo (MCMC) methods. We will then discuss more advanced MCMC methods such as adaptive MCMC methods and auxiliary variable methods such as parallel tempering, particle MCMC methods and slice sampling.

Download slides

Introduction to Reinforcement Learning

Rémi Munos, INRIA Lille

Part 1: Introduction to Reinforcement Learning and Dynamic Programming
Settting, Examples
Dynamic programming: value iteration, policy iteration
RL algorithms: Temporal difference, Q-learning.

Part 2: Approximate dynamic programming
Max-norm performance bounds
Sample-based algorithms: Least Squares TD, Bellman Residual, Fitted-Value Iteration

Part 3: Exploration-Exploitation tradeoffs
The stochastic bandit: UCB
The adversarial bandit: EXP3
Populations of bandits: Tree search, Nash equilibrium (Applications to Go, Poker).

Download slides: Part I - Part II - Part III

Bayesian Inference

Peter Green, University of Bristol

Inference is the process of discovering from data about mechanisms that may have caused or generated that data, or at least explain it. The goals are varied - perhaps simply predicting future data, or more ambitiously drawing conclusions about scientific or societal truths. In the language of applied mathematics, these are inverse problems. Bayesian inference is about using probability to do all this. One of its strengths is that all sources of uncertainty in a problem can be simultaneously and coherently considered. It is model-based (in the language of machine learning, these are generative models), and we can use Bayesian methods to choose and criticise the models we use.

My talks will cover some underlying ideas needed from probability, and the basic principles and concepts of Bayesian analysis. I will go on to talk about modelling in principle and practice, and say a little about computing Bayesian inferences. There will be some discussion of subjective and objective theories, and sensitivity to assumptions, and I will conclude with some more substantial applications.

Download slides: Part I - Part II - Part III

Bayesian Nonparametrics

Yee Whye Teh, University College London

Download slides

Sparse Methods for Under-determined Inverse Problems

Rémi Gribonval, INRIA Rennes

Download slides: Part 1 Part 2 Part 3 Part 4

Convex Optimization

Lieven Vandenberghe, UC Los Angeles

The lectures will give an introduction to the theory and applications of convex optimization, and an overview of recent developments in algorithms.
The first lecture will cover the basics of convex analysis, focusing on the results that are most useful for convex modeling, i.e., recognizing and formulating convex optimization problems in applications. We will introduce conic optimization, and the two most widely studied types of conic optimization problems, second-order cone and semidefinite programs. The material will be illustrated with applications to robust optimization, convex relaxations in nonconvex optimization, and convex techniques for sparse optimization.
Lecture 2 will cover interior-point methods for conic optimization, including path-following methods and symmetric primal-dual methods, and the numerical implementation of interior-point methods.
Lecture 3 will focus on first-order algorithms for large-scale convex optimization, including recent developments in the area of proximal gradient methods, and on dual decomposition and multiplier methods.

Download slides

Learning Theory: statistical and game-theoretic approaches

Nicolò Cesa-Bianchi, University of Milan

The theoretical foundations of machine learning have a double nature: statistical and game-theoretic. In this course we take advantage of both paradigms to introduce and investigate a number of basic topics, including mistake bounds and risk bounds, empirical risk minimization, online linear optimization, compression bounds, overfitting and regularization.
The goal of the course is to provide a sound mathematical framework within which one can investigate basic questions in learning theory, such as the dependence of the predictive performance of a model on the complexity of the model class and on the amount of training information.

Graphical Models and message-passing algorithms

Martin Wainwright, UC Berkeley

Download slides

Some recent advances in the theory of low-rank modeling

Emmanuel Candès, Stanford

Inspired by the success of compressive sensing, the last three years have seen an explosion of research in the theory of low-rank modeling. By now, we have results stating that it is possible to recover certain low-rank matrices from a minimal number of entries -- or of linear functionals -- by tractable convex optimization. We further know that these methods are robust vis a vis additive noise and even outliers. In a different direction, researchers have developed computationally tractable methods for clustering high-dimensional data points that are assumed to be drawn from multiple low-dimensional linear subspaces. This talk will survey some exciting results in these areas.

A computational approach to early language bootstrapping

Emmanuel Dupoux, EHESS

Human infants learn spontaneously and effortlessly the language(s) spoken in their environments, despite the extraordinary complexity of the task. In the past 30 years, tremendous progress has been made regarding the empirical investigation of the linguistic achievements of infants during their first two years of life. In that short period of their life, infants learn in an essentially unsupervised fashion the basic building blocks of the phonetics, phonology, lexical and syntactic organization of their native language (see Jusczyk, 1987). Yet, little is known about the mechanisms responsible for such acquisitions. Do infants rely on general statistical inference principles? Do they rely on specialized algorithms devoted to language?
Here, I will present an overview of the early phases of language acquisition and focus on one area where a modeling approach is currently being conducted, using tools of signal processing and automatic speech recognition: the unsupervized acquisition of phonetic categories. It is known that during the first year of life, before they are able to talk, infants construct a detailed representation of the phonemes of their native language and loose the ability to distinguish nonnative phonemic contrasts (Werker & Tees, 1984). It will be shown that the only mechanism that has been proposed so far, that is, unsupervised statistical clustering (Maye, Werker and Gerken, 2002), does not converge on the inventory of phonemes, but rather on contextual allophonic units or subunits (Varadarajan, 2008). An information-theoretic algorithm wil be presented: it groups together allophonic variants based on three sources of information: the statistical distribution of their contexts, the phonetic plausibility of the grouping, and the existence of lexical minimal pairs (Peperkamp et al., 2006; Martin et al, submitted). It is shown that each of the three sources of information can be acquired without presupposing the others. This algorithm is then tested on several natural speech corpora.
The more general proposal is that early language bootrapping does not rely on learning principles necessarily specific to language. What is presumably unique to language though, is the way in which these principles are combined in a particular ways to optimize the emergence of linguistic categories after only a few months of unsupervized exposure to speech signals.

Jusczyk, P. (1997). The discovery of spoken language. Cambridge, MA: MIT Press.
Martin, A., Peperkamp, S., & Dupoux, E. (submitted). Learning phonemes with a pseudo-lexicon.Maye, J., Werker, J., & Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82, B101–B111. Peperkamp, S., Le Calvez, R., Nadal, J.P. and Dupoux, E. (2006). The acquisition of allophonic rules: statistical learning with linguistic constraints. Cognition, 101, B31-B41Varadarajan, B., Khudanpur, S. & Dupoux, E. (2008). Unsupervised Learning of Acoustic Subword Units, in Proceedings of ACL-08: HLT, 165-168. Werker, J.F., & Tees, R.C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7, 49-63.

Practical sessions

Thu 8	Fri 9	Tue 13	Wed 14	Thu 15	Fri 16
		Introduction to reinforcement learning Manuel Lopes		Learning by imitation and inverse reinforcement learning Manuel Lopes
Parametric and nonparametric Bayesian clustering François Caron		(Partially observable) Markov Decision Processes Matthijs Spaan
Introduction to regression and classification Luis Montesano		Gaussian Processes Ruben Martinez-Cantin		Active Learning Ruben Martinez-Cantin
Convex optimization: small-scale problems Mark Schmidt		Convex optimization: large-scale problems Mark Schmidt

Introduction to classification and regression

Luis Montesano, University of Zaragoza

This lab is designed for people who started recently to work in machine learning problems. In the first part of this lab, we will implement from scratch basic classification techniques and apply them to several datasets. In the second part, we will briefly review linear regression and then study and apply LWPR, a non-linear function approximation technique based on multiple local linear regressors.

Convex optimization

Mark Schmidt, INRIA Rocquencourt

Part I: "black-box" software for solving general small-scale problems
Part II: writing custom "white-box" algorithms for solving specific large-scale problems

For both parts, we will use some relevant machine learning problems (such as LASSO, large-scale logistic regression, structured sparsity, conditional random fields, inference in graphical models, structure learning, etc.) as illustrative examples. The lab is as self-contained as possible, so no prior knowledge beyond basic calculus/linear-algebra/probability will be required. Required software: Matlab and CVX for the first part, but the second part can be done in any language.

Supporting material

Parametric and nonparametric Bayesian clustering

François Caron, INRIA Bordeaux

This lab will present model-based Bayesian methods for static and dynamic clustering. The first part will consider parametric Bayesian methods. The second part will consider Bayesian nonparametric methods, based on (dependent) Dirichlet Process Mixtures, for static and dynamic clustering.

(Inverse) Reinforcement learning

Manuel Lopes, INRIA Bordeaux

This lab will present an hands-on (using matlab) on markov decision processes (markov processes, mdps, model estimation) and standard methods to solve them (VI, Q-Learing, ...). A second, more adavanced part will address inverse reinforcement learning and apprentiship learning.

(Partially observable) Markov Decision Processes

Matthijs Spaan, Instituto Superior Tecnico Lisbon

Decision making is an important skill of autonomous agents. This lab course focuses on decision making under uncertainty in sensing and acting, common in many real-world systems. The associated planning problems can be formalized as partially observable Markov decision processes (POMDPs). This lab course will provide an introduction to the POMDP model and its solution methods, and the practical part will consists of implementing and testing different MDP and POMDP solving techniques. Finally, attention will be given to the problem of
decision making under uncertainty with multiple, interacting agents.

Gaussians Processes and Active Learning

Ruben Martinez-Cantin, University of Zaragoza

During the first part of the lab, we will overview some of the capabilities and methods to do learning and inference using Gaussian processes (GPs). A second, more advanced, part of the lab will cover the problems of active learning, experimental design, Bayesian optimization and submodular optimization based on GPs. During both labs, we will illustrate each part with realistic learning problems such as: robot kinematics, handwritten digits recognition or optimal sensor placement. Required software: Matlab or Octave.

Supporting material