MLSS 2011, Bordeaux,
France September 4-17, 2011 |

Home Schedule Application Practical Information Organizers |

## Schedule
- Google Calendar for MLSS 2011 Bordeaux.
## Lectures## Kernel MethodsBernhard Schölkopf, MPI TuebingenThe
course will start with basic ideas of machine learning, followed by
some elements of learning theory. It will also introduce positive
definite kernels and their associated feature spaces, and show how to
use them for kernel mean embeddings, SVMs, and kernel PCA. Download slides ## Machine Learning for RoboticsJan Peters, MPI TuebingenThese
lectures will discuss both challenges and opportunities for a machine
learning researchers who are willing to enter the area of robot
learning. First, we will discuss the generic problems of the domain of
robotics including the core technical challenges, tools that lead to
efficient development processes in robot learning, key insights from
classical robotics as well as core points of view of the robotics
community. Subsequently, we will focus on three core learning problems:
(i) Model Learning, (ii) Policy Acquisition and (iii) Robot
Self-Improvement. The lecture on Model Learning, we will give an
overview on supervised learning problems in robot control which includes
both solved and unsolved problems. The lecture on policy acquisition
will start with a review on imitation learning with a strong focus on
using dynamic systems motor primitives. The lecture on robot
self-improvement will highlight the successes in robot reinforcement
learning using either optimal control with learned models, value
functions approximation or policy search approaches. We discuss learning on three different levels of abstraction, i.e., learning for accurate control is needed to execute, learning of motor primitives is needed to acquire simple movements, and learning of the task-dependent "hyperparameters" of these motor primitives allows learning of complex tasks. Empirical evaluations on a several robot systems illustrate the effectiveness and applicability to learning control on anthropomorphic robots. Download slides ## Theory and Applications of BoostingRobert Schapire, PrincetonBoosting is a general method for producing a very accurate
classification rule by combining rough and moderately inaccurate
"rules of thumb." While rooted in a theoretical framework of
machine learning, boosting has been found to perform quite well
empirically. This tutorial will focus on the boosting algorithm
AdaBoost, and will explain the underlying theory of boosting,
including explanations that have been given as to why boosting
often does not suffer from overfitting, as well as
interpretations based on game theory, optimization, statistics,
and maximum entropy. Some practical applications and extensions
of boosting will also be described. Download slides ## Monte Carlo MethodsArnaud Doucet, University of OxfordWe
will first review the Monte Carlo principle and standard Monte Carlo
methods including rejection sampling, importance sampling and standard
Markov chain Monte Carlo (MCMC) methods. We will then discuss more
advanced MCMC methods such as adaptive MCMC methods and auxiliary
variable methods such as parallel tempering, particle MCMC methods and
slice sampling.
Download slides ## Introduction to Reinforcement LearningRémi Munos, INRIA LillePart 1: Introduction to Reinforcement Learning and Dynamic Programming Settting, Examples Dynamic programming: value iteration, policy iteration RL algorithms: Temporal difference, Q-learning. Part 2: Approximate dynamic programming Max-norm performance bounds Sample-based algorithms: Least Squares TD, Bellman Residual, Fitted-Value Iteration Part 3: Exploration-Exploitation tradeoffs The stochastic bandit: UCB The adversarial bandit: EXP3 Populations of bandits: Tree search, Nash equilibrium (Applications to Go, Poker). Download slides: Part I - Part II - Part III ## Bayesian InferencePeter Green, University of BristolInference is the process of discovering from data about mechanisms
that may have caused or generated that data, or at least explain
it. The goals are varied - perhaps simply predicting future data,
or more ambitiously drawing conclusions about scientific or
societal truths. In the language of applied
mathematics, these are inverse problems. Bayesian
inference is about using probability to do all this. One of its
strengths is that all sources of uncertainty in a problem can be
simultaneously and coherently considered. It is
model-based (in the language of machine learning, these
are generative models), and we can use Bayesian methods
to choose and criticise the models we use. My talks will cover some underlying ideas needed from probability, and the basic principles and concepts of Bayesian analysis. I will go on to talk about modelling in principle and practice, and say a little about computing Bayesian inferences. There will be some discussion of subjective and objective theories, and sensitivity to assumptions, and I will conclude with some more substantial applications. Download slides: Part I - Part II - Part III ## Bayesian NonparametricsYee Whye Teh, University College LondonDownload slides ## Sparse Methods for Under-determined Inverse ProblemsRémi Gribonval, INRIA RennesDownload slides: Part 1 Part 2 Part 3 Part 4 ## Convex OptimizationLieven Vandenberghe, UC Los AngelesThe lectures will give an introduction to the theory and
applications of convex optimization, and an overview of recent developments in algorithms. The first lecture will cover the basics of convex analysis, focusing on the results that are most useful for convex modeling, i.e., recognizing and formulating convex optimization problems in applications. We will introduce conic optimization, and the two most widely studied types of conic optimization problems, second-order cone and semidefinite programs. The material will be illustrated with applications to robust optimization, convex relaxations in nonconvex optimization, and convex techniques for sparse optimization. Lecture 2 will cover interior-point methods for conic optimization, including path-following methods and symmetric primal-dual methods, and the numerical implementation of interior-point methods. Lecture 3 will focus on first-order algorithms for large-scale convex optimization, including recent developments in the area of proximal gradient methods, and on dual decomposition and multiplier methods. Download slides ## Learning Theory: statistical and game-theoretic approachesNicolò Cesa-Bianchi, University of MilanThe theoretical foundations of
machine learning have a double nature: statistical and game-theoretic.
In this course we take advantage of both paradigms to introduce and
investigate a number of basic topics, including mistake bounds and risk
bounds, empirical risk minimization, online linear optimization,
compression bounds, overfitting and regularization. The goal of the course is to provide a sound mathematical framework within which one can investigate basic questions in learning theory, such as the dependence of the predictive performance of a model on the complexity of the model class and on the amount of training information. ## Graphical Models and message-passing algorithmsMartin Wainwright, UC BerkeleyDownload slides ## Some recent advances in the theory of low-rank modelingEmmanuel Candès, StanfordInspired
by the success of compressive sensing, the last three years have seen
an explosion of research in the theory of low-rank modeling. By now, we
have results stating that it is possible to recover certain low-rank
matrices from a minimal number of entries -- or of linear functionals
-- by tractable convex optimization. We further know that these methods
are robust vis a vis additive noise and even outliers. In a different
direction, researchers have developed computationally tractable methods
for clustering high-dimensional data points that are assumed to be
drawn from multiple low-dimensional linear subspaces. This talk will
survey some exciting results in these areas. ## A computational approach to early language bootstrappingEmmanuel Dupoux, EHESS
Human infants learn
spontaneously and effortlessly the language(s) spoken in their
environments, despite the extraordinary complexity of the task. In the
past 30 years, tremendous progress has been made regarding the empirical
investigation of the linguistic achievements of infants during their
first two years of life. In that short period of their life, infants
learn in an essentially unsupervised fashion the basic building blocks
of the phonetics, phonology, lexical and syntactic organization of their
native language (see Jusczyk, 1987). Yet, little is known about the
mechanisms responsible for such acquisitions. Do infants rely on general
statistical inference principles? Do they rely on specialized
algorithms devoted to language?
Here, I will present an overview of the early phases of language acquisition and focus on one area where a modeling approach is currently being conducted, using tools of signal processing and automatic speech recognition: the unsupervized acquisition of phonetic categories. It is known that during the first year of life, before they are able to talk, infants construct a detailed representation of the phonemes of their native language and loose the ability to distinguish nonnative phonemic contrasts (Werker & Tees, 1984). It will be shown that the only mechanism that has been proposed so far, that is, unsupervised statistical clustering (Maye, Werker and Gerken, 2002), does not converge on the inventory of phonemes, but rather on contextual allophonic units or subunits (Varadarajan, 2008). An information-theoretic algorithm wil be presented: it groups together allophonic variants based on three sources of information: the statistical distribution of their contexts, the phonetic plausibility of the grouping, and the existence of lexical minimal pairs (Peperkamp et al., 2006; Martin et al, submitted). It is shown that each of the three sources of information can be acquired without presupposing the others. This algorithm is then tested on several natural speech corpora. The more general proposal is that early language bootrapping does not rely on learning principles necessarily specific to language. What is presumably unique to language though, is the way in which these principles are combined in a particular ways to optimize the emergence of linguistic categories after only a few months of unsupervized exposure to speech signals. Jusczyk, P. (1997). The discovery of spoken language. Cambridge, MA: MIT Press. Martin, A., Peperkamp, S., & Dupoux, E. (submitted). Learning phonemes with a pseudo-lexicon.Maye, J., Werker, J., & Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82, B101–B111. Peperkamp, S., Le Calvez, R., Nadal, J.P. and Dupoux, E. (2006). The acquisition of allophonic rules: statistical learning with linguistic constraints. Cognition, 101, B31-B41Varadarajan, B., Khudanpur, S. & Dupoux, E. (2008). Unsupervised Learning of Acoustic Subword Units, in Proceedings of ACL-08: HLT, 165-168. Werker, J.F., & Tees, R.C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7, 49-63. ## Practical sessions
## Introduction to classification and regressionLuis Montesano, University of ZaragozaThis lab is
designed for people who started recently to work in machine learning
problems. In the first part of this lab, we will implement from scratch
basic classification techniques and apply them to several datasets. In
the second part, we will briefly review linear regression and then study
and apply LWPR, a non-linear function approximation technique based on
multiple local linear regressors. ## Convex optimizationMark Schmidt, INRIA RocquencourtPart I: "black-box" software for solving general small-scale problems Part II: writing custom "white-box" algorithms for solving specific large-scale problems For both parts, we will use some relevant machine learning problems (such as LASSO, large-scale logistic regression, structured sparsity, conditional random fields, inference in graphical models, structure learning, etc.) as illustrative examples. The lab is as self-contained as possible, so no prior knowledge beyond basic calculus/linear-algebra/probability will be required. Required software: Matlab and CVX for the first part, but the second part can be done in any language. Supporting material ## Parametric and nonparametric Bayesian clusteringFrançois Caron, INRIA BordeauxThis lab will present model-based Bayesian methods for static and dynamic clustering. The first part will consider parametric Bayesian methods. The second part will consider Bayesian nonparametric methods, based on (dependent) Dirichlet Process Mixtures, for static and dynamic clustering. ## (Inverse) Reinforcement learningManuel Lopes, INRIA BordeauxThis
lab will present an hands-on (using matlab) on markov decision
processes (markov processes, mdps, model estimation) and standard
methods to solve them (VI, Q-Learing, ...). A second, more adavanced
part will address inverse reinforcement learning and apprentiship
learning. ## (Partially observable) Markov Decision ProcessesMatthijs Spaan, Instituto Superior Tecnico LisbonDecision
making is an important skill of autonomous agents. This lab course
focuses on decision making under uncertainty in sensing and acting,
common in many real-world systems. The associated planning problems can
be formalized as partially observable Markov decision processes
(POMDPs). This lab course will provide an introduction to the POMDP
model and its solution methods, and the practical part will consists of
implementing and testing different MDP and POMDP solving techniques.
Finally, attention will be given to the problem of decision making under uncertainty with multiple, interacting agents. ## Gaussians Processes and Active LearningRuben Martinez-Cantin, University of ZaragozaDuring the first part of the lab, we will overview some of the
capabilities and methods to do learning and inference using Gaussian
processes (GPs). A second, more advanced, part of the lab will cover the
problems of active learning, experimental design, Bayesian optimization
and submodular optimization based on GPs. During both labs, we will
illustrate each part with realistic learning problems such as: robot
kinematics, handwritten digits recognition or optimal sensor placement.
Required software: Matlab or Octave.
Supporting material | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||