View on GitHub

INSEAD PhD Course:
Foundations of Machine Learning and AI

T. Evgeniou
Professor of Decision Sciences and Technology Management, INSEAD

N. Vayatis
Professor, Ecole Normale Superieure Paris-Saclay




"Another thing I must point out is that you cannot prove a vague theory wrong. [...] Also, if the process of computing the consequences is indefinite, then with a little skill any experimental result can be made to look like the expected consequences."

Richard Feynman

"I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk."

Enrico Fermi


Course Description

AI and Machine Learning have become central topics of discussion in the popular press after being developed for over 50 years in Academia - by computer scientists and, in more recent years, by mathematicians and statisticians. These fields are expected to have a major impact in potentially every aspect of research as well as business: from basic science fields such as life sciences, to Decision Sciences, Finance, but also areas like Sociology, Economics, and other Social Sciences.

However, while one can be a "reasonable" user of some popular machine learning and AI methods, gaining an edge in terms of innovation in research and practice but also taking full advantage of the capabilities offered by these technologies requires a more fundamental understanding of the principles behind these booming fields.

The goal of this course is to:

The course will be run as a combination of lectures, discussions of important papers, exercises, coding (in R or Python), and a class project. Participants are required to have knowledge of the core Probability and Statistics (I and II) courses.


Recommended books

While we will not follow any specific book, the following books are some of the "classics"" in the field. We will also use a few chapters from them.


V. N. Vapnik, Statistical Learning Theory, Wiley, 1998.
L. Devroye, L. Gyorfi, G. Lugosi, A Probabilistic Theory of Pattern Recognition, Springer, 1996.
T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, 2nd Ed., Springer, 2009.
V. K. Ivanov, V. V. Vasin, V. P. Tanana, Theory of Linear Ill-posed Problems and Its Applications, 1978 (revised version 2002).
T. Cover and J. Thomas, Elements of Information Theory, 1991 (revised version 2002).

These are some other, more recent books:

C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, The MIT Press, 2006 (another approach to Machine Learning).
I. Goodfellow and Y. Bengio and A. Courville, Deep Learning, The MIT Press, 2016.

Grading

20% Class Participation and Paper Presentation

30% Exercises: two exercise sets, combining mathematical and hands-on application exercises

50% Class Project: "Develop Your Own Machine Learning Method and Share the Code on Github".

Class Project: You will need to work either alone or at most with one more colleague on the following project:

Course Sessions

Sessions 1-2: Introduction and Set Up: AI and the Machine Learning Problem

In this session we will first provide a brief history of AI and Machine Learning, and outline the fundamental problems these fields aim to solve. We will then shift to the theoretical foundations of Machine Learning and provide an overview of the field, of some popular machine learning methods, of application of Machine Learning and AI, as well as a summary of this course. Main concepts: Symbolic AI, Connectionism, Statistical Learning, Approximation Theory, Bias-Variance, Empirical Risk Minimization, Hypothesis Spaces, Loss Functions, Generalization Error, Learnability, Consistency Properties. Background Readings:
FMLAI General Introduction Handouts
Sessions 1-2 Handouts
Exercise Set 1 (to prepare before Sessions 5-6)
An Interview with Vladimir Vapnik

Sessions 3-4: From Classical Statistics to Machine Learning

In this session we will develop and analyze some of the most common machine learning methods that are also the closest to classical statistical/econometric methods. We will also discuss about relations between Machine Learning and other important fields such as optimization theory, regularization theory for ill-posed problems, and signal processing. Main concepts: Regularization theory, Ridge Regression, Lasso, Support Vector Machines, Kernels, Sparsity, Model Selection, Cross-Validation, Matrix Completion, Recommender Systems. Background Readings:
Sessions 3-4 Handouts

Sessions 5-6: Data Representations, Feature Learning, and Applications

In this session we will revisit the problem of machine learning, this time from the point of view of finding good data ("world") representations. We will revisit and discuss topics like sparse representations, kernels, and learning data representations using deep learning methods. We will then discuss a number of applications of machine learning, ranging from text mining to time series prediction and analysis of network and graph data. Main concepts: Sparsity, Variable Selection, Feature Learning, Kernels, Sparse PCA, Low Rank Representations, Dictionary Learning, Text Mining, Time Series, Network Data. Background Readings:
Sessions 5-6 Handouts

Sessions 7-8: Deep Learning and Recent Mysteries in AI

In this session we will discuss some of the most common Deep Learning methods, and also touch upon some current open problems in Machine Learning and AI. A more general framework of machine learning and AI will also be discussed, and some recent applications of these tools will be presented. Main concepts: Perceptron, Feed-forward Neural Networks, Convolutional Neural Networks, Stochastic Gradient Descent, Back-propagation, Hierarchical Learning, Feature Learning. Background Readings:
Sessions 7-8 Handouts
A lecture on Theories of Deep Learning, by Tomaso Poggio
Exercise Set 2 (Due Sessions 13-14): Explore the website of the course Data Science for Business and work on the assignment 2 in that course (under Sessions 5-6) called Credit Card Default.

Sessions 9-10: Ensemble Methods and Other Algorithms

In this session we will discuss some well known approaches to combining machine learning methods. Combinations of methods, much like combinations of diverse expert opinions, is known to improve the accuracy of models/groups. We will discuss some theoretical underpinnings of ensemble methods as well as some further machine learning methods such as Classification and Regression Trees, Random Forests, Bagging and Boosting, and Neural Networks. We will also start exploring machine learning software packages. Main concepts: Bagging, Boosting, Random Forests, Boosted Trees, Neural Networks. Background Readings:
Sessions 9-10 Handouts

Sessions 11-12: Theoretical Foundations of Machine Learning

In this session we will introduce the main mathematical tools and intuitions that can help us better understand why and when machine learning methods work. We will also discuss some of the main theorems that explain the predictive performance of machine learning methods. It is these theorems, together with advances in computing power, storage, and availability of (big) data, which led to the recent important breakthroughs of AI and Machine Learning in all scientific and business areas. Main concepts: Concentration Inequalities, Complexity Measures, Learning Rates and bounds, VC-dimension, Structural Risk Minimization, Stability, Rademacher Complexity, Estimation and Generalization/Prediction Error, Approximation Theory. Background Readings:
Sessions 11-12 Handouts
Exercise Set 3 (Optional)

Sessions 13-14: Other Topics and Paper Presentations

In this session participants will present a number of papers that will be selected during the course. We will also discuss other topics not covered in this course. More online resources will be shared during the course. Participants are also expected to contribute some of these resources on the course website throughout the course. Example concepts: Deep Reinforcement Learning, Fairness in AI, Independent Component Analysis, Generative Adversarial Networks, Compressed Sensing, Random Matrix Theory, Wavelets, High Dimensional Statistics, Information Theory, Compression, Gaussian Processes, Graphical Models, Approximation Theory, Splines, Reproducing Kernel Hilbert Spaces, Bootstrap, Clustering, Matrix Estimation, Matrix Completion, Low Rank, Active Learning, Experimental Design, Change Point Detection, Natural Language Processing, Text Mining, etc. Example papers:

Some other articles on the broader topic of "Humans and Machines"