THE COURSE

I am offering together with the ZHAW a university course on deep learning based on the book that is open to everyone

TARGET AUDIENCE

This course is for people who are willing to learn what deep learning is and are not scared by a challenge. You should be someone who has some background in programming (not necessarily in Python) and some background in Mathematics. If you are a beginner is fine. We will cover the basics you need, but you may need to work a bit more on your own during the classroom lectures. I will try to help you and direct you toward the right material as much as I can. In our two introductory weeks we will also cover what we need in term of mathematics and Python. This course is for you also if you are an experienced data scientist but has not worked with neural networks before. You will be able to focus on the mode specific deep learning topics and not waste time with the basics, using the time to dig deeper in the topics explained. I am happy to help you go deeper that what we will be able to cover in the hours we have during the lectures.

COURSE FORMAT

Advanced training course with certificate. No credits (ECTS).
Optional homework assignments.
Certificate to the best 3 end course projects.
Course language: English (support in lab sessions can also be given in German)

The course is held in the late afternoon (17:15-20:00) to make it accessible to people with full-time jobs and is held once a week.

DURATION AND PRICE

2 weeks: preparatory lessons (3 hrs a week): optional
9 weeks lessons (3 hrs a week): at least 75% presence required
3 Weeks for project (3 hrs a week): optional with extra certificate

Normal Price: 760 CHF (including course book)
Student Price (with proof): 360 CHF (including course book)
Number of places is limited.

The course will be held in Zürich from the 23.10 to mid February. More details on the registration page.

DETAILED COURSE CONTENT

Detailed Content

Review of Python and in particular numpy and its philosophy.
Matplotlib and visualisation.
Review of linear algebra, matrix multiplications, inverse, element-wise multiplication,
Computational Graphs, Introduction to tensorflow („construction“ and „evaluation“ Phase)
Linear Regression with Tensorflow
Python Environment Setup, development of linear Regression Example in tensorflow
Network with One Neuron
Logistic and linear Regression with 1 Neuron
Preparation of a real dataset
Neural Networks with many layers
Overfitting concept explanation

Weights initialisation (Xavier and He)
Gradient descent algorithm
Dynamical learning rate decay
Optimizers (Momentum, RMSProp, Adam)
Regularisation: L1, L2 und Dropout.
Metric analysis
Explanation of why we need train, dev and test datasets
How to split datasets in the deep learning context
Strategies to solve and identify different dataset problems (overfitting, data from different sources or distributions, etc.)
Hyperparameter Tuning
Grid Search
Random Search
Bayesian Optimization
Coarse to fine optimization
Prameter search on a logarithmic scale

DETAILED COURSE DESCRIPTION

This course offers a case-based introduction on the basis of the book

U. Michelucci, Applied Deep Learning: A Case-Based Approach to Understanding Deep Neural Networks, APRESS, ISBN: 978-1-4842-3789-2

Umberto Michelucci about this course:
Why offer a course on applied deep learning? After all, try a google search on the subject and you will be overwhelmed by the huge number of results. The problem is that there is no course, blog or book that teaches in a consolidated and beginner friendly way advanced subjects like regularization, advanced optimisers as Adam or RMSProp, mini-batches gradient descent, dynamical learning rate decay, dropout, hyperparameter search, bayesian optimisation, metric analysis and so on.

I found material (and typically of very bad quality) only to implement very basic models on very simple datasets. If you want to learn how to classify the MNIST (hand written digits) dataset of 10 digits you are in luck (almost everyone with a blog havs done that, mostly copying the code you find on the tensorflow website). Searching something else to learn how logistic regression works? Not so easy. How to prepare a dataset to perform an interesting binary classification? Even more difficult.

I felt the need of filling this gap. I spent hours trying to debug models for reasons as dumb as having the labels wrong: instead of 0 and 1 I had 1 and 2, but no blog warned me about that. Is important to do a proper metric analysis when developing your models, but nobody is teaching you how (at least not on easy to access material). This gap needed to be filled. I find that covering more complex examples from data preparation to error analysis is a very efficient and fun way to learn the right techniques. In this course, I will always cover complete and complex examples to explain concepts that are not so easy to understand in any other way.

It is not possible to understand why it is important to choose the right learning rate if you don’t see what can happen when you select the wrong value for example. Note that the goal of this course is not to make you a Python or tensorflow expert, or someone that can develop new complex algorithms. Python and tensorflow are simply tools that are very well suited to develop models and get results quickly. Therefore, I use them. I could have used other tools, but those are the ones mostly used by practitioners, so it makes sense to choose them.

The goal of this course is to let you see more advanced material with new eyes. I cover the mathematical background as much as I can because I feel it is necessary for a complete comprehension of the difficulties and reasoning behind many concepts. You cannot understand why a big learning rate will make your model (strictly speaking the cost function) diverge, if you don’t know how the gradient descent algorithm works mathematically. In all real-life projects, you will not have to calculate partial derivatives or complex sums, but you need to understand them to be able to evaluate what can work and what cannot (and especially why).

This course is structured in 14 weeks (see above). Each lesson is divided in a theory part and a lab part, where we will work on Jupyter Notebooks together to try to implement and apply what we learnt in the theory part. The lab sessions will not give you enough time to develop really complex models, so optional homework assignments are prepared to give you a chance of really trying something more complex. The last 3 weeks are planned for work on an end course project that we will define together with the ETH spin-off 4quant, working on medical images. At the end of the 3 weeks each groups will have a chance to present their results and ideas, and the 3 best projects will get a certificate. The projects will be judged by Umberto Michelucci, Thomas Ott and Kevin Mader (the CTO and co-founder of 4quant). We will work with kaggle inclass (https://www.kaggle.com/about/inclass/overview) and we will structure the project as a competition using kaggle. Not only the numerical results will be judged, but also other aspects as research design, clear result explanations and so on.

The code is developed in Python and we will use the library tensorflow to develop our models. The course is not specifically on tensorflow, but on deep learning and neural networks. So we will not look at how to use special features of tensorflow, or to other derived libraries as Keras, but we will use the library to implement the algorithm from scratch as much as we can to really understand how they work.