Applied Deep Learning

 

Download the code

In the Apress respository you can find the code I used for the book and additional material that will help you understanding the concepts explained in the book. The repository is free and can be accessed by anyone. I am adding continously material, notebooks with exercises and material to expand the book content.

SOME NUMBERS

PAGES

CHAPTERS

COMPLETE USE CASES DESCRIBED

IMAGES

BOOK PHILOSOPHY
Why write a book on applied deep learning? After all, try a google search on the subject and you will be overwhelmed by the huge number of results. The problem is that there is no course, blog or book that teaches in a consolidated and beginner friendly way advanced subjects like regularization, advanced optimisers as Adam or RMSProp, mini-batches gradient descent, dynamical learning rate decay, dropout, hyperparameter search, bayesian optimisation, metric analysis and so on.

I found material (and typically of very bad quality) only to implement very basic models on very simple datasets. If you want to learn how to classify the MNIST (hand written digits) dataset of 10 digits you are in luck (almost everyone with a blog havs done that, mostly copying the code you find on the tensorflow website). Searching something else to learn how logistic regression works? Not so easy. How to prepare a dataset to perform an interesting binary classification? Even more difficult.

I felt the need of filling this gap. I spent hours trying to debug models for reasons as dumb as having the labels wrong: instead of 0 and 1 I had 1 and 2, but no blog warned me about that. Is important to do a proper metric analysis when developing your models, but nobody is teaching you how (at least not on easy to access material). This gap needed to be filled. I find that covering more complex examples from data preparation to error analysis is a very efficient and fun way to learn the right techniques. In this book, I will always cover complete and complex examples to explain concepts that are not so easy to understand in any other way. It is not possible to understand why it is important to choose the right learning rate if you don’t see what can happen when you select the wrong value for example. Note that the goal of this course is not to make you a Python or tensorflow expert, or someone that can develop new complex algorithms. Python and tensorflow are simply tools that are very well suited to develop models and get results quickly. Therefore, I use them. I could have used other tools, but those are the ones mostly used by practitioners, so it makes sense to choose them.

DETAILED CONTENT
  • Computational Graphs, Introduction to tensorflow („construction“ and „evaluation“ Phase)
  • Linear Regression with Tensorflow
  • Python Environment Setup, development of linear Regression Example in tensorflow
  • Network with One Neuron
  • Logistic and linear Regression with 1 Neuron
  • Preparation of a real dataset
  • Neural Networks with many layers
  • Overfitting concept explanation
  • Weights initialisation (Xavier and He)
  • Gradient descent algorithm
  • Dynamical learning rate decay
  • Optimizers (Momentum, RMSProp, Adam)
  • Computational Graphs, Introduction to tensorflow („construction“ and „evaluation“ Phase)
  • Linear Regression with Tensorflow
  • Python Environment Setup, development of linear Regression Example in tensorflow
  • Network with One Neuron
  • Logistic and linear Regression with 1 Neuron
  • Preparation of a real dataset
  • Neural Networks with many layers
  • Overfitting concept explanation
  • Weights initialisation (Xavier and He)
  • Gradient descent algorithm
  • Dynamical learning rate decay
  • Optimizers (Momentum, RMSProp, Adam)
UNIVERSITY COURSE

Why offer a course on applied deep learning? After all, try a google search on the subject and you will be overwhelmed by the huge number of results. The problem is that there is no course, blog or book that teaches in a consolidated and beginner friendly way advanced subjects like regularization, advanced optimisers as Adam or RMSProp, mini-batches gradient descent, dynamical learning rate decay, dropout, hyperparameter search, bayesian optimisation, metric analysis and so on.

I found material (and typically of very bad quality) only to implement very basic models on very simple datasets. If you want to learn how to classify the MNIST (hand written digits) dataset of 10 digits you are in luck (almost everyone with a blog havs done that, mostly copying the code you find on the tensorflow website). Searching something else to learn how logistic regression works? Not so easy. How to prepare a dataset to perform an interesting binary classification? Even more difficult.

I felt the need of filling this gap. I spent hours trying to debug models for reasons as dumb as having the labels wrong: instead of 0 and 1 I had 1 and 2, but no blog warned me about that. Is important to do a proper metric analysis when developing your models, but nobody is teaching you how (at least not on easy to access material). This gap needed to be filled. I find that covering more complex examples from data preparation to error analysis is a very efficient and fun way to learn the right techniques. In this course, I will always cover complete and complex examples to explain concepts that are not so easy to understand in any other way.

It is not possible to understand why it is important to choose the right learning rate if you don’t see what can happen when you select the wrong value for example. Note that the goal of this course is not to make you a Python or tensorflow expert, or someone that can develop new complex algorithms. Python and tensorflow are simply tools that are very well suited to develop models and get results quickly. Therefore, I use them. I could have used other tools, but those are the ones mostly used by practitioners, so it makes sense to choose them.

The goal of this course is to let you see more advanced material with new eyes. I cover the mathematical background as much as I can because I feel it is necessary for a complete comprehension of the difficulties and reasoning behind many concepts. You cannot understand why a big learning rate will make your model (strictly speaking the cost function) diverge, if you don’t know how the gradient descent algorithm works mathematically. In all real-life projects, you will not have to calculate partial derivatives or complex sums, but you need to understand them to be able to evaluate what can work and what cannot (and especially why).

This course is structured in 14 weeks (see above). Each lesson is divided in a theory part and a lab part, where we will work on Jupyter Notebooks together to try to implement and apply what we learnt in the theory part. The lab sessions will not give you enough time to develop really complex models, so optional homework assignments are prepared to give you a chance of really trying something more complex. The last 3 weeks are planned for work on an end course project that we will define together with the ETH spin-off 4quant, working on medical images. At the end of the 3 weeks each groups will have a chance to present their results and ideas, and the 3 best projects will get a certificate. The projects will be judged by Umberto Michelucci, Thomas Ott and Kevin Mader (the CTO and co-founder of 4quant). We will work with kaggle inclass (https://www.kaggle.com/about/inclass/overview) and we will structure the project as a competition using kaggle. Not only the numerical results will be judged, but also other aspects as research design, clear result explanations and so on.

The code is developed in Python and we will use the library tensorflow to develop our models. The course is not specifically on tensorflow, but on deep learning and neural networks. So we will not look at how to use special features of tensorflow, or to other derived libraries as Keras, but we will use the library to implement the algorithm from scratch as much as we can to really understand how they work.

WHERE YOU CAN FIND MORE…

On the book github repository you can find code, information, bugs and much more related to the book. The repository at the beginning will be empty and therefore have patience. As soon as more material is ready it will appear there.

ABOUT ME

I am Umberto Michelucci. I studied theoretical Physics and Mathematics since young age. A graduation with the maximum of grades “cum laude” kicked-off my scientific life. Always enjoying the challenge of big and difficult problems, especially using the marvellous laboratories that are computers, I did several years of research work first in the United States at the George Washington University and then in Germany at the Augsburg University working in areas from laser entrapment of atoms to high temperature superconductivity.
I felt science should be applied to the real world and decided to start working for clients in the data-warehousing and data management branch. I helped big companies developing complex solutions to be able to use their data efficiently and discover new insights.

I got more and more interested in how machines can help us in solving problems that seems impossible to solve. Prediction, learning, AI were areas that picked my interest more and more.

Research in data science and machine learning enrich now my life (professional and non). I believe we are facing a revolution. We have problems that are so complex that we don’t have a chance to solve. We need to work to find a new paradigm in how we approach the real world and how we can make it better.

The science we know so far is not enough.

MEET ME

In case you are interested in talking to me, don’t hesitate to get in touch using the contact from

UMBERTO MICHELUCCI

Author, Lecturer