Bibliographie commentée

Livres

Réseaux de neurones

Les livres de M. A. Nielsen [Nie15] (gratuit), de E. Charniak [Cha19], de F. Fleuret [Fle24] (gratuit) et de C.-A. Azencott [Aze22] (en français, avec une version gratuite en PDF) fournissent une très bonne introduction aux réseaux de neurones et au deep learning.

Cours

  • Cours « Speech and natural language processing » du Master MVA (Mathématiques, Vision, Apprentissage) de l’ENS Paris-Saclay [BCW+24]

  • Cours « Deep Learning » de François Fleuret à l’Université de Genève [Fle22]

  • Cours « Natural Language Processing with Deep Learning » de Stanford (CS224N) [cou24]

  • Cours « Apprendre les langues aux machines » de Benoît Sagot au Collège de France, dans le cadre de la chaire annuelle « Informatique et sciences numériques » [Sag23]

Tutoriels

Neural Networks: Zero to Hero

Andrej Karpathy est un ingénieur travaillant sur les réseaux de neurones. Il met en ligne de nombreuses ressources éducatives sur le sujet, comme une introduction d’une heure sur les LLM: Intro to Large Language Models [AndrejKarpathy23c] ou une série de tutoriels très complets pour construire des LLM, Neural Networks: Zero to Hero [Kar], dont voici la liste:

Suivre ces tutoriels et apprendre la construction « from scratch » d’architecture de réseaux de neurones pour faire du NLP est une expérience très enrichissante. Comme le dit Andrej:

These 94 lines of code are everything that is needed to train a neural network. Everything else is just efficiency.

This is my earlier project Micrograd. It implements a scalar-valued auto-grad engine. You start with some numbers at the leafs (usually the input data and the neural network parameters), build up a computational graph with operations like + and * that mix them, and the graph ends with a single value at the very end (the loss). You then go backwards through the graph applying chain rule at each node to calculate the gradients. The gradients tell you how to nudge your parameters to decrease the loss (and hence improve your network).

Sometimes when things get too complicated, I come back to this code and just breathe a little. But ok ok you also do have to know what the computational graph should be (e.g. MLP -> Transformer), what the loss function should be (e.g. autoregressive/diffusion), how to best use the gradients for a parameter update (e.g. SGD -> AdamW) etc etc. But it is the core of what is mostly happening.

The 1986 paper from Rumelhart, Hinton, Williams that popularized and used this algorithm (backpropagation) for training neural nets [RHW86], micrograd on Github [Kar24] and my (now somewhat old) YouTube video where I very slowly build and explain: [AndrejKarpathy22f]

—Andrej Karpathy sur X, juin 2024.

Références

[cou24]

Natural Language Processing with Deep Learning (CS224N). 2024. URL: https://web.stanford.edu/class/cs224n/.

[Aze22]

Chloé-Agathe Azencott. Introduction au Machine Learning. Dunod, second edition, 2022. URL: https://www.dunod.com/sciences-techniques/introduction-au-machine-learning-1.

[BCW+24]

Rachel Bawden, Chloé Clavel, Guillaume Wisniewski, Benoît Sagot, and Djamé Seddah. Cours du MVA "Speech and Natural Language Processing". 2024. URL: https://github.com/rbawden/MVA_2024_SL.

[Cha19]

Eugene Charniak. Introduction to Deep Learning. The MIT Press, 2019. URL: https://mitpress.mit.edu/9780262039512/introduction-to-deep-learning/.

[Fle22]

François Fleuret. Deep Learning Course. 2022. URL: https://fleuret.org/dlc/.

[Fle24]

François Fleuret. The Little Book of Deep Learning. May 2024. URL: https://fleuret.org/francois/lbdl.html.

[Kar]

Andrej Karpathy. Neural Networks: Zero to Hero. URL: http://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ.

[Kar24]

Andrej Karpathy. Micrograd. June 2024. URL: https://github.com/karpathy/micrograd.

[Nie15]

Michael A. Nielsen. Neural Networks and Deep Learning. Determination Press, 2015. URL: http://neuralnetworksanddeeplearning.com.

[RHW86]

David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning representations by back-propagating errors. Nature, 323(6088):533–536, October 1986. URL: https://doi.org/10.1038/323533a0.

[AndrejKarpathy22a]

Andrej Karpathy. Building makemore Part 2: MLP. September 2022. URL: https://www.youtube.com/watch?v=TCH_1BHY58I.

[AndrejKarpathy22b]

Andrej Karpathy. Building makemore Part 3: Activations & Gradients, BatchNorm. October 2022. URL: https://www.youtube.com/watch?v=P6sfmUTpUmc.

[AndrejKarpathy22c]

Andrej Karpathy. Building makemore Part 4: Becoming a Backprop Ninja. October 2022. URL: https://www.youtube.com/watch?v=q8SA3rM6ckI.

[AndrejKarpathy22d]

Andrej Karpathy. Building makemore Part 5: Building a WaveNet. November 2022. URL: https://www.youtube.com/watch?v=t3YJ5hKiMQ0.

[AndrejKarpathy22e]

Andrej Karpathy. The spelled-out intro to language modeling: building makemore. September 2022. URL: https://www.youtube.com/watch?v=PaCmpygFfXo.

[AndrejKarpathy22f] (1,2)

Andrej Karpathy. The spelled-out intro to neural networks and backpropagation: building micrograd. August 2022. URL: https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&index=1&pp=iAQB.

[AndrejKarpathy23a]

Andrej Karpathy. Let's build GPT: from scratch, in code, spelled out. January 2023. URL: https://www.youtube.com/watch?v=kCc8FmEb1nY.

[AndrejKarpathy23b]

Andrej Karpathy. State of GPT \textbar BRK216HFS. May 2023. URL: https://www.youtube.com/watch?v=bZQun8Y4L2A.

[AndrejKarpathy23c]

Andrej Karpathy. [1hr Talk] Intro to Large Language Models. November 2023. URL: https://www.youtube.com/watch?v=zjkBMFhNj_g&list=PLAqhIrjkxbuW9U8-vZ_s_cjKPT_FqRStI&index=1&pp=iAQB.

[AndrejKarpathy24a]

Andrej Karpathy. Let's build the GPT Tokenizer. February 2024. URL: https://www.youtube.com/watch?v=zduSFxRajkE.

[AndrejKarpathy24b]

Andrej Karpathy. Let's reproduce GPT-2 (124M). June 2024. URL: https://www.youtube.com/watch?v=l8pRSuU81PU.