#61: Prof. YANN LECUN: Interpolation, Extrapolation and Linearisation (w/ Dr. Randall Balestriero)

2022-01-04に共有
We are now sponsored by Weights and Biases! Please visit our sponsor link: wandb.me/MLST

Patreon: www.patreon.com/mlst
Discord: discord.gg/ESrGqhf5CB

Yann LeCun thinks that it's specious to say neural network models are interpolating because in high dimensions, everything is extrapolation. Recently Dr. Randall Bellestrerio, Dr. Jerome Pesente and prof. Yann LeCun released their paper learning in high dimensions always amounts to extrapolation. This discussion has completely changed how we think about neural networks and their behaviour.

[00:00:00] Pre-intro
[00:11:58] Intro Part 1: On linearisation in NNs
[00:28:17] Intro Part 2: On interpolation in NNs
[00:47:45] Intro Part 3: On the curse
[00:57:41] LeCun intro
[00:58:18] Why is it important to distinguish between interpolation and extrapolation?
[01:03:18] Can DL models reason?
[01:06:23] The ability to change your mind
[01:07:59] Interpolation - LeCun steelman argument against NNs
[01:14:11] Should extrapolation be over all dimensions
[01:18:54] On the morphing of MNIST digits, is that interpolation?
[01:20:11] Self-supervised learning
[01:26:06] View on data augmentation
[01:27:42] TangentProp paper with Patrice Simard
[01:29:19] LeCun has no doubt that NNs will be able to perform discrete reasoning
[01:38:44] Discrete vs continous problems?
[01:50:13] Randall introduction
[01:50:13] are the interpolation people barking up the wrong tree?
[01:53:48] Could you steel man the interpolation argument?
[01:56:40] The definition of interpolation
[01:58:33] What if extrapolation was being outside the sample range on every dimension?
[02:01:18] On spurious dimensions and correlations dont an extrapolation make
[02:04:13] Making clock faces interpolative and why DL works at all?
[02:06:59] We discount all the human engineering which has gone into machine learning
[02:08:01] Given the curse, NNs still seem to work remarkably well
[02:10:09] Interpolation doesn't have to be linear though
[02:12:21] Does this invalidate the manifold hypothesis?
[02:14:41] Are NNs basically compositions of piecewise linear functions?
[02:17:54] How does the predictive architecture affect the structure of the latent?
[02:23:54] Spline theory of deep learning, and the view of NNs as piecewise linear decompositions
[02:29:30] Neural Decision Trees
[02:30:59] Continous vs discrete (Keith's favourite question!)
[02:36:20] MNIST is in some sense, a harder problem than Imagenet!
[02:45:26] Randall debrief
[02:49:18] LeCun debrief

Pod version: anchor.fm/machinelearningstreettalk/episodes/061-I…

Our special thanks to;
- Francois Chollet (buy his book! www.manning.com/books/deep-learning-with-python-se…)
- Alexander Mattick (Zickzack)
- Rob Lange
- Stella Biderman

References:
Learning in High Dimension Always Amounts to Extrapolation [Randall Balestriero, Jerome Pesenti, Yann LeCun]
arxiv.org/abs/2110.09485

A Spline Theory of Deep Learning [Dr. Balestriero, baraniuk]
proceedings.mlr.press/v80/balestriero18b.html

Neural Decision Trees [Dr. Balestriero]
arxiv.org/pdf/1702.07360.pdf

Interpolation of Sparse High-Dimensional Data [Dr. Thomas Lux]
tchlux.github.io/papers/tchlux-2020-NUMA.pdf

If you are an old fart and offended by the background music, here is the intro (first 60 mins) with no background music. drive.google.com/file/d/16bc7XJjKJzw4YdvL5rYdRZZB1…

コメント (21)
  • Thanks for posting this episode! And as "that guy" at 2:08:19, I'm happy to say I found the discussion very interesting and it's changed my mind :)
  • @DanElton
    I’m literally working on a blog post about how deep learning is interpolation only , based on double descent phenomena and distribution shift issues, and then this drops!! Lol
  • This is incredible! Ms. Coffee Bean's dream came true: the extrapolation interpolation beef explained in a verbal discussion! 🤯 Cannot wait to watch this. So happy about a new episode from MLST. You kept us waiting.
  • Just spent 5 hours watching this 3-hour video. This is both dense and profound. Great job, best episode yet in my book!
  • I wait for your videos in more excitement than I wait for my favorite tv shows' new seasons. Looks amazing!
  • @BenuTuber
    Starting off the new year with a bang. Tim, Keith and Yannic - thank you so much for this quality work. You can clearly tell how much love and dedication goes into every episode. Also the intros just continue to amaze me - the level of understanding you approach the variety of topics with is extremely inspiring.
  • Couple of minutes into the video and you break some of the fundamentals assumptions I had about deep learning/Neural nets, Jeez man. Excited for this 3hrs long video. And as usual the production quality of the videos keeps getting better. Happy New Year Guys
  • Sooo what are the odds we can get a conversation between LeCun and Chollet? Would love to watch them have a discussion on this.
  • You guys really kept us waiting. Thank you! MLST for this one.
  • Love these long form videos -- really appreciate the effort you guys are putting in!!
  • The content in this channel is just mind blowing. But the main reason I come back is the thoughtful editing and introductions and reflections of the content by dr Tim. I cannot keep up yet in grasping all the content in real time but that is exactly why it's so awesome. Thanks!
  • Thank you guys, I've not been more amazed by anything in AI than this completely brand new revelation of neural network's internal working. Insanely interesting and beautiful.
  • WOHOOOO! I'm so so stoked to see this video!  Time to drop everything and watch another epic interview by the MLST team!
  • Just got done watching it. Grateful for the great work the team has done. Cheers :)
  • @tchlux
    Thanks for the shoutout at 38:01 Tim! The Discord channel rocks 😆 An additional note on extrapolation that people might find interesting: - In effect, the ReLU activation function prevents value extrapolation to the left. So when these are stacked, they serve as "extrapolation inhibitors". - This clipping could be applied to other activation functions to improve generalization (or forewarn excessive extrapolation)! - I.e., clipping the inputs to all activation functions within a neural network to be in the range seen at the end of training time will reduce large extrapolation errors at evaluation time (and counting the number of times an input point is clipped throughout the network could indicate how far "outside the relevant convex hull" it is). The clipping shouldn't be introduced until training is done (because we don't have a reason to assume the initialization vectors are "good" at identifying the relevant parts of the convex hull). But I'd be willing to bet that this "neuron input clipping" could improve generalization for many problems, is part of why ReLU works well for so many problems, and can prevent predictions from being made at all for adversarial inputs
  • Great episode! These long deep dives are amazing, I get a lot of intuition from them and they are a great point to start reading more papers on the topic (who except Yann can keep up with axiv these days...) Really appreciate the effort and have a great 2022 :)
  • @Artula55
    I think I have seen this video over a dozen times, but every time I keep learning something new. Thx MLST!
  • Fantastic discussion and explanation of the thinking behind interpolation, extrapolation and linearisation. This has really helped shift the needle towards towards the ultimate problem we all face, helping decipher what input is relevant to the task. If possible, please do V.2 covering some of the other concepts Prof LeCun was talking about. Could be a series on its own as so good! Mike Nash - The AI finder
  • I keep coming back to this. One of the best MLSTs.