#52 - Unadversarial Examples (Hadi Salman, MIT)

Performing reliably on unseen or shifting data distributions is a difficult challenge for modern vision systems, even slight corruptions or transformations of images are enough to slash the accuracy of state-of-the-art classifiers. When an adversary is allowed to modify an input image directly, models can be manipulated into predicting anything even when there is no perceptible change, this is known an adversarial example. The ideal definition of an adversarial example is when humans consistently say two pictures are the same but a machine disagrees. Hadi Salman, a Ph.D student at MIT (ex-Uber and Microsoft Research) started thinking about how adversarial robustness could be leveraged beyond security. He realised that the phenomenon of adversarial examples could actually be turned upside down to lead to more robust models instead of breaking them. Hadi actually utilized the brittleness of neural networks to design unadversarial examples or robust objects which_ are objects designed specifically to be robustly recognized by neural networks. Introduction [00:00:00] DR KILCHER'S PHD HAT [00:11:18] Main Introduction [00:11:38] Hadi's Introduction [00:14:43] More robust models == transfer better [00:46:41] Features not bugs paper [00:49:13] Manifolds [00:55:51] Robustness and Transferability [00:58:00] Do non-robust features generalize worse than robust? [00:59:52] The unreasonable predicament of entangled features [01:01:57] We can only find adversarial examples in the vicinity [01:09:30] Certifiability of models for robustness [01:13:55] Carlini is coming for you! And we are screwed [01:23:21] Distribution shift and corruptions are a bigger problem than adversarial examples [01:25:34] All roads lead to generalization [01:26:47] Unadversarial examples [01:27:26]

Om Podcasten