Hacking Language Models
Language models are everywhere today: they run in the background of Google Translate and other translation tools; they help operate voice assistants like Alexa or Siri; and most interestingly, they are available via several experiential projects trying to emulate natural conversations, such as OpenAI’s GPT-3 and Google’s LaMDA. Can these models be hacked to gain access to the sensitive information they learned from their training data?