“’Prompt injection is a type of security vulnerability that can be exploited to control the behavior of a ChatGPT instance,’ Github explained. A prompt injection can be as simple as telling the LLM to ignore the pre-programmed instructions. It could ask specifically for a nefarious action or to circumvent filters to create incorrect responses.”
It’s worth a read, including for the reference to a study of how hackers could “hypnotize” an AI large language model (LLM) to generate wrong answers or leak sensitive information. See our report on a student who “tricked” a chatbot into providing credit card numbers.
It makes us wonder if making AI more human-like could impact the system’s own ability to fend off cyber attacks based on human psychology.