How Anthropic Found a Hack to Make AI Give You Answers It Shouldn't

If you build it, people will try to break it. Sometimes even people building things are what break it. Such is the case of Anthropic and its latest research that demonstrates an interesting vulnerability in current LLM technology. More or less, if you continue with a question, you can break down barriers and end up with large language models that tell you things they are not designed to do. Like how to build a bomb.

Of course, given the progress in open source ai technology, you can create your own LLM locally and ask it whatever you want, but for more consumer stuff, this is a topic worth considering. The fun thing about ai today is the rapid pace at which it is advancing and how well (or not) we are doing as a species to better understand what we are building.

If I may think, I wonder if we'll see more questions and problems of the kind Anthropic describes as LLMs and other new types of ai models get smarter and bigger. Which may be repeating myself. But the closer we get to more generalized artificial intelligence, the more it should look like a thinking entity and not a computer we can program, right? If so, might we have more difficulty nailing down edge cases to the point where such work becomes unfeasible? Anyway, let's talk about what Anthropic recently shared.