Self-Replicating AI Malware is Here😱 #ComPromptMized – Security Boulevard

6 minutes, 4 seconds Read
Researchers worm themselves into your nightmares.

Generative AI can be fooled into stealing info, sending spam and spreading disinformation. Researchers have demonstrated how AI-powered agents can be maliciously subverted with self-replicating prompts. That allows the bad behavior to be made into a zero-click worm, so it spreads auto­matically.

Think about it: That’s frightening. In today’s SB Blogwatch, we try not to.

Your humble blogwatcher curated these bloggy bits for your enter­tainment. Not to mention: More of the same.

Skrik

What’s the craic? Matt Burgess broke the story—“Here Come the AI Worms”:

Two to three years
As generative AI systems … become more advanced, … companies are building AI agents and ecosystems … that can complete boring chores for you. … But as the tools are given more freedom, it also increases the potential ways they can be attacked. Now, … a group of researchers has created one of what they claim are the first generative AI worms—which can spread from one system to another, potentially stealing data or deploying malware.

The researchers … attack a generative AI email assistant to steal data from emails and send spam messages—breaking some security protections in ChatGPT and Gemini in the process. … The researchers turned to [an] “adversarial self-replicating prompt,” [which] triggers the generative AI model to output, in its response, another prompt … broadly similar to traditional SQL injection and buffer overflow attacks. [It] “poisons” the database of an email assistant using retrieval-augmented generation (RAG)—a way for LLMs to pull in extra data from outside its system.

While generative AI worms haven’t been spotted in the wild yet, multiple researchers say they are a security risk that startups, developers, and tech companies should be concerned about [and] should take seriously. This particularly applies when AI applications are given permission to take actions on someone’s behalf—such as sending emails or booking appointments. [The] researchers say they anticipate seeing generative AI worms in the wild in the next two to three years.

Why did they nickname it “Morris II”? Nate Nelson knows—“New Malware, Old Problem”:

Clever hackers
Most of today’s most advanced threats to AI models are just new versions of the oldest security problems in computing. … Part of what made the Morris worm novel in its time, three decades ago, was the fact that it figured out how to jump the data space into the part of the computer that exerts controls, enabling a Cornell grad student to escape the confines of a regular user and influence what a targeted computer does.

Clever hackers today use GenAI prompts largely to the same effect. And so, just like software developers before them, for defense, AI developers will need some way to ensure their programs don’t confuse user input [and] machine output. Developers can offload some of this responsibility to API rules, but a deeper solution might involve breaking up the gen AI models themselves into constituent parts. This way, data and control aren’t living side-by-side in the same big house.

Horse’s mouth? Stav Cohen, Ron Bitton and Ben Nassi—“ComPromptMized: Unleashing Zero-click Worms”:

Severe outcomes
This research is intended to serve as a whistleblower to the possibility of creating GenAI worms in order to prevent their appearance. … Given the widespread integration of GenAI capabilities by numerous companies into their products, transforming existing applications like personal assistants and email applications into an interconnected network of … GenAI-powered agents, it is imperative … to comprehend the security and privacy risks.

In this paper, … we show how attackers can leverage adversarial machine learning and jailbreaking techniques to create the first malware (worm) that exploits GenAI services to spread malware. … The first class of attack steers the flow of a GenAIpowered application toward a desired target, and the second class poisons the database … in inference time. Both attacks are applied in zero-click. … The malicious activity can be:
To exfiltrate … confidential data,
To distribute propaganda,
To generate toxic content, …
To spam the user, …
To perform a phishing or spear-phishing attack.

However, we believe that the impact of the malicious activity … will be more severe soon with the integration of GenAI capabilities into operating systems, smartphones and automotive. Such … agents can give rise to … severe payloads (e.g., ransomware, remote-code execution, wiper) and … severe outcomes (e.g., financial, operational, and safety).

What should devs do? jszymborski has two suggestions:

LLMs have very real limitations and should only be used within well-defined scopes. … There are two fundamental vulnerabilities here to my mind that I think are worth learning from:
1. Sanitize LLM output
2. Always outline to the user and make them confirm what actions a chat assistant is going to do.

Keeping humans in the loop—ensuring AI agents aren’t allowed to take actions without approval—is a crucial mitigation.

Good luck with that. HBI thinks similar:

I wish them luck. Protecting against attack in a system humans don’t fully understand is by definition impossible.

Isn’t this research inherently dangerous? No way, thinks u/Talvara:

It’s important to keep in mind … it’s preferred to have researchers find this **** out, publicize and disclose it—rather than have criminals or other bad actors develop it in silence.

But surely the AI devs have considered this already? DCStone doesn’t think so:

It’s one of the things the researchers are warning about: … Some people will implement [AI agents] without considering security risks.

We’re part way there already with existing virtual assistants and chat clients. It would not surprise me if, for example, someone decided to juice up their email auto-response system by hooking it to an agent to provide immediate AI-generated responses to basic support requests.

Not to be all, like, “Told you so,” but not2b is all, like, “Told you so”:

It isn’t surprising at all that this can be done. I figured that as soon as people started building LLM based agents that can read and send email we would see these.

There is no solid barrier to having the agent see something in an email as a command. It can be limited with prompts and training but no one has found a bulletproof way to constrain the behavior.

There’s nothing new under the sun. iAmWaySmarterThanYou remembers their “first year compsci project”:

One of the requirements our Professor gave us was to accept user generated input without breaking or doing something stupid. All input had to be filtered for safety no matter how psychotic the input. Anyone’s program that did stupid **** or crashed with malicious or unexpected input was automatically dropped a full letter grade.

Apparently the OpenAI and Google devs didn’t learn this lesson.

Meanwhile, u/Do-you-see-it-now needs some deyodarant: [You’re fired—Ed.]

Begun, the AI wars have.

And Finally:

Cohen et al illustrate shonkily

[embedded content]

Previously in And Finally


You have been reading SB Blogwatch by Richi Jennings. Richi curates the best bloggy bits, finest forums, and weirdest websites … so you don’t have to. Hate mail may be directed to @RiCHi, @richij or [email protected]. Ask your doctor before reading. Your mileage may vary. Past per­formance is no guarantee of future results. Do not stare into laser with remaining eye. E&OE. 30.

Image sauce: Edvard Munch, via the National Gallery of Norway (leveled and cropped)

This post was originally published on 3rd party site mentioned in the title of this site

Similar Posts