Sanket Badhe, a researcher at Rutgers University, created ScamAgent, an autonomous, multi-turn AI framework that shows how large language models (LLMs) can be used to make fully automated scam calls This article explores turns scamagent architecture. . The system successfully bypasses existing AI safety guardrails to simulate highly realistic social engineering attacks by combining goal-driven planning, contextual memory, and real-time text-to-speech (TTS) synthesis.

The architecture of ScamAgent diverges from traditional prompt injection by using a central orchestrator to manage conversational state and deception strategies across multiple interaction turns. ScamAgent System Architecture (source: arxiv.org) When given a malicious objective, the agent uses goal decomposition to break the target down into a sequence of seemingly benign sub-goals, mirroring how human fraudsters gradually build trust with their victims.

ScamAgent hides its prompts in roleplay settings to get around safety filters in models like GPT-4 and LLaMA3-70B. This makes it hard for standard single-turn moderation tools to see the overall malicious intent. In tests of five common types of fraud, ScamAgent was very good at breaking standard safety protocols and model alignments.

Goal Decomposition: Attackers break a bad goal down into small steps that don't look dangerous. To protect yourself, you need to keep an eye on conversations at all times. Deception and Roleplay: Fake stories or official roles hide harmful requests. Blocking impersonation and limiting AI personas can help with this.

Contextual Memory: The system remembers what you said before and changes its scam plan. This risk can be lowered by limiting how much history it can remember. Real-Time TTS: A fake voice call is made from text.

Checking the content before it goes out as audio can help stop abuse. When people asked direct malicious questions, 84% to 100% of the time they were turned down. But when the agentic framework was used, these refusals dropped to between 17% and 32% by spreading the harmful intent throughout the conversation.

This is a comparison of the refusal rates of GPT-4, Claude 3.7, and LLaMA 3 70B in single-prompt and ScamAgent situations (source: arxiv.org). Meta's LLaMA3-70B model had the highest full dialogue completion rate of 74% during job identity fraud simulations. It finished all of its sub-tasks without stopping for safety reasons. Researchers say that to protect against autonomous generative threats, security systems need to go from just filtering prompts to always watching and understanding what users want.

Find out more about Multi-Factor Authentication (MFA). Courses on cyber security It is important for firewall AI platform providers and security teams to use multi-layered defenses that include sequence classifiers to predict long-term outcomes and strict controls over memory retention., LinkedIn, and X for daily updates on cybersecurity. Get in touch with us to have your stories featured.