Apex AI Penetration Testing Agent Apex is an AI-powered penetration testing agent that can work on its own and test live apps in black-box mode. It doesn't need access to source code, hints, or set attack paths. This lets it find, link, and check real-world vulnerabilities at the speed that modern software development needs.
A structural breakdown in how software security is being done is what started Apex. AI coding agents are writing and combining code on a large scale. For example, Stripe's coding agents merge 1,300 pull requests every week, and some engineering teams spend more than $1,000 a day on AI tokens for each engineer without any human code review.
Find out more Technology for Businesses News updates about hacking VPN services This speed is too fast for traditional scanners and assessments done by people. Apex was made to be the adversarial verification layer. It is a separate agent that attacks the running application just like a real attacker would, finding weaknesses before they become breaches.
Apex can be used in three different ways. In CI pipelines, it checks every deployment against a sandboxed copy of the application, mapping the attack surface and trying to exploit it before code merges. It constantly finds and shows exploitable weaknesses in real time against production. It also lets you test any target on demand, replacing the quarterly PDF engagement with a feedback loop that works as fast as modern threats.
PensarAI created Argus, an open-source benchmark of 60 self-contained, Dockerized vulnerable web applications that are specifically designed to test offensive security agents. The most popular benchmark suite, XBOW's 104-challenge set, is 70% PHP, only covers single-vulnerability targets, and doesn't include GraphQL, JWT algorithm confusion, race conditions, prototype pollution chains, WAF bypass, or multi-tenant isolation scenarios. Node.js/Express (40%), Python/Flask/Django (20%), multi-service architectures (25%), Go, Java/Spring Boot, and PHP are all part of Argus.
It adds new categories that no other benchmark does, such as WAF and IDS evasion, multi-step exploit chains that need up to 7 chained vulnerabilities, multi-tenant isolation failures, race conditions and business logic flaws, modern authentication bypasses (JWT, OAuth, SAML, MFA), and attacks on cloud and Kubernetes infrastructure.
There are 2 easy, 27 medium, and 31 hard challenges that test your difficulty level. 60 apps have 271 vulnerabilities. We used Claude Haiku 4.5, the smallest and cheapest model available, to point Apex at all 60 Argus challenges in full black-box mode.
This helped us see how architectural gains were better than raw model capability. Apex had a 35% pass rate, which was better than PentestGPT (30%) and Raptor (27%). The gap grew a lot on the top 10 hardest challenges using Claude Opus 4.6: Apex solved 80%, PentestGPT reached 70%, and Raptor hit 60%. Find out more Platform for threat intelligence Training in security awareness for hacking and cracking Apex found 271 different vulnerabilities during the entire run.
These included SQL injection, SSRF, NoSQL injection, prototype pollution, SSTI, XXE, race conditions, IDOR, auth bypass, CORS misconfigurations, command injection, and path traversal.
The average cost of each challenge was about $8, and the whole 60-challenge run on Haiku cost less than $500. In less than 15 minutes, there were some notable solves, such as a 7-step race-condition double-spend in a fintech transfer endpoint, a multi-tenant SSRF chain that used a shared cache to get API keys from neighboring tenants, and SpEL injection to RCE a Java Spring Boot application. Apex's documented failure modes are helpful.
The biggest gap was last-mile execution, which meant finishing the last step of credential extraction after a successful SSRF chain. The agent was fooled by decoy flags twice, and complex multi-step chains like CI/CD pipeline poisoning and Kubernetes compromise took longer than the 30-minute budget. You can now get both Apex and the Argus benchmark for free on GitHub.
, LinkedIn, and X for daily news about cybersecurity. Get in touch with us to have your stories published.












