BlackIce: A Container-Based Red Teaming Toolkit for Testing AI Security

BlackIce, an open-source containerized toolkit created to expedite red teaming and AI security testing, has been released by Databricks. BlackIce, which was first presented at CAMLIS Red 2025, addresses important issues that security experts encounter when testing AI systems by combining 14 popular AI security tools into a single, repeatable Docker container environment. When performing security assessments, the Problem BlackIce Solves AI red teamers encounter four major challenges.

Initially, every security tool necessitates different configurations and setup processes, which takes up important testing time. Second, tools frequently have competing dependencies, which increases operational complexity and calls for distinct runtime environments. Third, testing flexibility is limited by the fact that managed notebook environments usually only expose one Python interpreter per kernel.

Lastly, novices find it challenging to navigate and choose suitable testing frameworks due to the quickly growing landscape of AI security tools. In order to overcome these obstacles, BlackIce uses a model akin to the well-known penetration testing distribution Kali Linux. Security teams can avoid time-consuming setup processes and concentrate on performing thorough security assessments by using BlackIce's ready-to-run container image.

The toolkit includes 14 carefully chosen open-source tools covering adversarial machine learning, security testing, and responsible AI evaluation. LM Eval Harness (Eleuther AI), Promptfoo, CleverHans (CleverHans Lab), Garak (NVIDIA), ART (IBM), Giskard, CyberSecEval (Meta), PyRIT (Microsoft), and a number of other tools were chosen for this release based on security relevance and community acceptance.

To ensure thorough coverage of important attack vectors, these capabilities are methodically mapped to the Databricks AI Security Framework (DASF) and the MITRE ATLAS framework. Prompt injection and jailbreak testing, supply-chain security scanning, LLM data leak detection, hallucination stress testing, adversarial example generation, and indirect prompt injection through untrusted content are all covered by BlackIce. BlackIce divides tools into two useful groups.

Through command-line interfaces that require little programming knowledge, static tools offer evaluation capabilities. Security experts can create unique attack code and scenarios with dynamic tools, which provide comparable functionality and support sophisticated Python-based customization. Static tools are installed in Node.js projects or isolated Python virtual environments with independent dependencies that can be accessed straight from the CLI.

The global Python environment incorporates dynamic tools, and a central requirements file is used to handle dependency conflicts. To enable smooth integration with Databricks Model Serving endpoints and workspaces, custom patches were applied to particular tools. Databricks' Docker Hub has the BlackIce container image.

Using the following command, users can install the most recent version: docker pull databricksruntime/blackice:17.3-LTS Teams use Databricks Container Services to configure compute resources and designate the BlackIce image as the Docker environment in order to integrate BlackIce into a Databricks workspace. Security experts can use demonstration notebooks to coordinate various AI security tools for vulnerability testing, such as prompt injection and jailbreak attack assessments, after the cluster is created. The complete implementation is available on GitHub, including tool documentation, examples for Databricks-hosted models, and Docker build artifacts.

In order to assist organizations in putting in place thorough AI security testing programs, the accompanying CAMLIS Red Paper offers more technical information on tool selection and container architecture.