The Canary Issue No.1
The LLM Canary Benchmark

December 2023
On November 30, 2022, ChatGPT was released to the general public and amassed over one million users in the first five days. Google and Meta raced to release their large language models too. The growth trend continued throughout this breakout year of AI; however so did new data leaks and security breaches. Sadly, security considerations seem to have not kept pace with this grand expansion, and there has been a lack of focus on protecting sensitive and personal information from exposure and misuse.

This is why we created LLM Canary, an open-source security benchmark and test suitebased on the OWASP LLM Top Ten. LLM Canary provides the AI community an accessible, trusted, and easy to use benchmarking tool to assess and report on the security posture of customized and fine-tuned LLMs. The testing suite can also be integrated into the AI development workflow for continuous vulnerability evaluation.

Foundation models are trained on vast amounts of data and are particularly susceptible to attack. Developers looking to incorporate LLMs into their organizations by training models with sensitive information can use this tool to better understand the potential vulnerabilities and security trade offs between different models.

The tests are designed to cover a variety of risk levels, and sophisticated attack techniques. LLMs are non-deterministic, and produce inconsistent responses. The LLM Canary takes into account duplicate prompts and repetition. The test suites can be expanded or customized, and testing can be integrated into development workflows.

In producing the initial benchmark, we ran multiple rounds of testing per LLM, to calculate the cumulative average as the overall LLM score. We ran 125 test runs per OWASP vulnerability group and per LLM, for the top 3 scoring LLMS. This amounted to almost 16k total tests.

We have shared some charts illustrating the results of the initial benchmark exercise. For instance, GPT-4 outperformed other LLMs in the benchmark for both vulnerability types tested. GPT-3.5 and Llama presented some more interesting results between the two test groups, both performing less consistently than the other LLMs tested.
Get Started
Benchmark Test Samples: Prompt Injection
Benchmark Test Samples: Sensitive Information Disclosure
The LLM Canary project is not only a test tool but an open-source initiative that addresses AI security and privacy challenges across the ecosystem. Today two key vulnerability groups are supported: Prompt Injection and Sensitive Information Disclosure, both critical LLM vulnerability types published by OWASP.

In a world where AI is becoming increasingly prevalent, prioritizing security is not just a choice; it's a necessity. LLM Canary fills a crucial gap by providing efficient, bench-marked LLM testing capabilities to accelerate our collective journey toward trustworthy AI systems and responsible innovation.

Be happy, safe and productive,

The LLM Canaries
llmcanarybenchmark@gmail.com