Cloudflare Saves 77% on Security Audit Costs After Integrating Kimi K2.5, Running 7 Billion Tokens Daily

動區BlockTempo

2026-03-24 09:25:29

Cloudflare’s Workers AI platform officially integrates Moonshot AI’s Kimi K2.5, supporting 256K context windows, multi-turn tool calls, and visual input. Cloudflare’s internal security audit agent processes over 7 billion tokens daily, and switching to this model reduces costs by 77% compared to mid-tier commercial models.
(Background: Cursor used Kimi K2.5 for training but didn’t disclose it; developers captured packets, deleted prompts, and the official quick turnaround was recorded.)
(Additional info: Cloudflare, which helps block web crawlers, launched a “One-Click Whole Site Crawler API” that supports RAG, incremental updates, and model training.)

Table of Contents

Toggle

Security Agent processing 7 billion tokens a day
Cloudflare introduces three improvements
Underlying inference engine: Infire support, not just off-the-shelf frameworks

Cloudflare’s Workers AI platform quietly made a major move. According to the official Cloudflare blog, they set Moonshot’s Kimi K2.5 as the default model for the Agents SDK starter. Cloudflare engineers are also using it for real security audits, saving a lot of costs.

Kimi K2.5 is one of the few open-source models supporting “cutting-edge specifications,” including 256K context windows, multi-turn tool calling, visual inputs, and structured outputs. For long-text reasoning agent tasks, these features are quite practical.

Security Agent processing 7 billion tokens a day

Cloudflare engineers directly used Kimi K2.5 as the main programming agent in the OpenCode environment, deploying a public code review agent called “Bonk” integrated into automated pipelines.

Even more impressive is the internal security audit scenario. This agent handles over 7 billion tokens daily. Using standard-tier commercial models for the same workload would cost about $2.4 million per year. With Kimi K2.5, costs are cut by 77%, saving nearly $1.85 million.

This isn’t advertising—it’s a figure directly shared by Cloudflare engineers on their official blog.

Cloudflare introduces three platform improvements

Just changing the model isn’t enough. Cloudflare also rolled out three platform-level improvements aimed at reducing costs and increasing efficiency for long conversation scenarios:

Prefix Caching: Tokens already processed in multi-turn conversations are no longer charged repeatedly; cached tokens enjoy discounted rates. Over long tasks, this saves a significant amount of money.
Session Affinity Header: Adds an x-session-affinity request header to route the same session to the same model, increasing cache hit rates. OpenCode and Agents SDK starter support this natively.
Asynchronous Batch Inference API: Requests exceeding synchronous rate limits can be queued and processed asynchronously. Internal tests typically complete within 5 minutes. Suitable for code scanning and research-oriented agent tasks that don’t require immediate responses.

Underlying inference engine: Infire support, not just off-the-shelf frameworks

Cloudflare didn’t use off-the-shelf inference frameworks. Instead, they built a customized core with their own Infire inference engine, employing data parallelism, tensor parallelism, and expert parallelism, combined with a separated prefix processing architecture.

Currently, Kimi K2.5 is the first large model inference case on Workers AI, demonstrating Cloudflare’s ambition in AI infrastructure—integrated with web platforms and cost-effective.

View Original

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Comment

0/400

No comments