If you’re an API product manager, you may already be seeing traffic from AI agents — nifty AI applications that act autonomously based on decisions formed from their own reasoning and contextual knowledge. These AI agents depend on APIs to function, with many making massive volumes of API calls, often in unpredictable or unusual patterns.
Considering that 96% of IT leaders responding to Cloudera’s “The Future of Enterprise AI Agents” survey plan on expanding their use of AI agents in the next 12 months, and APIs are key to unlocking their potential, you’ll need to figure out how to effectively manage API usage for these AI-powered consumers, and fast.
So today, we’re looking at the unique API usage challenges of AI agents, different rate limiting approaches, and ways API providers could implement them. We’re also highlighting how API gateway providers are responding to the increasing deployment of AI agents.
AI Agents Bring Unique API Usage ChallengesThe most common way to control API usage is through rate limiting. However, traditional rate limiting methods were created for browsers and apps used by humans. They were not built for AI agents that make high-volume, bursty, or unpredictable calls to APIs — calls that many malicious botnets also make.
Some malicious botnets now use AI to precisely mimic legitimate API traffic, while the traffic from authentic AI agents can appear inauthentic. For example, an AI agent suddenly making millions of legitimate requests to an API could look like a distributed denial of service (DDoS) attack, while a botnet conducting an actual DDoS attack against that same API could fly under the radar, thanks to AI.
AI agents behave differently from human users, and they share many of the same characteristics as malicious botnets. Plus, API traffic will be far spikier than you’re accustomed to. So, traditional rate limiting methods won’t do.
Traditional API Rate Limiting Methods Won’t Cut ItHistorically, methods for rate limiting APIs typically involve implementing an algorithm that limits the number of requests a consumer can make based on pre-defined rules and parameters. Common rate limiting algorithms include leaky bucket, fixed window, and token bucket.
These methods are based on fixed limits, which can work well for predictable API traffic. However, traditional rate limiting algorithms don’t consider user behavior and can’t distinguish between legitimate high-volume consumers like AI agents and malicious botnets. API providers need to consider new approaches that go beyond static rate limits, adjusting limits dynamically in real time to accommodate AI agent consumers.
What API Rate Limiting Approaches Will Work for AI Agents?Traditional approaches to rate limiting APIs won’t work effectively for AI agent consumers, so some API providers have shifted to adaptive rate limiting (ARL). For example, DeepSeek employs a more dynamic and adaptive approach to rate limiting its API compared to other LLM API providers currently.
The concept of adaptive rate limiting isn’t new, but it’s evolving to address new API usage scenarios that include AI agents. Modern ARL involves a set of principles, tools, and techniques that allow systems to adjust rate limits dynamically based on context and real-time insights. It includes a combination of approaches:
As more AI-powered applications consume APIs, API providers will need to accommodate their unique needs by offering flexible and dynamic API usage. Allow burst traffic for AI agents when needed while also preserving long-term request limits. Ensure AI-powered consumers will not lose access during legitimate spikes in API usage. So, how could you implement adaptive rate limiting for your APIs?
You could build an autonomous AI agent that automatically applies adaptive rate limiting using various third-party tools. You could use an API gateway to set up ARL approaches for your APIs. However, at the time of writing, no API gateway offers every single ARL approach out of the box. Some API gateways like Kong, Apache APISIX, and KrakenD offer many plugins you can use to implement ARL methods, though.
While robust adaptive rate limiting may not be a default feature of API gateways (yet!), many providers are moving towards the AI agent future.
How API Gateway Providers Are Responding to the Rise of AI AgentsMany providers are responding to the rapid mainstream adoption of large language models (LLMs) and the increasing deployment of AI agents by adapting their existing API gateways to accommodate AI workloads.
For example, gateways like Kong and Apache APISIX now include LLM-specific features such as token-aware plugins and management of multiple AI models. These upgrades allow existing API gateways to also serve as AI gateways. As the use of AI agents and LLMs increases, API gateways will become more AI-focused. It would behoove API gateway providers to make implementing ARL approaches quick and easy, as they are critical to effectively managing API usage for AI agent consumers.
Adapting Rate Limiting for an AI Agent FutureAI agents are already here and consuming a rapidly growing number of APIs, creating new and unique challenges for API providers. With their propensity to make high volumes of calls in spiky and unpredictable ways, API providers (and API gateway providers) must rethink how they approach API rate limiting. Fixed limits still have a role to play, but adaptive rate limiting offers AI agents the dynamic flexibility they need to complete their goals.
All Rights Reserved. Copyright , Central Coast Communications, Inc.