Forum Discussion
Configuring semantic caching on F5 AI Gateway
The semantic caching feature is mentioned on the F5 AI Gateway introduction page, but I couldn't find any documentation on how to use it. Is there a guide available for this?
Also, I'm curious whether token-based rate limiting will be supported in the future.
3 Replies
Can't say about the semantic caching feature but about token based rate limit this can be done with authorization header with XC as I have shown in F5 XC Session tracking with User Identification Policy | DevCentral
On nginx you can use njs javascript module to make not source ip based rate limit and on BIG-IP an irule should do the trick:
GitHub - nginx/njs-examples: NGINX JavaScript examples
3.1.2. Lab 2 - HTTP Throttling
The idea is to place the XC , nginx or BIG-IP before the AI Gateway as the AI gateway is for exact AI protections while pure API protection is done on the normal systems like XC, Nginx or BIG-IP.
- devopssong
Nimbostratus
Hi. Thanks for the reply.
Actually, I was wondering about input/output text token used in LLM API pricing, not JWT tokens.
For AI Gateway use cases, I believe token-based rate limiting would be more effective than traditional request-based limits.
If the token is not in a header but the request body then extracting it and rate limiting on it will be a little harder. Big-IP with irules or nginx with njs javascript module could do it but it will be complex.
https://clouddocs.f5.com/training/community/nginx/html/class3/module1/module12.html
https://github.com/nginx/njs-examples
Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com