Building a Rate-Limit-Proof IPinfo IP Lookup Proxy in Rust
What It Actually Does
At the highest level it's an HTTP proxy. You send it the exact same requests you'd send to ipinfo.io and it forwards them upstream with a valid API key attached. From your application's perspective it's just an IPInfo endpoint that doesn't rate limit you.
Under the hood there are three things happening:
Round-robin key rotation — it has a pool of API keys and cycles through them. If a key gets rate limited (429) or has an auth issue (401), that key gets put in a timeout and the next available one is used instead.
Singleflight coalescing — if 50 requests come in at the same time for the same IP address, only one of them actually hits upstream. The other 49 wait for that one to finish and then all get the same result back. This was one of those things I didn't think about until I actually started looking at the upstream call count and went "why is this number so high."
In-memory caching — once you've looked up an IP, the result gets stored in memory for a configurable TTL (default 5 minutes). Repeat lookups within that window never touch upstream at all.
Each of these layers solves a different problem. The cache alone doesn't help you if you're looking up thousands of unique IPs. The key rotation alone doesn't help you if you're making redundant calls for IPs you've already seen. Together they make a big difference.
Project Structure
The codebase is pretty small. Five modules:
- main.rs — entry point, sets up tracing, binds the Axum server
- config.rs — loads all config from env vars, handles the .env file
- rotator.rs — the key pool and round-robin logic
- cache.rs — Moka cache wrapper with stats tracking
- stats.rs — global request counters
- proxy.rs — all the route handlers, the singleflight group, the actual proxy logic
AppState is the shared state struct that gets cloned into every Axum handler. It holds an Arc to each of the above components, so they're all shared across requests:
proxy.rsrustpub struct AppState {
pub config: Arc<Config>,
pub rotator: Arc<Rotator>,
pub cache: Arc<Cache>,
pub stats: Arc<Stats>,
pub client: reqwest::Client,
pub inflight: Arc<Group<CacheEntry, u16>>,
}Arc means we're reference counting — each handler gets a cheap clone of the pointers, not a deep copy of the data. Axum does this automatically when you use State<AppState> in a handler.
The Key Rotator
This is probably the most interesting part. The Rotator struct holds a Vec of API keys where each key is wrapped in its own parking_lot::Mutex. There's also an AtomicUsize counter that drives the round-robin:
rotator.rsrustpub struct Rotator {
keys: Vec<Arc<Mutex<KeyState>>>,
counter: AtomicUsize,
cooldown: Duration,
}
struct KeyState {
key: String,
cooling_until: Option<Instant>,
}When next_key() is called, it increments the counter and uses that as the starting index. Then it scans forward through the key list looking for one that's available (not in cooldown):
rotator.rsrustpub fn next_key(&self) -> Option<String> {
let n = self.keys.len();
let start = self.counter.fetch_add(1, Ordering::Relaxed) % n;
for i in 0..n {
let idx = (start + i) % n;
let state = self.keys[idx].lock();
if state.is_available() {
return Some(state.key.clone());
}
}
None
}If it scans through all keys and none are available, it returns None — which the proxy handler turns into a 503 with a Retry-After header set to the cooldown duration so the client knows when to try again.
Cooldown recovery is self-healing — there's no background task or timer that "unbanns" a key. The is_available() check just compares the current time to the stored timestamp:
rotator.rsrustfn is_available(&self) -> bool {
match self.cooling_until {
None => true,
Some(until) => Instant::now() >= until,
}
}The moment the cooldown window passes, the next request that lands on that key will see it as available and start using it again. Clean and simple.
One detail: I used parking_lot::Mutex instead of the standard library one. The reason is that std::sync::Mutex can "poison" if a thread panics while holding the lock, which means you have to handle that error case everywhere you lock it. parking_lot doesn't have that behavior, and it's also faster in low-contention scenarios which is what we have here — the lock is only held for the duration of reading or writing a couple struct fields.
Singleflight — The Part I Didn't Know I Needed
I added caching early on and thought that would be enough. Then I noticed that on a cold start, or after the cache gets cleared, you can get a burst of requests all for the same IP hitting upstream simultaneously. The cache doesn't help you there because they all miss at the same time before any of them has populated the cache entry.
This is sometimes called a "thundering herd" problem. The fix is singleflight — a pattern where concurrent requests for the same work are deduplicated so only one actually executes.
The async_singleflight crate handles this. You call .work() with a key and a future. If there's already in-flight work for that key, your future is dropped and you wait for the existing one to resolve. If you're the first, your future runs and everyone waiting gets your result:
proxy.rsrustlet (ok, err, _) = {
let s2 = s.clone();
s.inflight
.work(&cache_key, async move {
fetch_and_cache(s2, upstream_url, cache_key2, path).await
})
.await
};The fetch_and_cache function does the actual upstream HTTP call and then inserts the result into the Moka cache. So after the singleflight resolves, every subsequent request for that IP hits the cache directly.
The cache key for singleflight is the same as the cache key: method + path + query string. So GET /8.8.8.8 and GET /8.8.8.8?fields=city are treated as separate keys, which is correct since they return different data.
Caching With Moka
The cache is backed by Moka, an async-native Rust caching library. The main reason I picked it over rolling my own is that it uses TinyLFU as the eviction policy. TinyLFU tracks both how recently an entry was accessed and how frequently — so a one-time lookup won't keep displacing a frequently-accessed entry just because it's newer. For an IP lookup cache where some IPs get queried a lot more than others, that's a better fit than pure LRU.
Cache entries store the response body bytes, the Content-Type header, and the HTTP status code:
cache.rsrustpub struct CacheEntry {
pub body: Bytes,
pub content_type: String,
pub status: u16,
}One thing I added that I think is important: a maximum body size limit for cached responses. If a response body is larger than CACHE_MAX_BODY_BYTES (default 32 KiB), it doesn't get cached:
cache.rsrustif entry.body.len() > s.config.cache_max_body_bytes {
debug!("skipping cache: response body exceeds size limit");
} else {
s.cache.insert(cache_key, ...).await;
}Without this, one large response could take up a disproportionate amount of cache space and cause a bunch of smaller, frequently-accessed entries to get evicted. The limit keeps the cache full of useful small entries.
TTL and max capacity are both configurable. Moka handles TTL expiry internally so there's no cleanup task to worry about.
What Gets Cached and What Doesn't
This is where I had to think carefully. Not every endpoint should be treated the same way.
GET /:ip and GET /:ip/:field — these are fully cached and go through singleflight. This is the hot path, the whole point of the service.
GET / — this one returns info about the requesting client's own IP, based on whatever IP is making the request to ipinfo.io. It's cacheable and goes through singleflight. Earlier in development I had this bypassing the cache which was wrong — same IP hitting GET / always gets the same result.
GET /me — this returns the API key's own account information: plan, rate limit status, usage counts. This bypasses both cache and singleflight. The reason is that different API keys return different data here. If two requests for /me get coalesced, one of them might get back account info for a key they weren't supposed to see. And caching it would mean a stale key gets served the wrong account state.
POST /batch — bypasses everything. The request body is part of the lookup, so a proper cache key would need to hash the body. Didn't bother with that since batch responses can be large and variable. Goes direct every time.
Clients can also send Cache-Control: no-cache to force a fresh upstream fetch on any GET endpoint. The proxy checks for that header and bypasses both cache and singleflight when it's present.
Observability
Three internal endpoints:
GET /health — tells you how many keys are active vs in cooldown, and gives an overall "ok" or "degraded" status. Degraded means all keys are currently rate limited.
GET /stats — full counter dump. Looks like this:
return responsejson{
"requests_total": 10482,
"requests_proxied": 832,
"requests_cached": 9650,
"upstream_errors": 0,
"keys_exhausted": 0,
"cache": {
"hits": 9650,
"misses": 832,
"size": 741,
"evictions": 12
},
"keys": {
"total": 3,
"active": 3,
"cooling": 0
}
}DELETE /cache — immediately invalidates all cache entries. Useful when upstream data has changed and you don't want to wait out the TTL.
All the counters are AtomicU64 with Relaxed ordering. No locks, no mutex contention, the stats tracking doesn't slow down request handling at all.
Structured logging via tracing logs every request with method, path, whether it was a cache hit or miss, the upstream status code, and the latency in microseconds. Set LOG_FORMAT=json if you're shipping logs to something like Datadog or Loki.
Configuration
Everything is configurable via environment variables. A .env file in the working directory gets picked up automatically via dotenvy. The only required value is IPINFO_KEYS:
.env.exampletextIPINFO_KEYS=key1,key2,key3 (required, comma-separated)
PORT=8080
HOST=0.0.0.0
COOLDOWN_SECONDS=60 (how long a key stays benched after a 429/401)
REQUEST_TIMEOUT_MS=5000 (upstream request timeout)
CACHE_TTL_SECONDS=300 (how long cache entries live)
CACHE_MAX_ENTRIES=10000 (max number of entries before eviction)
CACHE_MAX_BODY_BYTES=32768 (responses bigger than this won't be cached)
LOG_FORMAT=text (text or json)
IPINFO_BASE_URL=https://ipinfo.io (override for local testing)The IPINFO_BASE_URL override is particularly useful for tests — the integration test suite spins up a wiremock server and points the proxy at it so no real network calls happen.
Stack
- axum 0.7 for the HTTP server and routing
- tokio 1 as the async runtime
- reqwest 0.12 for the upstream HTTP client
- moka 0.12 for the async cache
- async_singleflight 0.4 for request coalescing
- parking_lot for the faster mutex in the rotator
- tracing + tracing-subscriber for structured logs
- dotenvy for .env support
- serde + serde_json for JSON serialization
- wiremock 0.6 + tower 0.5 for integration tests
Wrapping Up
Honestly this started as a "let me just throw something together quickly" project and ended up being a nice exercise in thinking about layered caching and concurrency in Rust. The singleflight part in particular — I'd read about the pattern before but hadn't implemented it, and adding it and watching the upstream call count drop significantly on load was satisfying.
If you're using IPInfo or any other rate-limited API heavily, this pattern is worth knowing. The combination of key pooling, request coalescing, and smart caching means you're not just multiplying your quota, you're also drastically reducing how often you actually need to use it.
Source Code https://github.com/zeljkovranjes/ipinfo-round-robin-api