Traffic Shaping with Digital Inductors

distributed-systems

software-engineering

Distributed system burst suppression and rate limiting, with less configuration.

Author

Adam Fillion

Published

March 21, 2026

Code

import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

np.random.seed(42)

# Bursty traffic signal shared across visualizations
duration = 30.0
dt_step = 0.05
t = np.arange(0, duration, dt_step)

def _traffic_rate(tv):
    if tv < 5: return 0
    elif tv < 8: return 80
    else: return 0

traffic = np.array([_traffic_rate(ti) for ti in t])

Traffic shaping is a core topic in distributed systems. If your system has users, or if it’s built out of imperfect physical systems (spoiler: it is), then it’s exposed to chaos that needs to be contained (traffic shaped).

Chaos is a regional outage from an asteroid, or a flood, or a war, or it’s the Superbowl halftime show, and everything in between. The invariant: you don’t control it.

This chaos bumps into practical limits of the system. When this happens, systems tend to degrade. Increased latency, reduced throughput, failures, whatever. Engineers scramble to draw boxes around the chaos to constrain it. In practice, this looks like rate limiters, throttlers, load shedders, quotas, and other statically configured components sprinkled across the distributed system.

Years pass, engineers come and go, and the system ends up with a huge number of limits that need to be continuously maintained. An unfortunate, albeit worthy, tradeoff for resilience.

During my time at AWS, many of the availability issues I investigated were caused by millisecond scale bursts. For a short period of time, the system would burst to 10x its usual capacity, exhaust any limits we (or our dependencies) had, and return failures or worse. At minimum, we had to provision around these bursts, often resulting in double or triple the capacity requirements to maintain multiple-9’s availability targets.

Identifying different burst patterns and limiting each and every one of them sounds appealing at first, until you multiply the service out to dozens of cells or regions, each with different scale, and the dynamic nature of the service itself, and you realize you will need to not only come up with a sensible limit today, but for all days. Ultimately, we end up with codebases of configuration that are stale and sub-optimal, but too risky to change without thorough analysis.

The point of this article is to brainstorm a new approach to rate limiting inspired by inductors, which are found everywhere in electronics and resist rapid changes in current.

Before we get into anything novel, lets look at the standard approaches — token bucket, sliding window, leaky bucket — each shape traffic a bit differently:

Code

viewof config = {
  function slider(min, max, value, step, label) {
    const id = `rl-${label.replace(/\W/g, '')}`;
    const el = html`<div style="display:flex;align-items:center;gap:4px;margin:1px 0;">
      <label for="${id}" style="font-size:10px;color:#666;min-width:55px;white-space:nowrap;">${label}</label>
      <input id="${id}" type="range" min="${min}" max="${max}" value="${value}" step="${step}"
        style="flex:1;height:12px;margin:0;cursor:pointer;">
      <span style="font-size:10px;color:#888;min-width:24px;text-align:right;">${value}</span>
    </div>`;
    const inp = el.querySelector("input");
    const out = el.querySelector("span");
    inp.addEventListener("input", () => { out.textContent = inp.value; });
    el._input = inp;
    return el;
  }

  const tb1 = slider(5, 80, 30, 1, "Fill Rate");
  const tb2 = slider(5, 100, 20, 1, "Capacity");
  const sw1 = slider(0.5, 10, 2, 0.5, "Window");
  const sw2 = slider(10, 200, 60, 5, "Max Reqs");
  const lb1 = slider(5, 80, 30, 1, "Drain Rate");

  const form = html`<div style="display:grid;grid-template-columns:1fr 1fr 1fr;gap:6px 12px;margin-bottom:4px;">
    <div>
      <div style="font-weight:600;color:#10b981;font-size:11px;margin-bottom:2px;">Token Bucket</div>
      ${tb1}${tb2}
    </div>
    <div>
      <div style="font-weight:600;color:#f59e0b;font-size:11px;margin-bottom:2px;">Sliding Window</div>
      ${sw1}${sw2}
    </div>
    <div>
      <div style="font-weight:600;color:#ef4444;font-size:11px;margin-bottom:2px;">Leaky Bucket</div>
      ${lb1}
    </div>
  </div>`;

  function update() {
    form.value = {
      tbFillRate: +tb1._input.value,
      tbCapacity: +tb2._input.value,
      swWindow: +sw1._input.value,
      swLimit: +sw2._input.value,
      lbDrainRate: +lb1._input.value,
    };
    form.dispatchEvent(new Event("input", {bubbles: true}));
  }

  for (const s of [tb1, tb2, sw1, sw2, lb1])
    s._input.addEventListener("input", update);
  update();
  return form;
}

Show rate limiter simulation code

{
  const dt = 0.05;
  const duration = 30;
  const steps = Math.floor(duration / dt);

  // Generate traffic: silence, a single burst, then silence
  let times = [], input = [];
  for (let i = 0; i < steps; i++) {
    let t = i * dt;
    times.push(t);
    let rate;
    if (t < 5) rate = 0;
    else if (t < 8) rate = 80;
    else rate = 0;
    input.push(rate);
  }

  // Token Bucket: refills tokens at constant rate, spends one per request.
  // Stored tokens allow short bursts through; excess is queued and
  // released as tokens refill — no traffic is dropped.
  let tbTokens = config.tbCapacity;
  let tbQueue = 0;
  let tbOut = [];
  for (let i = 0; i < steps; i++) {
    let arriving = input[i] * dt;
    tbTokens = Math.min(config.tbCapacity, tbTokens + config.tbFillRate * dt);
    let fromQueue = Math.min(tbQueue, tbTokens);
    tbQueue -= fromQueue;
    tbTokens -= fromQueue;
    let passed = Math.min(arriving, tbTokens);
    tbTokens -= passed;
    tbQueue += (arriving - passed);
    tbOut.push((fromQueue + passed) / dt);
  }

  // Sliding Window: counts served requests in a rolling time window.
  // Serves up to the limit per window; excess is queued and released
  // as the window slides — no traffic is dropped.
  let swServed = [];
  let swQueue = 0;
  let swOut = [];
  for (let i = 0; i < steps; i++) {
    let arriving = input[i] * dt;
    let windowSteps = Math.floor(config.swWindow / dt);
    let inWindow = 0;
    for (let j = Math.max(0, i - windowSteps); j < i; j++) {
      inWindow += swServed[j];
    }
    let canServe = Math.max(0, config.swLimit - inWindow);
    let fromQueue = Math.min(swQueue, canServe);
    swQueue -= fromQueue;
    canServe -= fromQueue;
    let passed = Math.min(arriving, canServe);
    swQueue += (arriving - passed);
    let totalServed = fromQueue + passed;
    swServed.push(totalServed);
    swOut.push(totalServed / dt);
  }

  // Leaky Bucket: all traffic enters an unbounded queue that drains at
  // a constant rate — no traffic is dropped.
  let lbQueue = 0;
  let lbOut = [];
  for (let i = 0; i < steps; i++) {
    let arriving = input[i] * dt;
    lbQueue += arriving;
    let drained = Math.min(lbQueue, config.lbDrainRate * dt);
    lbQueue -= drained;
    lbOut.push(drained / dt);
  }

  // All series on a single chart
  let data = [];
  for (let i = 0; i < steps; i++) {
    data.push({t: times[i], rate: input[i], type: "Input Rate"});
    data.push({t: times[i], rate: tbOut[i], type: "Token Bucket"});
    data.push({t: times[i], rate: swOut[i], type: "Sliding Window"});
    data.push({t: times[i], rate: lbOut[i], type: "Leaky Bucket"});
  }

  return Plot.plot({
    marks: [
      Plot.areaY(data.filter(d => d.type === "Input Rate"), {
        x: "t", y: "rate", fill: "#6366f1", fillOpacity: 0.08
      }),
      Plot.lineY(data.filter(d => d.type === "Input Rate"), {
        x: "t", y: "rate", stroke: "#6366f1", strokeWidth: 2
      }),
      Plot.lineY(data.filter(d => d.type !== "Input Rate"), {
        x: "t", y: "rate", stroke: "type", strokeWidth: 2,
        strokeOpacity: 0.6, strokeDasharray: "6 3"
      }),
      Plot.ruleY([0])
    ],
    color: {
      domain: ["Input Rate", "Token Bucket", "Sliding Window", "Leaky Bucket"],
      range: ["#6366f1", "#10b981", "#f59e0b", "#ef4444"],
      legend: true
    },
    x: {label: "Time (arbitrary unit)"},
    y: {label: "Rate (arbitrary unit)", domain: [0, 100]},
    width: width,
    height: 400,
    marginLeft: 50
  });
}

Unfortunately, the configurations need to be maintained for the entire lifecycle of the system.

If all we want to do is suppress bursts, we can deploy one of these components but we inherit the configuration overhead as well, since as the system changes your definition of “burst” will also likely change. This is a classic “overimplementation” trap - if the requirement is just that bursts need to be suppressed, then using a rate limiter is an overimplementation and the delta between the requirement and the implemented behavior is unnecessary complexity.

So, let’s implement a burst suppressor. The best place to start is with an idealized notion of how this component should behave.

First, we want an invariant definition of “burst”, so that we won’t need to change it. Secondly, we want it to be invisible to static event rates, but resist rapid changes (like an inductor!).

We can use some pretty simple math to build a function that smooths bursts:

Code

viewof tau = {
  const id = "rl-tau";
  const el = html`<div style="display:flex;align-items:center;gap:4px;max-width:320px;margin:1px 0;">
    <label for="${id}" style="font-size:10px;color:#666;min-width:55px;white-space:nowrap;">Reactivity</label>
    <input id="${id}" type="range" min="0.5" max="10" value="3" step="0.25"
      style="flex:1;height:12px;margin:0;cursor:pointer;">
    <span style="font-size:10px;color:#888;min-width:24px;text-align:right;">3</span>
  </div>`;
  const inp = el.querySelector("input");
  const out = el.querySelector("span");
  inp.addEventListener("input", () => {
    out.textContent = inp.value;
    el.value = +inp.value;
    el.dispatchEvent(new Event("input", {bubbles: true}));
  });
  el.value = +inp.value;
  return el;
}

Show burst suppression simulation code

{
  const dt = 0.05;
  const duration = 30;
  const steps = Math.floor(duration / dt);

  // Same input signal as the rate limiter chart
  let times = [], input = [];
  for (let i = 0; i < steps; i++) {
    let t = i * dt;
    times.push(t);
    if (t < 5) input.push(0);
    else if (t < 8) input.push(80);
    else input.push(0);
  }

  // EWMA burst suppressor: smooths the rate estimate, queues events
  // when the instantaneous rate exceeds the smoothed rate.
  // tau comes from the slider above.
  let smoothedRate = 0;
  let queue = 0;
  let bsOut = [];
  for (let i = 0; i < steps; i++) {
    let arriving = input[i] * dt;
    let currentRate = input[i];

    // Time-aware EWMA smoothing factor (same α from the article)
    let alpha = 1 - Math.exp(-dt / tau);
    smoothedRate = alpha * currentRate + (1 - alpha) * smoothedRate;

    // Queue excess above the smoothed rate, drain queue at smoothed rate
    queue += arriving;
    let canServe = smoothedRate * dt;
    let served = Math.min(queue, canServe);
    queue -= served;
    bsOut.push(served / dt);
  }

  let data = [];
  for (let i = 0; i < steps; i++) {
    data.push({t: times[i], rate: input[i], type: "Input Rate"});
    data.push({t: times[i], rate: bsOut[i], type: "Allowed Output Rate"});
  }

  return Plot.plot({
    marks: [
      Plot.areaY(data.filter(d => d.type === "Input Rate"), {
        x: "t", y: "rate", fill: "#6366f1", fillOpacity: 0.08
      }),
      Plot.lineY(data.filter(d => d.type === "Input Rate"), {
        x: "t", y: "rate", stroke: "#6366f1", strokeWidth: 2
      }),
      Plot.lineY(data.filter(d => d.type === "Allowed Output Rate"), {
        x: "t", y: "rate", stroke: "#10b981", strokeWidth: 2,
        strokeOpacity: 0.7, strokeDasharray: "6 3"
      }),
      Plot.ruleY([0])
    ],
    color: {
      domain: ["Input Rate", "Allowed Output Rate"],
      range: ["#6366f1", "#10b981"],
      legend: true
    },
    x: {label: "Time (arbitrary unit)"},
    y: {label: "Rate (arbitrary unit)", domain: [0, 100]},
    width: width,
    height: 350,
    marginLeft: 50
  });
}

As the input instantaneously jumps, the allowed rate ramps up slowly, reducing the peak by ~40% in this case.

For configuration - this function needs a single constant that I call reactivity. This will be closely related to the “time scale” of the system, and likely won’t change throughout its lifetime. Low latency systems will have higher reactivity to bursts, and higher latency systems will have lower reactivity.

Now that our theoretical function is defined, we can start thinking about implementation.

This component would ideally have minimal memory and CPU footprint and handle any scale, so that it can be “sprinkled” throughout the codebase with minimal analysis.

My formal education is in electrical engineering, so naturally an inductor seems like the right solution: a “memoryless” (quantum experts might disagree) component that suppresses bursts of electrons, while being invisible to consistent streams (DC).

In the digital realm, we can do the same thing: track a smoothed estimate of the event rate, and when events arrive faster than the estimate, queue them and release at the smoothed rate. Events that arrive at or below the expected rate pass straight through — the component is invisible to steady-state traffic, just like an inductor is “invisible” to DC current.

The core of the solution is this exponentially weighted rate estimator: $\alpha = 1 - e^{-dt/\tau}$.

Full pseudocode

algorithm INDUCTOR_BURST_SUPPRESSION:

    state:
        smoothed_interval ← null      # EWMA of inter-arrival time
        last_arrival_time ← null
        last_output_time  ← null
        queue             ← []        # bounded FIFO

    parameter:
        τ                              # time constant (seconds)

    on_arrival(event, now):
        if last_arrival_time ≠ null:
            dt ← now - last_arrival_time
            if smoothed_interval = null:
                smoothed_interval ← dt
            else:
                α ← 1 - exp(-dt / τ)
                smoothed_interval ← α · dt + (1 - α) · smoothed_interval

        last_arrival_time ← now

        if can_forward(now):
            forward(event, now)
        else if queue is not full:
            queue.push(event)
            schedule_poll(now + smoothed_interval)
        else:
            drop(event)

    can_forward(now):
        if last_output_time = null:
            return true
        return (now - last_output_time) ≥ smoothed_interval

    forward(event, now):
        last_output_time ← now
        send event downstream

This is too academic for most, so let’s do some discrete analysis using happy simulator:

flowchart LR
    A["Input Events<br/>(variable rate)"] --> B["EWMA<br/>Estimator"]
    B -->|"smoothed rate"| C{"rate ><br/>estimate?"}
    C -->|yes| D["Queue<br/>(bounded FIFO)"]
    C -->|no| F["Output<br/>(shaped rate)"]
    D -->|"drain at<br/>smoothed rate"| F

Code

from dataclasses import dataclass

from happysimulator import Simulation, Source, Event, Entity, Instant
from happysimulator.components.rate_limiter.inductor import Inductor
from happysimulator.instrumentation.collectors import ThroughputTracker
from happysimulator.instrumentation.data import Data
from happysimulator.instrumentation.probe import Probe
from happysimulator.load.profile import Profile


class InputCounter(Entity):
    """Counts arrivals then forwards downstream."""

    def __init__(self, name: str, downstream: Entity) -> None:
        super().__init__(name)
        self.data = Data()
        self._downstream = downstream

    def handle_event(self, event: Event) -> list[Event]:
        self.data.add_stat(1.0, event.time)
        return [self.forward(event, self._downstream)]


@dataclass(frozen=True)
class InductorShowcaseProfile(Profile):
    base_rate: float = 10.0

    def get_rate(self, time: Instant) -> float:
        t = time.to_seconds()
        if t < 10.0:
            return self.base_rate
        if t < 28.0:
            return 50.0
        if t < 35.0:
            return self.base_rate
        if t < 60.0:
            fraction = (t - 35.0) / 25.0
            return self.base_rate + fraction * (50.0 - self.base_rate)
        if t < 67.0:
            return self.base_rate
        if 70.0 <= t < 75.0 or 79.0 <= t < 84.0:
            return 80.0
        return self.base_rate


output_tracker = ThroughputTracker("OutputTracker")
inductor = Inductor("Inductor", downstream=output_tracker, time_constant=2.0)
input_counter = InputCounter("InputCounter", downstream=inductor)

source = Source.with_profile(
    profile=InductorShowcaseProfile(),
    target=input_counter,
    event_type="Request",
    poisson=False,
    name="LoadGenerator",
)

rate_probe, rate_data = Probe.on(inductor, "estimated_rate", interval=0.1)
depth_probe, depth_data = Probe.on(inductor, "queue_depth", interval=0.1)

sim = Simulation(
    sources=[source],
    entities=[input_counter, inductor, output_tracker],
    probes=[rate_probe, depth_probe],
    duration=90.0,
)

summary = sim.run()
print(f"Simulated {summary.total_events_processed} events in {summary.wall_clock_seconds:.2f}s")
print(f"Inductor stats: {inductor.stats}")

Simulated 16333 events in 0.38s
Inductor stats: InductorStats(received=2820, forwarded=2076, queued=2520, dropped=0)

Code

# Input vs Output throughput
input_rate = input_counter.data.rate(window_s=1.0)
output_rate = output_tracker.data.rate(window_s=1.0)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=input_rate.times(), y=input_rate.raw_values(),
    mode='lines', name='Input',
    line=dict(color='#6366f1', width=1.5),
))

fig.add_trace(go.Scatter(
    x=output_rate.times(), y=output_rate.raw_values(),
    mode='lines', name='Output (after inductor)',
    line=dict(color='#10b981', width=2),
))

# Phase annotations
fig.add_vrect(x0=0, x1=28, fillcolor="#f0f0ff", opacity=0.3, line_width=0,
              annotation_text="Phase 1: Step-up", annotation_position="top left")
fig.add_vrect(x0=35, x1=60, fillcolor="#f0fff0", opacity=0.3, line_width=0,
              annotation_text="Phase 2: Ramp", annotation_position="top left")
fig.add_vrect(x0=67, x1=90, fillcolor="#fff0f0", opacity=0.3, line_width=0,
              annotation_text="Phase 3: Periodic bursts", annotation_position="top left")

fig.update_layout(
    title="Input vs Output Throughput",
    xaxis_title="Time (arbitrary unit)",
    yaxis_title="Throughput (req/s)",
    height=450,
    legend=dict(yanchor="top", y=0.99, xanchor="right", x=0.99),
)
fig.show()

Code

# EWMA rate estimate — the inductor's internal state
ewma_bucketed = rate_data.bucket(window_s=0.5)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=ewma_bucketed.times(), y=ewma_bucketed.means(),
    mode='lines', name='EWMA Rate Estimate',
    line=dict(color='#f59e0b', width=2),
))

fig.add_vrect(x0=0, x1=28, fillcolor="#f0f0ff", opacity=0.3, line_width=0)
fig.add_vrect(x0=35, x1=60, fillcolor="#f0fff0", opacity=0.3, line_width=0)
fig.add_vrect(x0=67, x1=90, fillcolor="#fff0f0", opacity=0.3, line_width=0)

fig.update_layout(
    title="EWMA Rate Estimate (internal state)",
    xaxis_title="Time (arbitrary unit)",
    yaxis_title="Estimated rate (req/s)",
    height=350,
    showlegend=False,
)
fig.show()

In Phase 1, the inductor smoothly ramps up output throughput rather than instantly passing through the 5x spike. The queue absorbs the burst while the EWMA estimate gradually adapts.

In Phase 2, the linear ramp passes through almost unimpeded — the rate of change is slow enough to go mostly unnoticed by the inductor.

In Phase 3, the two bursts are close enough that the EWMA retains memory from the first burst when the second arrives. The inductor reacts faster to the second spike because it hasn’t fully decayed back to baseline.

Admittedly, I haven’t implemented this at a large scale anywhere in practice. I think this may be useful if used in conjunction with distributed throttlers - which are often required to provide deterministic limits to external users, but have convergence times that allow bursts to sneak through and wreak havoc.

References

Raghavan, B. et al. “Cloud Control with Distributed Rate Limiting.” Microsoft Research / SIGCOMM. — Uses an EWMA filter to smooth local rate measurements when coordinating limits across distributed nodes.
Luxenberg, E. and Boyd, S. “Exponentially Weighted Moving Models.” Stanford. — Foundational treatment of EWMA estimators, including parameter selection and convergence properties.
VividCortex. “ewma: Exponentially Weighted Moving Average algorithms for Go.” GitHub. — Production-grade Go implementation showing the recursive update pattern used inside the inductor.
“Adaptive Rate Limiting Strategies for Cloud-Native APIs.” The Backend Developers. — Survey of feedback-loop rate limiters (PID, EWMA, AIMD) layered on top of token-bucket and sliding-window primitives.

--- title: "Traffic Shaping with Digital Inductors" description: "Distributed system burst suppression and rate limiting, with less configuration." author: "Adam Fillion" date: "2026-03-21" categories: [distributed-systems, software-engineering] draft: false --- ```{python} #| label: setup #| code-fold: true import numpy as np import plotly.graph_objects as go from plotly.subplots import make_subplots np.random.seed(42) # Bursty traffic signal shared across visualizations duration = 30.0 dt_step = 0.05 t = np.arange(0, duration, dt_step) def _traffic_rate(tv): if tv < 5: return 0 elif tv < 8: return 80 else: return 0 traffic = np.array([_traffic_rate(ti) for ti in t]) ``` Traffic shaping is a core topic in distributed systems. If your system has users, or if it's built out of imperfect physical systems (spoiler: it is), then it's exposed to chaos that needs to be contained (traffic shaped). Chaos is a regional outage from an asteroid, or a flood, or a war, or it's the Superbowl halftime show, and everything in between. The invariant: you don't control it. This chaos bumps into practical limits of the system. When this happens, systems tend to degrade. Increased latency, reduced throughput, failures, whatever. Engineers scramble to draw boxes around the chaos to constrain it. In practice, this looks like rate limiters, throttlers, load shedders, quotas, and other statically configured components sprinkled across the distributed system. Years pass, engineers come and go, and the system ends up with a huge number of limits that need to be continuously maintained. An unfortunate, albeit worthy, tradeoff for resilience. During my time at AWS, many of the availability issues I investigated were caused by millisecond scale bursts. For a short period of time, the system would burst to 10x its usual capacity, exhaust any limits we (or our dependencies) had, and return failures or [worse](https://adamfillion.com/appendix/metastability/). At minimum, we had to provision around these bursts, often resulting in double or triple the capacity requirements to maintain multiple-9's availability targets. Identifying different burst patterns and limiting each and every one of them sounds appealing at first, until you multiply the service out to dozens of cells or regions, each with different scale, and the dynamic nature of the service itself, and you realize you will need to not only come up with a sensible limit today, but for all days. Ultimately, we end up with codebases of configuration that are stale and sub-optimal, but too risky to change without thorough analysis. The point of this article is to brainstorm a new approach to rate limiting inspired by inductors, which are found everywhere in electronics and resist rapid changes in current. Before we get into anything novel, lets look at the standard approaches — token bucket, sliding window, leaky bucket — each shape traffic a bit differently: ```{ojs} //| label: rate-limiter-controls //| echo: false viewof config = { function slider(min, max, value, step, label) { const id = `rl-${label.replace(/\W/g, '')}`; const el = html`<div style="display:flex;align-items:center;gap:4px;margin:1px 0;"> <label for="${id}" style="font-size:10px;color:#666;min-width:55px;white-space:nowrap;">${label}</label> <input id="${id}" type="range" min="${min}" max="${max}" value="${value}" step="${step}" style="flex:1;height:12px;margin:0;cursor:pointer;"> <span style="font-size:10px;color:#888;min-width:24px;text-align:right;">${value}</span> </div>`; const inp = el.querySelector("input"); const out = el.querySelector("span"); inp.addEventListener("input", () => { out.textContent = inp.value; }); el._input = inp; return el; } const tb1 = slider(5, 80, 30, 1, "Fill Rate"); const tb2 = slider(5, 100, 20, 1, "Capacity"); const sw1 = slider(0.5, 10, 2, 0.5, "Window"); const sw2 = slider(10, 200, 60, 5, "Max Reqs"); const lb1 = slider(5, 80, 30, 1, "Drain Rate"); const form = html`<div style="display:grid;grid-template-columns:1fr 1fr 1fr;gap:6px 12px;margin-bottom:4px;"> <div> <div style="font-weight:600;color:#10b981;font-size:11px;margin-bottom:2px;">Token Bucket</div> ${tb1}${tb2} </div> <div> <div style="font-weight:600;color:#f59e0b;font-size:11px;margin-bottom:2px;">Sliding Window</div> ${sw1}${sw2} </div> <div> <div style="font-weight:600;color:#ef4444;font-size:11px;margin-bottom:2px;">Leaky Bucket</div> ${lb1} </div> </div>`; function update() { form.value = { tbFillRate: +tb1._input.value, tbCapacity: +tb2._input.value, swWindow: +sw1._input.value, swLimit: +sw2._input.value, lbDrainRate: +lb1._input.value, }; form.dispatchEvent(new Event("input", {bubbles: true})); } for (const s of [tb1, tb2, sw1, sw2, lb1]) s._input.addEventListener("input", update); update(); return form; } ``` ```{ojs} //| label: rate-limiter-chart //| code-fold: true //| code-summary: "Show rate limiter simulation code" { const dt = 0.05; const duration = 30; const steps = Math.floor(duration / dt); // Generate traffic: silence, a single burst, then silence let times = [], input = []; for (let i = 0; i < steps; i++) { let t = i * dt; times.push(t); let rate; if (t < 5) rate = 0; else if (t < 8) rate = 80; else rate = 0; input.push(rate); } // Token Bucket: refills tokens at constant rate, spends one per request. // Stored tokens allow short bursts through; excess is queued and // released as tokens refill — no traffic is dropped. let tbTokens = config.tbCapacity; let tbQueue = 0; let tbOut = []; for (let i = 0; i < steps; i++) { let arriving = input[i] * dt; tbTokens = Math.min(config.tbCapacity, tbTokens + config.tbFillRate * dt); let fromQueue = Math.min(tbQueue, tbTokens); tbQueue -= fromQueue; tbTokens -= fromQueue; let passed = Math.min(arriving, tbTokens); tbTokens -= passed; tbQueue += (arriving - passed); tbOut.push((fromQueue + passed) / dt); } // Sliding Window: counts served requests in a rolling time window. // Serves up to the limit per window; excess is queued and released // as the window slides — no traffic is dropped. let swServed = []; let swQueue = 0; let swOut = []; for (let i = 0; i < steps; i++) { let arriving = input[i] * dt; let windowSteps = Math.floor(config.swWindow / dt); let inWindow = 0; for (let j = Math.max(0, i - windowSteps); j < i; j++) { inWindow += swServed[j]; } let canServe = Math.max(0, config.swLimit - inWindow); let fromQueue = Math.min(swQueue, canServe); swQueue -= fromQueue; canServe -= fromQueue; let passed = Math.min(arriving, canServe); swQueue += (arriving - passed); let totalServed = fromQueue + passed; swServed.push(totalServed); swOut.push(totalServed / dt); } // Leaky Bucket: all traffic enters an unbounded queue that drains at // a constant rate — no traffic is dropped. let lbQueue = 0; let lbOut = []; for (let i = 0; i < steps; i++) { let arriving = input[i] * dt; lbQueue += arriving; let drained = Math.min(lbQueue, config.lbDrainRate * dt); lbQueue -= drained; lbOut.push(drained / dt); } // All series on a single chart let data = []; for (let i = 0; i < steps; i++) { data.push({t: times[i], rate: input[i], type: "Input Rate"}); data.push({t: times[i], rate: tbOut[i], type: "Token Bucket"}); data.push({t: times[i], rate: swOut[i], type: "Sliding Window"}); data.push({t: times[i], rate: lbOut[i], type: "Leaky Bucket"}); } return Plot.plot({ marks: [ Plot.areaY(data.filter(d => d.type === "Input Rate"), { x: "t", y: "rate", fill: "#6366f1", fillOpacity: 0.08 }), Plot.lineY(data.filter(d => d.type === "Input Rate"), { x: "t", y: "rate", stroke: "#6366f1", strokeWidth: 2 }), Plot.lineY(data.filter(d => d.type !== "Input Rate"), { x: "t", y: "rate", stroke: "type", strokeWidth: 2, strokeOpacity: 0.6, strokeDasharray: "6 3" }), Plot.ruleY([0]) ], color: { domain: ["Input Rate", "Token Bucket", "Sliding Window", "Leaky Bucket"], range: ["#6366f1", "#10b981", "#f59e0b", "#ef4444"], legend: true }, x: {label: "Time (arbitrary unit)"}, y: {label: "Rate (arbitrary unit)", domain: [0, 100]}, width: width, height: 400, marginLeft: 50 }); } ``` Unfortunately, the configurations need to be maintained for the entire lifecycle of the system. If all we want to do is suppress bursts, we can deploy one of these components but we inherit the configuration overhead as well, since as the system changes your definition of "burst" will also likely change. This is a classic "overimplementation" trap - if the requirement is **just** that bursts need to be suppressed, then using a rate limiter is an overimplementation and the delta between the requirement and the implemented behavior is unnecessary complexity. So, let's implement a burst suppressor. The best place to start is with an idealized notion of how this component should behave. First, we want an invariant definition of "burst", so that we won't need to change it. Secondly, we want it to be invisible to static event rates, but resist rapid changes (like an inductor!). We can use some pretty simple math to build a function that smooths bursts: ```{ojs} //| label: idealized-controls //| echo: false viewof tau = { const id = "rl-tau"; const el = html`<div style="display:flex;align-items:center;gap:4px;max-width:320px;margin:1px 0;"> <label for="${id}" style="font-size:10px;color:#666;min-width:55px;white-space:nowrap;">Reactivity</label> <input id="${id}" type="range" min="0.5" max="10" value="3" step="0.25" style="flex:1;height:12px;margin:0;cursor:pointer;"> <span style="font-size:10px;color:#888;min-width:24px;text-align:right;">3</span> </div>`; const inp = el.querySelector("input"); const out = el.querySelector("span"); inp.addEventListener("input", () => { out.textContent = inp.value; el.value = +inp.value; el.dispatchEvent(new Event("input", {bubbles: true})); }); el.value = +inp.value; return el; } ``` ```{ojs} //| label: idealized-behavior //| code-fold: true //| code-summary: "Show burst suppression simulation code" { const dt = 0.05; const duration = 30; const steps = Math.floor(duration / dt); // Same input signal as the rate limiter chart let times = [], input = []; for (let i = 0; i < steps; i++) { let t = i * dt; times.push(t); if (t < 5) input.push(0); else if (t < 8) input.push(80); else input.push(0); } // EWMA burst suppressor: smooths the rate estimate, queues events // when the instantaneous rate exceeds the smoothed rate. // tau comes from the slider above. let smoothedRate = 0; let queue = 0; let bsOut = []; for (let i = 0; i < steps; i++) { let arriving = input[i] * dt; let currentRate = input[i]; // Time-aware EWMA smoothing factor (same α from the article) let alpha = 1 - Math.exp(-dt / tau); smoothedRate = alpha * currentRate + (1 - alpha) * smoothedRate; // Queue excess above the smoothed rate, drain queue at smoothed rate queue += arriving; let canServe = smoothedRate * dt; let served = Math.min(queue, canServe); queue -= served; bsOut.push(served / dt); } let data = []; for (let i = 0; i < steps; i++) { data.push({t: times[i], rate: input[i], type: "Input Rate"}); data.push({t: times[i], rate: bsOut[i], type: "Allowed Output Rate"}); } return Plot.plot({ marks: [ Plot.areaY(data.filter(d => d.type === "Input Rate"), { x: "t", y: "rate", fill: "#6366f1", fillOpacity: 0.08 }), Plot.lineY(data.filter(d => d.type === "Input Rate"), { x: "t", y: "rate", stroke: "#6366f1", strokeWidth: 2 }), Plot.lineY(data.filter(d => d.type === "Allowed Output Rate"), { x: "t", y: "rate", stroke: "#10b981", strokeWidth: 2, strokeOpacity: 0.7, strokeDasharray: "6 3" }), Plot.ruleY([0]) ], color: { domain: ["Input Rate", "Allowed Output Rate"], range: ["#6366f1", "#10b981"], legend: true }, x: {label: "Time (arbitrary unit)"}, y: {label: "Rate (arbitrary unit)", domain: [0, 100]}, width: width, height: 350, marginLeft: 50 }); } ``` As the input instantaneously jumps, the allowed rate ramps up slowly, reducing the peak by ~40% in this case. For configuration - this function needs a single constant that I call reactivity. This will be closely related to the "time scale" of the system, and likely won't change throughout its lifetime. Low latency systems will have higher reactivity to bursts, and higher latency systems will have lower reactivity. Now that our theoretical function is defined, we can start thinking about implementation. This component would ideally have minimal memory and CPU footprint and handle any scale, so that it can be "sprinkled" throughout the codebase with minimal analysis. My formal education is in electrical engineering, so naturally an inductor seems like the right solution: a "memoryless" (quantum experts might disagree) component that suppresses bursts of electrons, while being invisible to consistent streams (DC). In the digital realm, we can do the same thing: track a smoothed estimate of the event rate, and when events arrive faster than the estimate, queue them and release at the smoothed rate. Events that arrive at or below the expected rate pass straight through — the component is invisible to steady-state traffic, just like an inductor is "invisible" to DC current. The core of the solution is this exponentially weighted rate estimator: $\alpha = 1 - e^{-dt/\tau}$. ::: {.callout-tip collapse="true"} ## Full pseudocode ``` algorithm INDUCTOR_BURST_SUPPRESSION: state: smoothed_interval ← null # EWMA of inter-arrival time last_arrival_time ← null last_output_time ← null queue ← [] # bounded FIFO parameter: τ # time constant (seconds) on_arrival(event, now): if last_arrival_time ≠ null: dt ← now - last_arrival_time if smoothed_interval = null: smoothed_interval ← dt else: α ← 1 - exp(-dt / τ) smoothed_interval ← α · dt + (1 - α) · smoothed_interval last_arrival_time ← now if can_forward(now): forward(event, now) else if queue is not full: queue.push(event) schedule_poll(now + smoothed_interval) else: drop(event) can_forward(now): if last_output_time = null: return true return (now - last_output_time) ≥ smoothed_interval forward(event, now): last_output_time ← now send event downstream ``` ::: This is too academic for most, so let's do some discrete analysis using [happy simulator](https://github.com/adamfilli/happy-simulator/tree/main): ```{mermaid} %%| fig-width: 6 flowchart LR A["Input Events<br/>(variable rate)"] --> B["EWMA<br/>Estimator"] B -->|"smoothed rate"| C{"rate ><br/>estimate?"} C -->|yes| D["Queue<br/>(bounded FIFO)"] C -->|no| F["Output<br/>(shaped rate)"] D -->|"drain at<br/>smoothed rate"| F ``` ```{python} #| label: simulation #| code-fold: true from dataclasses import dataclass from happysimulator import Simulation, Source, Event, Entity, Instant from happysimulator.components.rate_limiter.inductor import Inductor from happysimulator.instrumentation.collectors import ThroughputTracker from happysimulator.instrumentation.data import Data from happysimulator.instrumentation.probe import Probe from happysimulator.load.profile import Profile class InputCounter(Entity): """Counts arrivals then forwards downstream.""" def __init__(self, name: str, downstream: Entity) -> None: super().__init__(name) self.data = Data() self._downstream = downstream def handle_event(self, event: Event) -> list[Event]: self.data.add_stat(1.0, event.time) return [self.forward(event, self._downstream)] @dataclass(frozen=True) class InductorShowcaseProfile(Profile): base_rate: float = 10.0 def get_rate(self, time: Instant) -> float: t = time.to_seconds() if t < 10.0: return self.base_rate if t < 28.0: return 50.0 if t < 35.0: return self.base_rate if t < 60.0: fraction = (t - 35.0) / 25.0 return self.base_rate + fraction * (50.0 - self.base_rate) if t < 67.0: return self.base_rate if 70.0 <= t < 75.0 or 79.0 <= t < 84.0: return 80.0 return self.base_rate output_tracker = ThroughputTracker("OutputTracker") inductor = Inductor("Inductor", downstream=output_tracker, time_constant=2.0) input_counter = InputCounter("InputCounter", downstream=inductor) source = Source.with_profile( profile=InductorShowcaseProfile(), target=input_counter, event_type="Request", poisson=False, name="LoadGenerator", ) rate_probe, rate_data = Probe.on(inductor, "estimated_rate", interval=0.1) depth_probe, depth_data = Probe.on(inductor, "queue_depth", interval=0.1) sim = Simulation( sources=[source], entities=[input_counter, inductor, output_tracker], probes=[rate_probe, depth_probe], duration=90.0, ) summary = sim.run() print(f"Simulated {summary.total_events_processed} events in {summary.wall_clock_seconds:.2f}s") print(f"Inductor stats: {inductor.stats}") ``` ```{python} #| label: throughput-chart #| code-fold: true # Input vs Output throughput input_rate = input_counter.data.rate(window_s=1.0) output_rate = output_tracker.data.rate(window_s=1.0) fig = go.Figure() fig.add_trace(go.Scatter( x=input_rate.times(), y=input_rate.raw_values(), mode='lines', name='Input', line=dict(color='#6366f1', width=1.5), )) fig.add_trace(go.Scatter( x=output_rate.times(), y=output_rate.raw_values(), mode='lines', name='Output (after inductor)', line=dict(color='#10b981', width=2), )) # Phase annotations fig.add_vrect(x0=0, x1=28, fillcolor="#f0f0ff", opacity=0.3, line_width=0, annotation_text="Phase 1: Step-up", annotation_position="top left") fig.add_vrect(x0=35, x1=60, fillcolor="#f0fff0", opacity=0.3, line_width=0, annotation_text="Phase 2: Ramp", annotation_position="top left") fig.add_vrect(x0=67, x1=90, fillcolor="#fff0f0", opacity=0.3, line_width=0, annotation_text="Phase 3: Periodic bursts", annotation_position="top left") fig.update_layout( title="Input vs Output Throughput", xaxis_title="Time (arbitrary unit)", yaxis_title="Throughput (req/s)", height=450, legend=dict(yanchor="top", y=0.99, xanchor="right", x=0.99), ) fig.show() ``` ```{python} #| label: ewma-chart #| code-fold: true # EWMA rate estimate — the inductor's internal state ewma_bucketed = rate_data.bucket(window_s=0.5) fig = go.Figure() fig.add_trace(go.Scatter( x=ewma_bucketed.times(), y=ewma_bucketed.means(), mode='lines', name='EWMA Rate Estimate', line=dict(color='#f59e0b', width=2), )) fig.add_vrect(x0=0, x1=28, fillcolor="#f0f0ff", opacity=0.3, line_width=0) fig.add_vrect(x0=35, x1=60, fillcolor="#f0fff0", opacity=0.3, line_width=0) fig.add_vrect(x0=67, x1=90, fillcolor="#fff0f0", opacity=0.3, line_width=0) fig.update_layout( title="EWMA Rate Estimate (internal state)", xaxis_title="Time (arbitrary unit)", yaxis_title="Estimated rate (req/s)", height=350, showlegend=False, ) fig.show() ``` In Phase 1, the inductor smoothly ramps up output throughput rather than instantly passing through the 5x spike. The queue absorbs the burst while the EWMA estimate gradually adapts. In Phase 2, the linear ramp passes through almost unimpeded — the rate of change is slow enough to go mostly unnoticed by the inductor. In Phase 3, the two bursts are close enough that the EWMA retains memory from the first burst when the second arrives. The inductor reacts faster to the second spike because it hasn't fully decayed back to baseline. Admittedly, I haven't implemented this at a large scale anywhere in practice. I think this may be useful if used in conjunction with distributed throttlers - which are often required to provide deterministic limits to external users, but have convergence times that allow bursts to sneak through and wreak havoc. ## References 1. Raghavan, B. et al. "[Cloud Control with Distributed Rate Limiting](https://www.microsoft.com/en-us/research/wp-content/uploads/2007/01/fp076-raghavan.pdf)." *Microsoft Research / SIGCOMM*. — Uses an EWMA filter to smooth local rate measurements when coordinating limits across distributed nodes. 2. Luxenberg, E. and Boyd, S. "[Exponentially Weighted Moving Models](https://web.stanford.edu/~boyd/papers/pdf/ewmm.pdf)." *Stanford*. — Foundational treatment of EWMA estimators, including parameter selection and convergence properties. 3. VividCortex. "[ewma: Exponentially Weighted Moving Average algorithms for Go](https://github.com/VividCortex/ewma)." *GitHub*. — Production-grade Go implementation showing the recursive update pattern used inside the inductor. 4. "[Adaptive Rate Limiting Strategies for Cloud-Native APIs](https://thebackenddevelopers.substack.com/p/adaptive-rate-limiting-strategies)." *The Backend Developers*. — Survey of feedback-loop rate limiters (PID, EWMA, AIMD) layered on top of token-bucket and sliding-window primitives.

About Me

Links

Traffic Shaping with Digital Inductors

References