In Search of Intuition: Transforms

probability
statistics
simulation
From guitar tuners to discrete event simulators
Author

Adam Fillion

Published

January 23, 2026

As an electrical engineering undergrad, a minor project I completed was an automatic guitar tuner. An inductive sensor detects the guitar string movement, we convert this to a digital signal, analyze it, manipulate the string tensions with some motors we found in a box somewhere, and hopefully end up with an accurately tuned guitar.

Automatic Guitar Tuner

Steve Jobs probably wouldn’t have been fond of our industrial design, but this was a scrappy prototype.

At the core of this project is digital signal analysis — converting a time series signal to the frequency domain so that we can easily identify the dominant frequency and make adjustments.

The raw signal looked something like this:

import numpy as np
import matplotlib.pyplot as plt

sample_rate = 4000
duration = 0.1
t = np.linspace(0, duration, int(sample_rate * duration))

fundamental = 82.41  # E2 string
signal = (np.sin(2*np.pi*fundamental*t) +
          0.5*np.sin(2*np.pi*2*fundamental*t) +
          0.3*np.sin(2*np.pi*3*fundamental*t) +
          1*np.random.randn(len(t)))

plt.figure(figsize=(12, 3))
plt.plot(t*1000, signal, 'b-', linewidth=0.8)
plt.xlabel('Time (ms)')
plt.ylabel('Amplitude')
plt.title('Raw signal')
plt.xlim(0, 50)
plt.show()

Although it’s easy for humans to visually spot the dominant frequency with a fuzzy peak to peak analysis, it’s much more difficult to do so programmatically within the time domain. All the software sees is a stream of numbers representing an amplitude - and devising algorithms for this peak analysis can quickly become unwieldy due to noise, or multiple simultaneous string pluckings. I challenge you to write an algorithm for this that can run on an low power Arduino circuit board.

So instead of struggling with time series data points, we filter the signal with a bandpass filter, then apply a Fourier transform [2] to “transform” the X-axis from time to frequency.

from scipy.fft import fft, fftfreq
from scipy.signal import butter, lfilter

# Butterworth bandpass filter for environmental noise
low, high = 10, 500
nyq = sample_rate / 2
b, a = butter(4, [low/nyq, high/nyq], btype='band')
filtered_signal = lfilter(b, a, signal)

N = len(t)
frequencies = fftfreq(N, 1/sample_rate)[:N//2]
spectrum = np.abs(fft(filtered_signal))[:N//2] * 2/N

plt.figure(figsize=(12, 3))
plt.plot(frequencies, spectrum, 'purple', linewidth=1.5)
plt.xlabel('Frequency (Hz)')
plt.ylabel('Magnitude')
plt.title('After transform')
plt.xlim(0, 500)
plt.axvline(x=82.41, color='red', linestyle='--', alpha=0.7, label='E2 = 82.41 Hz')
plt.legend()
plt.show()

The spike at ~82 Hz is our E string. Harmonics are the smaller spikes [5]. From here, the tuning algorithm is trivial.

This is indistinguishable from magic (if you’re a Renaissance era academic).

Years Later….

Since my undergrad, I haven’t used transforms much - it turns out they don’t come up much in distributed systems. Fortunately, I recently needed them to build a component of a discrete event simulator I am working on.

When running simulations, you may want to manipulate the simulation using a time based profile - for example to change a variable as a function of time or, in my case, generate events at a certain rate [3].

Code
import numpy as np
import matplotlib.pyplot as plt

t = np.linspace(0, 24, 1000)  # 24 hour period

def rate(t):
    # Base rate with morning ramp, evening cooldown
    base = 5 + 10 * np.exp(-((t - 12)**2) / 20)  # Peak at noon
    # Traffic spikes
    spike1 = 15 * np.exp(-((t - 9)**2) / 0.5)   # 9am spike
    spike2 = 20 * np.exp(-((t - 17)**2) / 0.3)  # 5pm spike
    return base + spike1 + spike2

plt.figure(figsize=(12, 3))
plt.plot(t, rate(t), 'r-', linewidth=2)
plt.fill_between(t, 0, rate(t), alpha=0.2, color='red')
plt.xlabel('Time (hours)')
plt.ylabel('Rate (events/sec)')
plt.title('Example: Event rate profile over 24 hours')
plt.xlim(0, 24)
plt.show()

One thing you need to know about many simulations is that they are event-based, every event has a timestamp, and this defines the ordering of the simulation, but nothing “happens” in between events - the event itself is what moves the simulation time forward.

The simulation core is a time ordered heap, where events are pushed and popped from the heap as the simulation progresses and comes to life.

The other thing to know is that, like in real life, many processes are random and uncorrelated, and this also needs to be modeled by the simulation.

So the problem statement is:

NoteProblem Statement

Given an arbitrary rate function \(\lambda(t)\) defining the Poisson rate of event \(E\) at time \(t\):

  1. Generate events with inter-arrival times following the variable (non-homogeneous) Poisson process [7]
  2. Cannot pre-generate—billions of events over the simulation lifetime due to memory constraints
  3. Must be statistically accurate
  4. No “ticks” or polling—nothing happens between events, so no opportunity to periodically wake up and check the rate

(Approach 1) Pre-generate events

Generate all events upfront before the simulation starts.

Why it fails: Memory, there may be billions of events to pregenerate.

(Approach 2) Event Daisy Chaining

Every event that occurs, obtain the rate from the function, and sample an interarrival time [6]: \(T_{next} = T_{current} + \frac{E}{\lambda(T_{current})}\) where \(E\) is a random number from an exponential distribution.

Why it fails:

  1. Rate blindness. You sample based on the rate at the current event. If the rate defined by f(t) changes during the interarrival time - this is not captured. Therefore, it’s inaccurate.

  2. Zero rate deadlock. If \(\lambda(t) = 0\) at any point, the inter-arrival time is \(E / 0 = \infty\). The simulation sleeps forever.

(Approach 3) Inverse Transform [3] [4]

The problem is that we’re in the wrong domain to solve this problem.

Instead of asking “given rate \(\lambda\), what’s the next time?”, ask: “at what time \(T\) does the accumulated rate (the area under the integral curve) reach a random threshold (a random variable I sample)?”

\[\int_{t_{current}}^{T_{next}} \lambda(s) \, ds = E\]

where \(E\) is a random number sampled from an exponential distribution [6] (the “target area” to accumulate).

In terms of code implementation, all I need to do is select a random number from an exponential distribution representing the desired area to accumulate, and solve an integral from \(T_{now}\) (known) to \(T_{next}\) (unknown).

Here is my best attempt to visualize the process. The implementation can be found in happy-simulator.

This handles both problems: the integral accounts for all rate changes between events, and zero-rate regions simply contribute zero area—the search continues until the integral reaches the target.

Events can now be daisy chained together and we can incrementally solve for the next event throughout the simulation, minimizing the number of items in the heap while still modelling a statistically accurate rate function [4].

References

  1. Fillion, Adam. “happy-simulator.” GitHub.

  2. Lyons, Richard G. “Understanding Digital Signal Processing.” Pearson Education.

  3. Lewis, P. A. W. & Shedler, G. S. “Simulation Methods for Poisson Processes in Nonstationary Systems.” Proceedings of the 1978 Winter Simulation Conference.

  4. Brown, E. N., Barbieri, R., Ventura, V., Kass, R. E. & Frank, L. M. “The Time-Rescaling Theorem and Its Application to Neural Spike Train Data Analysis.” Neural Computation.

  5. How FFT Works in Pitch Detection.” Pitch Detector.

  6. Goldsman, David. “Random Variate Generation.” Georgia Institute of Technology.

  7. Papp, Tamas. “Input Modelling for Non-Stationary Discrete-Event Simulation.” Lancaster University.

  8. Fillion, Adam. “Transforms Reference.” Appendix.