Introduction
Fraud detection in transaction data is mostly SQL. Not machine learning, not graph databases. SQL, run against the right tables, with the right joins, looking for the right shapes. This post covers six patterns that work across credit cards, healthcare claims, e-commerce, and government benefits.
1. Velocity
The simplest pattern: someone with a stolen card wants to drain it fast. They hit the card repeatedly in a short window.
SELECT
cardholder_id,
date_trunc('hour', timestamp) AS hour_bucket,
count(*) AS tx_count,
min(timestamp) AS first_tx,
max(timestamp) AS last_tx
FROM transactions
WHERE timestamp >= current_date - INTERVAL '30 days'
GROUP BY 1, 2
HAVING count(*) > 10;
Tune two knobs: window size and count threshold. Run 1-minute, 5-minute, and 1-hour versions in parallel. Different fraud shows up at different scales—a card-testing ring hits in seconds; a benefits-trafficking ring might take an afternoon.
For sliding-window velocity:
SELECT
cardholder_id,
timestamp,
count(*) OVER (
PARTITION BY cardholder_id
ORDER BY timestamp
RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW
) AS tx_in_last_5min
FROM transactions
QUALIFY tx_in_last_5min >= 5
ORDER BY cardholder_id, timestamp;
QUALIFY works in Snowflake, BigQuery, Databricks, Teradata. For Postgres, wrap in a CTE and filter outside.
2. Impossible Travel
If a card swipes in Chicago and seven minutes later in Los Angeles, one of those swipes is fake. The card is cloned.
WITH ordered_tx AS (
SELECT
cardholder_id,
timestamp,
location,
LAG(timestamp) OVER (PARTITION BY cardholder_id ORDER BY timestamp) AS prev_ts,
LAG(location) OVER (PARTITION BY cardholder_id ORDER BY timestamp) AS prev_loc
FROM transactions
)
SELECT
cardholder_id,
prev_ts AS first_tx,
timestamp AS second_tx,
prev_loc AS first_location,
location AS second_location,
EXTRACT(EPOCH FROM (timestamp - prev_ts)) / 60 AS minutes_apart,
haversine(prev_loc, location) AS miles_apart
FROM ordered_tx
WHERE prev_ts IS NOT NULL
AND prev_loc <> location
AND haversine(prev_loc, location)
/ nullif(EXTRACT(EPOCH FROM (timestamp - prev_ts)), 0)
* 3600 > 600;
haversine is the great-circle distance function. Most warehouses ship one. The 600 mph threshold is roughly commercial jet cruise speed (575 mph). Tighten to 100 mph to catch suspiciously-fast ground travel, but expect more false positives.
Other variants: two distant cities same state inside 5 minutes (local cloning rings), multiple ZIP codes inside an hour (skimmer rings), border crossings inside 10 minutes (international rings).
3. Amount Anomalies
Certain amounts appear disproportionately in fraud and almost never in normal use.
SELECT cardholder_id, timestamp, amount, merchant_id
FROM transactions
WHERE
(amount >= 99.50 AND amount < 100.00)
OR (amount >= 499.50 AND amount < 500.00)
OR amount IN (1.00, 5.00, 10.00)
ORDER BY cardholder_id, timestamp;
Round dollar amounts at small values ($1.00, $5.00, $10.00) are almost always card tests. Real cardholders almost never buy something for exactly $1.00—coffee is $4.73, gas is $52.81. Amounts just below a threshold ($99.99, $499.99) suggest someone knows the rules and is staying under them (ID check at $100, ATM cap at $500).
4. Suspicious Merchants
When a skimmer compromises a card reader, you get dozens of fraud cases from that merchant. The symptom: an unusual number of unrelated cards spending more than usual in a short window.
SELECT
merchant_id,
date_trunc('hour', timestamp) AS hour_bucket,
count(DISTINCT cardholder_id) AS unique_cards,
count(*) AS total_tx,
sum(amount) AS total_amount
FROM transactions
WHERE timestamp >= current_date - INTERVAL '7 days'
GROUP BY 1, 2
HAVING count(DISTINCT cardholder_id) > 20
AND sum(amount) > 5000
ORDER BY total_amount DESC;
Static thresholds don't account for merchant size—a Costco does that in 90 seconds, a used bookshop never. Better version compares each merchant against itself:
WITH merchant_hourly AS (
SELECT
merchant_id,
date_trunc('hour', timestamp) AS hour_bucket,
count(DISTINCT cardholder_id) AS unique_cards
FROM transactions
WHERE timestamp >= current_date - INTERVAL '60 days'
GROUP BY 1, 2
),
with_baseline AS (
SELECT *,
avg(unique_cards) OVER (
PARTITION BY merchant_id
ORDER BY hour_bucket
ROWS BETWEEN 168 PRECEDING AND 1 PRECEDING
) AS rolling_avg_cards
FROM merchant_hourly
)
SELECT *,
unique_cards / nullif(rolling_avg_cards, 0) AS spike_ratio
FROM with_baseline
WHERE unique_cards > rolling_avg_cards * 3
ORDER BY spike_ratio DESC;
The 168 is trailing seven days of hourly buckets. Three times normal is the starting threshold.
5. Off-Hours
Most people don't buy gas at 3am. If their card does, it's either stolen or they're traveling—and travel produces other signals.
WITH cardholder_hour_pattern AS (
SELECT
cardholder_id,
EXTRACT(HOUR FROM timestamp) AS hour_of_day,
count(*) AS tx_count
FROM transactions
WHERE timestamp >= current_date - INTERVAL '90 days'
GROUP BY 1, 2
),
cardholder_normal AS (
SELECT
cardholder_id,
min(hour_of_day) FILTER (WHERE tx_count >= 2) AS earliest_hour,
max(hour_of_day) FILTER (WHERE tx_count >= 2) AS latest_hour
FROM cardholder_hour_pattern
GROUP BY 1
)
SELECT t.cardholder_id, t.timestamp, t.amount, t.merchant_id
FROM transactions t
JOIN cardholder_normal cn USING (cardholder_id)
WHERE EXTRACT(HOUR FROM t.timestamp) NOT BETWEEN cn.earliest_hour AND cn.latest_hour
ORDER BY t.timestamp DESC;
The "two or more in that hour" filter prevents one stray late-night purchase from becoming part of the cardholder's normal hours. Drawback: doesn't work until you have history (90 days).
6. Window Functions for Chained Signals
This isn't a pattern—it's a setup that makes the other five composable.
SELECT
cardholder_id,
timestamp,
amount,
merchant_id,
timestamp - LAG(timestamp) OVER w AS time_since_last,
CASE WHEN merchant_id <> LAG(merchant_id) OVER w
THEN 'changed' ELSE 'same' END AS merchant_change,
sum(amount) OVER (
PARTITION BY cardholder_id
ORDER BY timestamp
RANGE BETWEEN INTERVAL '24 hours' PRECEDING AND CURRENT ROW
) AS running_24h_total,
ROW_NUMBER() OVER (
PARTITION BY cardholder_id, date(timestamp)
ORDER BY timestamp
) AS tx_of_day
FROM transactions
WINDOW w AS (PARTITION BY cardholder_id ORDER BY timestamp)
ORDER BY cardholder_id, timestamp;
Once materialized, fraud rules collapse to filter expressions. For card-testing rings: lots of small charges at different merchants within minutes:
SELECT *
FROM tx_with_windows
WHERE tx_of_day >= 5
AND time_since_last < INTERVAL '60 seconds'
AND merchant_change = 'changed';
Three filters. Now analysts can express new fraud hypotheses as SQL filters instead of engineering tickets.
Putting It Together
None of these alone is enough. Velocity has false positives (vending operators). Geographic impossibility misses single-metro fraud. Amount anomalies don't apply outside card-test contexts. Off-hours needs history.
Run them all and score each transaction across the signals. A transaction failing on three or four is almost always fraud. One failure might be your grandma being weird with her debit card on vacation.
Start with pattern 1—it surfaces useful fraud with very little legitimate activity, and it's cheap to run. If you've already got 1-5, invest in pattern 6: those window-function primitives. Every analyst will use them, and adding the next fraud pattern stops being a project.
Things Left Out
- NULL handling: Real tables use sentinel values like
9999-12-31for "no end date." Always check conventions. - False positives: Every rule flags legitimate weirdness. Human review with feedback loops is required.
- Privacy: Comply with data-use policies. Use de-identified data first.
- Cost: Window functions can be expensive. Filter date range first, then apply windows. Don't run
LAG()across two years before addingWHERE.


