Introduction

Fraud detection in transaction data is mostly SQL. Not machine learning, not graph databases. SQL, run against the right tables, with the right joins, looking for the right shapes. This post covers six patterns that work across credit cards, healthcare claims, e-commerce, and government benefits.

1. Velocity

The simplest pattern: someone with a stolen card wants to drain it fast. They hit the card repeatedly in a short window.

SELECT
  cardholder_id,
  date_trunc('hour', timestamp) AS hour_bucket,
  count(*) AS tx_count,
  min(timestamp) AS first_tx,
  max(timestamp) AS last_tx
FROM transactions
WHERE timestamp >= current_date - INTERVAL '30 days'
GROUP BY 1, 2
HAVING count(*) > 10;

Tune two knobs: window size and count threshold. Run 1-minute, 5-minute, and 1-hour versions in parallel. Different fraud shows up at different scales—a card-testing ring hits in seconds; a benefits-trafficking ring might take an afternoon.

For sliding-window velocity:

SELECT
  cardholder_id,
  timestamp,
  count(*) OVER (
    PARTITION BY cardholder_id
    ORDER BY timestamp
    RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW
  ) AS tx_in_last_5min
FROM transactions
QUALIFY tx_in_last_5min >= 5
ORDER BY cardholder_id, timestamp;

QUALIFY works in Snowflake, BigQuery, Databricks, Teradata. For Postgres, wrap in a CTE and filter outside.

2. Impossible Travel

If a card swipes in Chicago and seven minutes later in Los Angeles, one of those swipes is fake. The card is cloned.

WITH ordered_tx AS (
  SELECT
    cardholder_id,
    timestamp,
    location,
    LAG(timestamp) OVER (PARTITION BY cardholder_id ORDER BY timestamp) AS prev_ts,
    LAG(location)  OVER (PARTITION BY cardholder_id ORDER BY timestamp) AS prev_loc
  FROM transactions
)
SELECT
  cardholder_id,
  prev_ts  AS first_tx,
  timestamp AS second_tx,
  prev_loc  AS first_location,
  location  AS second_location,
  EXTRACT(EPOCH FROM (timestamp - prev_ts)) / 60 AS minutes_apart,
  haversine(prev_loc, location)                  AS miles_apart
FROM ordered_tx
WHERE prev_ts IS NOT NULL
  AND prev_loc <> location
  AND haversine(prev_loc, location)
      / nullif(EXTRACT(EPOCH FROM (timestamp - prev_ts)), 0)
      * 3600 > 600;

haversine is the great-circle distance function. Most warehouses ship one. The 600 mph threshold is roughly commercial jet cruise speed (575 mph). Tighten to 100 mph to catch suspiciously-fast ground travel, but expect more false positives.

Other variants: two distant cities same state inside 5 minutes (local cloning rings), multiple ZIP codes inside an hour (skimmer rings), border crossings inside 10 minutes (international rings).

3. Amount Anomalies

Certain amounts appear disproportionately in fraud and almost never in normal use.

SELECT cardholder_id, timestamp, amount, merchant_id
FROM transactions
WHERE
  (amount >= 99.50  AND amount < 100.00)
  OR (amount >= 499.50 AND amount < 500.00)
  OR amount IN (1.00, 5.00, 10.00)
ORDER BY cardholder_id, timestamp;

Round dollar amounts at small values ($1.00, $5.00, $10.00) are almost always card tests. Real cardholders almost never buy something for exactly $1.00—coffee is $4.73, gas is $52.81. Amounts just below a threshold ($99.99, $499.99) suggest someone knows the rules and is staying under them (ID check at $100, ATM cap at $500).

4. Suspicious Merchants

When a skimmer compromises a card reader, you get dozens of fraud cases from that merchant. The symptom: an unusual number of unrelated cards spending more than usual in a short window.

SELECT
  merchant_id,
  date_trunc('hour', timestamp) AS hour_bucket,
  count(DISTINCT cardholder_id) AS unique_cards,
  count(*) AS total_tx,
  sum(amount) AS total_amount
FROM transactions
WHERE timestamp >= current_date - INTERVAL '7 days'
GROUP BY 1, 2
HAVING count(DISTINCT cardholder_id) > 20
  AND sum(amount) > 5000
ORDER BY total_amount DESC;

Static thresholds don't account for merchant size—a Costco does that in 90 seconds, a used bookshop never. Better version compares each merchant against itself:

WITH merchant_hourly AS (
  SELECT
    merchant_id,
    date_trunc('hour', timestamp) AS hour_bucket,
    count(DISTINCT cardholder_id) AS unique_cards
  FROM transactions
  WHERE timestamp >= current_date - INTERVAL '60 days'
  GROUP BY 1, 2
),
with_baseline AS (
  SELECT *,
    avg(unique_cards) OVER (
      PARTITION BY merchant_id
      ORDER BY hour_bucket
      ROWS BETWEEN 168 PRECEDING AND 1 PRECEDING
    ) AS rolling_avg_cards
  FROM merchant_hourly
)
SELECT *,
  unique_cards / nullif(rolling_avg_cards, 0) AS spike_ratio
FROM with_baseline
WHERE unique_cards > rolling_avg_cards * 3
ORDER BY spike_ratio DESC;

The 168 is trailing seven days of hourly buckets. Three times normal is the starting threshold.

5. Off-Hours

Most people don't buy gas at 3am. If their card does, it's either stolen or they're traveling—and travel produces other signals.

WITH cardholder_hour_pattern AS (
  SELECT
    cardholder_id,
    EXTRACT(HOUR FROM timestamp) AS hour_of_day,
    count(*) AS tx_count
  FROM transactions
  WHERE timestamp >= current_date - INTERVAL '90 days'
  GROUP BY 1, 2
),
cardholder_normal AS (
  SELECT
    cardholder_id,
    min(hour_of_day) FILTER (WHERE tx_count >= 2) AS earliest_hour,
    max(hour_of_day) FILTER (WHERE tx_count >= 2) AS latest_hour
  FROM cardholder_hour_pattern
  GROUP BY 1
)
SELECT t.cardholder_id, t.timestamp, t.amount, t.merchant_id
FROM transactions t
JOIN cardholder_normal cn USING (cardholder_id)
WHERE EXTRACT(HOUR FROM t.timestamp) NOT BETWEEN cn.earliest_hour AND cn.latest_hour
ORDER BY t.timestamp DESC;

The "two or more in that hour" filter prevents one stray late-night purchase from becoming part of the cardholder's normal hours. Drawback: doesn't work until you have history (90 days).

6. Window Functions for Chained Signals

This isn't a pattern—it's a setup that makes the other five composable.

SELECT
  cardholder_id,
  timestamp,
  amount,
  merchant_id,
  timestamp - LAG(timestamp) OVER w AS time_since_last,
  CASE WHEN merchant_id <> LAG(merchant_id) OVER w
    THEN 'changed' ELSE 'same' END AS merchant_change,
  sum(amount) OVER (
    PARTITION BY cardholder_id
    ORDER BY timestamp
    RANGE BETWEEN INTERVAL '24 hours' PRECEDING AND CURRENT ROW
  ) AS running_24h_total,
  ROW_NUMBER() OVER (
    PARTITION BY cardholder_id, date(timestamp)
    ORDER BY timestamp
  ) AS tx_of_day
FROM transactions
WINDOW w AS (PARTITION BY cardholder_id ORDER BY timestamp)
ORDER BY cardholder_id, timestamp;

Once materialized, fraud rules collapse to filter expressions. For card-testing rings: lots of small charges at different merchants within minutes:

SELECT *
FROM tx_with_windows
WHERE tx_of_day >= 5
  AND time_since_last < INTERVAL '60 seconds'
  AND merchant_change = 'changed';

Three filters. Now analysts can express new fraud hypotheses as SQL filters instead of engineering tickets.

Putting It Together

None of these alone is enough. Velocity has false positives (vending operators). Geographic impossibility misses single-metro fraud. Amount anomalies don't apply outside card-test contexts. Off-hours needs history.

Run them all and score each transaction across the signals. A transaction failing on three or four is almost always fraud. One failure might be your grandma being weird with her debit card on vacation.

Start with pattern 1—it surfaces useful fraud with very little legitimate activity, and it's cheap to run. If you've already got 1-5, invest in pattern 6: those window-function primitives. Every analyst will use them, and adding the next fraud pattern stops being a project.

Things Left Out