Pandoc Lua Filters: Zero-Dependency AST Manipulation

Pandoc has long supported filters to manipulate its abstract syntax tree (AST) between parsing and writing. Traditional filters use JSON: they write AST to stdout and read it from stdin. This adds overhead and depends on external interpreters and libraries.

Starting with pandoc 2.0, a Lua interpreter (version 5.4) and a Lua library for filters are built into the pandoc executable. No external dependencies. Pandoc data types are marshaled directly to Lua, avoiding JSON serialization.

Performance Numbers

Benchmark converting the pandoc manual (MANUAL.txt) to HTML:

CommandTime
pandoc1.01s
pandoc --filter ./smallcaps (compiled Haskell)1.36s
pandoc --filter ./smallcaps.py (Python)1.40s
pandoc --lua-filter ./smallcaps.lua1.03s

Lua filter adds only 0.02s overhead vs 0.35-0.39s for JSON filters.

Filter Structure

A Lua filter is a table with element names as keys and functions as values. Return nil to leave unchanged, a single element to replace, or a list to splice. Example converting strong emphasis to small caps:

return {
  Strong = function (elem)
    return pandoc.SmallCaps(elem.content)
  end,
}

Or equivalently, using top-level functions:

function Strong(elem)
  return pandoc.SmallCaps(elem.content)
end

Run with: pandoc --lua-filter=smallcaps.lua

Traversal Order

Set traverse to 'topdown' or 'typewise' (default). Typewise processes all Inline functions, then Inlines list, then Block functions, then Blocks list, then Meta, then Pandoc. Topdown traverses depth-first and can cut short child processing by returning false as second value:

traverse = 'topdown'
function Note (n)
  return n, false  -- exclude footnote contents
end

Global Variables

  • FORMAT: output format (e.g., 'html5').
  • PANDOC_VERSION: version as array-like table (e.g., {2, 7, 3}).
  • PANDOC_API_VERSION: pandoc-types API version.
  • PANDOC_SCRIPT_FILE: path to the filter script.
  • PANDOC_READER_OPTIONS and PANDOC_WRITER_OPTIONS: reader/writer options (since pandoc 2.17).
  • lpeg and re: built-in PEG parsing and regex modules.

Pandoc Module

The pandoc module provides element creators (e.g., pandoc.Str, pandoc.Para) and functions like walk_block, walk_inline, read, pipe, and access to pandoc.mediabag and pandoc.utils.

Practical Example: Remove Spaces Before Citations

function Inlines(inlines)
  -- Remove space before citations
  for i = #inlines-1, 1, -1 do
    if inlines[i].t == 'Space' and inlines[i+1].t == 'Cite' then
      inlines:remove(i)
    end
  end
  return inlines
end

Why It Matters for Developers

If you use pandoc for document generation (e.g., converting Markdown to PDF, HTML, LaTeX), Lua filters let you customize output without managing external scripts or dependencies. The built-in interpreter ensures portability across environments. For CI/CD pipelines, the speed improvement from avoiding JSON serialization is significant.

Getting Started

Save a filter as .lua file and pass it with --lua-filter. Multiple filters can be chained. The pandoc Lua filters documentation covers all element types and advanced usage.