Pandoc Lua Filters: Zero-Dependency AST Manipulation
Pandoc has long supported filters to manipulate its abstract syntax tree (AST) between parsing and writing. Traditional filters use JSON: they write AST to stdout and read it from stdin. This adds overhead and depends on external interpreters and libraries.
Starting with pandoc 2.0, a Lua interpreter (version 5.4) and a Lua library for filters are built into the pandoc executable. No external dependencies. Pandoc data types are marshaled directly to Lua, avoiding JSON serialization.
Performance Numbers
Benchmark converting the pandoc manual (MANUAL.txt) to HTML:
| Command | Time |
|---|---|
pandoc | 1.01s |
pandoc --filter ./smallcaps (compiled Haskell) | 1.36s |
pandoc --filter ./smallcaps.py (Python) | 1.40s |
pandoc --lua-filter ./smallcaps.lua | 1.03s |
Lua filter adds only 0.02s overhead vs 0.35-0.39s for JSON filters.
Filter Structure
A Lua filter is a table with element names as keys and functions as values. Return nil to leave unchanged, a single element to replace, or a list to splice. Example converting strong emphasis to small caps:
return {
Strong = function (elem)
return pandoc.SmallCaps(elem.content)
end,
}
Or equivalently, using top-level functions:
function Strong(elem)
return pandoc.SmallCaps(elem.content)
end
Run with: pandoc --lua-filter=smallcaps.lua
Traversal Order
Set traverse to 'topdown' or 'typewise' (default). Typewise processes all Inline functions, then Inlines list, then Block functions, then Blocks list, then Meta, then Pandoc. Topdown traverses depth-first and can cut short child processing by returning false as second value:
traverse = 'topdown'
function Note (n)
return n, false -- exclude footnote contents
end
Global Variables
FORMAT: output format (e.g.,'html5').PANDOC_VERSION: version as array-like table (e.g.,{2, 7, 3}).PANDOC_API_VERSION: pandoc-types API version.PANDOC_SCRIPT_FILE: path to the filter script.PANDOC_READER_OPTIONSandPANDOC_WRITER_OPTIONS: reader/writer options (since pandoc 2.17).lpegandre: built-in PEG parsing and regex modules.
Pandoc Module
The pandoc module provides element creators (e.g., pandoc.Str, pandoc.Para) and functions like walk_block, walk_inline, read, pipe, and access to pandoc.mediabag and pandoc.utils.
Practical Example: Remove Spaces Before Citations
function Inlines(inlines)
-- Remove space before citations
for i = #inlines-1, 1, -1 do
if inlines[i].t == 'Space' and inlines[i+1].t == 'Cite' then
inlines:remove(i)
end
end
return inlines
end
Why It Matters for Developers
If you use pandoc for document generation (e.g., converting Markdown to PDF, HTML, LaTeX), Lua filters let you customize output without managing external scripts or dependencies. The built-in interpreter ensures portability across environments. For CI/CD pipelines, the speed improvement from avoiding JSON serialization is significant.
Getting Started
Save a filter as .lua file and pass it with --lua-filter. Multiple filters can be chained. The pandoc Lua filters documentation covers all element types and advanced usage.
