ast.unparse multi-line docstrings corrupt under external string re-indent
ast.unparse multi-line docstrings corrupt under external string re-indent
Problem
Code that generates Python source by (a) injecting a docstring into a function via
ast, (b) ast.unparse-ing that snippet to text, then (c) string-concatenating the
text into a larger function while indenting every line, silently corrupts the
docstring value for any multi-line docstring. A later ast.get_docstring(..., clean=False) over the generated source returns the docstring with injected leading
whitespace on every line after the first — so an integrity comparison against the
original value fails.
In SPEC-547 this surfaced as a deploy-time DocstringBindingError: a rule’s
enshrined text (World-Agent-distilled prose, routinely multi-sentence/multi-line)
was injected as the applies() docstring, and the deploy’s assert_docstring_binding
re-parsed the module and compared the deployed docstring to the enshrined value — and
mismatched, failing the build before deposit. Single-line fixtures (and the E2E)
all passed, so it escaped until review.
Symptoms:
assert_docstring_bindingraised only for multi-linestatement/applies_when.- Reproduced:
statement="line one\nline two"→ deployed docstring"line one\n line two". - The generated module still executed correctly — only the docstring value drifted.
Root cause
ast.unparse emits a multi-line string docstring as a triple-quoted literal with
real newlines in the output (its _write_docstring special-cases the leading
string-expr of a function/module). When that unparsed block is then passed through a
naive line-by-line indenter ("\n".join(INDENT + line ...)) to nest it inside a
wrapper function, the indenter prepends spaces to the docstring’s continuation lines
too — which are inside the triple-quoted literal. Those spaces become part of the
string value. inspect.cleandoc (what ast.get_docstring(clean=True) uses) would
strip them, but a clean=False exact comparison sees the corruption.
Solution
Do not string-indent already-unparsed source that may contain a multi-line docstring.
Build the whole wrapper as a single AST and ast.unparse it once — unparse
then emits all indentation correctly and the docstring’s continuation lines stay at
column 0 inside the literal, so the value round-trips verbatim.
Code changes
# Before (corrupts multi-line docstrings)def _with_docstring(source, *, docstring, rule_id): tree = ast.parse(source) applies = _find_applies(tree, rule_id) applies.body.insert(0, ast.Expr(ast.Constant(docstring))) # or replace existing ast.fix_missing_locations(tree) return ast.unparse(tree)
def _wrapper(fn_name, body_source): return f"def {fn_name}(context):\n{_indent(body_source)}\n return applies(context)\n" # _indent prepends spaces to EVERY line, including docstring continuation lines.
# After (AST-built wrapper, unparse once)def _build_wrapper(fn_name, predicate_source, *, docstring, rule_id): tree = ast.parse(predicate_source) applies = _find_applies(tree, rule_id) # raise on missing doc = ast.Expr(ast.Constant(docstring)) if ast.get_docstring(applies, clean=False) is not None: applies.body[0] = doc else: applies.body.insert(0, doc) wrapper = ast.FunctionDef( name=fn_name, args=ast.arguments(posonlyargs=[], args=[ast.arg(arg="context")], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[*tree.body, ast.Return(ast.Call(ast.Name("applies", ast.Load()), [ast.Name("context", ast.Load())], []))], decorator_list=[], ) module = ast.Module(body=[wrapper], type_ignores=[]) ast.fix_missing_locations(module) return ast.unparse(module) + "\n"Implementation notes
- The predicate snippet’s own module-level imports become function-local (nested in the wrapper) — same semantics as the old text-nesting approach.
ast.unparseescapes\ras a\rescape sequence (no raw CR byte in the output), so a downstream newline-normalizer (e.g. a content-addressed bundle that rewrites\r\n→\n) does not touch the docstring literal and the value round-trips through deposit + reload as well.
Prevention
Best practices
- When generating code that embeds arbitrary text in a docstring, assemble the
enclosing function/module as AST and
ast.unparseonce. Never re-indent unparsed source line-by-line. - For any “the generated docstring must equal X” integrity check, test with a multi-line value (and a trailing newline) — not just a single-line fixture.
Warning signs
- A round-trip assertion that passes for single-line strings but fails for multi-line.
- Mixing
ast.unparseoutput with f-string/textwrap-style manual indentation.
Related tests
tests/worlds/application/deploy/test_composition.py::TestStatementDocstring::test_multiline_statement_round_trips_through_the_docstring_binding
pins a multi-line statement + applies_when through generate_module_source +
assert_docstring_binding.
References
- SPEC-547 review fix commit
f46ffd2; mergedbb6e888. src/spectral/worlds/application/deploy/composition.py(_build_wrapper,assert_docstring_binding).