Skip to content
GitHub
Logic Errors

ast.unparse multi-line docstrings corrupt under external string re-indent

ast.unparse multi-line docstrings corrupt under external string re-indent

Problem

Code that generates Python source by (a) injecting a docstring into a function via ast, (b) ast.unparse-ing that snippet to text, then (c) string-concatenating the text into a larger function while indenting every line, silently corrupts the docstring value for any multi-line docstring. A later ast.get_docstring(..., clean=False) over the generated source returns the docstring with injected leading whitespace on every line after the first — so an integrity comparison against the original value fails.

In SPEC-547 this surfaced as a deploy-time DocstringBindingError: a rule’s enshrined text (World-Agent-distilled prose, routinely multi-sentence/multi-line) was injected as the applies() docstring, and the deploy’s assert_docstring_binding re-parsed the module and compared the deployed docstring to the enshrined value — and mismatched, failing the build before deposit. Single-line fixtures (and the E2E) all passed, so it escaped until review.

Symptoms:

  • assert_docstring_binding raised only for multi-line statement / applies_when.
  • Reproduced: statement="line one\nline two" → deployed docstring "line one\n line two".
  • The generated module still executed correctly — only the docstring value drifted.

Root cause

ast.unparse emits a multi-line string docstring as a triple-quoted literal with real newlines in the output (its _write_docstring special-cases the leading string-expr of a function/module). When that unparsed block is then passed through a naive line-by-line indenter ("\n".join(INDENT + line ...)) to nest it inside a wrapper function, the indenter prepends spaces to the docstring’s continuation lines too — which are inside the triple-quoted literal. Those spaces become part of the string value. inspect.cleandoc (what ast.get_docstring(clean=True) uses) would strip them, but a clean=False exact comparison sees the corruption.

Solution

Do not string-indent already-unparsed source that may contain a multi-line docstring. Build the whole wrapper as a single AST and ast.unparse it onceunparse then emits all indentation correctly and the docstring’s continuation lines stay at column 0 inside the literal, so the value round-trips verbatim.

Code changes

# Before (corrupts multi-line docstrings)
def _with_docstring(source, *, docstring, rule_id):
tree = ast.parse(source)
applies = _find_applies(tree, rule_id)
applies.body.insert(0, ast.Expr(ast.Constant(docstring))) # or replace existing
ast.fix_missing_locations(tree)
return ast.unparse(tree)
def _wrapper(fn_name, body_source):
return f"def {fn_name}(context):\n{_indent(body_source)}\n return applies(context)\n"
# _indent prepends spaces to EVERY line, including docstring continuation lines.
# After (AST-built wrapper, unparse once)
def _build_wrapper(fn_name, predicate_source, *, docstring, rule_id):
tree = ast.parse(predicate_source)
applies = _find_applies(tree, rule_id) # raise on missing
doc = ast.Expr(ast.Constant(docstring))
if ast.get_docstring(applies, clean=False) is not None:
applies.body[0] = doc
else:
applies.body.insert(0, doc)
wrapper = ast.FunctionDef(
name=fn_name,
args=ast.arguments(posonlyargs=[], args=[ast.arg(arg="context")],
kwonlyargs=[], kw_defaults=[], defaults=[]),
body=[*tree.body, ast.Return(ast.Call(ast.Name("applies", ast.Load()),
[ast.Name("context", ast.Load())], []))],
decorator_list=[],
)
module = ast.Module(body=[wrapper], type_ignores=[])
ast.fix_missing_locations(module)
return ast.unparse(module) + "\n"

Implementation notes

  • The predicate snippet’s own module-level imports become function-local (nested in the wrapper) — same semantics as the old text-nesting approach.
  • ast.unparse escapes \r as a \r escape sequence (no raw CR byte in the output), so a downstream newline-normalizer (e.g. a content-addressed bundle that rewrites \r\n\n) does not touch the docstring literal and the value round-trips through deposit + reload as well.

Prevention

Best practices

  • When generating code that embeds arbitrary text in a docstring, assemble the enclosing function/module as AST and ast.unparse once. Never re-indent unparsed source line-by-line.
  • For any “the generated docstring must equal X” integrity check, test with a multi-line value (and a trailing newline) — not just a single-line fixture.

Warning signs

  • A round-trip assertion that passes for single-line strings but fails for multi-line.
  • Mixing ast.unparse output with f-string/textwrap-style manual indentation.

tests/worlds/application/deploy/test_composition.py::TestStatementDocstring::test_multiline_statement_round_trips_through_the_docstring_binding pins a multi-line statement + applies_when through generate_module_source + assert_docstring_binding.

References

  • SPEC-547 review fix commit f46ffd2; merged bb6e888.
  • src/spectral/worlds/application/deploy/composition.py (_build_wrapper, assert_docstring_binding).