Status
Extraction
- Source:
FY-2025-26-Budget-Amendments.pdf(88 pages, sha25621927d6405a47805...)
Counts
- Total sections: 161
- Index nodes: 12
- Leaf sections: 149
- Inlined (too small for own file): 0
Known gaps
- No empty pages detected during extraction.
- Hand-crafted section plan. The source PDF has zero PDF bookmarks,
so pdf-doctree's outline-driven planner produced an empty plan. The
plan at
.extracted/section-plan.jsonis generated byscripts/build_amendments_plan.mjs, which parses the page-1 TOC and the^<councilor> NN – <title>amendment headings across all 88 pages. 12 councilor indexes, 149 amendment leaves. - Page granularity. Multiple amendments often share a single PDF page (e.g. Avalos 01/02/03 all sit on p. 2). Each amendment has its own leaf file but their TL;DRs may overlap because the enrichment input is the full page text. The leaf body always shows the complete page so nothing is hidden.
Rebuild
Regenerate the plan (needed only if the source PDF changes, or if you want to refine the parser):
node ../scripts/build_amendments_plan.mjs
Then run the normal pipeline:
./scripts/run_budget_2025_26_amendments.sh