Before / after review. Audio is Amazon Polly, neural + newscaster. Each clip is the real plugin output for the text shown.
red strike-through = narrated before, now removed · green = fixed by a pronunciation rule
1. Block content parsing
The old path sent a raw HTML strip to Polly, so table cells, repeated pull-quotes, and captions were all read aloud and words fused across boundaries. The new extractor walks the block tree and keeps only narratable blocks.
Before — raw HTML strip
Blackstone has agreed to buy a logistics portfolio spanning 37m sq m for A$20bn, the largest industrial deal of the year.Blackstone has agreed to buy a logistics portfolio for A$20bn.AssetPriceYieldSydney hubA$8bn5.2%The Sydney logistics hub at the centre of the deal.The transaction closes in the fourth quarter; spreads on the financing tightened 240bp.
355 chars. Note the duplicated pull-quote sentence, the table cells run together (AssetPriceYieldSydney hub…), and sentences fusing with no gap (year.Blackstone, 5.2%The).
After parsing — structure cleaned
Blackstone has agreed to buy a logistics portfolio spanning 37m sq m for A$20bn, the largest industrial deal of the year.
The Sydney logistics hub at the centre of the deal.
The transaction closes in the fourth quarter; spreads on the financing tightened 240bp.
An ordered set of text transforms runs over the extracted text before Polly, fixing things a single-token dictionary can't (they need context or variable digits).
After parsing + rules — full pipeline
Blackstone has agreed to buy a logistics portfolio spanning 37 million square metres for 20 billion Australian dollars, the largest industrial deal of the year.
The Sydney logistics hub at the centre of the deal.
The transaction closes in the fourth quarter; spreads on the financing tightened 240 basis points.
What changed vs the clip above: 37m sq m → "37 million square metres" · A$20bn → "20 billion Australian dollars" (not US dollars) · 240bp → "240 basis points" (not "B P").
3. Editorial no-audio gate
Pricings (CMA/ABA) and Market Monitor articles are tables and live dashboards; narrating them is meaningless. They now generate no audio. Below is what a real pricings article would have narrated without the gate.
Before — raw strip of a pricings page
Illustrative pricing table (fabricated; real articles follow the same format):
Northwind Funding Trust, 2026-2 Priced: April 9 Amount: $185 million Collateral: Auto loans Seller: Northwind Financial Bookrunners: Example Bank, Sample Securities ClassRatingAmountYieldWALSpreadBench AAA A72.500 5.05 1.90 +110 I-Curve BA A41.200 5.40 2.85 +150 I-Curve CBBB 28.000 6.10 3.95 +210 I-Curve …
After — gated, no audio
Audio narration is switched off for the "Initial pricings" format. These articles are built around tables and live dashboards, which do not translate to audio, so none is generated.
No Polly call, no cost. The editor metabox shows this note instead of generation controls; the gate is enforced at publish, save, CLI, force-regenerate, and the playable-URL itself, with cleanup of any already-generated audio when a post becomes excluded.