Trust spine

Sources

Every article on ClawBlogdraws from a fixed pool of pre-fetched, pre-classified URLs. Here’s how that pool is assembled, how each URL gets a tier, and what we do before we let a writer reach for it.

Why a fixed pool

Language models can write fluent prose around a URL that doesn’t exist. They can also confidently misattribute a real fact to a real-looking but invented citation. The fix is the simplest possible one: we don’t let the writer reach for an arbitrary URL. We pre-fetch a curated set of sources, hand them to the writer as the only material it’s allowed to cite, and reject any draft that quotes a URL not in that set.

The set is called a source pack. One pack per drafting run. The pack is also the basis of the verification pass that runs before any article is published.

The four tiers

Every URL in a source pack is classified into one of four tiers, by domain. The tier is shown on each citation in the article’s sources panel.

Official
Primary-source domains. Vendor blogs, official docs, CVE and GHSA databases. The strongest tier — when an official source confirms a claim, the claim doesn’t need a second source. Examples for our beat: anthropic.com, openai.com, openclaw.ai, cve.mitre.org, and the docs subdomains across the ecosystem.
Reputable
Established editorial outlets with a track record of fact-checked AI coverage. Strong second-source tier — sufficient for most claims.
Community
Discussion and aggregator surfaces — subreddits, Hacker News, GitHub issue threads. Useful as pointers toward primary sources; weak as standalone evidence. A claim resting only on a community source is flagged in the sources panel.
Unknown
Anything not in the curated lists. Off-topic unknowns are rejected at pack-build time; an unknown domain that actually overlaps the brief’s topic keywords can still survive into the pack, but its tier badge tells you we haven’t vetted the domain itself.

github.comis a useful example of why the bare host isn’t enough. Issue threads, README views, code browsers, and discussions are Communityby default — weak evidence on their own. The top-level GitHub advisories DB (github.com/advisories/GHSA-...) and the top-level security pages are bumped to Official via path overrides. Per-repo release notes ride the Community default; an editor can promote a specific release page by hand when they judge it primary-source quality.

The full classifier and the live domain lists are in the public engine repo at engine/modules/source-pack/lib/domain-tiers.ts and titles/clawblog/config/source-tiers.config.ts.

Pre-publish verification

Before any article goes live, every cited URL is re-fetched and matched against the source pack it was built from. The verification pass:

  1. Re-fetches the URL with the same safe-fetch rules used at scout time (no localhost, no private ranges, size + timeout caps).
  2. Compares the fetched page’s title to the title stored when the source pack was built. A large drift suggests the page changed under the writer’s feet and the citation may no longer say what the article claims it says.
  3. Records the result on the post and on a source_verifications row. The verification badge on the article reads its state from this record.

If verification fails, the post does not auto-publish — it stays at status=approved and the editor reviews. The full state machine and the specific failure reasons are at engine/modules/source-pack/lib/verify-pack.ts.

What you see on an article

  • Verified · N sources— the verification pass re-fetched and matched every cited URL.
  • Verification blocked— one or more cited URLs failed re-fetch or title-match. The post stays in editorial review until resolved.
  • Retroactively verified— the post shipped before the source-pack module landed and has since passed a manual verification.
  • Legacy— pre-source-pack content that we have not yet retroactively verified. We label it honestly rather than quietly omit the badge.

Below every article, the Sources panel lists each citation with its tier and per-URL verification note. When a URL has rotted (the page now 404s, redirects to a different topic, or is otherwise gone), it moves into the Links that rotted after publicationsection at the foot of the panel — visible and dated, never silently scrubbed.

Did we get something wrong?

We’d rather hear about a bad citation than ship a confidently-wrong sentence. Two paths:

  • Send a correction — reach the editor directly. Articles also carry a “Spot something wrong?” link that pre-fills the subject for you.
  • Submit a tip — if you have a primary source we missed, drop the URL. Tips that an editor approves flow into the next drafting run automatically.

Companion page: Methodology → covers the full pipeline (scout, writer, QC, verification, publish), cost-safety, and the Lean → Full operating-mode graduation.