Methodology — orbyd

The pipeline, in one paragraph

A five-stage pipeline ingests 30 days of news and four quarters of earnings transcripts for the book, the watchlist, and the momentum universe, and a frontier language model synthesises a quality grade and a full dossier per candidate. Once a week a mechanical radar clusters the tape into momentum themes; the model then curates the triggered clusters into scored theme-bets — shaping or rejecting each — while sizing and exits stay machine-owned. The output is a dossier per ticker, a regime-tagged journal entry, a macro view, sector rotation, a watchlist, and a public, Brier-scored forecast record. The reasoning is what gets published — and this is a forward experiment, not a proven edge.

The five stages

Liquidity screen. ~400 US-listed names filtered by spread, ADV, market cap, and tradable shape. Deterministic; cheap; eliminates ~75% of the universe before the LLM touches it.
Momentum + narrative scoring. Survivors ranked by a composite of price structure × volume × news density × theme-cluster strength. Themes are treated as primary, not afterthoughts — the system reads narrative basket behaviour, not isolated ticker action.
The model's quality read. Claude Opus reads news, earnings transcripts, and filings for each candidate and produces a quality grade (an internal grade from strongest to weakest), along with the dossier you see on this site: thesis, invalidation, bull case, bear case, setup, catalysts, what-would-change-our-mind, correlations.
Theme-bet curation. Once a week, Opus reasons across the radar's triggered momentum clusters side-by-side using the 1M-token context — every candidate dossier in one pass — and either rejects a cluster (index-twin, merger-arb, binary biotech — logged and scored) or shapes it into a theme-bet: concentrated in its one or two narrative-central names, with a stated probability, narrative and invalidation. Every decision is an immutable, public, Brier-scored forecast. Position sizing and exits are machine-owned — the model cannot veto them.
Mechanical outcome grading. Every forecast is graded against the invalidation trigger it published in advance — mechanically, no discretion — and folded into the public Brier score. Pre-registered, tighten-only kill gates decide whether the strategy scales, de-scales, or is declared dead.

The schedule

Windows are computed off the US market calendar in New-York-relative minutes — DST-correct year-round, including the ~4-week US/EU DST gap windows where naive Berlin-time schedules silently fail. The theme-bet loop is weekly; the reads that keep the record honest run daily:

Weekly radar — the mechanical cluster scan that emits candidate themes and triggers; nothing enters without a mechanical trigger.
Curation pass — the model shapes or rejects the triggered clusters into scored theme-bets.
Daily exits — machine-owned rules only (sustained trend break, min-hold, profit ladder, catastrophic stop); the model cannot override them.
Daily reads — premarket universe scan, the close-of-day regime call, the EOD journal, and a post-close reconcile of the record.

What's in a dossier

Each dossier follows the same skeleton — designed so a reader can audit the reasoning, not just the conclusion:

Current thesis — one paragraph, what the bet is.
Invalidation trigger — the explicit kill criterion.
Bull case — sourced bullets, dated, no hand-waving.
Bear case — same construction, equal weight.
Setup & price structure — MAs, RSI, levels, basing pattern.
Catalyst calendar — next 30 days, dated.
What would change our mind — explicit conditions for higher / lower conviction.
Correlation notes — how the name moves with its basket.

Archetype taxonomy

Every name is tagged with an archetype. Archetypes are not labels — they're behaviour profiles that drive the dossier's stop logic and how its setup is read. Full definitions on the glossary.

Compounder. Quality balance sheet, secular tailwind, multi-year hold candidate.
Cyclical recovery. Mean-reverting earnings, regime-sensitive.
Theme leader. Highest-conviction name within an active narrative.
Special situation. M&A, spin-off, restructuring, regulatory event.
Earnings inflection. Pre/post-print setup with explicit binary.
Retail squeeze. High-beta, short-interest-driven, hard sizing cap.
Defensive. Cash-flow durability, low-beta, regime hedge.
Macro hedge. Cross-asset proxy for thematic risk (XLE / GLD / TLT / …).

Regime classification

Each journal entry records the system's regime call. Regimes are not picks — they're a filter that gates how aggressively the system reads candidate setups. The macro view is the long-form version of the same read.

Published regime labels: RISK-ON, CHOPPY, RISK-OFF, a binary-event stagflation scare, an escalating stagflation scare, a healthy-but-unconfirmed recovery, and variants. Each is a defined rule that maps to buy-threshold, size-multiplier, max-exposure, and cash-floor settings.

Conviction levels

Conviction is the model's calibrated confidence that the setup will play, not a price target or return forecast. Four levels: SUPREME, HIGH, MEDIUM, LOW. Each is a probabilistic claim, published in advance, that the thesis plays out (SUPREME ≈ 0.90 down to LOW ≈ 0.50) — scored directly on the track record — and each carries an explicit invalidation trigger that strips the conviction if breached.

How outcomes are scored

Open methodology applies to the scoring too — here is the exact, reproducible method behind the track record. Every thesis ships a falsifiable invalidation trigger: the specific level or event, published in advance, that would prove the claim wrong. That is what makes a call gradeable at all — a thesis with no stated kill condition can never be scored, only quietly forgotten. When a thesis resolves, the pipeline marks it "played out" or "invalidated", dated, with a flag for whether the published trigger fired first. Strictly non-monetary — outcomes are binary: the claim held, or it was falsified.

Each conviction tier is treated as a probabilistic claim, published in advance, that the thesis plays out: SUPREME = 0.90, HIGH = 0.75, MEDIUM = 0.60, LOW = 0.50. Names the model held no conviction on are listed in the resolved ledger for transparency but are not scored — you can only be graded on a call you actually made. Against the binary outcome (played-out = 1, invalidated = 0) we compute:

Brier score — the mean squared error between the stated probability and the outcome. 0 is perfect, 0.25 is a coin flip, 1 is maximally wrong. (Reference: expert Superforecasters ≈ 0.08, the best LLMs ≈ 0.10.)
Murphy decomposition — Brier = calibration − resolution + uncertainty. Calibration (reliability) is how far each tier's observed play-out rate sits from its stated probability; resolution is how much the tiers separate from the base rate; uncertainty is the irreducible base-rate variance.
Brier skill score — skill versus a naive always-the-base-rate forecast (1 − Brier ∕ uncertainty).
Reliability by tier, archetype, and regime — the observed play-out rate broken out by conviction tier, by archetype, and by the macro regime in force when each thesis resolved. This is the falsifiable test of whether SUPREME actually beats LOW.

The board is hidden until at least one thesis has resolved (no faked scorecard), and the full record is machine-readable at /track-record.json for agents and independent verification.

Why score it this way at all? A headline accuracy number — "right 70% of the time" — can be claimed by any research outlet that publishes its winners and lets its losers age out of the archive. A Brier score over pre-committed, falsifiable theses cannot: every call carries a stated probability and a stated kill condition before the outcome is known, so the score is computed against the full population of resolved calls, not a survivor set. The live record is the proof that the method is applied — including the invalidated theses, which stay on the board.

Where the model is wrong

Three classes of failure are recurring and worth naming:

Stale facts. A model snapshot of a balance sheet can lag the latest 10-Q. Flagged in dossier notes when caught — not always caught.
Confident-but-wrong setup reads. A "clean higher-low" can become a failed reclaim within hours. Dossiers age fast; recency-of-write is on every page.
Theme misclassification. A name gets bucketed in a basket whose actual price driver is different; the correlation logic then over-fits.

This is why nothing here is a recommendation. The dossier is the reasoning behind every call.

Open by default

Every thesis, the names we hold and the ones we're watching, and every regime, macro and sector call — published the day it's made, dated, and scored against the trigger that would prove it wrong. The methodology is open, and so is the reasoning behind every call on the book.

And it is honest about what it is: orbyd has not proven an edge. Every mechanical harvest of this phenomenon lost to the benchmark before the forward book started; the one untested claim is whether the model's curation adds anything, and that is what the public scoreboard measures. So the forward theme-bet book is an experiment run at full transparency, with the shutdown conditions written down in advance. Three surfaces make it auditable in real time: the weekly radar (the mechanical candidate scan), the scored forecasts (open bets and refusals, with reasoning), and the kill gates (the pre-registered, tighten-only shutdown criteria). See also why trust it.

Common questions

What is orbyd?

orbyd is a continuous market-intelligence layer built on frontier language models. A multi-stage pipeline reads US equity news, earnings transcripts, and price structure every trading day, then publishes per-ticker dossiers, regime-tagged journal entries, a weekly macro view, sector rotation, and a curated watchlist.

Which language models power the pipeline?

Anthropic's Claude Opus and Claude Sonnet. The theme-bet curation stage uses Opus's 1M-token context window to weigh the week's triggered momentum clusters and their candidate dossiers side-by-side in a single pass.

What is an archetype?

Archetypes are behaviour profiles assigned to each name (Compounder, Cyclical recovery, Theme leader, Special situation, Earnings inflection, Retail squeeze, Defensive, Macro hedge). They drive the dossier's stop logic and how its setup is read. See the glossary for full definitions.

What is an invalidation trigger?

The explicit kill criterion published with every dossier. If the trigger fires, the conviction is stripped and the thesis is treated as broken. It's a published commitment, not a soft warning.

How often is the site updated?

Daily for the journal and dossiers; weekly for the macro view and sector rotation. Each surface carries its own dateModified meta and an updated-at line. Subscribe via JSON Feed or RSS.

Is this investment advice?

No. We publish educational content under the BaFin and EU regulatory framework. No personalised advice is given and no orders are accepted.