How We Generate AI Network Digests for MegaETH at MiniBlocks.io

Miniblocks.io — Sat, 07 Mar 2026 00:25:19 +0000

Every morning, a new daily digest appears on MiniBlocks. Every Monday, a weekly report follows. These are written by an AI model — but the interesting part isn't the AI. It's everything that happens before the AI writes a single word.

We believe in transparency about how our content is produced. If you're reading our digests, you should know exactly what's behind them: what data we collect, how we detect anomalies, where external context comes from, and what constraints we give the model. This post covers the full pipeline.

Why Automate Digests at All?

MegaETH produces 100 blocks per second. That's 8.6 million blocks per day. We process every single one of them — every transaction, every contract interaction, every gas unit consumed across the entire network. Our analytics dashboard shows this in real time, but raw data and charts don't tell stories. They don't connect a spike in gas usage to a new contract deploying, or notice that weekend activity has been steadily climbing for three weeks.

We wanted reports that connect data points into narrative — the kind of analysis a person would write after staring at dashboards all day. But doing that manually every morning doesn't scale when the chain never sleeps and the scope is the entire chain, not a curated subset.

The solution: collect everything a human analyst would look at — across every contract and every wallet on MegaETH — do the statistical analysis ourselves, then hand the AI a pre-digested briefing and strict instructions on how to write about it.

The Pipeline

Each digest goes through six stages. The AI model is involved in exactly one of them.

The scope is the full network: every contract, every wallet, every transaction on MegaETH. The pipeline's job is to compress that into something a language model can work with — without losing what matters.

Stage 1: Raw Data Collection

We continuously process every block and every transaction on MegaETH — 100 blocks per second, millions of events per day. Each transaction is parsed in real time: which contract was called, how much gas it consumed, whether it succeeded or failed, who sent it. This raw stream covers the entire network — every contract, every wallet, no filtering.

This data is stored at full granularity. When it's time to generate a digest, the raw record of everything that happened on-chain is available for analysis.

Stage 2: Aggregation

Millions of raw events don't fit in a prompt. Stage 2 compresses them into compact summary tables — small enough for a language model to consume, complete enough to preserve the signal.

We compute:

Hourly and daily rollups: Average TPS, peak gas, total transactions — bucketed across 14-day and 28-day windows, with weekend/weekday markers
Per-contract summaries: Transaction count, gas consumption, unique callers, and failure rates for every contract on the network — not just named DApps
DApp leaderboards: Tracked applications ranked by volume over the last 24 hours (daily) or the past week with week-over-week deltas (weekly)
Network overview: Daily totals for transactions, unique wallets, and failed transactions across the entire chain

Weekly reports add cross-week comparisons and surface top unmapped contracts — addresses with significant activity that don't have a name yet. The output of this stage is a set of structured tables — the raw data compressed by orders of magnitude, but not yet interpreted.

Stage 3: Trends & Anomaly Detection

This is the analytical layer. We compute two kinds of output from the aggregated tables:

Trend observations — directional statements about network-level behavior, pre-labeled so the AI doesn't have to interpret tables:

Day-over-day and week-over-week throughput changes, classified as "up", "down", or "flat" with thresholds tuned to filter out noise
Weekend vs weekday pattern shifts — flagged only when large enough to be notable
Multi-day trend direction — is the past week rising, declining, or stable?
Busiest and quietest days across the 14-day window

Anomaly detectors — five independent modules, each looking for a different type of event. They run on a schedule — every 30 minutes for 24-hour windows, every 3 hours for 7-day windows:

Peak & Trough Detection: Finds TPS and gas peaks and troughs using percentile-based thresholds. Only fires when values significantly exceed historical norms. Drills down to the exact second and the contract responsible.

Trend Analysis: Fits a linear regression per contract over the past week of daily volumes. Only qualifies if the trend explains a meaningful share of the variance and the slope is steep enough to matter.

Growth Detection: Compares the current 24-hour period against the previous one for every contract on the network. Rate-normalized to handle downtime. Flags major surges and significant declines.

Failure Rate Anomalies: Computes per-contract baseline failure rates, then uses z-score analysis to detect statistically significant deviations. Only fires when the spike is unlikely to be random noise.

New Activity Discovery: Scans the entire network for contracts with no prior history. Classifies them as "emerging" (sustained activity), "spike" (concentrated burst), or "flash" (activity already stopped).

Every detector is downtime-aware. If our data collection had gaps (incomplete hours with less data than expected), those periods are excluded from analysis. This prevents false positives — a quiet hour because of a collection gap isn't a real "trough".

Each detected anomaly carries a severity level (medium or high), a category, and structured metadata: which contract, what time, exact values, and how far the metric deviated from its baseline. Together with the trend observations, the output of Stage 3 is a complete analytical briefing — numbered findings and directional labels, not raw data to sift through.

Why we do the math ourselves: Language models are unreliable at statistical analysis. They'll confidently compute wrong percentages or spot trends that don't exist. By pre-computing every observation with explicit thresholds and formulas, the AI's job is strictly editorial: turn verified facts into readable prose.

Stage 4: External Context

On-chain data alone misses context. A 30% TPS drop on a random Tuesday means one thing. A 30% TPS drop during a market-wide selloff means something else. We pull three external sources:

Market sentiment: 14-day history of a well-known crypto sentiment index (values 0–100 with classifications like "fear" or "greed"). The model uses this to subtly calibrate tone without ever naming the index directly.
DeFi ecosystem data: Total value locked on MegaETH, daily and weekly change percentages, 14-day history, and stablecoin supply breakdown (minted vs bridged). Only mentioned in digests if movement exceeds 5% daily or 15% weekly.
Ecosystem milestones: Progress on MegaETH's network goals — live apps, qualified applications, and other public metrics scraped from official sources.

Each source is fault-tolerant. If any API fails, the digest proceeds without it. External data supplements the narrative; it never drives it.

Stage 5: Prompt Construction & Generation

Now we assemble two prompts:

The system prompt defines the voice, constraints, and output structure. Key rules:

Voice: "Senior analyst writing for informed peers" — not a newsletter trying to be clever
Strict neutrality: never promote, advertise, or endorse any project
Include 5–8 links (daily) or 8–12 (weekly), weaving them naturally into text
Place up to 5 chart markers where they support the narrative
Prescribed structure: daily digests get 4 sections (Week So Far / Today's Story / Network Health / Takeaway); weekly reports get 6

The data prompt contains all the pre-computed data from stages 1–4, organized into clearly labeled sections. Trend tables, computed observations, DApp leaderboards, detected anomalies, network overview, and external data — each tagged so the model knows exactly what it's reading.

The model's job is editorial: weave verified facts into a coherent narrative. It decides emphasis, ordering, what's interesting enough to highlight, and where charts should go. It doesn't do math.

Stage 6: Post-Processing & Charts

The AI returns markdown. We then:

Extract metadata: Title, summary (first paragraph stripped of formatting), word count, link count.
Render to HTML: Markdown is converted to semantic HTML.
Inject charts: The model places chart markers in its text (e.g., "place a 14-day TPS chart here"). We replace each marker with a server-rendered SVG chart — line charts for trends, bar charts for DApp rankings. These are real inline SVGs, not images or JavaScript. They render in any context: email, RSS reader, search engine crawler. Dark-themed, matching our dashboard style.
Add overview charts: Every daily digest automatically gets two charts prepended after the title — daily transaction count and unique wallets over 28 days. These provide instant visual context before the reader gets into the text.
Resolve addresses: Weekly reports auto-linkify any bare contract addresses in the text, turning them into clickable links to our contract explorer.

What the AI Sees vs. What It Decides

This distinction matters. The AI receives:

Pre-computed trend observations with explicit directional labels
Anomalies with severity ratings, exact deviation values, and structured metadata
Ranked DApp leaderboards with pre-calculated metrics
External context with usage rules ("mention TVL only if >5% change")

The AI decides:

What's the most interesting story in today's data?
Which anomalies deserve prominent coverage vs. a brief mention?
How to connect on-chain trends with external context
Where charts add value to the narrative
The human-readable phrasing of quantified observations

This split is deliberate. Statistical facts shouldn't depend on a language model's arithmetic. Editorial judgment — what's interesting, what connects, what matters — is where the model adds value.

Cadence

Daily digests generate every morning at 00:30 UTC. Weekly reports follow at 01:00 UTC on Mondays. Each daily digest runs 500–700 words with up to 7 charts; weekly reports go deeper at 1,200–1,500 words. The AI generation itself costs a few cents per digest — the real investment is in the data pipeline and statistical analysis that happens before the model ever sees a prompt.

We store the full prompts, token counts, generation time, and cost for every digest — partly for accounting, partly so we can audit the model's inputs if a digest ever says something questionable.

What We Got Wrong (and Fixed)

The first version of this system was simpler — and worse. Lessons learned:

Letting the AI do math. Early digests had the model computing percentage changes from raw data tables. It got them wrong regularly. Not by a lot, but enough to erode trust. Now every number the model quotes has been pre-computed on our servers.

Missing downtime awareness. Our anomaly detectors initially didn't account for data collection gaps. A 2-hour outage in our pipeline would show up as a "dramatic network trough" in the next digest. Now every detector checks for data completeness before flagging events.

Over-relying on external data. Early prompts weighted market sentiment too heavily. Digests would lead with "amid broader market fear" when the on-chain data was actually interesting on its own. We now have explicit thresholds: external data supplements the narrative only when changes are significant.

Charts as images. First iteration rendered charts as PNG images. They looked blurry on high-DPI screens, didn't render in RSS readers, and added load time. Moving to inline SVG solved all three problems and made charts indexable by search engines.

What This Means for Readers

When you read our daily or weekly digests:

Every percentage change and trend direction was computed with explicit thresholds, not hallucinated
Every anomaly passed a statistical significance test before being included in the prompt
The charts are real data rendered server-side, not AI-generated images
External context (market sentiment, TVL) is used to calibrate tone, not to speculate
The model is instructed to be neutral — no endorsements, no promotion, no unverified claims

The AI is a writer. The data, the analysis, and the statistical rigor come from our pipeline. We think that's the right way to use language models for reporting: trust them with prose, not with math.

Read the Digests

Browse the full archive at miniblocks.io/digest. Daily digests cover the last 24 hours with a two-week trend backdrop. Weekly reports go deeper on DApp-level rankings and week-over-week changes across every active contract on MegaETH. Both are produced automatically, with the pipeline described above, every day at 00:30 UTC.

Forem: Miniblocks.io