Benchmark programs

Benchmark AdMultaDB against real ndjson workloads.

Start with the live ndjson benchmark, point it at a real provider folder, inspect query-ready assets, apply Native acceleration, and measure the five required queries through the AdMultaDB runtime.

Start intake Open monitor

Benchmark Catalog

One benchmark center, multiple proof paths.

NDJSON provider benchmark

Use a real Bluesky ndjson provider, prepare query-ready assets, and run Q1 through Q5 through the live AdMultaDB runtime with visible timings and result tables.

Live today

Expanding benchmark library

This benchmark center is designed to grow alongside additional transform, query, and stream-processing proofs without changing the navigation model.

More benchmark tracks can sit beside this one.

Provider Setup

Start with the ndjson provider benchmark.

Point the benchmark at the Bluesky ndjson corpus, scan the selected 1, 10, 100, or 1000 file tier, prepare the query paths, and run the full Q1-Q5 workloads. The interactive preview stays bounded for a quick look, while benchmark execution always runs the full selected tier.

Provider path

Folder containing the benchmark ndjson/json corpus. If you need a single file instead, paste it directly.

Index root

Choose the AdMultaDB folder that should hold or reuse the index_1, index_10, index_100, and index_1000 workspaces.

Dataset size Matches the benchmark size tiers: 1, 10, 100, 1000.

Preview row cap Upper bound for the interactive quick-look preview only. Full benchmark runs ignore this cap.

Benchmark ready Choose a benchmark path, scan the dataset, then prepare query paths before running Q1-Q5.

Resolved index directory: not configured
Files directory: not configured
Detected benchmark artifacts: none

Provider Scan

What the benchmark contract sees.

0 Files selected

-- Total bytes

-- Scan time

0 Row files discovered

0 Indexed preview rows

0/5 Queries run

No benchmark files selected yet. Scan the dataset path to populate the working set.

ndjson Preview

Sample lines from the selected dataset

See the real benchmark payload shape: did, kind, nested commit.operation, commit.collection, and time_us.

Scan a dataset to load ndjson preview lines.

Query Readiness

Query acceleration state

0 Rows indexed

0 Collections tracked

0 Q3 hour buckets

0 Post users indexed

0 Skipped lines

No Preview cap reached

Q1 aggregate index: 0 collection keys
Q2 unique-user index: 0 user memberships
Q4 earliest post index: 0 users
Q5 activity span index: 0 users

Required Queries

Run the Q1-Q5 benchmark suite.

Q1

Collection counts

Count all records by collection.

Requirement: collection_count
Strategy: index_fast

SELECT data.commit.collection AS event, count() AS count FROM bluesky GROUP BY event ORDER BY count DESC

kindIndex.am16

Run this query to see timing and output for the current dataset.

Q2

Collection counts with unique users

Count commit/create events by collection and compute unique users.

Requirement: collection_count, collection_unique_users
Strategy: index_fast

SELECT data.commit.collection AS event, count() AS count, uniqExact(data.did) AS users FROM bluesky WHERE data.kind = 'commit' AND data.commit.operation = 'create' GROUP BY event ORDER BY count DESC

kindIndex.am16kindIndex.am14row_files.am13

Run this query to see timing and output for the current dataset.

Q3

Hourly collection counts

Count post, repost, and like events by hour of day.

Requirement: hourly_collection_count
Strategy: index_fast

SELECT data.commit.collection AS event, toHour(fromUnixTimestamp64Micro(data.time_us)) AS hour_of_day, count() AS count FROM bluesky WHERE data.kind = 'commit' AND data.commit.operation = 'create' AND data.commit.collection in ['app.bsky.feed.post', 'app.bsky.feed.repost', 'app.bsky.feed.like'] GROUP BY event, hour_of_day ORDER BY hour_of_day, event

row_files.am13emit_q3_hour_collection.tsvsideload_step2_q3.am22

Run this query to see timing and output for the current dataset.

Q4

First post top 3

Find the earliest three users by first post timestamp.

Requirement: user_first_post_topk
Strategy: index_fast

SELECT data.did::String AS user_id, min(fromUnixTimestamp64Micro(data.time_us)) AS first_post_ts FROM bluesky WHERE data.kind = 'commit' AND data.commit.operation = 'create' AND data.commit.collection = 'app.bsky.feed.post' GROUP BY user_id ORDER BY first_post_ts ASC LIMIT 3

row_files.am13emit_q4_first_post_top3.tsvsideload_step2_post.am23sideload_step2_q4_rank.am24

Run this query to see timing and output for the current dataset.

Q5

Activity span top 3

Find the three users with the largest post activity span.

Requirement: user_activity_span_topk
Strategy: index_fast

SELECT data.did::String AS user_id, date_diff('milliseconds', min(fromUnixTimestamp64Micro(data.time_us)), max(fromUnixTimestamp64Micro(data.time_us))) AS activity_span FROM bluesky WHERE data.kind = 'commit' AND data.commit.operation = 'create' AND data.commit.collection = 'app.bsky.feed.post' GROUP BY user_id ORDER BY activity_span DESC LIMIT 3

row_files.am13emit_q5_activity_span_top3.tsvsideload_step2_post.am23sideload_step2_q5_rank.am25

Run this query to see timing and output for the current dataset.

Market Comparison

Compare AdMultaDB against published ndjson engine results.

Use the published comparison table for this ndjson workload to benchmark AdMultaDB alongside commercial and analytical databases. The live AdMultaDB timings below come from this machine; the cross-engine table is loaded from the published results snapshot.

Source: Embedded published comparison snapshot Dataset: Bluesky ndjson Live run: 1 file tier through the actual AdMultaDB query path.Run Q1-Q5 to place AdMultaDB into the official leaderboard.

Q1 --

Q2 --

Q3 --

Q4 --

Q5 --

Suite total --

Metric	ClickHouse	Starrocks	Apache Doris	Elasticsearch	SingleStore	MongoDB	DuckDB	PostgreSQL
Data size	92.72 GiB (x1.00)	179.73 GiB (x1.94)	199.88 GiB (x2.16)	359.58 GiB (x3.88)	218.75 GiB (x2.36)	164.60 GiB (x1.78)	440.14 GiB (x4.75)	615.01 GiB (x6.63)
Data quality	999999258	997999662	999999994	999999101	811999990	893632990	974400000	804000000
Q1	0.225s (x1.00)	0.510s (x2.21)	1.630s (x6.98)	3.884s (x16.57)	43.980s (x187.19)	1054.370s (x4486.72)	3717.611s (x15819.66)	3884.170s (x16528.43)
Q2	3.236s (x1.00)	7.180s (x2.22)	6.610s (x2.04)	28.548s (x8.80)	196.390s (x60.51)	20461.800s (x6303.70)	3721.045s (x1146.35)	4277.800s (x1317.87)
Q3	2.136s (x1.47)	1.450s (x1.00)	1.780s (x1.23)	23.570s (x16.15)	111.105s (x76.11)	1216.240s (x833.05)	3717.631s (x2546.33)	4253.340s (x2913.25)
Q4	0.479s (x1.00)	1.010s (x2.09)	0.930s (x1.92)	8.080s (x16.54)	32.407s (x66.29)	168.797s (x345.21)	3719.273s (x7605.90)	4907.090s (x10034.97)
Q5	0.514s (x1.00)	1.100s (x2.12)	1.030s (x1.98)	8.994s (x17.18)	40.644s (x77.58)	173.268s (x330.68)	3722.804s (x7104.61)	4913.500s (x9376.93)
Suite total	6.59s (x1)	11.25s (x1.71)	11.98s (x1.82)	73.076s (x11.09)	424.526s (x64.42)	23074.475s (x3501.44)	18598.364s (x2822.21)	22235.9s (x3374.19)

Latest Output

Query output

Prepare query paths and run a query to inspect benchmark output rows.

Why This Matters

1 Use the real dataset Validate performance against the same ndjson directory structure the benchmark harness expects.

2 Show readiness before rollout Confirm the matching index_1, index_10, index_100, or index_1000 assets so teams know the dataset is ready for repeatable testing.

3 Preview quickly, run fully Keep the quick-look preview bounded for responsiveness, then switch to the full selected dataset tier when it is time to produce benchmark numbers.

4 Turn timing into proof Run Q1 through Q5 with named steps, result tables, and elapsed time so performance becomes something you can show, not just claim.

An unhandled error has occurred. Reload 🗙