One number for how well a compressor does on real data.
Squishy is a fixed set of real, freely-shareable files — prose, code, logs, genomes, tables, images, binaries — picked to cover the range of things people actually compress, from a few megabytes to several gigabytes. Run your tool over it and you get a single Squishy Score you can cite and compare. It's the 2026 successor to Silesia.
One command. Hand it your compressor as a plain stdin → stdout command —
it streams the corpus, runs your tool over every file, and prints your score:
uv run squishy-calculate --cmd "zstd -19 -c"
Works with any codec the same way: --cmd "xz -9 -c", --cmd "brotli -q 11 -c",
or your own --cmd "./mytool -c". Add --verify --decompress "zstd -dc" to
prove it's lossless; use --cmd "mytool -o {out} {in}" for tools that read/write files
instead of pipes. It caches as it goes, so re-runs are instant.
Each dot is one artifact, placed by properties of its bytes — measured directly, never from how a compressor performs: how random (entropy), how repetitive, and how far back the repeats sit (local vs long-range); dot size = file size. The files are sparse — not a dense grid — but representative of the whole; these are the dimensions along which compressors are known to behave differently, so spanning them is a principled reason each file is here (they describe coverage, they don't predict a ratio). Drag to rotate · scroll to zoom · hover for detail.
| tool | Squishy Score (×) | corpus bpb | Prose | Code & Web | Structured | Tabular / DB | Binary & Media |
|---|---|---|---|---|---|---|---|
| zpaq v7.15 | 7.85× | 2.620 | 5.32× | 7.34× | 19.26× | 8.37× | 4.74× |
| xz -9 v5.8.3 | 5.76× | 2.977 | 4.15× | 5.37× | 11.91× | 5.74× | 4.15× |
| brotli -11 v1.2.0 | 5.69× | 3.021 | 4.11× | 5.40× | 12.60× | 5.32× | 4.01× |
| zstd -22 v1.5.7 | 5.46× | 3.092 | 4.10× | 5.27× | 11.96× | 4.99× | 3.78× |
| zstd -19 v1.5.7 | 5.40× | 3.106 | 4.07× | 5.22× | 11.56× | 4.95× | 3.76× |
| bzip2 -9 v1.0.8 | 5.08× | 3.278 | 4.02× | 5.18× | 12.21× | 4.11× | 3.24× |
| gzip -9 | 3.99× | 3.495 | 2.84× | 4.00× | 8.38× | 3.53× | 3.00× |
Draft, partial: these run only the small members of the corpus — the large rungs are pending, so this is not yet a Squishy Score. Click a column to sort; scales to any number of tool versions.
dickensNine novels by Charles Dickens — English prose.
A TALE OF TWO CITIES
A STORY OF THE FRENCH REVOLUTION
By Charles Dickens
CONTENTS
Book the First--Recalled to Life
CHAPTER I The Period
CHAPTER II The Mail
CHAPTER III The Night Shadows
CHAPTER IV The Preparation
CHAPTER V The Wine-shop
CHAPTER VI The Shoemaker
Book the Second--the Golden Thread
CHAPTER I Five Years Later
CHAPTER II A SightaozoraCollected works of Natsume Sōseki — Japanese literary prose.
夏目漱石 カーライル博物館 カーライル博物館 夏目漱石 公園の片隅に通りがかりの人を相手に演説をしている者がある。向うから来た釜形の尖った帽子を被ずいて古ぼけた外套を猫背に着た爺さんがそこへ歩みを佇めて演説者を見る。演説者はぴたりと演説をやめてつかつかとこの村夫子のたたずめる前に出て来る。二人の視線がひたと行き当る。演説者は濁りたる田舎調子にて御前はカ 余は晩餐前に公園を散歩するたびに川縁の椅子に腰を卸して向側を眺める。倫敦に固有なる濃霧はことに岸辺に多い。余が桜の杖に頤を支えて真正面を見ていると、遥かに対岸の往来を這い廻る霧の影は次第に濃くなって五階立の町続きの下からぜんぜんこの揺曳くものの裏に薄れ去って来る。しまいには遠き未来の世を眼前に引き カーライルはおらぬ。演説者も死んだであろう。しかしチェルシーは以前のごとく存在している。否彼の多年住み古した家屋敷さえ今なお儼然と保存せられてある。千七百八年チェイン・ロウが出来てより以来幾多の主人を迎え幾多の主人を送ったかは知らぬがとにかく今日まで昔のままで残っている。カーライルの歿後は有志家の 文学者でチェルシーに縁故のあるものを挙げると昔しはトマス・モア、下ってスモレット、なお下ってカーライルと同時代にはリ・ハントなどがもっとも著名である。ハントの家はカーライルの直近傍で、現にカーライルがこの家に引き移った晩尋ねて来たという事がカーライルの記録に書いてある。またハントがカーライルの細君 チェイン・ローは河岸端の往来を南に折れる小路でカーライルの家はその右側の中頃に在る。番地は二十四番地だ。 毎日のように川を隔てて霧の中にチェルシーを眺めた余はある朝ついに橋を渡ってその有名なる庵りを叩いた。 庵りというと物寂びた感じがある。少なくとも瀟洒とか風流とかいう念と伴う。しかしカーライルの庵はそんな脂っこい華奢なものではない。往来から直ちに戸が敲けるほどの道傍に建てられた四階造の真四角な家である。 出張った所も引き込んだ所もないのべつに真直に立っている。まるで大製造場の煙突の根本を切ってきてこれに天井を張って窓をつけたように見える。 これが彼が北の田舎から始めて倫敦へ出て来て探しに探し抜いて漸々の事で探し宛てた家である。彼は西を探し南を探しハンプステッドの北まで探してついに恰好の家を探し出す事が出来ず、最後にチェイン・ローへ来てこの家を見てもまだすぐに取きめるほどの勇気はなかったのである。四千万の愚物と天下を罵った彼も住家には 余は今この四角な家の石階の上に立って鬼の面のノッカーをコツコツと敲く。しばらくすると内から五十恰好の肥った婆さんが出て来て御這入りと云う。最初から見物人と思っているらしい。婆さんはやがて名簿のようなものを出して御名前をと云う。余は倫敦滞留中四たびこの家に入り四たびこの名簿に余が名を記録した覚えがあ 案内者はいずれの国でも同じものと見える。先っきから婆さんは室内の絵画器具について一々説明を与える。五十年間案内者を専門に修業したものでもあるまいが非常に熟練したものである。何年何月何日にどうしたこうしたとあたかも口から出任せに喋舌っているようである。しかもその流暢な弁舌に抑揚があり節奏がある。調子
monorepoThe lib/ source tree of the LLVM Clang C++ compiler.
lib/CMakeLists.txt lib/CIR/CMakeLists.txt lib/CIR/Dialect/CMakeLists.txt lib/CIR/Dialect/IR/CMakeLists.txt lib/CIR/Dialect/IR/CIRDialect.cpp lib/CrossTU/CrossTranslationUnit.cpp lib/CrossTU/CMakeLists.txt lib/Index/IndexBody.cpp lib/Index/CMakeLists.txt lib/Index/IndexingContext.cpp lib/Index/IndexingAction.cpp lib/Index/CommentToXML.cpp … 1433 files total
//===- CIRDialect.cpp - MLIR CIR ops implementation -----------------------===// // // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // See https://llvm.org/LICENSE.txt for license information. // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // //===----------------------------------------------------------------------===// // // This file implements the CIR dialect and its operations. // //===----------------------------------------------------------------------===// #include <clang/CIR/Dialect/IR/CIRDialect.h>
minjsThe minified Plotly.js charting library — one big line of JavaScript.
/**
* plotly.js v2.27.0
* Copyright 2012-2023, Plotly, Inc.
* All rights reserved.
* Licensed under the MIT license
*/
/*! For license information please see plotly.min.js.LICENSE.txt */
!function(t,e){"object"==typeof exports&&"object"==typeof module?module.exports=e():"function"==typeof define&&define.amd?define([],e):"object"==typeomarkupShakespeare's plays, marked up in XML.
a_and_c.xml all_well.xml as_you.xml catalog com_err.xml coriolan.xml cymbelin.xml dream.xml dsssl.dtd fot.dtd hamlet.xml hen_iv_1.xml … 48 files total
<?xml version="1.0"?> <!DOCTYPE PLAY SYSTEM "play.dtd"> <PLAY> <TITLE>The Tragedy of Antony and Cleopatra</TITLE> <FM> <P>ASCII text placed in the public domain by Moby Lexical Tools, 1992.</P> <P>SGML markup by Jon Bosak, 1992-1994.</P> <P>XML version by Jon Bosak, 1996-1999.</P> <P>The XML markup in this version is Copyright © 1999 Jon Bosak. This work may freely be distributed on condition that it not be modified or altered in any way.</P> </FM> <PERSONAE>
json20,000 magnitude-4.5+ earthquakes, 2010–2024 (USGS GeoJSON).
{"type":"FeatureCollection","metadata":{"generated":1780043074000,"url":"https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2010-01-01&endtime=2024-01-01&minmagnitude=4.5&orderb
{"type":"Feature","properties":{"mag":4.6,"place":"south of the Fiji Islands","time":1704042288597,"updated":1709415575040,"tz":null,"url":"https://earthquake.usgs.gov/earthquakes/eventpage/us6000m0urlogA NASA web server's access log from July 1995.
199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245 unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 3985 199.120.110.21 - - [01/Jul/1995:00:00:09 -0400] "GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0" 200 4085 burger.letters.com - - [01/Jul/1995:00:00:11 -0400] "GET /shuttle/countdown/liftoff.html HTTP/1.0" 304 0 199.120.110.21 - - [01/Jul/1995:00:00:11 -0400] "GET /shuttle/missions/sts-73/sts-73-patch-small.gif HTTP/1.0" 200 4179 burger.letters.com - - [01/Jul/1995:00:00:12 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 304 0 burger.letters.com - - [01/Jul/1995:00:00:12 -0400] "GET /shuttle/countdown/video/livevideo.gif HTTP/1.0" 200 0 205.212.115.106 - - [01/Jul/1995:00:00:12 -0400] "GET /shuttle/countdown/countdown.html HTTP/1.0" 200 3985 d104.aa.net - - [01/Jul/1995:00:00:13 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 3985 129.94.144.152 - - [01/Jul/1995:00:00:13 -0400] "GET / HTTP/1.0" 200 7074 unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /shuttle/countdown/count.gif HTTP/1.0" 200 40310 unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786 unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/KSC-logosmall.gif HTTP/1.0" 200 1204 d104.aa.net - - [01/Jul/1995:00:00:15 -0400] "GET /shuttle/countdown/count.gif HTTP/1.0" 200 40310
genomeSequencing reads from an E. coli genome (FASTQ).
@DRR002013.1 HWUSI-EAS679_0026:1:1:5823:1110/1 CATCGCGATCCACGCTCGCTGGCGTTGTCCGCCAGAAAGGGTATCCACGCTTTGANCTGCCAGATGAGTTATTCCCGCGGNCTGCNTTGCTTTCGTTACC + ???????????????????????????????????????????????????????????????????????????????????????????????????? @DRR002013.2 HWUSI-EAS679_0026:1:1:6733:1111/1 ATTGCGCAACTGCCATCACCACCGTGCATGTCAGCGATCGTGGTCACGCTGGATTNGTCACCCTCGCCGCAGAAGATTACNAACTNGGCGCCCGGGGGCG + ???????????????????????????????????????????????????????????????????????????????????????????????????? @DRR002013.3 HWUSI-EAS679_0026:1:1:7437:1115/1 TTTGTAACAGAATACCATAATGTTGGTGTGTGTGTTCTTATCTGGTTAAGAGAAAGTGAAAAAAACACAGCGAAAAGAAANCGAANATGTGACAAATATC + ???????????????????????????????????????????????????????????????????????????????????????????????????? @DRR002013.4 HWUSI-EAS679_0026:1:1:8755:1109/1 GTGAAGATTCAGTTTCAGTCCTTCATCCTGCTCTGCACACCAGGCTTCCAGATCCNTCGCTGGACGGATTTCCGGCACCCNGTTANGACCACACTGCTCA
csvDaily weather observations from NOAA's global climate network, 2024 (CSV).
| STATION | DATE | ELEMENT | VALUE | M_FLAG | Q_FLAG | S_FLAG | OBS_TIME |
|---|---|---|---|---|---|---|---|
| ASN00009647 | 20240101 | PRCP | 30 | a | |||
| ASN00009678 | 20240101 | PRCP | 0 | a | |||
| ASN00009692 | 20240101 | PRCP | 0 | a | |||
| ASN00009710 | 20240101 | PRCP | 0 | a | |||
| ASN00009714 | 20240101 | PRCP | 0 | a | |||
| ASN00009738 | 20240101 | PRCP | 0 | a | |||
| ASN00009741 | 20240101 | TMAX | 209 | S | |||
| ASN00009741 | 20240101 | PRCP | 0 | S |
parquetU.S. airline on-time flight records (Bureau of Transportation Statistics) — stored column-wise as Apache Parquet.
| Year | Quarter | Month | DayofMonth | DayOfWeek | FlightDate | Reporting_Airline |
|---|---|---|---|---|---|---|
| 2024 | 1 | 1 | 8 | 1 | 2024-01-08 | 9E |
| 2024 | 1 | 1 | 9 | 2 | 2024-01-09 | 9E |
| 2024 | 1 | 1 | 10 | 3 | 2024-01-10 | 9E |
| 2024 | 1 | 1 | 11 | 4 | 2024-01-11 | 9E |
| 2024 | 1 | 1 | 12 | 5 | 2024-01-12 | 9E |
| 2024 | 1 | 1 | 15 | 1 | 2024-01-15 | 9E |
sqliteUSDA's nutrition database — foods, nutrients, and portions across 17 related tables (SR Legacy).
food (5 columns)| fdc_id | data_type | description | food_category_id | publication_date |
|---|---|---|---|---|
| 167512 | sr_legacy_food | Pillsbury Golden L | 18 | 2019-04-01 |
| 167513 | sr_legacy_food | Pillsbury, Cinnamo | 18 | 2019-04-01 |
| 167514 | sr_legacy_food | Kraft Foods, Shake | 18 | 2019-04-01 |
| 167515 | sr_legacy_food | George Weston Bake | 18 | 2019-04-01 |
| 167516 | sr_legacy_food | Waffles, buttermil | 18 | 2019-04-01 |
| 167517 | sr_legacy_food | Waffle, buttermilk | 18 | 2019-04-01 |
exeA compiled Linux executable — the Hugo static-site generator.
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............ 00000010: 0200 3e00 0100 0000 405b 4400 0000 0000 ..>.....@[D..... 00000020: 4000 0000 0000 0000 e055 b903 0000 0000 @........U...... 00000030: 0000 0000 4000 3800 0e00 4000 2800 2700 ....@.8...@.(.'. 00000040: 0600 0000 0400 0000 4000 0000 0000 0000 ........@....... 00000050: 4000 4000 0000 0000 4000 4000 0000 0000 @.@.....@.@..... 00000060: 1003 0000 0000 0000 1003 0000 0000 0000 ................ 00000070: 0800 0000 0000 0000 0300 0000 0400 0000 ................ 00000080: 5003 0000 0000 0000 5003 4000 0000 0000 P.......P.@..... 00000090: 5003 4000 0000 0000 1c00 0000 0000 0000 P.@.............
photoNASA's “Blue Marble” — Earth photographed from Apollo 17.

movieA clip from the open film Big Buck Bunny (H.264 video).

weightsThe trained weights of a small neural network (safetensors).
| tensor | dtype | shape |
|---|---|---|
| embeddings.position_ids | I64 | 1×512 |
| embeddings.LayerNorm.bias | F32 | 384 |
| embeddings.LayerNorm.weight | F32 | 384 |
| embeddings.position_embeddings.weight | F32 | 512×384 |
| embeddings.token_type_embeddings.weight | F32 | 2×384 |
| embeddings.word_embeddings.weight | F32 | 30522×384 |
| encoder.layer.0.attention.output.LayerNorm.bias | F32 | 384 |
| encoder.layer.0.attention.output.LayerNorm.weight | F32 | 384 |
00000000: 902c 0000 0000 0000 7b22 5f5f 6d65 7461 .,......{"__meta
00000010: 6461 7461 5f5f 223a 7b22 666f 726d 6174 data__":{"format
00000020: 223a 2270 7422 7d2c 2265 6d62 6564 6469 ":"pt"},"embeddi
00000030: 6e67 732e 706f 7369 7469 6f6e 5f69 6473 ngs.position_ids
00000040: 223a 7b22 6474 7970 6522 3a22 4936 3422 ":{"dtype":"I64"
00000050: 2c22 7368 6170 6522 3a5b 312c 3531 325d ,"shape":[1,512]
Large files spanning the kinds and the size axis (~0.3–3 GB). The GB rungs of compressible kinds (csv, columnar, genome, text) are scored members of the corpus; the model-weights ladder (135M → 0.5B → 1.5B params) and large media are near-incompressible throughput / behavior diagnostics, not scored. This tier is still being assembled — see the readiness plan.
weights-smollm2-135m.safetensorsSmolLM2-135M — a small (135M-parameter) language model's weights (Apache-2.0). The middle rung of the weights size-ladder.
5af571cbf074e6d2… · source ↗nasa-http-jul-aug-1995.logScale-tier file — for throughput / large-window testing (not scored).
35c38d9465a8ed27… · source ↗big-buck-bunny-1080p.movScale-tier file — for throughput / large-window testing (not scored).
dc2146a2b1172def… · source ↗bts-ontime-2022-2024.parquetScale-tier file — for throughput / large-window testing (not scored).
acb6eeb73e9c4449… · source ↗weights-qwen2.5-0.5b.safetensorsQwen2.5-0.5B — a 0.5B-parameter language model's weights (Apache-2.0). The second rung of the weights size-ladder.
fdf756fa7fcbe740… · source ↗enwik9.txtScale-tier file — for throughput / large-window testing (not scored).
159b85351e5f76e6… · source ↗ecoli-DRR002013-full.fastqScale-tier file — for throughput / large-window testing (not scored).
ff3de7024de4f45e… · source ↗noaa-ghcn-daily-2024-full.csvScale-tier file — for throughput / large-window testing (not scored).
70baf8b1fe829889… · source ↗clang-releases-16-17-18-19.tarScale-tier file — for throughput / large-window testing (not scored).
e8518848a41185c7… · source ↗llvm-project-19.1.0.src.tarScale-tier file — for throughput / large-window testing (not scored).
bb4ae7add97894e6… · source ↗weights-qwen2.5-1.5b.safetensorsQwen2.5-1.5B — a larger (1.5B-parameter) language model's weights (Apache-2.0). The top rung of the ladder; multi-GB, for large-window and throughput work.
dd924a11b4c220f3… · source ↗noaa-ghcn-daily-2021-2023.csvScale-tier file — for throughput / large-window testing (not scored).
9111537b27d9ed83… · source ↗