NEWΔ-IRIS·ICML 2024

WMWM Arena: World Model Predictions vs Real Atari Gameplay Benchmark

What do World Models dream?

Crafter rollout surface for world model evaluation. The environment shell is live now; richer Crafter metrics and visualizations are landing next.

22 World ModelsCrafter64x64 @ 10fps
Crafter Pilot

Pilot metrics, scripted expert, and rollout scaffolding

Crafter is in pilot mode. Δ-IRIS is the first planned evaluated model, while achievement extraction, spatial persistence, and seed-level rollout summaries are being staged behind this environment shell.

Δ-IRISfirst planned model
10 × 500seed/frame rollout target
1published Crafter evaluations

Achievement timeline

Current seed summary renders from `time_alignment` and `metric_status`.

1Start: day
2End: day
3Daylight observed
4Night observed
5Status: ranking-adopted

Inventory HUD

Current HUD summary renders from published Crafter pilot metrics.

achievement_f11.00
spatial_persistence1.00
metric_statusranking-adopted
ssim_t5000.37

Seed distribution

Published seed rows now render from `delta_iris.games[].crafter_metrics`.

S01
S10
S02
S03
S04
S05
S06
S07
S08
S09

achievement_f1

Pilot gate pending

Unlock event precision/recall across the 22 Crafter achievements.

spatial_persistence

Pilot gate pending

Tracks whether revisited semantic tiles remain consistent over time.

ssim_t500

Exploratory

Long-horizon descriptive similarity at 500 frames.

Current manifest ships 10 published Crafter rollout seed row(s). This section can evolve without changing routes again. Focus model: Δ-IRIS.

Visual Comparison
Seed 01
SSIM 0.3748Δ-IRISNEW
Ground Truth
Seed 01 real gameplay
Δ-IRIS Prediction
Seed 01 Δ-IRIS prediction
0.3748
148.2
0.3266
500
Frames
View details →
Featured Games
Model Leaderboard
#
Model
Architecture
HNS
JEDi
Venue
Status
-
TWISTERBurchi, Timofte
Transformer + CPC
-
ICLR 2025
Pending
-
DIAMONDAlonso et al.
Diffusion (EDM)
-
NeurIPS 2024 Spotlight
Pending
-
EDELINELee, Lin, Sun, Lee
Mamba SSM + Diffusion
-
NeurIPS 2025
Pending
-
EfficientZero V2Ye et al.
MuZero + MCTS
-
ICML 2024 Spotlight
Pending
-
EfficientZeroYe et al.
MuZero + Self-Supervised
-
NeurIPS 2021
Pending
-
Cohen et al.
Modular Transformer
-
2025
Pending
-
HarmonyDreamMa et al.
RSSM + Harmony Loss
-
ICML 2024
Pending
-
OC-STORMZhang et al.
Object-Centric Transformer
-
2024
Pending
-
REMCohen et al.
RetNet + Parallel Observation
-
ICML 2024
Pending
-
EMERALDBurchi, Timofte
MaskGIT + Spatial Latent
-
ICML 2025
Pending
-
GIT-STORMMicheli et al.
Transformer + MaskGIT Prior
-
ICLR 2025
Pending
-
DreamerV3Hafner et al.
RSSM + Symlog
-
JMLR 2024
Pending
-
STORMZhang et al.
Stochastic Transformer
-
NeurIPS 2023
Pending
-
Δ-IRISMicheli, Alonso, Fleuret
Transformer + Delta Tokens
-
ICML 2024
Pending
-
DARTAgarwal, Andrews, Kahou
Fully Discrete Tokens
-
ICML 2024
Pending
-
IRISMicheli, Alonso, Fleuret
Transformer + dVAE
-
ICLR 2023
Pending
-
TWMChen et al.
Transformer + VQ-VAE
-
ICLR 2023
Pending
-
SimPLeKaiser et al.
VAE + Video Prediction
-
ICLR 2020
Pending
-
DramaWang et al.
Mamba-2 SSM
-
ICLR 2025
Pending
-
DreamerV2Hafner et al.
RSSM + Discrete Latent
-
ICLR 2021
Pending
-
BBFSchwarzer et al.
Scaled ResNet DQN
-
ICML 2023
Pending
-
SPRSchwarzer et al.
Rainbow DQN + Self-Prediction
-
ICLR 2021
Pending