How To

What Samsung’s TRM does and why it matters

The Editor

10 Jan 2026 — 4 min read

In a recent development out of Samsung’s AI lab in Montreal, researchers have introduced a new “Tiny Recursive Model” (TRM) that challenges the prevailing notion that more parameters = more intelligence.)

Key ideas & claims

Tiny footprint: TRM has only ~7 million parameters—orders of magnitude smaller than typical large language or reasoning models.
Recursive refinement: Instead of doing a single large forward pass, TRM repeatedly “asks itself” whether its current answer can be improved. It refines latent states and outputs over multiple steps.
Deep supervision + adaptive halting: The training regime provides feedback at intermediate steps (not just the final one), and the model learns to decide when to stop refining.
Stronger generalization on structured reasoning tasks: On benchmarks involving logic puzzles, mazes, and abstract reasoning (such as ARC‑AGI, Sudoku-Extreme, Maze-Hard), TRM matches or surpasses much larger models. For example, it achieves ~87% accuracy on Sudoku-Extreme and ~85% on Maze-Hard.
Simplicity over complexity: TRM is a simplification of a prior model called HRM (Hierarchical Reasoning Model), removing the extra networks, hierarchical structure, and more complex mathematics, while still improving performance.
Practical significance: Because of its small size, TRM can run on modest hardware and with lower energy costs. This opens doors for small labs, edge devices, or privacy-sensitive environments where massive models are impractical.

However, some caveats to keep in mind:

TRM is currently tested on narrow, structured reasoning tasks (puzzles, mazes, symbolic grids) rather than general language or multimodal tasks.
Its performance on open-ended text, common-sense reasoning, or large-scale generative problems is not established.
The approach is relatively new and will need replication and extension to validate robustness across domains.

Overall, TRM offers intriguing evidence that architectural cleverness (recursion + refinement) might substitute for brute-force scale—at least in specialized reasoning domains.

How to run TRM yourself: step‑by‑step

If you’d like to experiment with TRM, the authors have made the code open source (GitHub) and provided instructions. Below is a distilled, annotated version of how to get it running and train it.

Prerequisites / environment

Python 3.10 (or similar)
GPU support (CUDA ~12.6)
pip, torch, torchvision, torchaudio
Optional: Weights & Biases for experiment tracking

1. Clone the repository

git clone https://github.com/SamsungSAILMontreal/TinyRecursiveModels.git
cd TinyRecursiveModels

2. Install dependencies

pip install --upgrade pip wheel setuptools
pip install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126
pip install -r requirements.txt
pip install --no-cache-dir --no-build-isolation adam-atan2
wandb login YOUR_API_KEY  # optional

3. Prepare datasets

ARC‑AGI-1 / ARC‑AGI-2:

python -m dataset.build_arc_dataset   --input-file-prefix kaggle/combined/arc-agi   --output-dir data/arc1concept-aug-1000   --subsets training evaluation concept   --test-set-name evaluation

Sudoku‑Extreme:

python dataset/build_sudoku_dataset.py   --output-dir data/sudoku-extreme-1k-aug-1000   --subsample-size 1000 --num-aug 1000

Maze‑Hard:

python dataset/build_maze_dataset.py

4. Run experiments

ARC‑AGI-1 (4 GPUs):

torchrun --nproc-per-node 4 pretrain.py arch=trm   data_paths="[data/arc1concept-aug-1000]"   arch.L_layers=2 arch.H_cycles=3 arch.L_cycles=4   +run_name=pretrain_att_arc1concept_4 ema=True

Sudoku‑Extreme (single GPU):

python pretrain.py arch=trm   data_paths="[data/sudoku-extreme-1k-aug-1000]"   epochs=50000 eval_interval=5000 lr=1e-4   arch.mlp_t=True arch.pos_encodings=none   arch.L_layers=2 arch.H_cycles=3 arch.L_cycles=6   +run_name=pretrain_mlp_t_sudoku ema=True

Maze‑Hard (4 GPUs):

torchrun --nproc-per-node 4 pretrain.py arch=trm   data_paths="[data/maze-30x30-hard-1k]"   epochs=50000 eval_interval=5000 lr=1e-4   arch.L_layers=2 arch.H_cycles=3 arch.L_cycles=4   +run_name=pretrain_att_maze30x30 ema=True

Running TRM on a Desktop

You don’t need a server rack or enterprise GPUs to explore Samsung’s Tiny Recursive Model.
The model has only 7 million parameters, which makes it ideal for experimentation even on a single-GPU desktop.

What counts as “average-good hardware”

You will need at least:

At least 16 GB system RAM
An SSD for dataset caching
GPU at least as strong powerful as an NVIDIA RTX 3070

Even without multiple GPUs, TRM’s recursion approach means you can train smaller versions effectively.

Scaling TRM down for your PC

Setting	Original (Lab)	Home-friendly
GPUs	4× H100	1× RTX 3060–4090
Batch size	64–128	8–16
arch.H_cycles	3	2
arch.L_cycles	4–6	2–3
Epochs	50 000	5 000–10 000 (for tests)
Dataset size	1 000–10 000 samples	200–500 samples
Precision	FP32	torch.autocast() (mixed FP16/FP32)

Example command for single GPU:

python pretrain.py   arch=trm   data_paths="[data/sudoku-extreme-small]"   epochs=10000 eval_interval=1000   batch_size=8   lr=2e-4   arch.L_layers=2 arch.H_cycles=2 arch.L_cycles=3   +run_name=small_sudoku_test ema=False

This smaller configuration trains in 2–4 hours on an RTX 4080 and 6–10 hours on an RTX 3060—perfect for exploration.

How to build smaller datasets

You can create mini versions of the reasoning puzzles:

python dataset/build_sudoku_dataset.py   --output-dir data/sudoku-small   --subsample-size 200 --num-aug 200

or for mazes:

python dataset/build_maze_dataset.py --subsample-size 200

This keeps disk usage and training time low while preserving the recursive structure.

Tips for smooth runs

Use mixed precision (torch.autocast) to halve VRAM use.
Disable Weights & Biases logging if you just want local tests (wandb disabled).
Set eval_interval high (e.g., 2000) to reduce validation overhead.
Use SSDs — recursive models reread data often.
Try Sudoku or Maze datasets first before ARC-AGI.

Caveats

Hardware: Expect high GPU utilization; scale down cycles for smaller hardware.
Recursion depth: More cycles = higher reasoning quality but slower training.
Use wandb: Great for tracking progress and comparing runs.
Domain focus: Best for puzzles, reasoning, and symbolic logic—not general text.
Experimentation encouraged: The repo is meant for open research and tinkering.

Reference:
“Less is More: Recursive Reasoning with Tiny Networks” – Alexia Jolicoeur‑Martineau, Samsung SAIL Montreal (2025)