Summary: What Samsung’s TRM does and why it matters
In a recent development out of Samsung’s AI lab in Montreal, researchers have introduced a new “Tiny Recursive Model” (TRM) that challenges the prevailing notion that more parameters = more intelligence.)
Key ideas & claims
- Tiny footprint: TRM has only ~7 million parameters—orders of magnitude smaller than typical large language or reasoning models.
- Recursive refinement: Instead of doing a single large forward pass, TRM repeatedly “asks itself” whether its current answer can be improved. It refines latent states and outputs over multiple steps.
- Deep supervision + adaptive halting: The training regime provides feedback at intermediate steps (not just the final one), and the model learns to decide when to stop refining.
- Stronger generalization on structured reasoning tasks: On benchmarks involving logic puzzles, mazes, and abstract reasoning (such as ARC‑AGI, Sudoku-Extreme, Maze-Hard), TRM matches or surpasses much larger models. For example, it achieves ~87% accuracy on Sudoku-Extreme and ~85% on Maze-Hard.
- Simplicity over complexity: TRM is a simplification of a prior model called HRM (Hierarchical Reasoning Model), removing the extra networks, hierarchical structure, and more complex mathematics, while still improving performance.
- Practical significance: Because of its small size, TRM can run on modest hardware and with lower energy costs. This opens doors for small labs, edge devices, or privacy-sensitive environments where massive models are impractical.
However, some caveats to keep in mind:
- TRM is currently tested on narrow, structured reasoning tasks (puzzles, mazes, symbolic grids) rather than general language or multimodal tasks.
- Its performance on open-ended text, common-sense reasoning, or large-scale generative problems is not established.
- The approach is relatively new and will need replication and extension to validate robustness across domains.
Overall, TRM offers intriguing evidence that architectural cleverness (recursion + refinement) might substitute for brute-force scale—at least in specialized reasoning domains.
How to run TRM yourself: step‑by‑step
If you’d like to experiment with TRM, the authors have made the code open source (GitHub) and provided instructions. Below is a distilled, annotated version of how to get it running and train it.
Prerequisites / environment
- Python 3.10 (or similar)
- GPU support (CUDA ~12.6)
- pip, torch, torchvision, torchaudio
- Optional: Weights & Biases for experiment tracking
1. Clone the repository
bashgit clone https://github.com/SamsungSAILMontreal/TinyRecursiveModels.git cd TinyRecursiveModels
2. Install dependencies
bashpip install --upgrade pip wheel setuptools pip install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126 pip install -r requirements.txt pip install --no-cache-dir --no-build-isolation adam-atan2 wandb login YOUR_API_KEY # optional
3. Prepare datasets
ARC‑AGI-1 / ARC‑AGI-2:
bashpython -m dataset.build_arc_dataset --input-file-prefix kaggle/combined/arc-agi --output-dir data/arc1concept-aug-1000 --subsets training evaluation concept --test-set-name evaluation
Sudoku‑Extreme:
bashpython dataset/build_sudoku_dataset.py --output-dir data/sudoku-extreme-1k-aug-1000 --subsample-size 1000 --num-aug 1000
Maze‑Hard:
bashpython dataset/build_maze_dataset.py
4. Run experiments
ARC‑AGI-1 (4 GPUs):
bashtorchrun --nproc-per-node 4 pretrain.py arch=trm data_paths="[data/arc1concept-aug-1000]" arch.L_layers=2 arch.H_cycles=3 arch.L_cycles=4 +run_name=pretrain_att_arc1concept_4 ema=True
Sudoku‑Extreme (single GPU):
bashpython pretrain.py arch=trm data_paths="[data/sudoku-extreme-1k-aug-1000]" epochs=50000 eval_interval=5000 lr=1e-4 arch.mlp_t=True arch.pos_encodings=none arch.L_layers=2 arch.H_cycles=3 arch.L_cycles=6 +run_name=pretrain_mlp_t_sudoku ema=True
Maze‑Hard (4 GPUs):
bashtorchrun --nproc-per-node 4 pretrain.py arch=trm data_paths="[data/maze-30x30-hard-1k]" epochs=50000 eval_interval=5000 lr=1e-4 arch.L_layers=2 arch.H_cycles=3 arch.L_cycles=4 +run_name=pretrain_att_maze30x30 ema=True
Running TRM on a Desktop
You don’t need a server rack or enterprise GPUs to explore Samsung’s Tiny Recursive Model.
The model has only 7 million parameters, which makes it ideal for experimentation even on a single-GPU desktop.
What counts as “average-good hardware”
You will need at least:
- At least 16 GB system RAM
- An SSD for dataset caching
- GPU at least as strong powerful as an NVIDIA RTX 3070
Even without multiple GPUs, TRM’s recursion approach means you can train smaller versions effectively.
Scaling TRM down for your PC
| Setting | Original (Lab) | Home-friendly |
|---|---|---|
| GPUs | 4× H100 | 1× RTX 3060–4090 |
| Batch size | 64–128 | 8–16 |
| arch.H_cycles | 3 | 2 |
| arch.L_cycles | 4–6 | 2–3 |
| Epochs | 50 000 | 5 000–10 000 (for tests) |
| Dataset size | 1 000–10 000 samples | 200–500 samples |
| Precision | FP32 | torch.autocast() (mixed FP16/FP32) |
Example command for single GPU:
bashpython pretrain.py arch=trm data_paths="[data/sudoku-extreme-small]" epochs=10000 eval_interval=1000 batch_size=8 lr=2e-4 arch.L_layers=2 arch.H_cycles=2 arch.L_cycles=3 +run_name=small_sudoku_test ema=False
This smaller configuration trains in 2–4 hours on an RTX 4080 and 6–10 hours on an RTX 3060—perfect for exploration.
How to build smaller datasets
You can create mini versions of the reasoning puzzles:
bashpython dataset/build_sudoku_dataset.py --output-dir data/sudoku-small --subsample-size 200 --num-aug 200
or for mazes:
bashpython dataset/build_maze_dataset.py --subsample-size 200
This keeps disk usage and training time low while preserving the recursive structure.
Tips for smooth runs
- Use mixed precision (
) to halve VRAM use.torch.autocast - Disable Weights & Biases logging if you just want local tests (
).wandb disabled - Set eval_interval high (e.g., 2000) to reduce validation overhead.
- Use SSDs — recursive models reread data often.
- Try Sudoku or Maze datasets first before ARC-AGI.
⚙️ Tips & Caveats
- Hardware: Expect high GPU utilization; scale down cycles for smaller hardware.
- Recursion depth: More cycles = higher reasoning quality but slower training.
- Use wandb: Great for tracking progress and comparing runs.
- Domain focus: Best for puzzles, reasoning, and symbolic logic—not general text.
- Experimentation encouraged: The repo is meant for open research and tinkering.

