Sunday, Nov 23, 2025
Menu

Small, Smart, and Open Source: Training your own TRM Model

Small, Smart, and Open Source: Training your own TRM Model

Summary: What Samsung’s TRM does and why it matters

In a recent development out of Samsung’s AI lab in Montreal, researchers have introduced a new “Tiny Recursive Model” (TRM) that challenges the prevailing notion that more parameters = more intelligence.)

Key ideas & claims

  • Tiny footprint: TRM has only ~7 million parameters—orders of magnitude smaller than typical large language or reasoning models.
  • Recursive refinement: Instead of doing a single large forward pass, TRM repeatedly “asks itself” whether its current answer can be improved. It refines latent states and outputs over multiple steps.
  • Deep supervision + adaptive halting: The training regime provides feedback at intermediate steps (not just the final one), and the model learns to decide when to stop refining.
  • Stronger generalization on structured reasoning tasks: On benchmarks involving logic puzzles, mazes, and abstract reasoning (such as ARC‑AGI, Sudoku-Extreme, Maze-Hard), TRM matches or surpasses much larger models. For example, it achieves ~87% accuracy on Sudoku-Extreme and ~85% on Maze-Hard.
  • Simplicity over complexity: TRM is a simplification of a prior model called HRM (Hierarchical Reasoning Model), removing the extra networks, hierarchical structure, and more complex mathematics, while still improving performance.
  • Practical significance: Because of its small size, TRM can run on modest hardware and with lower energy costs. This opens doors for small labs, edge devices, or privacy-sensitive environments where massive models are impractical.

However, some caveats to keep in mind:

  • TRM is currently tested on narrow, structured reasoning tasks (puzzles, mazes, symbolic grids) rather than general language or multimodal tasks.
  • Its performance on open-ended text, common-sense reasoning, or large-scale generative problems is not established.
  • The approach is relatively new and will need replication and extension to validate robustness across domains.

Overall, TRM offers intriguing evidence that architectural cleverness (recursion + refinement) might substitute for brute-force scale—at least in specialized reasoning domains.


How to run TRM yourself: step‑by‑step

If you’d like to experiment with TRM, the authors have made the code open source (GitHub) and provided instructions. Below is a distilled, annotated version of how to get it running and train it.

Prerequisites / environment

  • Python 3.10 (or similar)
  • GPU support (CUDA ~12.6)
  • pip, torch, torchvision, torchaudio
  • Optional: Weights & Biases for experiment tracking

1. Clone the repository

bash
git clone https://github.com/SamsungSAILMontreal/TinyRecursiveModels.git cd TinyRecursiveModels

2. Install dependencies

bash
pip install --upgrade pip wheel setuptools pip install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126 pip install -r requirements.txt pip install --no-cache-dir --no-build-isolation adam-atan2 wandb login YOUR_API_KEY # optional

3. Prepare datasets

ARC‑AGI-1 / ARC‑AGI-2:

bash
python -m dataset.build_arc_dataset --input-file-prefix kaggle/combined/arc-agi --output-dir data/arc1concept-aug-1000 --subsets training evaluation concept --test-set-name evaluation

Sudoku‑Extreme:

bash
python dataset/build_sudoku_dataset.py --output-dir data/sudoku-extreme-1k-aug-1000 --subsample-size 1000 --num-aug 1000

Maze‑Hard:

bash
python dataset/build_maze_dataset.py

4. Run experiments

ARC‑AGI-1 (4 GPUs):

bash
torchrun --nproc-per-node 4 pretrain.py arch=trm data_paths="[data/arc1concept-aug-1000]" arch.L_layers=2 arch.H_cycles=3 arch.L_cycles=4 +run_name=pretrain_att_arc1concept_4 ema=True

Sudoku‑Extreme (single GPU):

bash
python pretrain.py arch=trm data_paths="[data/sudoku-extreme-1k-aug-1000]" epochs=50000 eval_interval=5000 lr=1e-4 arch.mlp_t=True arch.pos_encodings=none arch.L_layers=2 arch.H_cycles=3 arch.L_cycles=6 +run_name=pretrain_mlp_t_sudoku ema=True

Maze‑Hard (4 GPUs):

bash
torchrun --nproc-per-node 4 pretrain.py arch=trm data_paths="[data/maze-30x30-hard-1k]" epochs=50000 eval_interval=5000 lr=1e-4 arch.L_layers=2 arch.H_cycles=3 arch.L_cycles=4 +run_name=pretrain_att_maze30x30 ema=True

Running TRM on a Desktop

You don’t need a server rack or enterprise GPUs to explore Samsung’s Tiny Recursive Model.
The model has only 7 million parameters, which makes it ideal for experimentation even on a single-GPU desktop.

What counts as “average-good hardware”

You will need at least:

  • At least 16 GB system RAM
  • An SSD for dataset caching
  • GPU at least as strong powerful as an NVIDIA RTX 3070

Even without multiple GPUs, TRM’s recursion approach means you can train smaller versions effectively.

Scaling TRM down for your PC

SettingOriginal (Lab)Home-friendly
GPUs4× H1001× RTX 3060–4090
Batch size64–1288–16
arch.H_cycles32
arch.L_cycles4–62–3
Epochs50 0005 000–10 000 (for tests)
Dataset size1 000–10 000 samples200–500 samples
PrecisionFP32torch.autocast() (mixed FP16/FP32)

Example command for single GPU:

bash
python pretrain.py arch=trm data_paths="[data/sudoku-extreme-small]" epochs=10000 eval_interval=1000 batch_size=8 lr=2e-4 arch.L_layers=2 arch.H_cycles=2 arch.L_cycles=3 +run_name=small_sudoku_test ema=False

This smaller configuration trains in 2–4 hours on an RTX 4080 and 6–10 hours on an RTX 3060—perfect for exploration.

How to build smaller datasets

You can create mini versions of the reasoning puzzles:

bash
python dataset/build_sudoku_dataset.py --output-dir data/sudoku-small --subsample-size 200 --num-aug 200

or for mazes:

bash
python dataset/build_maze_dataset.py --subsample-size 200

This keeps disk usage and training time low while preserving the recursive structure.

Tips for smooth runs

  • Use mixed precision (
    torch.autocast
    )
    to halve VRAM use.
  • Disable Weights & Biases logging if you just want local tests (
    wandb disabled
    ).
  • Set eval_interval high (e.g., 2000) to reduce validation overhead.
  • Use SSDs — recursive models reread data often.
  • Try Sudoku or Maze datasets first before ARC-AGI.

⚙️ Tips & Caveats

  • Hardware: Expect high GPU utilization; scale down cycles for smaller hardware.
  • Recursion depth: More cycles = higher reasoning quality but slower training.
  • Use wandb: Great for tracking progress and comparing runs.
  • Domain focus: Best for puzzles, reasoning, and symbolic logic—not general text.
  • Experimentation encouraged: The repo is meant for open research and tinkering.

Reference:
“Less is More: Recursive Reasoning with Tiny Networks” – Alexia Jolicoeur‑Martineau, Samsung SAIL Montreal (2025)

UP NEXT…

Another article title

An article summary goes here.

Another article title