Skip to content

Pinned Loading

  1. OLMo OLMo Public

    Modeling, training, eval, and inference code for OLMo

    Python 6.3k 695

  2. dolma dolma Public

    Data and tools for generating and inspecting OLMo pre-training data.

    Python 1.4k 163

  3. ai2thor ai2thor Public

    An open-source platform for Visual AI.

    C# 1.6k 265

  4. olmocr olmocr Public

    Toolkit for linearizing PDFs for LLM datasets/training

    Python 16.7k 1.3k

  5. OLMoE OLMoE Public

    OLMoE: Open Mixture-of-Experts Language Models

    Jupyter Notebook 949 90

Repositories

Showing 10 of 540 repositories
  • FlexOlmo Public

    Code and training scripts for FlexOlmo

    allenai/FlexOlmo’s past year of commit activity
    Python 120 Apache-2.0 16 5 11 Updated Jan 11, 2026
  • open-instruct Public

    AllenAI's post-training codebase

    allenai/open-instruct’s past year of commit activity
    Python 3,518 Apache-2.0 485 10 (1 issue needs help) 43 Updated Jan 11, 2026
  • allenai/rslearn_projects’s past year of commit activity
    Python 17 Apache-2.0 7 15 6 Updated Jan 11, 2026
  • olmo-cookbook Public

    OLMost every training recipe you need to perform data interventions with the OLMo family of models.

    allenai/olmo-cookbook’s past year of commit activity
    Python 63 Apache-2.0 11 1 31 Updated Jan 11, 2026
  • OLMo-core Public

    PyTorch building blocks for the OLMo ecosystem

    allenai/OLMo-core’s past year of commit activity
    Python 681 Apache-2.0 119 6 48 Updated Jan 10, 2026
  • duplodocus Public

    Tooling for exact and MinHash deduplication of large-scale text datasets

    allenai/duplodocus’s past year of commit activity
    Rust 51 Apache-2.0 4 0 1 Updated Jan 9, 2026
  • olmoearth_pretrain Public

    Earth system foundation model data, training, and eval

    allenai/olmoearth_pretrain’s past year of commit activity
    Python 121 22 2 14 Updated Jan 9, 2026
  • olmocr Public

    Toolkit for linearizing PDFs for LLM datasets/training

    allenai/olmocr’s past year of commit activity
    Python 16,702 Apache-2.0 1,323 34 15 Updated Jan 9, 2026
  • beaker-gantry Public

    Gantry is a CLI that streamlines running experiments in Beaker

    allenai/beaker-gantry’s past year of commit activity
    Python 29 Apache-2.0 7 2 2 Updated Jan 9, 2026
  • rslearn Public

    A tool for developing remote sensing datasets and models.

    allenai/rslearn’s past year of commit activity
    Python 63 Apache-2.0 11 21 3 Updated Jan 9, 2026