High-Performance Computing Code Generation using LLMs

High-Performance Computing Code Generation using LLMs

TLDR: we achieved competitive performance with 67x less memory than frontier models

The Challenge

Writing high-performance computing (HPC) code is notoriously difficult. From parallel algorithms using MPI to complex mathematical computations, developers need deep expertise in both domain knowledge and low-level optimization. 

Large Language Models (LLMs) have shown promise in code generation, but the best performing models require massive computational resources - often 1TB+ of VRAM - making them inaccessible to most system engineers and architects. They often have to trade off performance against the privacy of their proprietary code by exposing it to LLM providers. We propose an alternative approach, H2LooP v0.1 SLMs (Small Language Models) that are designed to achieve State-of-the-art performance and can run on consumer-grade hardware, on-premise.

H2LooP Models

We developed two specialized models designed specifically for HPC code generation:

  • H2LooP v0.1-coder (32B parameters, 19GB VRAM)
  • H2LooP v0.1-multimodal (27B parameters, 20GB VRAM)

These models can run on consumer hardware like RTX 4090 or A6000 GPUs, yet compete with models requiring 50-100x more resources.

Benchmark Results

We evaluated 11 state-of-the-art LLMs on 120 HPC programming challenges using the
covering domains from dense linear algebra to parallel sorting algorithms. The problem set is based on the ParEval benchmark.

Build Success Rate (Code Compilation)

We evaluated model outputs on several factors, including build success and ability to pass the tests and H2Loop v0.1 models comfortably beat other similarly sized models. For example, for build success:

Similar results were observed on validation success:


Performance comparison with similarly-sized Models:

  • H2LooP v0.1-multimodal: 71.67% ✅
  • H2LooP v0.1-coder: 61.67% ✅
  • Deepseek Coder v2-lite: 35.83%
  • Llama 3.3 8B: 6.67%
  • Mistral Nemo: 5.83%


Performance comparison against large SOTA Models:

  • Gemini Pro: 91.67%
  • Kimi K2 (1T parameters): 90.83%
  • H2LooP v0.1-multimodal: 71.67% ✅
  • H2LooP v0.1-coder: 61.67% ✅
  • Qwen 2.5 Coder (32B): 60.00%
  • Llama-4 Maverick (400B): 36.67%

The Efficiency Breakthrough

The main focus while building H2Loop models is on performance. Measured as accuracy on passing all functional tests curated by experts per GB VRAM consumed

  • H2LooP v0.1-coder: 3.25% per GB (baseline)
  • Gemma-3 27B: 1.44% per GB (2.3x less efficient)
  • Qwen 2.5 Coder: 0.67% per GB (4.8x less efficient)
  • Kimi-K2: 0.089% per GB (36x less efficient)
  • Llama-4 Maverick: 0.029% per GB (112x less efficient)

What This Means

Our research demonstrates that specialized, efficiently designed models can achieve 60-80% of frontier model performance while using 50-100x fewer resources. This has profound implications:

For Researchers

  • Run cutting-edge code generation on university hardware
  • Rapid prototyping without cloud computing costs
  • Local development with no data privacy concerns

For Industry

  • Deploy AI coding assistants on edge devices
  • Reduce infrastructure costs by orders of magnitude
  • Enable real-time code generation in resource-constrained environments

Technical Insights

Why H2LooP Works:

  1. Domain Specialization: Purpose-built for computational code rather than general chat
  2. Efficient Architecture: Optimized transformer design with quantization-aware training
  3. Quality Data: Focused training on high-quality HPC code examples
  4. Balanced Trade-offs: Optimized for the sweet spot between performance and efficiency

Parallel Programming Excellence:

  • 56.7% success rate in MPI parallel code generation
  • Competitive speedups (1.45x-1.93x) when scaling to multiple processes.
  • Strong performance across computational domains from linear algebra to graph algorithms

The Road Ahead

This research challenges conventional wisdom about the relationship between model size and performance. By focusing on efficiency and specialization, we’ve shown that smaller, purpose-built models can deliver results comparable to SOTA models, while consuming orders of magnitude less compute resources.

The future of AI coding assistance isn’t just about bigger models - it’s about smarter, more efficient ones that every systems team can use, tailored to their use-case.

Our AI Research focus in coming months is going to be on surpassing SOTA performance on Systems Engineering Domain with our small LLMs. If you are interested in joining our mission, please reach out to careers@h2loop.ai.

Technical Details: Full methodology and results available in our comprehensive research report. https://github.com/h2loop/evals-public/blob/main/REPORTS/r1/parallel_code_gen_report.md

Other blogs