TLDR: we achieved competitive performance with 67x less memory than frontier models
Writing high-performance computing (HPC) code is notoriously difficult. From parallel algorithms using MPI to complex mathematical computations, developers need deep expertise in both domain knowledge and low-level optimization.
Large Language Models (LLMs) have shown promise in code generation, but the best performing models require massive computational resources - often 1TB+ of VRAM - making them inaccessible to most system engineers and architects. They often have to trade off performance against the privacy of their proprietary code by exposing it to LLM providers. We propose an alternative approach, H2LooP v0.1 SLMs (Small Language Models) that are designed to achieve State-of-the-art performance and can run on consumer-grade hardware, on-premise.
We developed two specialized models designed specifically for HPC code generation:
These models can run on consumer hardware like RTX 4090 or A6000 GPUs, yet compete with models requiring 50-100x more resources.
We evaluated 11 state-of-the-art LLMs on 120 HPC programming challenges using the
covering domains from dense linear algebra to parallel sorting algorithms. The problem set is based on the ParEval benchmark.
We evaluated model outputs on several factors, including build success and ability to pass the tests and H2Loop v0.1 models comfortably beat other similarly sized models. For example, for build success:
Similar results were observed on validation success:
Performance comparison with similarly-sized Models:
Performance comparison against large SOTA Models:
Our research demonstrates that specialized, efficiently designed models can achieve 60-80% of frontier model performance while using 50-100x fewer resources. This has profound implications:
Why H2LooP Works:
Parallel Programming Excellence:
This research challenges conventional wisdom about the relationship between model size and performance. By focusing on efficiency and specialization, we’ve shown that smaller, purpose-built models can deliver results comparable to SOTA models, while consuming orders of magnitude less compute resources.
The future of AI coding assistance isn’t just about bigger models - it’s about smarter, more efficient ones that every systems team can use, tailored to their use-case.
Our AI Research focus in coming months is going to be on surpassing SOTA performance on Systems Engineering Domain with our small LLMs. If you are interested in joining our mission, please reach out to careers@h2loop.ai.
Technical Details: Full methodology and results available in our comprehensive research report. https://github.com/h2loop/evals-public/blob/main/REPORTS/r1/parallel_code_gen_report.md