Independent · Documented · Repeatable
Measure What Your AI Workloads Actually Need
Substrate Co helps engineering teams build and run fair benchmarks for AI accelerator selection — with methods written down, caveats stated, and results that reflect your own workloads rather than vendor figures.
What We Offer
Three Ways to Work with Substrate Co
Each service is scoped to a specific stage of the benchmarking process. Pick the one that fits where your team is today.
Workload Benchmark Setup
A guided session to help your team define fair, repeatable benchmarks for your own AI workloads. Methods and assumptions are documented openly, so results reflect real use.
- One structured session with your engineers
- Benchmark plan with documented assumptions
- Reporting outline for future comparisons
Comparative Benchmark Study
Runs your defined benchmarks across candidate configurations and presents results side by side. We avoid declaring a single winner — the data informs the decision, not us.
- Two-phase delivery with written report
- Results dataset with clear caveats
- Method note explaining each figure
Benchmarking Advisory Retainer
An ongoing arrangement to help your team maintain and refresh its benchmarking practice as workloads and hardware evolve. We act as an independent advisor, not a vendor.
- Three-month engagement with scheduled reviews
- Living method library maintained over time
- Review notes after each assessment cycle
Why It Matters
What Sets Our Approach Apart
No Commercial Stake
We do not sell hardware, cloud capacity, or software. Our findings are not shaped by vendor relationships — only by the method and the data.
Documented Methods
Every figure comes with a method note. Assumptions, warm-up procedures, and edge conditions are written down so you can reproduce or challenge the results.
Workload-First Design
Benchmarks are built around your actual inference or training tasks — not synthetic suites designed for press releases.
Side-by-Side Presentation
Results are shown in a comparison frame that lets your team read figures directly. No summary narrative that pre-decides the outcome for you.
Repeatable Process
Benchmark templates are yours to keep. Re-run them in six months when a new option appears and compare apples to apples.
Plain Communication
Reports are written in language an engineering lead can read directly. We do not pad deliverables with charts that obscure rather than inform.
Ready to Start
Have a Benchmarking Question?
Whether you are at the early stage of defining what to measure, or already have results that need a second set of eyes, we are glad to talk through the specifics with you.
Hardware We Benchmark
NVIDIA Accelerators & Our Benchmarking Approach
NVIDIA GPU Evaluation
NVIDIA's accelerator portfolio — from the H100 and H200 in data centre deployments to the L40S for inference-at-scale and the RTX series for on-premise workloads — covers a wide performance and cost range. Vendor specifications describe peak throughput under idealised conditions. What those figures mean for a specific team's inference pipeline or training loop is a separate question entirely.
Substrate Co designs benchmarks around the actual tasks: the model architecture, batch configuration, precision format, and memory footprint your workload uses in practice. We run against the hardware you are evaluating and document every condition that could affect how the number holds up in production.
H100 / H200
Data centre training & large-batch inference benchmarks
L40S / A100
Multi-model inference and throughput comparison studies
RTX Series
On-premise and edge deployment evaluation
Cross-vendor
NVIDIA vs AMD vs custom ASIC side-by-side studies
Our AI Benchmarking Tooling
Substrate Co has developed an internal measurement stack built specifically for AI workload evaluation. Rather than running standard suite tests that hardware vendors optimise for, our tooling measures the performance characteristics that matter for real deployment decisions: latency percentiles under load, throughput degradation at varying batch sizes, memory bandwidth saturation, and precision-mode trade-offs across FP16, BF16, and INT8.
Every measurement is paired with a method note that documents the run conditions. If the number is only valid at a particular batch size or driver version, that is stated explicitly. The goal is results your team can stand behind in a procurement review — not figures that look strong in a summary slide but fall apart under questions.
-
Latency & throughput measurement across concurrency levels
-
Precision format benchmarking: FP32, FP16, BF16, INT8
-
Memory bandwidth and VRAM utilisation profiling
-
Multi-GPU scaling and NVLink / PCIe topology tests
-
Framework-level profiling: PyTorch, TensorRT, vLLM, TGI
-
Results dataset provided in CSV alongside the written report
Why method notes matter for AI hardware decisions
An AI accelerator benchmark without a method note is a number without a context. Factors like driver version, CUDA toolkit, inference server configuration, and batch size can shift throughput figures by 20–40% between otherwise identical runs. Every Substrate Co deliverable includes a method note covering exactly these conditions — so your team knows not just what the number is, but when it holds and when it does not.
Discuss your evaluation needs →Common Questions
Frequently Asked
What does an AI workload benchmark actually measure?
A workload benchmark measures how a specific hardware configuration performs on tasks you actually run — such as a particular model size at a given batch size and precision. It records throughput, latency percentiles, memory utilisation, and power draw under conditions you define. The goal is a figure you can reproduce and compare, not a marketing headline.
How long does the Comparative Benchmark Study take?
The study runs across two phases. Phase one covers setup and a dry run with one configuration; phase two covers the remaining candidates and report writing. Elapsed time depends on how many configurations you are evaluating and how quickly we can access them, but most engagements complete within two to three weeks of the kick-off call.
Do we need to provide hardware access?
For the Comparative Benchmark Study and the Advisory Retainer, yes — we run benchmarks against the specific configurations you are evaluating, which means we need either direct access or a method to submit jobs. We discuss access arrangements during the scoping call and can work within most secure environments. For the Benchmark Setup service, hardware access is not required during the session itself.
Will the report tell us which option to choose?
We present data with caveats and context — not a verdict. Procurement decisions involve factors beyond benchmark numbers: budget, support terms, roadmap, and team familiarity. Our role is to give you a reliable data layer, not to remove the judgment call from your team.
What is included in the Advisory Retainer?
The retainer runs for three months and includes scheduled review sessions, updates to your method library when workloads or hardware options change, and written notes after each review. The idea is that benchmarking stays current rather than going stale between procurement cycles. Pricing is RM 3,020 for the full three-month engagement.
How do we get started?
Fill in the enquiry form below or call us directly. We will arrange a short scoping call — typically 30 minutes — to understand your workloads and evaluation timeline, then send a clear proposal with deliverables and timings.
Our Location
Find Us in Kuala Lumpur
Unit B-7-3, Megan Avenue II, 12 Jalan Yap Kwan Seng, 50450 Kuala Lumpur
Get in Touch
Send an Enquiry
Contact Details
Phone
+60 3-2166 5392Address
Unit B-7-3, Megan Avenue II
12 Jalan Yap Kwan Seng
50450 Kuala Lumpur, Malaysia
Working Hours
Monday – Friday: 9:00 AM – 6:00 PM MYT
Saturday: 10:00 AM – 2:00 PM MYT
Sunday: Closed
Note: For initial enquiries, a brief description of your AI workloads and the hardware options you are considering helps us prepare a more useful first response.