Green AI in Practice: How I Track GPU Hours, Energy, CO₂, and Cost for Every ML Experiment
Practice Green AI by tracking GPU-hours, energy, and cost for every ML run, so you pick models that are not just accurate, but also cheaper, leaner, and greener.
Join the DZone community and get the full member experience.
Join For FreeMost data teams track Accuracy, Latency, and maybe GPU Utilization if someone is watching the dashboard. Almost no one tracks:
- How many GPU-hours a model run consumed
- How many kWh of electricity that implies
- How much CO₂ and cloud spend are associated with each experiment
Once I started paying attention to these metrics, it completely changed how I design and run experiments.
For me, this is what Green AI really means — not just “AI that helps the climate,” but AI that is affordable, accessible, efficient, and low-cost. Recent Green AI research shows that energy, carbon, and efficiency are now first-class concerns in ML research and practice, not just an afterthought.
We live in a world where:
- US data centers consume 4%+ of national electricity,
- Carbon intensity is often higher than the grid average, and
- A100/H100 GPUs still cost $1–$4+ per GPU-hour in the cloud.
GPUs are cheaper per FLOP than ever, but at scale, they’re still expensive. Ignoring that at the experiment level is a choice.
This article is my playbook for treating GPU hours, energy, carbon, and dollars as core ML metrics — right alongside accuracy and F1.
What to Expect
This is a practical guide for data scientists and ML engineers who want to:
- Measure GPU time, energy use, and CO₂ per experiment
- Translate those metrics into actual dollars
- Use that information to choose better models, tune smarter, and ship greener systems
No philosophy — just concrete metrics, tools, and patterns I’ve actually used.
Why Green AI Is a Data Science Problem, Not Just DevOps
Much of “Green Software” and “Green DevOps” focuses on:
- Efficient CI/CD pipelines
- Cloud-based deployment architectures
- Renewable energy and data center power sources
All of that matters. But as a data scientist/ML engineer, you decide:
- How big your models are
- How long you train them
- How aggressive your hyperparameter search is
- Whether you use a huge LLM or a compact model
I’ve seen these choices create 10–100× differences in energy and cost for the same business value.
Studies show that model selection, training strategy, and hyperparameter search are some of the most impactful ways to reduce energy consumption in Green AI.
Once I started instrumenting my work, I was able to:
- Stop running wasteful experiments
- Justify model choices with cost + carbon + accuracy, not just leaderboard scores
- Make AI accessible to teams without infinite GPU budgets
It all starts with a few simple metrics.
Metrics I Now Log for Every ML Experiment
You don’t need to become an energy engineer. At the experiment level, these four metrics are enough:
- GPU/CPU time (resource usage)
- Energy (kWh)
- CO₂ emissions (kg)
- Cloud cost ($)
Here's how I define them in practice.
1. GPU time: The simplest “green” metric
For each experiment:
gpu_hours = num_gpus × training_time_hourscpu_hours, if CPU-heavy (traditional ML, data preprocessing)
Most cloud providers expose runtime info; you can also derive it from training logs.
2. From time to energy (kWh)
Energy can be estimated as:
Energy (kWh) = Average Power (W) × Time (hours) / 1000
Example: GPU drawing ~300 W for 5 hours:
Energy ≈ 300 W × 5 h / 1000 = 1.5 kWh
You can measure or estimate power using:
- Data center telemetry/vendor tooling (e.g., NVIDIA DCGM)
- Open-source libraries like CodeCarbon, which convert usage to kWh
Validation studies on tools like CodeCarbon and other “ML impact calculators” show they’re not perfect for absolute values, but they’re good enough for comparative analysis:
"Experiment A vs. B - which one is cheaper and cleaner for similar performance?"
That's usually what I care about.
3. From energy to CO₂ (kgCO₂)
Multiply energy by your grid’s carbon intensity:
CO₂ (kg) ≈ Energy (kWh) × CI (kgCO₂/kWh)
Rough ranges:
- US average: 0.38–0.40 kgCO₂/kWh
- Hydropower-heavy grids (Norway): a few grams per kWh
- Coal-heavy grids: much higher
Tools like CodeCarbon embed region-specific intensities, so you often don’t need to calculate manually.
4. From time to dollars ($)
For cloud GPUs, the cost is straightforward:
Cost ($) = GPU_hours × price_per_GPU_hour
Typical on-demand costs:
- A100 80GB: ~$1–$3/hour
- H100: ~$3–$8/hour, depending on provider and discounts
Spot/discounted instances can be much cheaper, but also less reliable. Even rough cost tracking is enough to show that brute-force grid searches aren’t free.
Once you start tagging experiments with $, some "fun" ideas suddenly look much less fun.
Tooling I Use: CodeCarbon and Friends
There's a small ecosystem of tools for estimating emissions from ML workloads. The one I tend to reach for is:
- CodeCarbon: A Python library to determine the power consumption of CPU/GPU/RAM and convert it into CO₂ emission based on your location and grid intensity, and integrates with experiment trackers such as Comet.
- CarbonTracker, experiment-impact-tracker, Green Algorithms, MLCO2 - have been reviewed in recent energy footprint studies of NLP and ML workloads.
Most of them follow the same pattern:
- Hook into your training script
- Estimate power use and sample hardware utilization
- Add up the total kWh and CO₂
- Write results to a log file, JSON, or experiment tracker.
Here’s a minimal CodeCarbon integration similar to what I use.
Minimal Example: Logging Energy and CO₂ for a Training Run
Install CodeCarbon: pip install codecarbon
Wrap your training script:
from codecarbon import EmissionsTracker
import torch
from torch.utils.data import DataLoader
from my_model import MyModel, MyDataset
def train_one_epoch(model, dataloader, optimizer, device):
model.train()
for batch in dataloader:
inputs, targets = batch
inputs, targets = inputs.to(device), targets.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = torch.nn.functional.cross_entropy(outputs, targets)
loss.backward()
optimizer.step()
def main():
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
train_loader = DataLoader(MyDataset("train"), batch_size=64, shuffle=True)
# 1) Start tracker (set your region if you know it)
tracker = EmissionsTracker(
project_name="resnet_baseline",
country_iso_code="USA", # or specific country/region
)
tracker.start()
num_epochs = 10
for epoch in range(num_epochs):
train_one_epoch(model, train_loader, optimizer, device)
print(f"Finished epoch {epoch+1}/{num_epochs}")
# 2) Stop tracker and get emissions in kgCO2
emissions_kg = tracker.stop()
print(f"Estimated emissions: {emissions_kg:.4f} kg CO2")
if __name__ == "__main__":
main()
What you get:
- Total energy (kWh) and CO₂ (kg) written to a CSV/JSON log
- Optional integration with Comet/MLflow dashboards if you configure it
In my experiment tracker, I typically store:
experiment_idmodel_type,dataset,hyperparametersaccuracy/metricgpu_hours,kWh,kgCO₂,cost_estimate
…in one place, so I can compare runs on both performance and efficiency.
Turning Metrics Into Decisions: Doing More With Less
Once you log energy and cost, you can stop arguing purely about accuracy and start comparing “accuracy per unit of cost/emissions”.
Say you run three models on the same task:
|
Model
|
Metric (F1)
|
GPU-hours
|
kWh
|
CO₂ (kg)
|
Cost ($)
|
|---|---|---|---|---|---|
|
Baseline XGBoost
|
0.89
|
0.2
|
0.05
|
0.02
|
0.20
|
|
Medium CNN
|
0.91
|
2.0
|
0.60
|
0.23
|
4.00
|
|
Large Transformer
|
0.92
|
12.0
|
3.90
|
1.50
|
24.00
|
(Illustrative numbers only.)
From an accuracy-only standpoint, the large transformer wins (0.92 vs 0.91). But with Green AI lenses on, I find myself asking:
- Is +0.01 F1 worth 6× more energy and money compared to the medium model?
- If the medium CNN already meets the business requirements, is the large transformer actually justified?
This is the core mindset shift: go from "best metric at any cost" to "best trade-off of performance vs cost/emissions."
Hyperparameter Search: The Silent Emitter
One of the biggest silent emitters I’ve seen in practice is hyperparameter search, especially:
- Massive grid searches over huge parameter ranges
- Blindly rerunning dozens or hundreds of trials for marginal gains
Energy-aware ML surveys consistently call this out as a primary source of wasted compute.
Patterns I've found useful:
- Coarse-to-Fine Search: Use a smaller initial budget to narrow down the area.
- Prefer Random search or Bayesian optimization over huge grid searches.
- Use early stopping and learning curve extrapolation to kill bad searches quickly.
If you log energy per trial, you can literally answer:
- "What hyperparameters are driving most of our emissions?"
- "Did those extra 20 trials meaningfully improve the best score?"
Often, the answer is "no."
A Practical Green AI Checklist
Here’s a checklist based on how I introduce Green AI into an existing workflow.
- Start measuring
- Pick one tool (e.g., CodeCarbon) and wrap 1–2 training scripts.
- Log
energy_kwh,emissions_kg,gpu_hoursto your experiment tracker.
- Instrument cost
- Hardcode or fetch your GPU hourly price.
- Add a simple cost calculation:
cost = gpu_hours × price_per_hour
- Baseline your current workloads
- For your top 3 models, record accuracy + kWh + $ per run
- Optimize low-hanging fruit
- Replace brute-force grid search with smarter hyperparameter search
- Try a smaller architecture or distilled variant for at least one workflow
- Set guardrails
- Define a maximum budget per experiment (kWh or $).
- Require a justification when proposing a new model that is ≥5× more expensive than the current baseline.
- Make it part of design reviews
- When proposing a new model, always show: "metric, GPU-hours, kWh, cost."
- Encourage teams to pick configurations on the Pareto frontier: best trade-offs of performance vs cost/emissions.
Final Thoughts
For me, Green AI is not about banning large models or pretending GPUs are evil. It's about:
- Evaluating your model's actual consumption
- Comparing the performance of alternatives
- Designing workflows that provide good enough performance with far less energy and expense
As a data scientist, you already optimize for accuracy, generalization, and robustness. Adding GPU-hours, kWh, CO₂, and dollars to your metrics:
- Makes you a better engineer
- Gives you harder evidence when talking to stakeholders worried about budgets and sustainability
- And, yes, genuinely makes your systems greener
Once you see a run consume 10+ GPU-hours for a tiny metric bump, it becomes very natural to ask:
"Is this extra compute actually worth it?"
That question is where Green AI really starts :)
Learned something new? Tap that like button and pass it on!
Opinions expressed by DZone contributors are their own.
Comments