Green AI in Practice: How I Track GPU Hours, Energy, CO₂, and Cost for Every ML Experiment

Practice Green AI by tracking GPU-hours, energy, and cost for every ML run, so you pick models that are not just accurate, but also cheaper, leaner, and greener.

Sai Teja Erukude

Feb. 13, 26 · Analysis

Likes (2)

Comment

Save

3.9K Views

Most data teams track Accuracy, Latency, and maybe GPU Utilization if someone is watching the dashboard. Almost no one tracks:

How many GPU-hours a model run consumed
How many kWh of electricity that implies
How much CO₂ and cloud spend are associated with each experiment

Once I started paying attention to these metrics, it completely changed how I design and run experiments.

For me, this is what Green AI really means — not just “AI that helps the climate,” but AI that is affordable, accessible, efficient, and low-cost. Recent Green AI research shows that energy, carbon, and efficiency are now first-class concerns in ML research and practice, not just an afterthought.

We live in a world where:

US data centers consume 4%+ of national electricity,
Carbon intensity is often higher than the grid average, and
A100/H100 GPUs still cost $1–$4+ per GPU-hour in the cloud.

GPUs are cheaper per FLOP than ever, but at scale, they’re still expensive. Ignoring that at the experiment level is a choice.

This article is my playbook for treating GPU hours, energy, carbon, and dollars as core ML metrics — right alongside accuracy and F1.

What to Expect

This is a practical guide for data scientists and ML engineers who want to:

Measure GPU time, energy use, and CO₂ per experiment
Translate those metrics into actual dollars
Use that information to choose better models, tune smarter, and ship greener systems

No philosophy — just concrete metrics, tools, and patterns I’ve actually used.

Why Green AI Is a Data Science Problem, Not Just DevOps

Much of “Green Software” and “Green DevOps” focuses on:

Efficient CI/CD pipelines
Cloud-based deployment architectures
Renewable energy and data center power sources

All of that matters. But as a data scientist/ML engineer, you decide:

How big your models are
How long you train them
How aggressive your hyperparameter search is
Whether you use a huge LLM or a compact model

I’ve seen these choices create 10–100× differences in energy and cost for the same business value.

Studies show that model selection, training strategy, and hyperparameter search are some of the most impactful ways to reduce energy consumption in Green AI.

Once I started instrumenting my work, I was able to:

Stop running wasteful experiments
Justify model choices with cost + carbon + accuracy, not just leaderboard scores
Make AI accessible to teams without infinite GPU budgets

It all starts with a few simple metrics.

Metrics I Now Log for Every ML Experiment

You don’t need to become an energy engineer. At the experiment level, these four metrics are enough:

GPU/CPU time (resource usage)
Energy (kWh)
CO₂ emissions (kg)
Cloud cost ($)

Here's how I define them in practice.

1. GPU time: The simplest “green” metric

For each experiment:

gpu_hours = num_gpus × training_time_hours
cpu_hours, if CPU-heavy (traditional ML, data preprocessing)

Most cloud providers expose runtime info; you can also derive it from training logs.

2. From time to energy (kWh)

Energy can be estimated as:

Energy (kWh) = Average Power (W) × Time (hours) / 1000

Example: GPU drawing ~300 W for 5 hours:

Energy ≈ 300 W × 5 h / 1000 = 1.5 kWh

You can measure or estimate power using:

Data center telemetry/vendor tooling (e.g., NVIDIA DCGM)
Open-source libraries like CodeCarbon, which convert usage to kWh

Validation studies on tools like CodeCarbon and other “ML impact calculators” show they’re not perfect for absolute values, but they’re good enough for comparative analysis:

"Experiment A vs. B - which one is cheaper and cleaner for similar performance?"

That's usually what I care about.

3. From energy to CO₂ (kgCO₂)

Multiply energy by your grid’s carbon intensity:

CO₂ (kg) ≈ Energy (kWh) × CI (kgCO₂/kWh)

Rough ranges:

US average: 0.38–0.40 kgCO₂/kWh
Hydropower-heavy grids (Norway): a few grams per kWh
Coal-heavy grids: much higher

Tools like CodeCarbon embed region-specific intensities, so you often don’t need to calculate manually.

4. From time to dollars ($)

For cloud GPUs, the cost is straightforward:

Cost ($) = GPU_hours × price_per_GPU_hour

Typical on-demand costs:

A100 80GB: ~$1–$3/hour
H100: ~$3–$8/hour, depending on provider and discounts

Spot/discounted instances can be much cheaper, but also less reliable. Even rough cost tracking is enough to show that brute-force grid searches aren’t free.

Once you start tagging experiments with $, some "fun" ideas suddenly look much less fun.

Tooling I Use: CodeCarbon and Friends

There's a small ecosystem of tools for estimating emissions from ML workloads. The one I tend to reach for is:

CodeCarbon: A Python library to determine the power consumption of CPU/GPU/RAM and convert it into CO₂ emission based on your location and grid intensity, and integrates with experiment trackers such as Comet.
CarbonTracker, experiment-impact-tracker, Green Algorithms, MLCO2 - have been reviewed in recent energy footprint studies of NLP and ML workloads.

Most of them follow the same pattern:

Hook into your training script
Estimate power use and sample hardware utilization
Add up the total kWh and CO₂
Write results to a log file, JSON, or experiment tracker.

Here’s a minimal CodeCarbon integration similar to what I use.

Minimal Example: Logging Energy and CO₂ for a Training Run

Install CodeCarbon: pip install codecarbon

Wrap your training script:

    Python
   
 

   from codecarbon import EmissionsTracker
import torch
from torch.utils.data import DataLoader

from my_model import MyModel, MyDataset

def train_one_epoch(model, dataloader, optimizer, device):
    model.train()
    for batch in dataloader:
        inputs, targets = batch
        inputs, targets = inputs.to(device), targets.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = torch.nn.functional.cross_entropy(outputs, targets)
        loss.backward()
        optimizer.step()

def main():
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    model = MyModel().to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
    train_loader = DataLoader(MyDataset("train"), batch_size=64, shuffle=True)

    # 1) Start tracker (set your region if you know it)
    tracker = EmissionsTracker(
        project_name="resnet_baseline",
        country_iso_code="USA",  # or specific country/region
    )
    tracker.start()

    num_epochs = 10
    for epoch in range(num_epochs):
        train_one_epoch(model, train_loader, optimizer, device)
        print(f"Finished epoch {epoch+1}/{num_epochs}")

    # 2) Stop tracker and get emissions in kgCO2
    emissions_kg = tracker.stop()
    print(f"Estimated emissions: {emissions_kg:.4f} kg CO2")

if __name__ == "__main__":
    main()
  

What you get:

Total energy (kWh) and CO₂ (kg) written to a CSV/JSON log
Optional integration with Comet/MLflow dashboards if you configure it

In my experiment tracker, I typically store:

experiment_id
model_type, dataset, hyperparameters
accuracy/metric
gpu_hours, kWh, kgCO₂, cost_estimate

…in one place, so I can compare runs on both performance and efficiency.

Turning Metrics Into Decisions: Doing More With Less

Once you log energy and cost, you can stop arguing purely about accuracy and start comparing “accuracy per unit of cost/emissions”.

Say you run three models on the same task:

Model	Metric (F1)	GPU-hours	kWh	CO₂ (kg)	Cost ($)
Baseline XGBoost	0.89	0.2	0.05	0.02	0.20
Medium CNN	0.91	2.0	0.60	0.23	4.00
Large Transformer	0.92	12.0	3.90	1.50	24.00

(Illustrative numbers only.)

From an accuracy-only standpoint, the large transformer wins (0.92 vs 0.91). But with Green AI lenses on, I find myself asking:

Is +0.01 F1 worth 6× more energy and money compared to the medium model?
If the medium CNN already meets the business requirements, is the large transformer actually justified?

This is the core mindset shift: go from "best metric at any cost" to "best trade-off of performance vs cost/emissions."

Hyperparameter Search: The Silent Emitter

One of the biggest silent emitters I’ve seen in practice is hyperparameter search, especially:

Massive grid searches over huge parameter ranges
Blindly rerunning dozens or hundreds of trials for marginal gains

Energy-aware ML surveys consistently call this out as a primary source of wasted compute.

Patterns I've found useful:

Coarse-to-Fine Search: Use a smaller initial budget to narrow down the area.
Prefer Random search or Bayesian optimization over huge grid searches.
Use early stopping and learning curve extrapolation to kill bad searches quickly.

If you log energy per trial, you can literally answer:

"What hyperparameters are driving most of our emissions?"
"Did those extra 20 trials meaningfully improve the best score?"

Often, the answer is "no."

A Practical Green AI Checklist

Here’s a checklist based on how I introduce Green AI into an existing workflow.

Start measuring
- Pick one tool (e.g., CodeCarbon) and wrap 1–2 training scripts.
- Log energy_kwh, emissions_kg, gpu_hours to your experiment tracker.
Instrument cost
- Hardcode or fetch your GPU hourly price.
- Add a simple cost calculation: cost = gpu_hours × price_per_hour
Baseline your current workloads
- For your top 3 models, record accuracy + kWh + $ per run
Optimize low-hanging fruit
- Replace brute-force grid search with smarter hyperparameter search
- Try a smaller architecture or distilled variant for at least one workflow
Set guardrails
- Define a maximum budget per experiment (kWh or $).
- Require a justification when proposing a new model that is ≥5× more expensive than the current baseline.
Make it part of design reviews
- When proposing a new model, always show: "metric, GPU-hours, kWh, cost."
- Encourage teams to pick configurations on the Pareto frontier: best trade-offs of performance vs cost/emissions.

Final Thoughts

For me, Green AI is not about banning large models or pretending GPUs are evil. It's about:

Evaluating your model's actual consumption
Comparing the performance of alternatives
Designing workflows that provide good enough performance with far less energy and expense

As a data scientist, you already optimize for accuracy, generalization, and robustness. Adding GPU-hours, kWh, CO₂, and dollars to your metrics:

Makes you a better engineer
Gives you harder evidence when talking to stakeholders worried about budgets and sustainability
And, yes, genuinely makes your systems greener

Once you see a run consume 10+ GPU-hours for a tiny metric bump, it becomes very natural to ask:

"Is this extra compute actually worth it?"

That question is where Green AI really starts :)

Learned something new? Tap that like button and pass it on!

AI Data science COS (operating system)

Opinions expressed by DZone contributors are their own.

Related

Trending