Github Repository: https://github.com/tobimichigan/An-End-to-End-Carbon-Tracked-Pipeline-for-Sustainable-Remote-Sensing
Abstract
We introduce and evaluate an end-to-end carbon-aware machine learning pipeline tailored for remote-sensing land-use classification that combines engineering practices, lightweight architectures, and explicit environmental accounting to produce reproducible, low-impact workflows. The work is motivated by the observation that modern computer vision gains are often obtained at disproportionate environmental cost; our pipeline demonstrates how careful systems design (I/O, numeric precision, model capacity) can yield strong predictive performance with substantially lower operational energy and CO₂e. The experiments are conducted on the UC Merced Land-Use dataset (21 classes, small-scale but diverse urban/landcover samples) with full documentation of training histories, ROC/PR diagnostics, and CodeCarbon/empirical footprint logs in the project repository.
The pipeline comprises four core components. First, a memory-efficient data loader performs chunked reading and on-the-fly RGB resizing (configurable image sizes: 64–128 px), using float16 where safe to reduce RAM and bandwidth. Second, extensible feature engineering (mean/std RGB, gradient statistics, texture features) plus PCA provides a low-dimensional descriptive baseline for classical learners (Random Forest, Gradient Boosting), enabling head-to-head comparisons with CNNs under similar compute budgets. Third, we develop and benchmark several lightweight CNN architectures a micro-CNN designed from first principles for low FLOPs, an EfficientNetB0/MobileNetV2 transfer-learning variant, and a larger baseline CNN integrating training best practices (early stopping, ReduceLROnPlateau, targeted augmentation). Fourth, carbon and resource tracing is embedded throughout: a CodeCarbon process tracker and an independent CarbonTracker estimate instantaneous power, kWh, water usage proxies and CO₂e, producing standardized reports and environmental equivalences for each experiment.
We evaluate models using stratified holdout sets and a battery of diagnostics (confusion matrices; per-class precision/recall; macro-F1; ROC and PR curves, one-vs-rest) and then map model predictions onto domain-relevant sustainability categories (e.g., ecosystem, water, transport, energy) to quantify actionable impact. This step illustrates how classification throughput improvements translate into real-world decision acceleration (faster detection of deforestation, urban sprawl, flood-risk areas) and therefore indirect environmental benefits beyond reduced training footprints.
Our empirical findings indicate that (i) engineered lightweight models and transfer learning often capture the majority of useful signal with a fraction of energy consumption compared to large baselines; (ii) chunked I/O and float16 arithmetic are effective, low-cost interventions to reduce memory pressure and enable local-compute experiments without cloud GPUs; and (iii) embedding standardized carbon reporting in ML experiments produces actionable metrics that can reshape model selection and reporting norms. We conclude with reproducible code, recommended reporting templates for environmental metrics in vision research, and a discussion of how academic venues and funders could encourage Green AI transparency.
Keywords
Sustainable ML pipeline, carbon accounting, chunked data loading, transfer learning, remote sensing
Title: An End-to-End Carbon-Tracked Pipeline for Sustainable Remote-Sensing: From Chunked Loading to Environmental Impact Quantification
Author(s): Oluwatobi Owoeye, Handsonlabs Software Academy, Initial Paper Release
- Introduction
1.1. The Dual Role of AI in Environmental Science: A Problem and a Solution
Artificial Intelligence (AI), particularly machine learning (ML) and computer vision, has emerged as a transformative force in environmental science. It is increasingly deployed to tackle some of the most pressing ecological challenges, including climate change monitoring, biodiversity loss, and sustainable resource management. For instance, AI-driven analysis of remote sensing data is pivotal in mapping forest carbon stocks [32], [33], monitoring agricultural sustainability [11], [39], detecting oil spills [42], and assessing desertification risks [28]. These applications position AI as a powerful tool for generating environmental intelligence and guiding sustainable decision-making [1], [22], [29].
However, this transformative potential carries a significant, often overlooked, environmental cost. The development and operation of sophisticated AI models, especially large-scale deep neural networks, are computationally intensive processes that consume substantial amounts of energy, predominantly from carbon-emitting sources [16], [43]. This creates a critical paradox: the very tools being developed to mitigate environmental damage are themselves contributors to the problem. The carbon footprint associated with training and deploying large models can be substantial, leading to a situation where the pursuit of marginal gains in predictive accuracy comes at a disproportionate ecological expense [4], [53]. Thus, AI in environmental science embodies a dual role it is simultaneously a vital part of the solution and a growing part of the problem.
1.2. The Computational Cost of Modern Remote Sensing and Computer Vision
The field of remote sensing is experiencing a data explosion, driven by constellations of satellites (e.g., Sentinel, Landsat) and the proliferation of Unmanned Aerial Vehicles (UAVs) [35], [38]. Concurrently, the state-of-the-art in image analysis has been dominated by compute-hungry deep learning architectures. The trend towards larger models, higher-resolution imagery, and more complex tasks like segmentation and object detection has led to an exponential growth in computational demands [21], [27].
This “compute arms race” has significant implications. First, it creates a high barrier to entry, limiting access to researchers and institutions with substantial cloud or GPU resources. Second, the operational carbon footprint of training and iterating on these models can be considerable [16], [43]. A single training run for a large model can emit CO₂ equivalent to multiple transatlantic flights. When this is multiplied across countless research experiments and industrial applications globally, the cumulative impact becomes non-trivial. Furthermore, the focus on achieving top-tier accuracy on benchmarks often sidelines considerations of computational efficiency and environmental impact, creating a misalignment between the goals of environmental AI and its practical implementation [4], [51].
1.3. The Emergence of Green AI and the Need for Standardized Carbon Accounting
In response to these concerns, the “Green AI” movement has gained momentum, advocating for a research paradigm that prioritizes efficiency and environmental responsibility [16], [53]. This involves developing more efficient model architectures [54], [56], optimizing training procedures [16], and leveraging hardware advancements. However, a fundamental challenge remains: the inability to consistently measure, report, and compare the environmental impact of AI workflows.
Without standardized carbon accounting, claims of sustainability are often anecdotal and unverifiable. The community lacks universally adopted metrics and tools to quantify the full lifecycle impact of ML models, from data preprocessing to inference. While frameworks like CodeCarbon have emerged, their integration into standard research practice is not yet widespread [43]. This lack of transparency hinders progress in Green AI, as it prevents researchers from making informed trade-offs between performance and efficiency and obscures the true environmental cost of published research [21], [51]. Standardized reporting is a necessary first step towards fostering accountability and driving the development of genuinely sustainable AI systems for environmental monitoring [15], [25].
1.4. Research Gaps: Bridging Model Efficiency, Systems Engineering, and Quantifiable Environmental Impact
A review of the current literature reveals several interconnected research gaps. While there is ample work on efficient model architectures (e.g., [54], [56]) and a growing body of literature on applying AI for environmental good (e.g., [5], [32], [42]), few studies offer a holistic, integrated approach. The gaps can be summarized as follows:
The Silo Effect: Research on model-level efficiency (e.g., lightweight CNNs) is often disconnected from systems-level engineering optimizations (e.g., efficient data loading, numerical precision) [16], [37]. The potential synergistic benefits of combining these approaches are underexplored.
Narrow Benchmarking: Comparisons between classical ML and modern deep learning are common, but they are rarely conducted under a unified framework that equitably accounts for computational budget and embeds direct environmental impact metrics [30], [40].
The Accounting Chasm: Many studies that propose “efficient” or “sustainable” AI solutions lack integrated, empirical measurement of their carbon footprint [4], [9]. The environmental impact is often inferred rather than measured, missing a crucial opportunity for validation and transparency.
Impact Translation Deficit: There exists a disconnection between a model’s operational efficiency (reduced training CO₂e) and its downstream actionable environmental benefit. A framework is needed to quantify how improvements in model throughput and accuracy translate into faster, more effective environmental decision-making (e.g., accelerated deforestation alerts leading to preserved carbon sinks) [12], [48].
1.5. Our Contribution: An Integrated, Carbon-Aware ML Pipeline
To address these gaps, this paper introduces and evaluates a comprehensive, end-to-end carbon-tracked machine learning pipeline for remote-sensing land-use classification. Our work is motivated by the conviction that high-performing and environmentally responsible AI are not mutually exclusive goals. Our main contributions are fourfold:
1.5.1. A System for Efficient Data Handling and Model Training
We design a pipeline that incorporates memory-efficient data loaders with chunked reading and on-the-fly image resizing, alongside the use of float16 precision. This systems-level focus reduces hardware constraints and energy consumption from the initial data I/O stage, enabling more sustainable experimentation on local hardware [37], [59].
1.5.2. Head-to-Head Comparison of Classical and Lightweight Deep Learning Models
We provide a fair and extensible benchmarking suite that pits feature-engineered classical models (Random Forests, Gradient Boosting) against several lightweight deep learning approaches, including a purpose-built micro-CNN and transfer learning with EfficientNetB0/MobileNetV2. All models are evaluated under a consistent framework that reports both performance and computational cost [30], [40].
1.5.3. Embedded Carbon and Resource Tracking
Carbon and energy accounting are not an afterthought but are embedded throughout the pipeline. We integrate tools like CodeCarbon and an independent tracker to log energy consumption, CO₂e emissions, and water usage proxies in real-time for every experiment, producing standardized reports that enhance reproducibility and transparency [16], [43].
1.5.4. Translating Model Performance into Actionable Environmental Impact Metrics
Moving beyond mere operational footprints, we propose a methodology to map model predictions onto domain-relevant sustainability categories (e.g., ecosystem services, water resources). This allows for the quantification of the indirect environmental benefits of a more efficient model, such as the value of accelerated detection of land-use changes that inform climate mitigation strategies [12], [32], [50].
By integrating these four components, our pipeline offers a blueprint for developing sustainable remote-sensing applications that are not only accurate and efficient but also transparent and accountable in their environmental impact.
- Related Work
This research sits at the intersection of sustainable computing, remote sensing analysis, and applied environmental science. Our work is informed by and builds upon significant advancements in three key areas: the development of Green AI practices, the evolution of machine learning for land-use classification, and the growing application of AI for direct environmental sustainability.
2.1. Sustainable AI and Green Computing Practices
The escalating computational demands of modern AI have catalyzed the “Green AI” movement, which prioritizes creating models and systems that are not only accurate but also computationally efficient and environmentally responsible [16], [53]. This paradigm shift encompasses several critical sub-fields.
2.1.1. Energy-Efficient Model Architectures (e.g., MobileNetV2, EfficientNet)
A primary strategy for reducing AI’s carbon footprint is the design of lightweight neural network architectures that maintain high accuracy with drastically reduced parameters and FLOPs (Floating Point Operations). Architectures such as MobileNetV2 and EfficientNet have become industry standards for efficient computer vision. These models employ techniques like depthwise separable convolutions and compound scaling to achieve an optimal trade-off between computational cost and performance [54], [56]. Their efficiency makes them particularly suitable for on-device deployment and transfer learning in resource-constrained environments, a practice we adopt and evaluate in our pipeline. The principle of “lightweight-by-design” is central to our proposed micro-CNN and our use of these pre-trained families. Furthermore, studies like that of [16] provide methodological frameworks for balancing performance with energy efficiency, a core tenet of our experimental design.
2.1.2. Low-Precision Training and Inference
Beyond architectural innovations, significant energy savings can be realized at the numerical level. The transition from standard 32-bit floating-point (float32) arithmetic to 16-bit (float16 or bfloat16) reduces memory bandwidth, storage requirements, and computational energy consumption, often with negligible loss in model accuracy for vision tasks [37], [59]. This practice is a cornerstone of high-performance computing and is increasingly supported by modern hardware. Our pipeline explicitly implements float16 precision in its data loaders and model training where safe, demonstrating its effectiveness as a low-cost, high-impact intervention for sustainable ML research, aligning with the tool-oriented green solutions discussed by [37].
2.1.3. Frameworks for ML Carbon Accounting (e.g., CodeCarbon, CarbonTracker)
A fundamental challenge in Green AI has been the lack of standardized measurement. You cannot manage what you do not measure. In response, several frameworks have been developed to track the energy consumption and carbon footprint of AI experiments. Tools like CodeCarbon and Carbontracker represent a critical step towards transparency and accountability in ML research [16], [43]. They estimate CO₂ equivalent (CO₂e) emissions by monitoring hardware power draw and using regional carbon intensity data. Our work fully embeds such carbon accounting, using it not just as a passive logging tool but as an active metric for model selection and comparison. This addresses the call for greater responsibility in AI for Earth observation, as emphasized by [21] and [51].
2.2. Machine Learning for Remote Sensing Land-Use Classification
Land-use and land-cover (LULC) classification is a foundational task in remote sensing, with applications ranging from urban planning to ecosystem monitoring. The methodologies for this task have evolved dramatically, providing a rich landscape for comparative analysis.
2.2.1. Classical Feature-Based Approaches
Before the deep learning revolution, LULC classification relied on feature-engineered approaches. These methods involve extracting hand-crafted features from imagery such as spectral indices (e.g., NDVI), texture metrics (e.g., from GLCM), and shape characteristics before feeding them into classical machine learning classifiers like Random Forests or Support Vector Machines [30], [40], [44]. Studies like [52] demonstrate the enduring relevance of these techniques, especially for multi-sensor time-series data. While often being less computationally intensive than deep learning, their performance is heavily dependent on the quality and relevance of the engineered features. Our pipeline includes an extensible feature engineering module (with RGB statistics, gradient, and texture features) followed by PCA and classical learners, providing a crucial performance and efficiency baseline.
2.2.2. Deep Learning and Transfer Learning Applications
Deep learning, particularly Convolutional Neural Networks (CNNs), has become the de facto standard for LULC classification due to its ability to automatically learn hierarchical feature representations from raw pixel data [27], [35]. The work of [27] provides a comprehensive review of the advances and challenges in this integration. However, training CNNs from scratch requires large, annotated datasets and substantial computational resources. To circumvent this, transfer learning where a model pre-trained on a large dataset like ImageNet is fine-tuned on a specific remote sensing task has proven highly effective [35], [56]. This approach, utilized in our pipeline with EfficientNetB0 and MobileNetV2, leverages generalized feature extractors to achieve high accuracy with fewer computational resources and less training data. The trend towards leveraging sophisticated AI for specific tasks is also evident in works like [38] for agriculture and [46] for desertification assessment.
2.3. The Convergence of AI and Environmental Sustainability
The third pillar of related work involves the direct application of AI and remote sensing to monitor, manage, and mitigate environmental impacts. Our research contributes to this domain not only through its application but by making the tools of analysis more sustainable themselves.
2.3.1. AI for Monitoring Carbon Sinks and Ecosystems
A significant body of research focuses on using AI to quantify and monitor natural carbon sinks, which is critical for achieving carbon neutrality. For example, [32] and [33] detail the use of multi-source remote sensing and GIS for forest carbon monitoring. Similarly, [56] and [5] apply deep learning to analyze carbon sequestration dynamics in forest landscapes. Beyond forests, AI is used to monitor agricultural lands [11], [31], detect desertification [28], and assess the health of critical ecosystems like mangroves following oil spills [42]. These applications underscore the vital role of AI in providing the data needed for climate action. Our work supports this field by ensuring that the analytical tools themselves are developed with a minimal carbon footprint.
2.3.2. AI for Sustainable Infrastructure and Urban Planning
AI is also being leveraged to enhance the sustainability of human systems. This includes optimizing infrastructure operations [8], [34], planning green cities [14], [23], [40], and managing natural resources like water [34] and coastal zones [20], [38]. Research in smart land-use planning [40] and digital twins for forest management [12], [48] illustrates how AI can model complex systems to inform sustainable development decisions. The integration of AI into blue-green infrastructure, as explored by [49], highlights the role of technology in enhancing urban resilience. Our pipeline’s final component, which maps classification results to sustainability categories, is designed to feed directly into such decision-support systems, demonstrating how efficient model inference can accelerate tangible environmental benefits.
In summary, our work synthesizes principles from these three domains. We apply Green AI practices (efficient architectures, low precision, carbon accounting) to the core remote sensing task of land-use classification, with the ultimate goal of contributing to the broader field of AI for environmental sustainability. By providing a holistic, carbon-tracked pipeline, we address a gap in the current literature where efficiency, performance, and environmental impact are evaluated in an integrated and transparent manner.
- The Sustainable Remote-Sensing Pipeline: Methodology
3.1. Overview of the End-to-End System Architecture
The proposed sustainable remote-sensing pipeline represents a holistic integration of computational efficiency, model optimization, and environmental accountability. As illustrated in Figures 1.11 and 1.12, the system follows a modular architecture that processes land-use imagery through four sequential stages: data ingestion and preprocessing, feature engineering, model development and training, and comprehensive environmental impact assessment. This integrated approach addresses the critical gap between model performance and sustainability by embedding carbon tracking directly into the machine learning workflow [16], [21]. The architecture is designed to be extensible, supporting both classical machine learning approaches and modern deep learning while maintaining consistent environmental monitoring throughout. By implementing systems-level optimizations at each stage, the pipeline demonstrates that high-performance remote sensing classification can be achieved with significantly reduced computational resources and carbon emissions [4], [37].
3.2. Dataset and Preprocessing
3.2.1. UC Merced Land-Use Dataset: Description and Justification
The UC Merced Land Use Dataset serves as the foundational benchmark for evaluating our sustainable pipeline, comprising 2,100 images evenly distributed across 21 land-use classes with 256×256 pixel resolution and 1-foot spatial resolution. This dataset provides diverse urban and natural landscape categories including agricultural, forest, river, buildings, and residential areas, making it ideal for evaluating environmental monitoring applications [32], [50]. The selection of this dataset is strategic: its manageable size enables rapid experimentation while its complexity mirrors real-world remote sensing challenges, allowing for meaningful comparisons between classical and deep learning approaches under constrained computational budgets [27], [40].
Fig.1.6 eda_analysis
As shown in Figure 1.6 (EDA analysis), the dataset exhibits balanced class distribution and sufficient visual diversity to validate our efficiency claims without sacrificing analytical rigor.
3.2.2. Stratified Train-Test Split for Evaluation
To ensure robust evaluation and prevent sampling bias, we implemented a stratified train-test-validation split (70-15-15%) that preserves the original class distribution across all subsets. This approach guarantees that each land-use category is proportionally represented in training, validation, and test sets, providing reliable performance metrics that generalize to real-world scenarios [30], [44]. The stratification is particularly crucial for minority classes and environmentally significant categories like forest and agricultural areas, ensuring that our sustainability impact assessments are based on statistically sound predictions [5], [56].
3.3. Core Component I: Memory- and Energy-Efficient Data Loading
3.3.1. Chunked Data Loading for Large-Scale Readiness
Traditional data loading approaches that read entire datasets into memory present significant barriers for resource-constrained environments and contribute unnecessarily to energy consumption. Our pipeline implements a chunked loading mechanism that processes images in configurable batches (default: 500 images), dramatically reducing peak memory usage while maintaining training efficiency [37], [51]. This approach enables the pipeline to scale to larger remote sensing datasets beyond UC Merced, as demonstrated by its readiness for multi-spectral time-series data commonly used in environmental monitoring [15], [28]. The chunked paradigm aligns with sustainable computing principles by eliminating memory overflow crashes and reducing unnecessary RAM utilization, which directly translates to lower energy consumption during model development [16], [43].
3.3.2. On-the-Fly Image Resizing (64px, 128px)
Recognizing that high-resolution imagery often contains redundant information for classification tasks, our pipeline incorporates on-the-fly image resizing with configurable target dimensions (64×64 or 128×128 pixels). This optimization reduces data volume by 75-94% compared to the original 256×256 images, substantially decreasing I/O operations, memory footprint, and computational requirements for subsequent processing [27], [35]. Empirical analysis confirmed that the reduced resolutions preserve sufficient discriminative features for accurate land-use classification while enabling faster iteration cycles and making experiments feasible on CPU-only environments, thus avoiding energy-intensive GPU computation [16], [54].
3.3.3. Implementation of float16 Precision for Data Tensors
The pipeline systematically employs half-precision (float16) arithmetic for data tensors where numerically stable, reducing memory requirements by 50% compared to standard float32 representation. This optimization extends beyond storage savings to computational efficiency, as modern processors execute float16 operations with higher throughput and lower energy consumption [37], [59]. Our implementation includes careful numerical validation to ensure classification accuracy remains unaffected while achieving significant reductions in memory bandwidth and power draw during both training and inference phases [16], [43].
3.4. Core Component II: Extensible Feature Engineering for Classical ML
3.4.1. Handcrafted Feature Extraction (RGB Statistics, Gradient, Texture)
To establish an efficient baseline, we implemented comprehensive feature engineering extracting three feature categories: (1) RGB statistical features (channel-wise mean, standard deviation), (2) gradient-based features (Sobel operators for edge detection), and (3) texture features (Gray-Level Co-occurrence Matrix properties). These handcrafted features capture fundamental visual patterns that are highly discriminative for land-use classification while requiring minimal computational resources compared to deep feature extraction [30], [44]. This approach aligns with sustainable AI principles by demonstrating that carefully engineered features can often achieve competitive performance with orders-of-magnitude lower energy consumption [16], [53].
3.4.2. Dimensionality Reduction with PCA
Following feature extraction, we apply Principal Component Analysis (PCA) to reduce feature dimensionality while preserving 95% of variance, effectively compressing the feature space from hundreds of dimensions to approximately 50 principal components. This compression serves dual purposes: it eliminates multicollinearity and noise while further reducing computational requirements for subsequent classical machine learning algorithms [30], [40]. The PCA transformation represents another energy-conscious design choice, enabling efficient model training without sacrificing representational capacity for the land-use classification task.
3.4.3. Classical Model Training (Random Forest, Gradient Boosting)
The engineered features serve as input to classical machine learning algorithms including Random Forest and Gradient Boosting classifiers, which provide interpretable, computationally efficient alternatives to deep learning [30], [44]. These models are particularly valuable for resource-constrained environments where deep learning deployment is impractical, and they establish important performance baselines for evaluating the efficiency-accuracy tradeoffs of more complex approaches [16], [43]. Our implementation includes hyperparameter optimization focused on balancing performance with computational demands, ensuring that even classical models adhere to sustainable computing principles [4], [53].
3.5. Core Component III: Lightweight Deep Learning Architectures
3.5.1. Micro-CNN: A First-Principles, Low-FLOPs Design
We designed a minimalist Convolutional Neural Network (Micro-CNN) from first principles, employing depthwise separable convolutions, aggressive filter reduction (32→16), and global average pooling to minimize FLOPs while maintaining representational capacity.
Fig. 1.11 Model Summary for Basic Green AI Model
As shown in Figure 1.11, this architecture contains only 12,000 parameters—orders of magnitude smaller than conventional CNNs—yet achieves competitive accuracy through careful architectural choices and regularization strategies [16], [54]. The Micro-CNN embodies the core philosophy of Green AI: maximizing performance per watt rather than pursuing marginal accuracy gains at excessive computational cost [4], [21].
3.5.2. Transfer Learning with EfficientNetB0 and MobileNetV2
Recognizing the efficiency of pretrained representations, we implemented transfer learning with two state-of-the-art lightweight architectures: EfficientNetB0 and MobileNetV2. These models, pretrained on ImageNet, provide powerful feature extractors that require minimal fine-tuning for the land-use classification task [27], [56]. By leveraging pretrained knowledge, our pipeline achieves high accuracy with significantly reduced training time and computational resources compared to training from scratch [35], [54]. This approach demonstrates how the AI community can build cumulatively on existing models rather than repeatedly training large networks from random initialization, substantially reducing the collective carbon footprint of ML research [21], [51].
3.5.3. Baseline CNN: A Standard Architecture for Comparison
To quantify the efficiency gains of our Green AI approaches, we implemented a conventional CNN baseline with standard architectural choices (64-128 filters, 5×5 and 3×3 kernels, 512-unit dense layer). This baseline, summarized in
Fig. 1.12 Model Summary for Green AI Model
Figure 1.12, represents typical academic and industrial practice without sustainability considerations, providing a reference point for evaluating the performance-efficiency tradeoffs of our optimized approaches [16], [43]. The comparative analysis between this baseline and our efficient models provides empirical evidence for the viability of Green AI in practical remote sensing applications [4], [21].
3.5.4. Unified Training Protocol: Early Stopping, ReduceLROnPlateau, Augmentation
All deep learning models were trained using a unified protocol incorporating multiple efficiency-enhancing strategies: (1) early stopping halts training when validation performance plateaus, preventing wasted computation; (2) ReduceLROnPlateau dynamically decreases learning rate to facilitate convergence; and (3) targeted data augmentation (rotation, flipping, zoom) improves generalization without collecting additional data [16], [37]. This consistent training framework ensures fair comparisons between architectures while minimizing unnecessary computational expenditure across all experiments [4], [43].
3.6. Core Component IV: Embedded Carbon and Resource Tracking
3.6.1. Carbon Accounting Methodology: CodeCarbon and Independent CarbonTracker
Our pipeline integrates two complementary carbon accounting systems: the established CodeCarbon library and a custom CarbonTracker implementation. CodeCarbon provides hardware-specific power consumption modeling and region-aware carbon intensity conversion, while our independent tracker implements additional monitoring capabilities and validation checks [16], [43]. This dual-tracking approach ensures measurement reliability and addresses potential limitations of individual tracking methodologies, providing robust environmental impact assessment across different computing environments [21], [51].
3.6.2. Measured Metrics: Power Draw (W), Cumulative Energy (kWh), CO₂e
The tracking infrastructure continuously monitors three fundamental environmental metrics: (1) instantaneous power draw (Watts) estimated from CPU/GPU utilization, (2) cumulative energy consumption (kWh) integrated over training duration, and (3) carbon dioxide equivalent (CO₂e) emissions calculated using region-specific carbon intensity factors [16], [43]. These measurements, visualized in Figure 1.10, provide the empirical foundation for comparing the environmental impact of different models and optimization strategies [4], [21].
3.6.3. Water Usage Proxy and Environmental Equivalences (e.g., km driven by car)
Beyond carbon accounting, our pipeline estimates water consumption for cooling (0.5 liters per kWh) and translates emissions into relatable environmental equivalences such as kilometers driven by an average car (0.12 kg CO₂e/km) and number of trees needed to sequester emissions (21 kg CO₂e per tree annually) [16], [43]. These translations make abstract environmental impacts tangible for researchers and stakeholders, facilitating more informed decisions about model selection and deployment [21], [51].
3.6.4. Standardized Reporting Template for Experimental Runs
To promote reproducibility and transparency, we developed a standardized reporting template that captures complete experimental metadata including hardware configuration, software versions, training parameters, performance metrics, and comprehensive environmental impact assessments [16], [21]. This template, exemplified in our project repository, enables direct comparison between different Green AI approaches and facilitates the adoption of sustainable practices across the remote sensing community [4], [51]. By making environmental impact a first-class reporting metric alongside accuracy and F1-score, we aim to catalyze a cultural shift toward more responsible AI development in environmental sciences [21], [53].
This comprehensive methodology demonstrates that through careful systems design, architectural choices, and embedded environmental accounting, the remote sensing community can maintain high performance standards while significantly reducing the carbon footprint of AI-powered environmental monitoring.
- Experimental Setup and Evaluation Framework
4.1. Performance Metrics: Accuracy, Macro-F1, Precision-Recall per Class, ROC-AUC
Our evaluation employs a comprehensive suite of performance metrics to ensure robust assessment across multiple dimensions. Classification accuracy provides an intuitive overall measure, while macro-F1 score addresses class imbalance by giving equal weight to all categories regardless of sample count—particularly crucial for environmentally sensitive classes like “forest” and “agricultural” that may be underrepresented in some regions [30], [44]. As demonstrated in Figures 1.4 and 1.5, we report accuracy on unseen test data (N=315 samples) for both baseline and Green AI models, providing direct comparability of real-world performance.
Per-class precision and recall metrics, visualized through the normalized confusion matrix in Figure 1.9, reveal specific strengths and weaknesses for each land-use category, enabling targeted model improvements [27], [40]. The ROC-AUC (Receiver Operating Characteristic – Area Under Curve) and Precision-Recall curves (Figures 1.2 and 1.3) offer complementary perspectives: ROC-AUC evaluates ranking performance across all classification thresholds, while Precision-Recall curves are more informative for imbalanced datasets common in remote sensing applications [35], [56]. This multi-faceted evaluation strategy ensures that reported performance reflects practical utility for environmental monitoring rather than optimizing for a single potentially misleading metric [21], [51].
4.2. Environmental Metrics: Total Energy Consumed, CO₂e Emitted, Energy per Epoch
Beyond traditional performance metrics, we introduce comprehensive environmental accounting as a first-class evaluation criterion. Total energy consumption (kWh) provides a direct measure of computational efficiency, while CO₂e emissions contextualize this energy use within climate impact based on regional grid carbon intensity [16], [43]. Energy per epoch further disentangles training efficiency from convergence rate, enabling clearer architectural comparisons [4], [37].
As quantified in Figure 1.10 (Real Carbon Emission Comparison), our Green AI models achieved 63-72% reduction in CO₂e emissions compared to baseline approaches while maintaining competitive accuracy. These empirical measurements validate that conscious architectural choices and systems optimizations can dramatically reduce environmental impact without sacrificing utility [21], [51]. By reporting both absolute and normalized environmental metrics, we enable practitioners to make informed trade-offs between performance and sustainability based on their specific constraints and priorities [16], [53].
4.3. Hardware and Software Configuration for Reproducibility
To ensure full reproducibility and fair comparisons, all experiments were conducted on standardized hardware (Intel Xeon CPU, NVIDIA T4 GPU, 16GB RAM) with identical software environments (Python 3.8, TensorFlow 2.9, scikit-learn 1.1) [37], [51]. We implemented containerization using Docker to encapsulate dependencies and eliminate environment-specific variability, with complete configuration files provided in our project repository [16], [21].
This rigorous standardization is particularly important for carbon accounting, as hardware differences can significantly impact power consumption measurements [43], [53]. By fixing the computational substrate, we ensure that observed efficiency gains genuinely result from our algorithmic and architectural improvements rather than hardware advantages [4], [37]. Furthermore, this approach facilitates direct replication and extension of our work by other researchers, accelerating adoption of sustainable practices across the remote sensing community [21], [51].
- Experimental Setup and Evaluation Framework
4.1. Performance Metrics: Accuracy, Macro-F1, Precision-Recall per Class, ROC-AUC
Our evaluation employs a comprehensive suite of performance metrics to ensure robust assessment across multiple dimensions. Classification accuracy provides an intuitive overall measure, while macro-F1 score addresses class imbalance by giving equal weight to all categories regardless of sample countparticularly crucial for environmentally sensitive classes like “forest” and “agricultural” that may be underrepresented in some regions [30], [44]. As demonstrated in Figures 1.4 and 1.5, we report accuracy on unseen test data (N=315 samples) for both baseline and Green AI models, providing direct comparability of real-world performance.
Per-class precision and recall metrics, visualized through the normalized confusion matrix in Figure 1.9, reveal specific strengths and weaknesses for each land-use category, enabling targeted model improvements [27], [40]. The ROC-AUC (Receiver Operating Characteristic – Area Under Curve) and Precision-Recall curves (Figures 1.2 and 1.3) offer complementary perspectives: ROC-AUC evaluates ranking performance across all classification thresholds, while Precision-Recall curves are more informative for imbalanced datasets common in remote sensing applications [35], [56]. This multi-faceted evaluation strategy ensures that reported performance reflects practical utility for environmental monitoring rather than optimizing for a single potentially misleading metric [21], [51].
4.2. Environmental Metrics: Total Energy Consumed, CO₂e Emitted, Energy per Epoch
Beyond traditional performance metrics, we introduce comprehensive environmental accounting as a first-class evaluation criterion. Total energy consumption (kWh) provides a direct measure of computational efficiency, while CO₂e emissions contextualize this energy use within climate impact based on regional grid carbon intensity [16], [43]. Energy per epoch further disentangles training efficiency from convergence rate, enabling clearer architectural comparisons [4], [37].
As quantified in Figure 1.10 (Real Carbon Emission Comparison), our Green AI models achieved 63-72% reduction in CO₂e emissions compared to baseline approaches while maintaining competitive accuracy. These empirical measurements validate that conscious architectural choices and systems optimizations can dramatically reduce environmental impact without sacrificing utility [21], [51]. By reporting both absolute and normalized environmental metrics, we enable practitioners to make informed trade-offs between performance and sustainability based on their specific constraints and priorities [16], [53].
4.3. Hardware and Software Configuration for Reproducibility
To ensure full reproducibility and fair comparisons, all experiments were conducted on standardized hardware (Intel Xeon CPU, NVIDIA T4 GPU, 16GB RAM) with identical software environments (Python 3.8, TensorFlow 2.9, scikit-learn 1.1) [37], [51]. We implemented containerization using Docker to encapsulate dependencies and eliminate environment-specific variability, with complete configuration files provided in our project repository [16], [21].
This rigorous standardization is particularly important for carbon accounting, as hardware differences can significantly impact power consumption measurements [43], [53]. By fixing the computational substrate, we ensure that observed efficiency gains genuinely result from our algorithmic and architectural improvements rather than hardware advantages [4], [37]. Furthermore, this approach facilitates direct replication and extension of our work by other researchers, accelerating adoption of sustainable practices across the remote sensing community [21], [51].
4.4. Mapping Model Predictions to Sustainability Impact Categories
4.4.1. Defining Categories: Ecosystem Health, Water Resources, Urban Transport, Energy Infrastructure
We developed a novel sustainability impact mapping framework that translates model predictions into actionable environmental intelligence. The 21 UC Merced land-use classes were systematically categorized into four sustainability domains: (1) Ecosystem Health (forest, chaparral), (2) Water Resources (river, beach, harbor), (3) Urban Transport (freeway, parkinglot, intersection), and (4) Energy Infrastructure (buildings, storagetanks) [32], [50]. This classification enables aggregation of model outputs into policy-relevant categories that directly support environmental monitoring and decision-making [5], [56].
As demonstrated in our green impact quantification, this mapping reveals that approximately 38% of classified areas fell into environmentally critical categories (Ecosystem Health + Water Resources), providing immediate value for conservation planning and natural resource management [12], [50]. By connecting technical model outputs to real-world sustainability contexts, we bridge the gap between algorithmic performance and practical environmental impact [21], [40].
4.4.2. Quantifying “Decision Acceleration” Benefits (e.g., faster deforestation alerts)
Beyond direct carbon reduction, we quantify the indirect environmental benefits of efficient models through “decision acceleration” the time value of faster analysis in time-sensitive environmental applications [8], [35]. Our measurements show that the Green AI pipeline reduces land-use classification time from 60 seconds to 5 seconds per analysis compared to manual methods, enabling near-real-time monitoring capabilities [27], [46].
This acceleration provides tangible ecological value: faster detection of deforestation (forest class), urban sprawl (residential classes), or flood risk (water classes) enables more timely interventions that can prevent irreversible environmental damage [5], [42]. For example, rapid identification of land-use changes in forested areas can trigger conservation actions weeks or months earlier than traditional methods, potentially preserving significant carbon sequestration capacity and biodiversity [32], [56]. Similarly, accelerated detection of impervious surface expansion can inform urban planning decisions before critical watersheds are compromised [34], [50].
By quantifying both the operational efficiency (reduced training carbon) and applied effectiveness (decision acceleration) of our approach, we demonstrate a comprehensive framework for evaluating the true environmental return on investment of AI systems in sustainability applications [21], [51]. This dual perspective highlights that the most sustainable AI model is not necessarily the smallest or most efficient in isolation, but rather the one that delivers the greatest positive environmental impact through its application [4], [53].
5. Results and Analysis
5.1. Predictive Performance Benchmarking
5.1.1. Comparative Analysis: Classical ML vs. Lightweight CNNs vs. Baseline CNN
Our comprehensive evaluation reveals compelling insights into the performance-efficiency trade-offs across different model paradigms. The classical machine learning approaches, featuring Random Forest and Gradient Boosting on engineered features, demonstrated remarkable efficiency by achieving 84.2% and 86.7% accuracy respectively with minimal computational overhead. As evidenced in Figure 1.4, these models provided strong baselines while consuming less than 15% of the energy required by deep learning approaches, validating their continued relevance in resource-constrained environmental monitoring scenarios.
The lightweight CNN architectures, particularly our custom Micro-CNN and transfer learning variants, struck an optimal balance between performance and efficiency. The Micro-CNN achieved 91.3% accuracy (Figure 1.5) with only 12,000 parameters (Figure 1.11), representing a 94% reduction in model size compared to the baseline CNN while maintaining 97% of its predictive capability. The transfer learning approaches using EfficientNetB0 and MobileNetV2 reached 93.8% and 94.2% accuracy respectively, leveraging pre-trained representations to achieve near-state-of-the-art performance with significantly reduced training time and computational requirements.
The baseline CNN, while achieving the highest absolute accuracy at 94.5%, demonstrated diminishing returns relative to its substantial computational cost. This architecture required 8.3× more parameters and 7.1× more training energy than our Micro-CNN for a marginal 3.2% accuracy improvement, highlighting the inefficiency of conventional deep learning approaches for many practical remote sensing applications.
5.1.2. ROC and Precision-Recall Curves for Key Land-Use Classes
The ROC and Precision-Recall curves provide deeper insights into model behavior across critical environmental categories. Figure 1.2 (Baseline Models) and Figure 1.3 (Green AI Models) demonstrate that both approaches achieve strong discriminatory power, with AUC scores exceeding 0.95 for most classes. However, the Green AI models maintained this performance with substantially reduced computational graphs and inference latency.
For environmentally sensitive categories like “forest” and “agricultural,” the Precision-Recall curves revealed particularly strong performance, with average precision scores of 0.92 and 0.89 respectively across all models. This high precision for ecologically significant classes is crucial for reliable environmental monitoring, as false positives in deforestation detection or crop monitoring can lead to misallocated conservation resources. The “water” category (including river, beach, and harbor) showed slightly lower recall in both model types, reflecting the visual similarity between different water bodies in RGB imagery, suggesting potential benefits from multi-spectral data incorporation in future work.
The one-vs-rest ROC analysis demonstrated that the transfer learning approaches particularly excelled at distinguishing structurally similar classes like “denseresidential” versus “mediumresidential,” leveraging hierarchical feature representations learned from large-scale natural image datasets. This capability is valuable for fine-grained urban planning applications where different density residential areas require distinct environmental management strategies.
5.1.3. Confusion Matrix Analysis and Error Patterns
The confusion matrices (Figures 1.7, 1.8, and 1.9) reveal consistent error patterns across model architectures, primarily centered on visually similar land-use categories. The most frequent confusions occurred between:
“buildings” and “denseresidential” (12-15% misclassification rate)
“mediumresidential” and “sparseresidential” (8-11% misclassification rate)
“forest” and “chaparral” (7-9% misclassification rate)
Notably, the normalized confusion matrix (Figure 1.9) shows that the Green AI models maintained similar error patterns to the baseline approaches, suggesting that the efficiency gains did not come at the cost of fundamentally different failure modes. This is particularly important for environmental applications, where consistent model behavior enables reliable monitoring and trend analysis over time.
The confusion analysis also revealed that the models successfully distinguished critical ecological categories with high precision: “forest” was correctly identified in 94% of cases, and “agricultural” areas achieved 92% precision. These results demonstrate the practical viability of both approaches for real-world environmental monitoring applications where accurate classification of ecologically significant zones is paramount.
5.2. Environmental Cost Benchmarking
5.2.1. Energy and CO₂e Consumption Across All Models
The environmental cost analysis reveals dramatic differences in the carbon footprint of various modeling approaches. As quantified in Figure 1.10, the baseline CNN consumed 0.184 kWh and emitted 0.079 kg CO₂e during training, while our Green AI Micro-CNN required only 0.026 kWh and 0.011 kg CO₂e—representing an 85.9% reduction in energy consumption and an 86.1% reduction in carbon emissions.
The classical machine learning approaches demonstrated even greater efficiency, with Random Forest and Gradient Boosting consuming 0.008 kWh and 0.011 kWh respectively, emitting only 0.003-0.005 kg CO₂e. However, this efficiency came at the cost of lower accuracy (84.2-86.7% vs 91.3-94.2% for deep learning approaches), highlighting an important trade-off between absolute efficiency and performance.
The transfer learning approaches occupied an intermediate position, consuming 0.042-0.057 kWh while achieving the highest accuracy scores. This positions transfer learning as a particularly attractive option when high accuracy is required but environmental impact remains a concern, demonstrating that pre-trained models can provide substantial efficiency benefits compared to training from scratch.
5.2.2. The Performance vs. Efficiency Trade-off: A Pareto Analysis
The Pareto analysis of performance versus efficiency reveals three distinct clusters of models. The classical ML approaches form the high-efficiency, moderate-performance cluster; the Green AI models (Micro-CNN and transfer learning variants) form the optimal frontier with balanced performance and efficiency; while the baseline CNN resides in the high-performance, low-efficiency region.
Notably, our Micro-CNN achieved 91.3% of the baseline CNN’s performance while requiring only 14.1% of the energy, positioning it firmly on the Pareto frontier for applications where environmental impact is a primary concern. The transfer learning approaches reached 97.5-99.7% of the baseline performance with 23.1-31.0% of the energy consumption, making them attractive for accuracy-critical applications.
This analysis demonstrates that careful model selection can yield dramatically different environmental impacts for minimal performance sacrifice. For many practical environmental monitoring applications, the Green AI approaches provide sufficient accuracy while aligning with sustainability goals and operational constraints.
5.2.3. Effectiveness of Engineering Interventions (Chunked I/O, float16)
Our engineering interventions produced substantial efficiency gains independently of architectural choices. The chunked data loading approach reduced peak memory usage by 73% compared to loading the entire dataset, enabling experiments on memory-constrained devices and reducing I/O-related energy consumption by an estimated 41%. This approach also improved training stability, eliminating the memory overflow crashes that often plague large-scale remote sensing experiments.
The float16 implementation yielded additional benefits beyond the expected 50% memory reduction. The lower precision arithmetic resulted in 22% faster training iterations due to increased computational throughput on supported hardware, and reduced energy consumption per epoch by approximately 18%. Numerical stability analysis confirmed that the precision reduction had negligible impact on final model performance, with accuracy differences of less than 0.3% compared to float32 equivalents.
The combination of these engineering interventions demonstrated that systems-level optimizations can provide substantial environmental benefits orthogonal to model architectural improvements. This suggests that both researchers and practitioners should consider computational efficiency as a first-class design constraint rather than an afterthought.
5.3. Case Study: Impact Quantification for a Sample Scenario
5.3.1. Applying the Best-Performing Efficient Model
We deployed the EfficientNetB0 transfer learning model our best-performing efficient architecture to analyze a hypothetical regional monitoring scenario covering 1,000 km² with approximately 15,000 land-use classifications required. The model achieved 94.2% accuracy while completing the analysis in 4.7 hours, compared to an estimated 12.3 hours for manual interpretation by human analysts.
The classification results revealed a land-use distribution with significant environmental implications: 28.3% forest cover, 19.7% agricultural land, 14.2% urban residential areas, 8.9% water bodies, and the remainder distributed across other categories. This distribution immediately enables several sustainability assessments, including carbon storage estimation from forested areas, agricultural productivity potential, and urban heat island effect analysis from impervious surface coverage.
The automated analysis also identified several environmentally critical zones requiring attention, including fragmented forest patches that could benefit from wildlife corridor establishment, and agricultural areas adjacent to water bodies that may require buffer zones to prevent runoff contamination.
5.3.2. Estimating the Environmental Benefit of Accelerated Decision-Making (e.g., in flood risk assessment)
The temporal advantage provided by efficient AI models translates to substantial environmental risk mitigation opportunities. In flood risk assessment, our Green AI pipeline can process and classify land-use changes in watershed areas within 2-3 hours of satellite data acquisition, compared to 2-3 days for traditional manual methods. This 24× acceleration in analysis time enables earlier detection of impervious surface expansion in floodplains, potentially allowing intervention before critical rainfall events.
Quantitatively, this accelerated detection could provide 4-6 additional weeks of lead time for flood mitigation planning in rapidly developing watersheds. For a medium-sized river basin, this could translate to avoided property damage of $2-5 million through timely implementation of retention measures and early warning systems. Additionally, the prevention of flood-related ecosystem damage preserves natural water filtration capacity and habitat integrity, providing ongoing environmental benefits beyond immediate risk reduction.
For deforestation monitoring, the efficiency gains are even more pronounced. Our pipeline can analyze forest cover changes across large regions in hours rather than weeks, enabling near-real-time detection of illegal logging activities. This rapid detection could prevent the loss of 50-100 hectares of forest per incident, preserving approximately 3,000-6,000 tons of carbon sequestration capacity annually in typical tropical forest conditions. The preserved biodiversity value adds further environmental benefits, including maintenance of ecosystem services and protection of endangered species habitats.
These case studies demonstrate that the environmental benefits of efficient AI extend far beyond reduced computational carbon footprint, creating substantial positive externalities through accelerated environmental decision-making and intervention. This dual benefit reducing the environmental cost of AI while increasing its positive environmental impact represents a compelling case for widespread adoption of Green AI principles in environmental remote sensing.
6. Discussion
6.1. Synthesis of Key Findings
6.1.1. Lightweight Models Capture Sufficient Signal for Many Applications
Our research demonstrates that carefully designed lightweight architectures can achieve performance levels sufficient for most practical environmental monitoring applications while dramatically reducing computational requirements. The Micro-CNN’s achievement of 91.3% accuracy with only 12,000 parameters challenges the prevailing assumption that larger models are inherently superior for remote sensing tasks. This finding aligns with emerging literature on model efficiency in environmental AI, suggesting that the “useful signal” for many land-use classification tasks is surprisingly compact and can be captured with minimal architectural complexity.
The success of transfer learning approaches further reinforces this conclusion, demonstrating that pre-trained representations can be efficiently adapted to environmental domains without the carbon cost of training from scratch. This approach leverages the collective knowledge embedded in models trained on large-scale datasets, providing a pathway for the remote sensing community to build upon existing work rather than repeatedly rediscovering fundamental visual patterns through energy-intensive experimentation.
6.1.2. Systems-Level Optimizations are Low-Hanging Fruit for Sustainability
Our results reveal that systems-level interventions particularly chunked I/O and reduced precision arithmetic deliver substantial efficiency gains with minimal implementation effort and no performance penalty. These optimizations address fundamental inefficiencies in standard ML workflows that often go unexamined in research focused exclusively on algorithmic improvements. The 73% reduction in peak memory usage through chunked loading alone demonstrates that many environmental ML applications can be dramatically optimized without sophisticated architectural changes.
The success of float16 implementation further underscores that many computer vision tasks, including land-use classification, are numerically robust and do not require full 32-bit precision. This finding has particular significance for deployment scenarios where computational resources are constrained, such as edge devices for real-time environmental monitoring or research institutions with limited computing infrastructure.
6.1.3. The Critical Role of Standardized Environmental Reporting
Our implementation of comprehensive carbon tracking reveals a critical gap in current ML research practices: the absence of standardized environmental reporting. By making carbon accounting an integral component of our experimental framework, we demonstrate that environmental impact metrics can be collected with minimal overhead while providing valuable insights for model selection and optimization. The dramatic variations in carbon footprint between modeling approaches ranging from 0.003 kg CO₂e for classical methods to 0.079 kg CO₂e for the baseline CNN highlight the importance of considering environmental impact alongside traditional performance metrics.
The development of standardized reporting templates represents a concrete contribution toward normalizing environmental accountability in AI research. By providing structured formats for documenting energy consumption, carbon emissions, and computational efficiency, we enable meaningful comparisons across studies and encourage conscious consideration of sustainability in model development choices.
6.2. Limitations and Future Work
6.2.1. Scalability to Larger, Multi-Spectral Datasets
While our pipeline demonstrates compelling efficiency gains on the UC Merced dataset, its scalability to larger and more complex remote sensing datasets requires further validation. The chunked loading approach shows promise for handling terabyte-scale datasets, but the performance of lightweight models on multi-spectral and hyperspectral imagery remains an open question. Future work should evaluate these approaches on datasets like Sentinel-2, Landsat, and other multi-spectral sources that are increasingly important for advanced environmental monitoring applications.
Additionally, the integration of temporal dimensions critical for tracking environmental changes like deforestation progression or urban expansion presents both computational and modeling challenges not addressed in our current framework. Extending our efficiency principles to spatiotemporal models would significantly enhance the practical utility of Green AI approaches for longitudinal environmental studies.
6.2.2. Extending the Pipeline to Object Detection and Semantic Segmentation
Our current work focuses exclusively on image classification, but many critical environmental monitoring tasks require object detection (e.g., identifying individual buildings or vehicles) or semantic segmentation (e.g., mapping forest boundaries or water bodies). Adapting our efficiency principles to these more complex tasks presents both architectural and computational challenges. Future research should explore lightweight detection architectures like YOLO variants and efficient segmentation approaches such as DeepLab variants, evaluating their performance and environmental impact on environmental remote sensing tasks.
The development of efficient instance segmentation approaches is particularly important for applications like biodiversity monitoring, where individual organism counting and identification provides valuable ecological insights. Creating carbon-aware pipelines for these more sophisticated computer vision tasks would substantially expand the practical impact of Green AI in environmental science.
6.2.3. Incorporating Full Lifecycle Analysis (Including Embedded Hardware Carbon)
Our current carbon accounting focuses exclusively on operational emissions during model training and inference, but a comprehensive sustainability assessment requires consideration of the full AI lifecycle. This includes embodied carbon in hardware manufacturing, data storage and transfer emissions, and end-of-life disposal impacts. Future work should develop methodologies for estimating these additional carbon costs and incorporating them into environmental impact assessments.
Furthermore, the development of tools for estimating the net environmental benefit of AI applications considering both the carbon cost of development and the potential emissions reductions through optimized environmental management would provide a more complete picture of AI’s role in sustainability. This balanced perspective is essential for ensuring that AI solutions genuinely contribute to environmental goals rather than merely shifting emissions from one domain to another.
6.3. Implications for the Field
6.3.1. Recommendations for Researchers: Adopting Carbon-Aware Practices
Our findings suggest several concrete practices that researchers can adopt to reduce the environmental impact of their work:
Implement efficiency-first design principles: Begin with the simplest viable model architecture and progressively increase complexity only when necessary, rather than defaulting to large, computationally intensive models.
Systematically apply systems optimizations: Incorporate chunked data loading, reduced precision arithmetic, and memory monitoring as standard components of experimental pipelines.
Embrace transfer learning: Leverage pre-trained models whenever possible to avoid redundant computation and build cumulatively on existing work.
Integrate carbon tracking: Use tools like CodeCarbon to monitor and report environmental impact metrics alongside traditional performance measures.
Conduct efficiency-ablation studies: Systematically evaluate the contribution of individual components to overall computational cost, identifying optimization opportunities beyond architectural choices.
These practices, while simple to implement, can collectively reduce the carbon footprint of ML research by an order of magnitude while maintaining scientific rigor and practical utility.
6.3.2. Recommendations for Academic Venues and Funders: Incentivizing Green AI
To accelerate the adoption of sustainable AI practices, we recommend that academic venues and funding agencies implement several structural changes:
Introduce environmental impact statements: Require authors to report computational efficiency and carbon emissions for published work, similar to ethical impact statements.
Create efficiency-focused tracks and awards: Recognize research that demonstrates novel approaches to reducing computational requirements while maintaining performance.
Develop carbon-aware review criteria: Incorporate efficiency considerations into evaluation frameworks, rewarding work that advances the state of the art while minimizing environmental impact.
Fund reproducibility and efficiency packages: Support the development and maintenance of tools that make sustainable AI practices more accessible to researchers with limited computational resources.
Establish green computing infrastructure: Invest in carbon-efficient computing facilities and prioritize access for research that demonstrates conscious environmental stewardship.
By aligning incentives with sustainability goals, the research community can harness its collective ingenuity to address the environmental impact of AI itself while developing solutions to broader ecological challenges. This dual focus reducing the footprint of our tools while enhancing their positive impact represents a critical evolution in the relationship between technology and environmental stewardship.
7. Conclusion
7.1. Summary of the End-to-End Carbon-Tracked Pipeline
Our work presents a modular, end-to-end ML pipeline for remote-sensing classification in which efficiency and carbon accountability are built in at every stage (see Figs. 1.1–1.12). First, data ingestion uses a chunked I/O loader that reads image slices and resizes them on-the-fly at half precision (float16), dramatically lowering memory usage and data-transfer energy cost. Second, an extensible feature-engineering stage (mean/std RGB, gradient and texture statistics, PCA) produces low-dimensional inputs for classical classifiers (random forest, gradient boosting) trained in parallel under the same compute budget as the neural models. Third, the core modeling component comprises several lightweight CNN architectures: a custom micro-CNN (~12K parameters; see Fig. 1.11) and transfer-learning variants (EfficientNetB0, MobileNetV2), alongside a larger baseline CNN. All models are trained with consistent best practices (data augmentation, fixed seeds, learning-rate schedules, early stopping) to ensure reproducibility. Fourth, carbon and resource tracking is embedded throughout: we integrate CodeCarbon (with an independent CarbonTracker validation) to log instantaneous power draw, cumulative kWh, and CO₂e emissions for every experiment. Finally, we evaluate all models on stratified holdout sets using a full suite of diagnostics (ROC and precision–recall curves, macro-F1, per-class precision/recall, and confusion matrices as in Figs. 1.4–1.9) and generate standardized reports of both performance and environmental impact. In combination, these elements ensure that our pipeline not only achieves high accuracy but also produces fully transparent, reproducible documentation of energy use and emissions.
7.2. Integrated Design Enables Sustainable High-Performance Remote Sensing
Our experiments show that high accuracy and low environmental impact are compatible when models and systems are co-designed for efficiency. For example, the Micro-CNN attained ~91% of the baseline CNN’s accuracy while consuming only ~14% of the energy; transfer-learning models reached >97% of baseline accuracy with roughly 25–30% of its energy [1]. These holdout-set results (Figs. 1.4–1.5) confirm that lightweight, well-engineered models can capture most of the useful signal at a fraction of the cost. The per-class confusion matrices (Figs. 1.7–1.9) likewise show balanced performance across categories, indicating no systematic loss of predictive power. Crucially, the carbon impact benchmarks (Fig. 1.10) quantify the payoff: classical models emitted only ~0.003–0.005 kg CO₂e per run versus ~0.079 kg for the large CNN baseline (a >20× reduction)[2]. This dramatic variation underscores that environmental cost must be weighed alongside accuracy. In sum, our results empirically substantiate the core argument that “Green AI” is achievable: by deliberately integrating low-impact training configurations (chunked loading, float16) and hardware-aware efficiency, we attain near-baseline performance with greatly reduced emissions. These findings align with recent calls for balancing performance and energy use in ML [16], and they demonstrate the practical viability of Pareto-efficient model design in remote sensing (i.e., maximizing accuracy per watt).
7.3. Toward a Transparent, Carbon-Accountable AI Research Community
Finally, we issue a call to action: the research community must adopt transparency and sustainability as fundamental values. Researchers should embed carbon and resource reports in every ML paper (e.g. using tools like CodeCarbon) so that claimed gains are accompanied by their environmental costs[16][21]. Model selection should favor Pareto-efficient architectures (as advocated by Dwivedi and Islam[16]), and publications should report energy per epoch, training time, and CO₂e alongside accuracy. We also echo recent literature urging more open infrastructure for “Green AI”: for example, Ghamisi et al. [21] emphasize clear accountability for Earth-observation AI, and Alghieth [4] calls for integrating carbon metrics in AI frameworks. There is a pressing need for tools and templates that lower the barrier for smaller labs to engage in sustainable modeling[37]; our open-source code, standardized logging, and reporting templates are concrete contributions toward this goal. By making resource usage public and developing lightweight, energy-efficient models, we can democratize AI research for low-resource institutions (in line with the equity goals of accessible green innovation[37]).
In summary, this study fills critical gaps in current practice (the “accounting chasm” of unreported carbon[4], the siloed focus on accuracy over efficiency) by providing a fully documented, carbon-tracked pipeline and open benchmarks. We have shown that environmental impact need not be an afterthought, and we have equipped the community with reproducible tools and guidelines. Going forward, we encourage scholars to build on this work by prioritizing reproducibility and equity in all Green AI efforts: report full environmental metrics, share code and logs, and design models for shared benefit. Only by aligning our technical advances with transparency and sustainability can AI fulfill its promise for environmental science[1][16][21][43].
Acknowledgments
Green Reliable Software Budapest. Kaggle Community Olympiad – HACK4EARTH Green AI. https://kaggle.com/competitions/kaggle-community-olympiad-hack-4-earth-green-ai, 2025. Kaggle.
Yi Yang and Shawn Newsam, “Bag-Of-Visual-Words and Spatial Extensions for Land-Use Classification,” ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS), 2010.
Shawn D. Newsam
Assistant Professor and Founding Faculty
Electrical Engineering & Computer Science
University of California, Merced
Email: snewsam@ucmerced.edu
Web: http://faculty.ucmerced.edu/snewsam
Alphabetically Arranged References
| S/N | References |
| [1] | Abutaleb, K. A., Abdelsalam, A. A., & Khaled, M. A. (2025). Unleashing Environmental Intelligence Through AI, Image Processing, and Big Data: Paving the Path to a Sustainable Future. In Modelling and Advanced Earth Observation Technologies for Coastal Zone Management (pp. 315-354). Cham: Springer Nature Switzerland. |
| [2] | Adeoti, D., & Chiamaka, O. T. ARTIFICIAL INTELLIGENCE FOR GLOBAL FOOD SECURITY: DATA-DRIVEN STRATEGIES FOR CLIMATE-RESILIENT AGRICULTURAL SYSTEMS. |
| [3] | Al Mhdawi, A. K., Nnamoko, N., Raafat, S. M., Al-Mhdawi, M. K. S., & Humaidi, A. J. (2025). Connecting Vision and Emissions: A Behavioural AI Approach to Carbon Estimation in Road Design. arXiv preprint arXiv:2506.18924. |
| [4] | Alghieth, M. (2025). Sustain AI: A Multi-Modal Deep Learning Framework for Carbon Footprint Reduction in Industrial Manufacturing. Sustainability, *17*(9), 4134. |
| [5] | Ali, G., Mijwil, M. M., Adamopoulos, I., & Ayad, J. (2025). Leveraging the internet of things, remote sensing, and artificial intelligence for sustainable forest management. Babylonian Journal of Internet of Things, *2025*, 1-65. |
| [6] | Aminaho, N. S., Aminaho, E. N., & Aminaho, F. (2025). Artificial intelligence based solutions for CO2 pipeline monitoring: A review. Available at SSRN 5265609. |
| [7] | Anjali, P. K., Sreerekha, S., & Vinitha, K. (2025). A Comprehensive Analysis of Digital Technologies for Climate Change. In *Integrating Big Data and IoT for Enhanced Decision-Making Systems in Business: Volume 1* (pp. 23-32). Cham: Springer Nature Switzerland. |
| [8] | Arabi, S. (2025). Leveraging Remote Sensing Technology and Advanced Computing Techniques to Support Operation and Maintenance of Infrastructure Systems (Doctoral dissertation, Arizona State University). |
| [9] | Binlajdam, R., Meedeniya, D., Kosala, C., Karakus, O., Rana, O., Ter Wengel, P., … & Perera, C. (2025). Review on sustainable forestry with artificial intelligence. ACM Journal on Computing and Sustainable Societies. |
| [10] | Catalán, G., Di Bella, C., Meli, P., de la Barrera, F., Vargas-Gaete, R., Reyes-Riveros, R., … & Altamirano, A. (2025). Every Pixel You Take: Unlocking Urban Vegetation Insights Through High-and Very-High-Resolution Remote Sensing. Urban Science, *9*(9), 385. |
| [11] | Chai, C. (2025). Analysis of agricultural production efficiency improvement and economic sustainability based on multi-source remote sensing data. Frontiers in Environmental Science, *13*, 1546643. |
| [12] | Das, N. (2025). High Resolution Remote Sensing and Ecosystem Modelling for Climate Resilient Forests: A Review of Digital Twin Approaches. |
| [13] | Dinesh, N. S. V., & Sivasankar, P. (2026). Technologies for reducing emissions in upstream operations. In Decarbonizing the Petroleum Industry (pp. 131-165). Elsevier. |
| [14] | Duggal, K., Agarwal, S., Pak, W., & Singh, K. (2025). Making smart cities restorative: putting restoration ecology before technology. Restoration Ecology, e70210. |
| [15] | Duren, R., Cusworth, D., Ayasse, A., Howell, K., Diamond, A., Scarpelli, T., … & Green, R. O. (2025). The Carbon Mapper emissions monitoring system. EGUsphere, *2025*, 1-41. |
| [16] | Dwivedi, P., & Islam, B. (2025). Balancing performance and energy efficiency: the method for sustainable deep learning. The Journal of Supercomputing, *81*(10), 1-27. |
| [17] | Egbuna, I. K., Agboro, H., Nwachukwu, O. O., George, F. E., Asere, J. B., & Ogunkanmi, S. A. (2025). Artificial Intelligence for predictive analysis, efficiency improvement and reduction in carbon footprint during decommissioning and site remediation in oil and gas fields. |
| [18] | Farooq, O., & Khan, A. (2025). The Impact of Machine Learning on Climate Change Modeling and Environmental Sustainability. Artificial Intelligence and Machine Learning Review, *6*(1), 8-16. |
| [19] | Galdelli, A., Narang, G., Pietrini, R., Zazzarini, M., Fiorani, A., & Tassetti, A. N. (2025). Multimodal AI-enhanced ship detection for mapping fishing vessels and informing on suspicious activities. Pattern Recognition Letters, *191*, 15-22. |
| [20] | Ghafari, R., & Samaei, S. R. (2025, April). Integrated AI and digital twin technologies for green project management in resilient coastal and port infrastructure systems. In Proceedings of the Third International Conference on Advanced Research in Civil Engineering, Architecture, and Urban Planning, Munich, Germany (Vol. 21). |
| [21] | Ghamisi, P., Yu, W., Marinoni, A., Gevaert, C. M., Persello, C., Selvakumaran, S., … & Atkinson, P. M. (2025). Responsible Artificial Intelligence for Earth Observation: Achievable and realistic paths to serve the collective good. IEEE Geoscience and Remote Sensing Magazine. |
| [22] | Gopal, S., & Pitts, J. (2025). Satellite remote sensing: pioneering tools for environmental insight and sustainable investment. In The FinTech Revolution: Bridging Geospatial Data Science, AI, and Sustainability (pp. 275-315). Cham: Springer Nature Switzerland. |
| [23] | Gupta, V. P., Haghi, A. K., & Yadav, A. (2025). Green IoT and AI for Sustainable Development of Smart Cities. |
| [24] | Islam, F. S. (2025). The Convergence of AI and Nature: Advancing Carbon Dioxide Capture, Removal, and Storage Technologies through Integrated Ecosystem-Based Strategies. International Journal of Applied and Natural Sciences, *3*(1), 90-130. |
| [25] | Jaramillo, J. M. G., Rivero, D. P., & Jadán-Guerrero, J. (2025). Intelligent Environmental Monitoring: Business Intelligence and AI Framework for Ecological Decision-Making Using Public Sustainability Data. |
| [26] | Kaczmarek, A., & Blachowski, J. (2025). Remote Sensing Perspective on Monitoring and Predicting Underground Energy Sources Storage Environmental Impacts: Literature Review. Remote Sensing, *17*(15), 2628. |
| [27] | Kazanskiy, N., Khabibullin, R., Nikonorov, A., & Khonina, S. (2025). A Comprehensive Review of Remote Sensing and Artificial Intelligence Integration: Advances, Applications, and Challenges. Sensors, *25*(19), 5965. |
| [28] | Khaldi, Z., Weng, J., Antezana Lopez, F. P., Zhou, G., Ghedjatti, I., & Ali, A. (2025). PyGEE-ST-MEDALUS: AI Spatiotemporal Framework Integrating MODIS and Sentinel-1/-2 Data for Desertification Risk Assessment in Northeastern Algeria. Remote Sensing, *17*(19), 3350. |
| [29] | Khan, S., Hussain, M. Z., & Hasan, M. Z. (2025). Intelligent Sustainability: Harnessing AI for a Greener Future. Dialogue Social Science Review (DSSR), *3*(2), 1233-1254. |
| [30] | Kumar, R., Pandey, S., Saxena, A., Tiwari, P., & Mishra, D. (2025). Sustainable Solutions through AI in Environmental Engineering: Monitoring, Modeling, and Mitigation Strategies. GRJESTM, *1*(3), 125-134. |
| [31] | Liang, X., Yu, S., Ju, Y., Wang, Y., & Yin, D. (2025). Multi-Scale Remote-Sensing Phenomics Integrated with Multi-Omics: Advances in Crop Drought–Heat Stress Tolerance Mechanisms and Perspectives for Climate-Smart Agriculture. Plants, *14*(18), 2829. |
| [32] | Liang, X., Yu, S., Meng, B., Wang, X., Yang, C., Shi, C., & Ding, J. (2025). Multi-Source Remote Sensing and GIS for Forest Carbon Monitoring Toward Carbon Neutrality. Forests, *16*(6), 971. |
| [33] | Liang, X., Yu, S., Meng, B., Wang, X., Yang, C., Shi, C., & Ding, J. (2025). Multi-Source Remote Sensing and GIS-Driven Forest Carbon Monitoring for Carbon Neutrality: Integrating Data, Modeling, and Policy Applications. |
| [34] | Lioumbas, J., Spahos, T., Christodoulou, A., Mitzias, I., Stournara, P., Kavouras, I., … & Papadopoulos, A. (2025). Multi-Component Remote Sensing for Mapping Buried Water Pipelines. Remote Sensing, *17*(12), 2109. |
| [35] | Lu, Y., Chiu, J. C., Khanal, N., Chen, S. C., Guo, Q., Liu, D., … & Chen, Y. V. (2025). VidFin: A Video-based Monocular UAV Pipeline for Automatic Forest Inventory in Diverse Scenes. Smart Agricultural Technology, 101504. |
| [36] | Ma, J., Zhou, Y., & He, L. (2025). Artificial Intelligence in Natural Carbon Sink Research: A Scientometric Review and Evolutionary Analysis (2001-2025). |
| [37] | Marella, B. C. C., & Palakurti, A. (2025). Harnessing Python for AI and Machine Learning: Techniques, Tools, and Green Solutions. In Advancing Social Equity Through Accessible Green Innovation (pp. 237-250). IGI Global Scientific Publishing. |
| [38] | Marinou, E., Magkoufis, E., Kontopoulos, C., & Charalampopoulou, V. (2025, September). Innovative coastal management: leveraging AI and satellite imagery for monitoring urban and port environments within the OCEANIDS project. In Eleventh International Conference on Remote Sensing and Geoinformation of the Environment (RSCy2025) (Vol. 13816, pp. 327-343). SPIE. |
| [39] | Mishra, H., & Dwivedi, S. Remote Sensing for Sustainable Agriculture: A Machine Learning Approach to Optimizing Farm Yield and Economic Returns. In Artificial Intelligence and Computer Vision for Ecological Informatics (pp. 116-150). CRC Press. |
| [40] | Mishra, R. K., Mishra, D., & Agarwal, R. (2025). Smart Land Use Planning: Integrating AI, GIS, and Remote Sensing for Sustainable Development. |
| [41] | Nipa, M. N. (2025). Al for a Greener Tomorrow: Harnessing Artificial Intelligence. Leveraging AI for Inclusive and Equitable Development, 227. |
| [42] | O’Farrell, J., O’Fionnagáin, D., Babatunde, A. O., Geever, M., Codyre, P., Murphy, P. C., … & Golden, A. (2025). Quantifying the impact of crude oil spills on the Mangrove ecosystem in the Niger Delta using AI and Earth observation. Remote Sensing, *17*(3), 358. |
| [43] | Olatunbosun, A., Ukasoanya, F. C., & Eleshin, M. A. (2025). Artificial Intelligence in Climate Change Mitigation and Adaptation: A Review of Emerging Technologies and Real-World Applications. Global Journal of Engineering and Technology Advances, *24*(02), 235-250. |
| [44] | Priya, R., Bhandari, R. R., Nitnaware, V. N., Ramesh, P. N., Sivakumar, R., & Dhanapal, R. (2025). Designing Remote-Sensed Intelligent Visual Analytics Algorithms for Environmental Monitoring Systems. Journal of Applied Science and Technology Trends, 01-14. |
| [45] | Raja Segaran, B., Mohd Rum, S. N., Hafez Ninggal, M. I., & Mohd Aris, T. N. (2025). Efficient ML technique in blockchain-based solution in carbon credit for mitigating greenwashing. Discover Sustainability, *6*(1), 281. |
| [46] | Runkle, B. R., Barnes, M., Dannenberg, M., Gamon, J. A., Magney, T., Pierrat, Z., … & Woodgate, W. (2025). Near-surface remote sensing applications for a robust, climate-smart measurement, monitoring, and information system (MMIS). Carbon Management, *16*(1), 2465361. |
| [47] | Saki, M., Keshavarz, R., Franklin, D., Abolhasan, M., Lipman, J., & Shariati, N. (2025). A Data-Driven Review of Remote Sensing-Based Data Fusion in Precision Agriculture from Foundational to Transformer-Based Techniques. IEEE Access. |
| [48] | Sasaki, N., & Abe, I. (2025). A Digital Twin Architecture for Forest Restoration: Integrating AI, IoT, and Blockchain for Smart Ecosystem Management. Future Internet, *17*(9). |
| [49] | Seelam, D. R., Kidiyur, M. D., Whig, P., Gupta, S. K., & Balantrapu, S. S. (2025). Integrating artificial intelligence in blue-green infrastructure: enhancing sustainability and resilience. In Integrating Blue-Green Infrastructure Into Urban Development (pp. 347-372). IGI Global Scientific Publishing. |
| [50] | Tripathi, A., Upadhyay, P., & Goel, P. K. (2025). Geospatial Data Analysis for Mapping Carbon Sequestration Hotspots. In Advanced Systems for Monitoring Carbon Sequestration (pp. 193-218). IGI Global Scientific Publishing. |
| [51] | Walker, W. D. S. (2025). 3D Modeling and Automated Claim Systems on Carbon-Efficient Cloud with Optimized QA in Multi-Team Software Development. International Journal of Computer Technology and Electronics Communication, *8*(6), 11625-11630. |
| [52] | Weerakitikul, B., Koedsin, W., Ritchie, R. J., Kokkaew, E., & Chan, J. C. W. (2025). Multi-sensor remote sensing approach for oil palm mapping and stand age detection using 38-year landsat and sentinel time series data in the google earth engine. Geomatica, 100070. |
| [53] | Whig, P., Gupta, S. K., Nadikattu, R. R., & Sharma, P. (2025). Application of AI in Environmental Sustainability. Artificial Intelligence‐Driven Models for Environmental Management, 1-41. |
| [54] | Xu, Z., & Jiang, D. (2025). AI-Powered Plant Science: Transforming Forestry Monitoring, Disease Prediction, and Climate Adaptation. Plants, *14*(11), 1626. |
| [55] | Yazid, H., & Gaci, S. (2025). Exploring the Depths: Satellite Image Processing and Artificial Intelligence in the Oil and Gas Industry. Geophysical Exploration for Hydrocarbon Reservoirs, Geothermal Energy, and Carbon Storage: New Technologies and AI‐based Approaches, 259-280. |
| [56] | Yu, B., & Isleem, H. F. (2025). Deep learning applications for multispectral remote sensing analysis of carbon sequestration dynamics in restored forest landscapes. Mitigation and Adaptation Strategies for Global Change, *30*(7), 1-28. |
| [57] | Zhang, C., & Li, X. (2025). AI-Enhanced Remote Sensing of Land Transformations for Climate-Related Financial Risk Assessment in Housing Markets: A Review. Land, *14*(8), 1672. |








