Transforming air pollution management in India with AI and machine learning technologies

Transforming air pollution management in India with AI and machine learning technologies

Consequences of air pollution in India

Air pollution in India specially in metropolitan cities has dire consequences for public health, stemming from increased levels of particulate matter, nitrogen oxides, and various pollutants. This increase pollution level is consistently linked to increased respiratory diseases, particularly asthma, chronic obstructive pulmonary disease (COPD), and bronchitis7,44. Children, with developing respiratory systems, are particularly vulnerable to irreversible health issues upon prolonged exposure, while the elderly, with compromised immune systems, face pre-eminent risks, including deep lung penetration, inflammation, and enduring damage caused by PM2.5. Beyond respiratory implications, air pollution has severe cardiovascular consequences, with nitrogen oxides significantly contributing to an increased risk of heart attacks and strokes, leading to heightened cardiovascular mortality with prolonged exposur7. The significant study conducted by the CPCB in Delhi highlighted robust correlations between air quality levels and negative health effects. Comparative analysis against a rural control population in West Bengal indicated a 1.7-fold higher occurrence of respiratory symptoms in Delhi, emphasizing the direct impact of air quality on public health20,45,46,47. Odds ratios for upper and lower respiratory symptoms were 1.59 and 1.67, respectively, emphasizing the profound impact of air pollution. The study also highlighted a significantly higher prevalence of current and physician-diagnosed asthma in Delhi, with lung function notably reduced in 40.3% of Delhi’s participants compared to 20.1% in the control group20.

In addition to respiratory effects, non-respiratory impacts were observed in the cities as compared to rural controls. The prevalence of hypertension was notably higher in cities (36% vs. 9.5% in controls), correlating positively with respirable suspended particulate matter (PM10) levels in ambient air48. Chronic headaches, eye irritation, and skin irritation were significantly more pronounced in most of the cities. Community-based studies consistently affirm the association between air pollution and respiratory morbidity. Studies focusing on indoor air pollution reveal similar correlations with respiratory morbidity, extending to conditions such as attention-deficit hyperactivity disorder in children, increased blood levels of lead, and decreased serum concentration of vitamin D metabolites49. Beyond health impacts, the environmental consequences of air pollution are profound. Pollutants harm plants and animals, disrupt ecosystems, and lead to biodiversity loss50. The issue extends beyond health and the environment, impacting economics and society, straining healthcare, productivity, and social equity, demanding holistic strategies spanning economic, social, and environmental facets making it imperative, in this crisis, to understand the existing and potential remediation techniques51.

The economic and social ramifications are substantial, with healthcare costs soaring as the incidence of pollution-related illnesses rises7. Treating respiratory and cardiovascular diseases places a significant burden on the healthcare system, affecting both public and private healthcare expenditures44. Air pollution in India incurred an estimated economic toll of $95 billion in 2019, amounting to 3% of the country’s GDP, attributable to decreased productivity, increased work absences, and premature fatalities52. The economic implications of air pollution extend beyond direct healthcare costs, affecting labor markets and overall productivity53. Social disparities are accentuated by air pollution, with vulnerable communities facing disproportionate exposure to pollutants. Factors such as socio-economic status, access to healthcare, and geographic location contribute to disparities in exposure and health outcomes54. Addressing these social dimensions is crucial for devising equitable solutions that prioritize environmental justice. As India grapples with the immediate consequences of air pollution, emerging challenges require attention. Also, climate change exacerbates existing issues, influencing weather patterns and contributing to the persistence of stagnant air masses that trap pollutants and their transportation mechanism8. The increasing frequency of extreme weather events further complicates pollution dynamics55. Moreover, the complex interplay of indoor and outdoor air pollution adds another layer of complexity, with indoor air pollution often stemming from household activities such as cooking with solid fuels, compounding the overall burden on public health49. However, government policies and initiatives take center stage in this exploration, with regulatory measures, such as emission standards and vehicle restrictions, scrutinized for their effectiveness and implementation challenges12. Sustainable urban planning, including the creation of green spaces and transportation planning for pollution reduction, is examined as a proactive approach to mitigate pollution at its source56. Technological solutions, ranging from air purifiers to pollution monitoring devices, are also evaluated57. The challenges of scalability, accessibility, and integration into existing infrastructure are dissected to discern the practicality and potential impact of these technologies. Emerging technologies and global collaborations are explored as potential catalysts for change57,58.

Contributors to air pollution in India

Air pollution in India is a complex issue with multiple sources and contributors, as highlighted by various studies conducted by Lalchandani et al.59, Tobler et al.60, Rai et al.61, Talukdar et al.62 and Wang et al.63. The sources and contributors to air pollution can be broadly categorized into particulate matter (PM2.5 and PM10), organic aerosols (OAs) including black carbon (BC), water-soluble brown carbon (WS-BrC), and volatile organic compounds (VOCs). Each of these components plays a signifsicant role in the overall air quality of the region.

Particulate matter (PM)

Particulate matter is a key component of air pollution, and Lalchandani et al.59 conducted studies using the Positive Matrix Factorization (PMF) model to identify and apportion different sources of PM. The sources identified included traffic-related emissions, dust transportation, solid-fuel burning emissions, and secondary factors62,64. Traffic-related emissions in metropolitan cities were found to be the significant contributor to the total concentration of PM, for example, at the IIT Delhi site, emphasizing the impact of vehicular activities on air quality. Additionally, solid fuel burning emissions, often associated with residential cooking and heating, were identified as a major contributor to PM, particularly at night62. Rai et al.61 conducted source apportionment of elements in PM10 and PM2.5, identifying nine source profiles/factors, including dust, non-exhaust sources, solid fuel combustion, and industrial/combustion aerosol plume events. The contribution of anthropogenic sources to elements associated with health risks, such as carcinogenic elements. The geographical origins of these sources were also determined, emphasizing the regional and local influences on element concentrations in atmosphere65.

Organic aerosols (OAs)

Organic aerosols are another crucial component of air pollution, and studies by Tobler et al.60 and Lalchandani et al.62 revealed three main components of OAs: solid fuel combustion OAs (SFC OAs), hydrocarbon-like OA (HOAs) from vehicular emissions, and oxygenated OAs (OOAs). Lalchandani et al.65 further categorized these components into sub-factors, providing a detailed understanding of the OA composition. Emissions stemming from traffic emerged as the primary contributor to the overall OA mass, underscoring the profound influence of vehicular pollution59.

Black carbon (BC)

BC, a product of incomplete combustion, was studied by Using the Absorption Ångström Exponent (AAE) method, contributions from biomass burning and vehicular emissions were apportioned66. Vehicular emissions were found to be a dominant source of BC, contributing around 67.5% 62,67. The distinction between BC and brown carbon (BrC), which absorbs light in the near-UV to visible region, was also discussed, highlighting the need to consider multiple light-absorbing aerosols in air quality assessments.

Water-soluble brown carbon (WS-BrC)

Rastogi et al.68 performed a PMF analysis of WS-BrC spectra, identifying six factors representing specific sources of BrC. The study revealed diurnal variability in BrC absorption, with factors associated with different emission sources. The presence of secondary BrC was indicated, suggesting the importance of atmospheric processes in the formation of brown carbon. This finding adds another layer of complexity to the sources of light-absorbing aerosols in the atmosphere69.

Volatile organic compounds (VOCs)

Wang et al.63 investigated the characteristics and sources of VOCs, identifying six factors related to traffic, solid fuel combustion, and secondary sources. Traffic-related emissions were found to be the dominant source of VOCs at the urban site, while at the suburban site (MRIIRS), contributions from secondary formation and solid fuel combustion were more significant. The study highlighted the major role of anthropogenic sources in VOC pollution70.

Current remediation techniques

India has faced escalating challenges in managing air pollution over the years, necessitating the implementation of diverse remediation techniques. Figure 2 illustrates the legislative evolution of air quality management in India across three eras: Pre-Internet (1905–89), Transition (1990–99), and Internet Era (2000 onwards). This timeline showcases key acts and regulations implemented over time to address air pollution. The bottom timeline highlights the progression of NAAQS in India, from monitoring just 3 pollutants in 1982 to 7 in 1994, and 12 in 2009. The latest phase (2019–24) involves a comprehensive review of air quality standards under the National Clean Air Programme (NCAP) in 2019, demonstrating India’s ongoing commitment to improving air quality management.

Fig. 2
figure 2

Legalisation and Evaluation of NAAQS in India12.

Legislation and regulatory measures

India’s legislative landscape has evolved significantly to address air pollution. The introduction of key acts such as the Air (Prevention and Control of Air Pollution) Act in 1981 and subsequent amendments empowered central and state pollution control boards to handle severe air pollution emergencies71. The Environment (Protection) Act of 1986 served as an umbrella act for environmental protection, while the Motor Vehicles Act has been periodically amended to regulate vehicular pollution72. Recent developments include the Motor Vehicles (Amendment) Bill of 2019, allowing the government to recall vehicles causing environmental harm73. The establishment of institutions like the National Green Tribunal (NGT) and the National Environment Tribunal reflects a commitment to environmental accountability74.

National ambient air quality standards (NAAQS) and air quality index (AQI)

The formulation and periodic revision of National Ambient Air Quality Standards (NAAQS) have been pivotal in regulating air quality18. Beginning in 1982, the Central Pollution Control Board (CPCB) introduced NAAQS, initially covering SO2, NO2, and SPM47. Subsequent amendments expanded the list to include RSPM, Pb, NH3, and CO75. The National Air Quality Index (NAQI) was introduced to enhance public awareness, categorizing air quality into six levels from ‘Good’ to ‘Severe’76. This index, based on the concentration of eight pollutants, guides interventions for improved air quality.

Air pollution monitoring network

India’s air quality monitoring network has witnessed substantial growth. The initiation of the National Ambient Air Quality Monitoring (NAAQM) Network in 1984, expanded to the National Air Quality Monitoring Programme (NAMP), marked a critical step77. The network, comprising both manual and Continuous Ambient Air Quality Monitoring System (CAAQMS) stations, now stands at 1082 locations78,79. Real-time monitoring, as exemplified by CAAQMS, provides valuable data for prompt decision-making. The introduction of the System of Air Quality and Weather Forecasting and Research (SAFAR) further enhances forecasting capabilities80.

Evolution of studies on emission load

Emission inventories, critical for formulating air pollution control policies, have evolved over time. Initiatives by CSIR-NEERI and CPCB in the late twentieth century laid the foundation12. Emission inventory data, collected through GIS, has become integral in mapping pollution sources and understanding spatial distribution81. The Air Pollution Knowledge Assessments (APnA) city program and organizations like TERI contribute to city-specific inventories82. The emphasis on utilizing secondary data streamlines the process, enabling the creation of comprehensive databases for national and urban pollution inventories. The secondary data refers to datasets that include emission loads from various sources such as vehicular emissions, industrial outputs, construction activities, residential heating, and biomass burning83.

Management strategies and control policies

India’s air pollution management strategies encompass a multifaceted approach, with a blend of judicial interventions and executive actions.

Judicial interventions

The judiciary, particularly through petitions filed by M.C. Mehta, has been instrumental in setting guidelines and policies84. For instance, interventions in the Taj Trapezium Zone and the oversight of air quality management plans for non-attainment cities by the National Green Tribunal (NGT) are notable74. The judiciary has played a significant role in shaping policies for better governance and legislation.

Executive actions

Several executive measures contribute to air pollution control. The Auto Fuel Policy, initiated in 2003 and updated in 2014, addresses vehicular emissions85. Emphasis on alternative fuels, as seen in the National Auto Fuel Policy and the Pradhan Mantri Ujwala Yojana (PMUY) for subsidized LPG connections, aligns with cleaner fuel initiatives86. Stricter emission standards for thermal power plants and the push for Hybrid and Electric Vehicles (EVs) under schemes like Faster Adoption and Manufacturing of Hybrid & Electric Vehicles (FAMHE) contribute to pollution reductions87.

AI&ML Techniques for addressing and forecasting air pollution

Overview of AI&ML models

Various AI&ML techniques, such as ANN, Fuzzy logic (FL), Support Vector Machine (SVM), Convolutional Neural Network (CNN), Recurrence Neural Network (RNN), Long Short-Term Memory (LSTM), Convolutional Autoencoder (CA) etc., are commonly used in previous studies to predict and forecast earth and atmospheric variables8,25,88,89,90,91 (Table 1). AI&ML models have become pivotal in processing and simulating non-linear information, with a notable focus on ANNs92. ANNs emulate the human nervous system, comprising interconnected neurons that collectively address a spectrum of challenges, from function approximation to clustering and optimization93. The three-stage process involved in ANN modelling, encompassing design, training, and validation, underscores its versatility92. During the design phase, crucial parameters such as architecture, layers, neurons, and learning algorithms are thoroughly chosen94. Training involves iterative adjustments of synaptic weights to minimize errors, while validation gauges the network’s generalization performance for unknown data.

Table 1 Different AI&ML models with target pollutants.

Multilayer Perceptron (MLPs), a prominent type of ANN, have proven effective in predicting atmospheric pollution events. Typically featuring input, hidden, and output layers, MLPs can adapt to complex patterns by incorporating multiple hidden layers92. Configuring neurons in the hidden layers is of utmost importance, as an incorrect count can lead to over-fitting or under-fitting. Techniques like thumb rule and trial and error, network reduction offer solutions to optimize neuron numbers. FL, another AI technique, operates on a different paradigm by assigning truth values in a range. Developed from fuzzy set theory, it accommodates linguistic variables, making it adept at handling uncertainty in natural language statements. Fuzzy logic’s three main phases—fuzzification, inference, and defuzzification—form a robust modelling system capable of addressing nuanced problems. SVM are popular for supervised learning, excelling in classification, prediction, density estimation, and pattern recognition. SVM seeks an optimal hyperplane to segregate data into predefined classes, with kernel functions playing a pivotal role in introducing non-linearity.

Deep Neural Networks (DNNs) represent an advanced version of ANNs, characterized by structural depth and scalability8. DNNs, with more than three layers, can automatically extract features from raw inputs, known as feature learning. Notable architectures within DNNs, such as CA, LSTM, CNNs and RNNs have demonstrated superior performance, especially in air pollution forecasting. The training of DNNs demands significant computational power, leading to advancements in processing capabilities and the development of sophisticated algorithms. Overcoming challenges like vanishing gradient and overfitting has prompted the application of advanced algorithms like SVM, RF, Greedy layer-wise, and Dropout. The application of these models extends across various domains due to their versatility and robust performance. The modelling of complex atmospheric variables such as air pollution forecasting, LSTM, CA, and CNNs emerge as particularly effective and popular architectures.

Application of AI&ML in addressing and forecasting air pollution

The application of AI&ML models, particularly ANNs, FL, SVM and DL models, have emerged as a crucial tool in addressing and forecasting air pollution. ANNs have helped in a transformative era in air pollution forecasting, with a diverse range of applications capturing the attention of researchers. Numerous studies attest to the success of ANNs in predicting both particulate and gaseous pollutants with desired accuracy over various spatio-temporal resolution. The early forays into air pollution forecasting by Mlakar et al.95 marked a significant milestone, employing a trained nonlinear three-layered back propagation feed forward network. This model successfully predicted the concentration of SO2 over a thermal power plant, showcasing the potential of ANNs. Subsequent research expanded the scope and sophistication of ANN applications. Similarly, Arena et al.96 demonstrated the efficacy of multi-layer perceptron in predicting concentration of SO2 over an industrial area, emphasizing the model’s accuracy across diverse weather conditions. Sohn et al.97 extended the ANN approach to model multiple pollutants, including NO, SO2, NO2, CO, O3, CH4 and total hydrocarbons. The results indicated reasonable accuracy within a limited prediction range, highlighting the need for further optimization by incorporating additional weather-related input parameters. The application of ANNs in gaseous pollutants forecasting continued with studies by Slini et al.98 and Kandya99 both emphasizing the importance of optimizing input parameters for improved accuracy. Comparative assessments with other forecasting techniques consistently positioned ANNs as superior for gaseous pollutants. Chaloulakou et al.100 found that ANN outperformed Multiple Linear Regression (MLR) in predicting ozone concentrations, showcasing the model’s superior accuracy. Similar findings were reported by Mishra and Goyal101, compared Principal Component Analysis (PCA)-based ANN model with MLR for estimating the concentrations of NO2. In the realm of particulate matter forecasting, ANNs have proven equally effective. Fernando et al.102 successfully used multi-layered MLP to predict PM10 concentrations, considering parameters such as hourly meteorological data, particulate, matter with statistical indicators. Grivas and Chaloulakou103 employed an ANN model for hourly PM10 predictions, showcasing consistent accuracy even in the presence of noisy datasets. The versatility of ANNs extends to predicting roadside contributions to PM10 concentrations, as demonstrated by Suleiman et al.104. Comparative studies with other models have affirmed the efficacy of ANNs in particulate matter forecasting. Zhang et al.105 utilized BPANN to forecast the concentrations of PM10 and found BPANN outperforming other models in predictive accuracy. Paschalidou et al.106 evaluated the multi-layer perceptron-based ANN those models provided superior results compared to Radial Basis Function models, establishing the former’s dominance in terms of forecasting capability. Contrasting trends were observed in certain studies, such as those by Mishra et al.107 and Moisan et al.108, where alternative models outperformed ANN during extreme events. This highlights the nuanced nature of model performance, with specific conditions favouring different approaches. However, recent progress has witnessed researchers utilizing ensemble methods to improve both the stability and accuracy of ANN models. Liu et al.109 combined Wavelet Packet Decomposition (WPD), Particle Swarm Optimization (PSO), and BPNN to create an ensemble model for PM2.5 forecasting, demonstrating superior precision compared to individual models.

FL, renowned for its capacity to manage uncertainty, enhanced fault tolerance, and adeptness in handling highly complex nonlinear functions, has garnered extensive adoption in the realm of air pollution prediction. The advantages of FL are exemplified in various studies. For example, Chen et al.110 innovatively introduced a novel fuzzy time series model specifically for O3 prediction, showcasing its superior performance when compared to traditional fuzzy time series models. Jain and Khare111 applied a neuro-fuzzy model to predicts the concentration of CO in Delhi, achieving accurate estimates at complex urban levels. Carbajal-Hernández et al.112 predicts air quality in Mexico City by utilising FL model alongside autoregression model and signal processing. The introduction of a novel algorithm, the “Sigma operator,” allowed for precise evaluation of air quality variables, showcasing the effectiveness of fuzzy-based models. Moreover, Al-Shammari et al.113, evaluates stochastic and FL-driven models to estimate the daily maximum concentrations of O3. The findings indicated that the FL-based model exhibited a marginal superiority over the statistical model particularly in instances of severe pollution events. Innovative approaches like the Fuzzy Inference Ensemble (FIE), as proposed by Bougoudis et al.114, demonstrated high accuracy in air pollution forecasting for Athens. Another significant application was presented by Song et al.115, where different probability density functions were employed to enhance particulate matter (PM) forecasting. They developed an adaptive neuro-fuzzy model, emphasizing the importance of density functions in addressing uncertainty associated with future PM trends. Furthermore, Wang et al.116 presented a hybrid model for forecasting air pollution. This model merges uncertainty analysis with fuzzy time series, demonstrating precision in predicting PM and NO2 concentrations. Behal and Singh117 leveraged FL within an intelligent IoT sensor framework to monitor and simulate benzene, demonstrating satisfactory statistical efficacy in recent advancements. The versatility of fuzzy logic extends to unconventional pollutants as demonstrated by Arbabsiar et al.118, who modelled the leakage of CH4 and H2S using a fuzzy inference technique. The suggested model demonstrated satisfactory performance when evaluating these contaminants.

Support Vector Machines (SVM), when combined with other machine learning algorithms, have been helpful in forecasting diverse types of pollutants. Feng et al.119 compared SVM with other models for forecasting daily maximum concentrations of O3 in Beijing, highlighting its stable and accurate performance. Yeganeh et al.120 assessed the efficacy of a forecasting model utilizing SVM integrated with Partial Least Squares (PLS) for the prediction of CO concentrations, demonstrating positive outcomes. García Nieto et al.121 conducted a comparative analysis of various prediction models for PM10 concentrations, determining that the SVM method exhibited superior accuracy and robustness. Luna et al.122 utilized Principal PCA in combination with SVM and ANN for the prediction of O3 levels in Rio de Janeiro. Their study specifically investigated the influence of meteorological parameters on the concentrations of O3. Wang et al.123 proposed hybrid adaptive forecasting models combining SVM and ANN for predicting PM10 and SO2, demonstrating superior performance compared to individual models. FL and SVM in the forecasting air pollution levels have proven to be highly effective in addressing the complexities and uncertainties associated with predicting pollutant concentrations.

While still in its early stages, the potential of DNNs in this domain is evident from a review of various applications such as forecasting of variables in earth and atmospheric sciences. Early on, Freeman et al. (2018) employed a combination of Long Short-Term Memory (LSTM) and Recurrent Neural Networks (RNN) to predict ozone concentrations in an urban area. While showing strong predictability in 8 h average ozone concentrations, various model runs revealed overfitting concerns, underscoring the necessity for further refinement. Wang and Song125 introduced an ensemble method using a deep LSTM network with fuzzy c-means clustering for air quality forecasting. This ensemble approach outperformed individual models, showcasing its efficacy in both short-term and long-term predictions. Zhou et al.126 explored the application of LSTM and deep learning algorithms for multi-step ahead forecasting of PM2.5, PM10, and NOx. Their deep learning architecture, integrating dropout neurons and L2 regularization, demonstrated exceptional capabilities in capturing variations in the processes of air pollutant generation. Recent research highlights the growing preference for employing deep neural networks to capture dynamic spatiotemporal features from historical air quality and climatological datasets. Fan et al.91 introduced stacked LSTM (LSTME), spatiotemporal deep learning (STDL), time delay neural network (TDNN), autoregressive moving average (ARMA), and support vector regression (SVR) for modelling of air pollutants over different spatiotemporal resolutions. The inclusion of auxiliary inputs resulted in a model with exceptional performance, outshining other machine learning techniques. Soh et al.127 proposed a STDL integrating ANN, CNN, and LSTM for PM2.5 prediction. The model exhibited stability over extended time periods, with noise reduction achieved through Airbox sensor source models, further enhancing prediction accuracy. Qi et al.128 presented a novel forecasting approach employing a fusion of Graph Convolutional and LSTM (GC-LSTM) neural networks, aiming to investigate spatial interdependence within air quality data. The spatial correlation modelling highlighted the consistency of the GC-LSTM model for short-term forecasting, suggesting potential improvements for long-term predictions with enhanced spatiotemporal considerations. Fan et al.91 developed a LSTM-based deep–RNN for predicting PM2.5 for different spatiotemporal frames showcasing superior specificity measures compared to baseline models. In a novel approach, Li et al.129 and Zhang et al.130 incorporated large-scale datasets of graphical images for air pollution estimation, utilizing CNN. The models, trained on images capturing various atmospheric conditions, demonstrated improved prediction accuracy, emphasizing the adaptability of deep learning to diverse data types. These models offer robust solutions, demonstrating superior performance in various studies and showcasing their potential to contribute significantly to the field of environmental monitoring and public health.

Performance analysis

The evaluation is based on the comparison of their performances using statistical measures such as RMSE and R2, widely accepted metrics in air pollution forecasting studies. Previous research, utilizing a range of datasets, has yielded disparate results134. While certain studies advocate for ensemble methods, others find negligible disparities in the overall accuracy of the outcomes. The efficacy of AI and ML-driven methodologies relies heavily on the precise curation of influential parameters, especially when addressing various pollutants such as PM, O3, NO2, SO2, and CO29. For example, for PM forecasting, critical elements such as precipitation, pressure, humidity, land utilization, wind speed and direction, traffic flow on roads, and population density exert significant influence. Similarly, different influential parameters are identified for SO2, NO2, O3, and CO, emphasizing the importance of tailoring models to specific pollutants. The precision of the methods is notably impacted by the direct correlation between these factors and forecasted levels of pollutants. Additionally, the efficacy of AI&ML models hinges upon variables including network structure, intricacy, learning algorithms, correspondence between input and output information, and the presence of data interference. A comprehensive analysis shows the varying performances of DNN, SVM, ANN, and Fuzzy techniques across different pollutants. DNNs emerge as particularly effective in forecasting PM concentrations, outperforming other techniques with R2 and mean RMSE values of 0.96 and 7.27 μg/m3, respectively91,126,133. In O3 prediction, SVM, FL and DNN exhibit superior accuracy, with DNNs once again leading with R2 and mean RMSE values of 0.92 and 3.51 μg/m3, respectively119,120. SVM excels in forecasting NO2 concentrations, although Fuzzy and DNN techniques also demonstrate reasonable accuracy116,118,131. Notably, the DNN approach consistently stands out, showcasing the best statistical performance for O3 and CO categories. For CO, DNN achieves an exceptional RMSE of 0.69 × 10–5 ppm and an R2 of 0.95119,120,124,125. The overall analysis represents the superiority of DNN across all pollutants, with the lowest overall RMSE score of 5.68. However, despite DNN’s dominance, it is crucial to note the underdeveloped application of ensemble methodologies based on DL models for the forecasting of air pollution131,135,136. These approaches, involving multiscale spatiotemporal predictions, have untapped potential to further advance the field, incorporating more explanatory variables to represent air pollution episodes with robust dynamical forcing. The DNN emerges as the leading AI&ML system for the forecasting and prediction of air pollution based on statistical evidence, the exploration of ensemble approaches presents an avenue for future developments in enhancing predictive accuracy.

Prediction of PM2.5 concentrations

The study used a convolutional autoencoder (CA) for analysing PM2.5 concentrations. The dataset was divided into training (70%), testing (20%), and validation (10%) sets, trained over 30 epochs (Fig. 3). This PM2.5-focused CA processes sequences of ten consecutive images, using acquired features to reconstruct subsequent images. The visual representation of the model’s capabilities includes sequences of 10 input images, their corresponding 11th ground truth, and the model’s predictions (Fig. 4). The model demonstrates promising performance in predicting PM2.5 concentration patterns across India. Comparing the actual 11th image with the predicted one reveals that the model successfully captures the broad spatial distribution of PM2.5 concentrations. Key findings show that the model accurately predicts high concentration areas in the northern regions, particularly in the IGP (Fig. 4). It also effectively represents lower concentrations in southern and eastern coastal areas. The model captures the general gradient from northwest to southeast quite effectively. The prediction tends to slightly overestimate PM2.5 levels in the northwestern region. Additionally, some localized high-concentration areas in central India are not fully captured in the prediction. Furthermore, the model’s prediction shows a smoother distribution compared to the more granular actual data. (Fig. 4). Performance evaluation employed established image quality metrics: Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR) and Mean Squared Error (MSE) (Fig. 5). SSIM, which assesses image similarity, predominantly ranged from 0.50 to 0.70 during training, slightly lowering to 0.45 to 0.55 during testing, and stabilizing at 0.50 to 0.60 in validation. PSNR peaked at 25 to 30 dB during training, followed by 24 to 28 dB in testing, and 28 to 30 dB in validation. Lower MSE values (10 to 15 µg/m3 in training, 10 to 20 µg/m3 in testing, and 8 to 11 µg/m3 in validation) signify improved accuracy at the pixel level.

Fig. 3
figure 3

RMSE loss during the training, testing and validation phase.

Fig. 4
figure 4

Example set for predicting the 11th image of PM2.5 by providing a batch of 10 images of concentration and comparing with the 11th actual image. The maps were generated using Python in a Jupyter Notebook with Matplotlib (v3.3.4) and Basemap (v1.2.2) libraries ( and basemap/).

Fig. 5
figure 5

Model evaluation parameters used for prediction the PM2.5 concentrations.

These metrics offer insights into image quality, indicating some variation between training, testing, and validation, yet within acceptable ranges. Consistently higher SSIM and PSNR values and lower MSE values highlight the model’s exceptional precision compared to benchmarks. The model’s excellence traces back to its ability to capture complex spatio-temporal features through Autoencoder-based models and strategic integration of Conv2d, Batch Normalization, and Upsampling layers. The model outperforms prior methodologies in predicting PM2.5 concentrations, achieving precise and high-quality predictions across phases. Attempting to forecast PM2.5 levels for the next 4 days led to efficiency parameter decreases (SSIM, PSNR, MSE) with increased time frames, suggesting the need for more parameters for model efficiency improvement (Fig. 6). Predicting PM2.5 concentrations remains challenging due to intricate spatiotemporal features, where DL models offer promise. Leveraging deep learning architectures and transfer learning, this study fine-tuned models, achieving promising PM2.5 prediction results. Despite ongoing challenges in precise location predictions due to PM2.5‘s dynamic nature, the model demonstrated spatial distribution prediction abilities, evident in visual comparisons between predicted and actual PM2.5 concentration maps.

Fig. 6
figure 6

Example set of predictions of PM2.5 for next 4 days compared with their actual images. The maps were generated using Python in a Jupyter Notebook with Matplotlib (v3.3.4) and Basemap (v1.2.2) libraries ( and basemap/).

Challenges and limitations

Technological barriers

One of the primary challenges lies in overcoming technological barriers. While advanced pollution control technologies exist, their widespread adoption is hindered by factors such as high costs and limited access to cutting-edge solutions. Many regions, particularly in rural areas, lack the infrastructure necessary to deploy and maintain sophisticated air quality monitoring and purification systems. Bridging this technological divide is essential for comprehensive pollution control.

Regulatory and enforcement challenges

India grapples with the challenge of implementing and enforcing air quality regulations consistently. While the country has established regulatory frameworks to curb emissions from industries, vehicles, and other pollution sources, enforcement remains uneven. This inconsistency is often compounded by resource constraints, bureaucratic hurdles, and the need for stronger mechanisms to penalize non-compliance. Strengthening regulatory frameworks and enhancing enforcement mechanisms are critical steps in addressing this challenge.

Public awareness and participation

Creating widespread awareness and fostering public participation are essential components of any successful pollution control strategy. However, there is a considerable gap in public awareness regarding the causes and consequences of air pollution. Engaging citizens in proactive measures, such as adopting sustainable practices and reducing individual carbon footprints, requires comprehensive educational campaigns and community involvement. Overcoming societal inertia and instigating behavioral change are significant challenges in this regard.

Agricultural practices and crop burning

Agricultural practices, particularly the prevalent practice of crop burning, contribute significantly to air pollution. The burning of crop residues releases substantial amounts of particulate matter and pollutants into the air. Farmers resort to this practice due to a lack of viable alternatives and time constraints between harvest seasons. Developing and promoting sustainable agricultural practices, coupled with providing farmers with effective alternatives to crop burning, is a complex challenge that requires a holistic approach.

Urbanization and infrastructure development

Rapid urbanization and infrastructure development, while essential for economic growth, often contribute to increased pollution levels. The construction industry, in particular, releases pollutants into the air. Balancing the need for development with sustainable and environmentally conscious practices poses a significant challenge. Implementing green building technologies, stringent emission norms for construction activities, and incorporating urban planning strategies that prioritize air quality are vital steps in addressing this challenge.

Cross-border pollution

Air pollution knows no boundaries, and India contends with the impact of cross-border pollution. Transboundary movement of pollutants, especially during crop burning seasons, contributes to elevated pollution levels in various regions. Collaborative efforts with neighbouring countries are necessary to address this challenge effectively. Developing joint strategies, sharing data, and fostering regional cooperation are imperative for tackling the transboundary dimension of air pollution.

Climate change interlinkages

The interlinkages between air pollution and climate change present a complex challenge. Mitigating air pollution often aligns with climate action goals, but there are trade-offs and synergies that need careful consideration. Striking a balance between addressing immediate air quality concerns and contributing to long-term climate resilience requires integrated policies and strategic planning.

Socio-economic disparities

Air pollution disproportionately affects vulnerable communities, exacerbating existing socio-economic disparities. The challenge lies in designing interventions that address environmental concerns and promote social equity. Ensuring that pollution control measures do not inadvertently burden marginalized communities and providing equitable access to clean technologies are critical to overcoming this challenge.

link

Leave a Reply

Your email address will not be published. Required fields are marked *