Waste-to-Resource Transformation: Gradient Boosting Modeling for

May 22, 2019 - Despite progressive diversion from landfill, over 50% of landfilled MSW is ... Modelling for Organic Fraction Municipal Solid Waste. Pr...
0 downloads 0 Views 2MB Size
Research Article Cite This: ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

pubs.acs.org/journal/ascecg

Waste-to-Resource Transformation: Gradient Boosting Modeling for Organic Fraction Municipal Solid Waste Projection Eniola Adeogba,# Peter Barty,# Edward O’Dwyer, and Miao Guo* Department of Chemical Engineering, Imperial College London, London, SW7 2AZ U.K.

Downloaded by UNIV AUTONOMA DE COAHUILA at 00:38:43:956 on May 25, 2019 from https://pubs.acs.org/doi/10.1021/acssuschemeng.9b00821.

S Supporting Information *

ABSTRACT: Food and garden waste are important components of organic fraction municipal solid waste (OFMSW), representing carbon and nutrient rich resources composed of carbohydrates, lipid, protein, cellulose, hemicellulose, and lignin. Despite progressive diversion from landfill, over 50% of landfilled MSW is biodegradable, causing greenhouse gas emissions. In conventional waste management value chains, OFMSW components have been regarded as byproducts as opposed to promising resources with energy and nutrient values. Full exploitation of waste resources calls for a value chain transformation toward proactive resource recovery and waste commoditization. This requires robust projection of OFMSW composition and supply variability. Gradient boosting models are developed here using historical socio-demographic, weather, and waste data from U.K. local authorities. These models are used to forecast garden and food OFMSW generation for each of the 327 U.K. local authorities. The developed methods perform particularly well in forecasting garden waste due to a greater link to measurable environmental variables. The research highlights the key influences in waste volume prediction and demonstrates the difficulty in transferring models to local authorities without training data. The predictive performance and spatial granularity of model projections offer a promising approach to inform decision-making on future waste recovery facilities and OFMSW commoditization. KEYWORDS: Machine learning, Organic municipal solid waste, Gradient Boosting model, Waste recovery, Value chain



INTRODUCTION The projected 50% increase in global population in the 21st century1 combined with non-OECD economic growth is expected to increase resource (e.g., food and energy) demands as well as lead to rising waste generation. Municipal solid waste (MSW) defined as the municipal waste collected and treated by or for municipalities, including organic fraction MSW (OFMSW, food waste, garden and park waste, paper and cardboard, wood) and inorganic MSW (textiles, rubber and leather, plastics, metal, glass, and others).2 Global MSW growth is projected to exceed 11 million tons per day (59%− 68% organic fraction) by 2100 under “business as usual”.3 Increasing waste trends are particularly intense in less developed countries.2 Such waste trends not only increase the resource stress but also contribute to greenhouse gases (GHGs). Annual global food waste (equivalent to one-third of food produced globally) is equivalent to the waste of 8.5% annual water withdrawn and 28% of agricultural lands;4 MSW has emerged as a major concern as postconsumer waste accounts for 5% of global GHGs. A transformation toward resource-circular systems and sustainable MSW management is necessary. Despite the ongoing shift away from a landfill-dominated system, MSW chemical composition variability and conventional waste value chains hinder the transformation of the © XXXX American Chemical Society

waste sector toward a resource-circular system. Growing environmental pressure has caused regional/national targets to divert waste from landfills and increase the recycling and recovery rate. As part of a circular economy strategy, the EU has set the targets to recover MSW (recycling 65% MSW by 2035)5 and restrict OFMSW sent to landfill (35% of the 1995 baseline by 2020).6 U.K. OFMSW to landfill represents 22% of the 1995 baseline value with over 7.7 million tonnes of biodegradable municipal waste ending up in landfill in 2016.6 This implies that over 50% of landfilled MSW is biodegradable.6,7 The decomposition of organic MSW in landfill is the predominant contributor (92%) to the GHGs associated with the MSW sector (4% of total GHG) in the U.K..8 In a conventional waste management value chain and market, OFMSW along with other waste streams have been regarded as byproducts (carrying zero or low-value) rather than marketable commodities with well-defined grades (in contrast to oil products as energy carriers). In fact, OFMSW streams are not only carbon-rich resources as energy carriers but also contain high nutrient value (e.g., protein, lipid, and minerals). The waste sector presents promising opportunities for resources to Received: February 10, 2019 Revised: May 13, 2019

A

DOI: 10.1021/acssuschemeng.9b00821 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering

complex real world problem such as MSW generation is limited due to strict requirements placed on the input variables. These requirements include independency of explanatory variables, constant variance, and normality of errors in order to conform with fundamental regression assumptions.22 In contrast with the aforementioned methods, time series analysis is independent of demographic and socio-economic factors and relies only on historical waste data. A nonlinear dynamics based prediction technique was used to forecast waste generation and compared to a traditional time-series approach known as seasonal AutoRegressive and Integrated Moving Average (sARIMA).16 More recently, machine learning and artificial intelligence techniques such as support vector machines (SVM) and artificial neural networks (ANN) have been used to predict MSW generation on long, medium, and short-term scales.23,24 Zade and Noori25 used ANNs to forecast weekly waste generation, and Abbasi et al.26 used partial least-squares (PLS) for feature selection with SVM for waste forecasting. Our research differs from previously published literature which modeled total MSW volumes. Our research focuses on waste composition projection by accounting for two categories of OFMSW: food and garden waste. In the specific context of wasteto-resource transformation, the organic waste fraction of MSW is unique in the complexity of its environmental impact and its potential for value-added product recovery. This research uses a gradient boosting model, which is based on decision tree regression models. Gradient boosting has demonstrated the ability to model complex nonlinear relationships between variables and has proven higher prediction accuracy than traditional time series approaches such as AutoRegressive and Integrated Moving Average (ARIMA).27 The major drawback of the ARIMA model is its assumption of a linear relationship between independent and dependent variables which does not mirror the complex nature of real world relationships between variables.28 When compared to widely adopted ANN models, the gradient boosting method has many distinct advantages. There are typically fewer hyper-parameters to be tuned, and methods of interpretation are better developed than for ANNs. More importantly for our application, the gradient boosting implementation used in this paper has an in-built method for handling missing values in the data, whereas ANNs are less capable in this regard. Finally, different from approaches adopted in previous research which focused on MSW forecasting for a given region, this study investigates the transferability of the OFMSW prediction models to regions not included in the training set. Such generality would provide a promising means to quantify the future availability of food and garden waste and inform local authorities currently without collection schemes in place. Our proposed approach has the potential to benefit ODA countries where the increasing waste trends are particularly intense.2

be converted to value-added products via thermochemical and biochemical pathways such as anaerobic digestion. A transformative waste value chain and the commoditization of waste resources are required to exploit waste resource value, which in turn requires quantitative projection of waste composition and supply. However, the waste composition is highly complex and variable. Take food waste as an example. The U.K. nationwide analyses showed significantly varying carbohydrates (30−250 g/kg), lipid (10−128 g/kg), protein (5−140g/kg), and soluble sodium and potassium (1.2−55 g/kg) contents.9 These are dependent on spatially explicit factors (e.g., local diet and behavior) and seasonally environmental variables (e.g., winter and summer). The analytical experiments to quantify such varying waste composition can be labor intensive and cost ineffective; but it is essential to inform the technology design to maximize waste recovery. Moreover, planning (e.g., sizing and logistics) and operation of waste recovery facilities requires continuous and consistent waste feedstock supply; whereas it is difficult to precisely quantify the waste availability in particular OFMSW volumes due to its low traceability and mixed waste collection system origin. Notably, household waste (27.3 million tons in 2016) dominates the local authority collected MSW in the U.K. (∼90%)7 and shares over 70% of the food waste stream.10 These lead to inhomogeneous OFMSW streams driven by household behaviors and consumption trends which are not only affected by environmental factors but also socio-economic variables at the local level (e.g., income). Such challenges have been highlighted in a previous study,11 where the authors pointed out that the energy recovery barrier is the inability to quantify the garden waste. Thereby, technology implementation and waste value chain transformation call for robust projection of waste feedstock quantity and quality (composition) at spatial and temporal scales. This study aims to project variability in organic solid waste generated in a given U.K. local authority, accounting for socioeconomic and other environmental factors. Our research focuses on food and garden waste streams as important components of OFMSW. The chemical compositions and potential conversion pathways of food and garden waste are presented in Figure S1 in Supporting Information (SI).



METHODS FOR FORECASTING OFMSW

There is an increasing research interest in forecasting MSW generation in the context of informing local governments to plan efficient waste management systems.12 A variety of advanced techniques have been used to forecast MSW generation, which can be broadly classified as follows: descriptive statistical methods,13 material flow models,14 regression analysis,15 time series analysis,16,17 and artificial intelligence models.18,19 A list of advanced MSW forecasting techniques and their features is presented in Table S1. Descriptive statistical methods typically use demographic information such as population growth and average waste generation per capita as the main predictor; however, this method is prone to inaccuracies due to the dynamics of the MSW generation process.13 Material flow models have been adapted to predict waste generation under various social and economic scenarios.14 This approach fully characterizes the dynamic nature of MSW generation; however, it is typically applied to total waste rather than collected waste.20 Hekkert et al.21 highlighted that comparisons of the results of material flow analysis with real observed waste data on the highest aggregation levels were questionable due to the presence of different aggregations or low consistency within the studies. In regression analysis, MSW generation is correlated with economic and demographic variables.19 The suitability of this method for a



MATERIALS AND METHODS

Data Collection. An extensive literature review of past MSW forecasting research highlighted population, socio-demographic variables, and climate-related variables as the most important contributors to MSW generation.23,27,29 The selected features (Table 1) were therefore chosen to reflect each of these factors. Local authority waste data was obtained from the U.K. municipal waste database Waste-Data-Flow (WDF).30 The data was provided for each quarter spanning the 7 year period of 2009−2016. Each entry in the data set provided information on the name of the local authority, period of interest, type of material collected, collection method, number of households served by a waste collection service, and collected waste tonnage. Each entry is provided for a combination of a given authority and a given quarter, e.g., City of London, JanuaryMarch 2016. This combination of authority and quarter defines an “example”. The actual number of examples available for training the model was limited by the fact that only a subset of local authorities have separate collection schemes in place for food and garden waste. B

DOI: 10.1021/acssuschemeng.9b00821 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering

to predictors.31 A different linear model is then fitted to each region. This binary partitioning is performed recursively, and at each stage, the splitting variable and split point are determined using a greedy recursive-partitioning algorithm. Figure 2 illustrates the binary partitioning process in the context of our research. Consider two predictor variables X1 and X2, representing population density and mean temperature of a specified local authority at a given point in time, and a response Y, the total waste generated in the area. Splitting occurs at the first node using population density as the splitting variable. The next step involves splitting at the subsequent node, using temperature as the splitting variable. The recursive splitting process is stopped once a defined minimum node size is attained and the resulting large tree is pruned using a process called “cost-complexity pruning”. This process involves removing weak links identified through cross-validation. The result is a single decision tree which best describes the underlying relationship between a set of variables.32 GBRT is an extension of traditional regression trees which incorporates a statistical technique known as “boosting”. This procedure improves prediction accuracy by combining the outputs of “weak” classifiers to form a single consensus model. Hastie et al.31 defined a weak classifier as “one whose error rate is only slightly better than random guessing”. In the context of regression trees, boosting is a form of functional gradient descent that optimizes performance by adding a new tree at each step that minimizes the gradient of the loss function.32 The first regression tree is the one which minimizes the loss rate to the maximum extent; subsequent decision trees are then fitted to the residuals of the preceding tree. It is also important to note that although the model is updated each time a new tree is calculated, the existing trees remain unchanged. Instead, the linear model fitted to each observation is recalculated to account for the effect of the new tree.31 In order to optimize predictive performance, several parameters including number of estimators, maximum tree depth, and learning rate were tuned. Tuning was performed using graphical methods of comparing training and cross-validation accuracies, with the aim being to maximize the R2 score for the cross-validation set. The number of estimators determines the maximum number of trees in the model, as illustrated in Figure 2, while tree depth determines the degree of interaction between features. Friedman et al.33 states that, “since each new tree builds on the residuals of the previous tree, shallow trees with a depth of 4−6 are often preferred”. Learning rate is another key parameter that determines the weighting of each tree in the final model. A low learning rate will increase the number of trees used and allow regularization of results. Ultimately, this results in better model

Table 1. External Features Used in Models for Food and Garden Waste Features Population of Authority Area Population Density Index of Deprivation Temperature Rainfall Solar radiation Fraction of population living in rural areas Unemployment rate Household income

Period

Source

Yearly

WDF

Monthly

CEDA

Yearly

ONS

The availability of garden and food waste data across different U.K. local authorities is illustrated in Figure 1. Additional data sets were incorporated into the feature set to provide contextual information about the authorities. This included meteorological data (e.g., mean temperature, rainfall, and solar radiation) obtained from the Centre for Environmental Data Analysis (CEDA) as well as socio-demographic data (e.g., household disposable income and rural-urban classification) from the Office of National Statistics (ONS). Modeling Approach. A Gradient Boosted Regression Tree (GBRT) model was used to predict long-term waste generation. GBRT is used here to estimate waste volumes using only contextual features of a given local authority. This process is performed in three stages: training, cross-validation, and testing (see Figure S2 for a graphical outline of the process). First, the model is trained to identify patterns between features and the target variable: waste volume. In the cross-validation step, the model parameters are fine-tuned to improve model performance and ensure that the model can generalize to examples beyond those it was trained on. Finally, the model is tested on a number of unseen examples and its prediction accuracy is determined. Other model types were examined and tested, including support-vector-machines and random forests, but the GBRT model was found to be most effective for the two specific use cases in this paper (see Tables S2, S3 for further details). The GBRT method combines the strengths of two algorithms: regression trees and boosting. The resulting model is an additive regression model comprised of decision trees fitted in a stage-wise manner. Decision trees partition the feature space into regions and use a series of rules to identify regions with homogeneous responses

Figure 1. Availability of garden waste (A) and food waste (B) data across the U.K. (number of data points per authority). C

DOI: 10.1021/acssuschemeng.9b00821 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering

Figure 2. (A) Visualization of a decision tree using two demonstrative predictor variables (population density and temperature) to generate a single response (waste volume). (B) Regression tree tuning parameters. performance and a reduced risk of overfitting to the training data set despite a subsequent trade-off in computation time.34 The final model is a linear combination of all decision trees whose contribution to the overall model is weighted by the learning rate to minimize the root mean squared error loss function.34 XGBRegressor was implemented using Python’s XGBoost package version 0.81. Feature Engineering. Features were mapped to each U.K. local authority on a quarterly basis from 2009 to 2016. U.K. weather data was obtained from CEDA at a grid resolution of 5 × 5 km2. These were mapped to the relevant regions by locating the nearest grid point to each local authority with grid point locations determined by minimizing the Euclidean distance between the grid points and the longitude and latitude given for the local authority. A small number of these authorities were excluded due to discrepancies in nomenclature between waste authorities and local authorities. Once the feature mapping stage was completed, features were checked for colinearity and excluded if their linear correlation Rsquared score exceeded 0.8. Features excluded by this method include local authority dwelling stock and a number of metrics related to the rural population in local authorities. Feature choices were kept consistent for all model types to allow comparison of feature importance and, more importantly, model accuracy. Models. Two model types were developed - a forecasting model and a “peer” model. For each of these model types, individual submodels were created for food and garden waste. The forecasting model was developed to predict food and garden waste volumes for future time periods. The food and garden waste submodels were trained using data from 2009 to 2013 for all applicable local authorities and then tested for the period from 2014 to 2016. The mean waste volume prediction was compared to actual waste data for this test period. The purpose of the forecast model is to predict waste volumes in the future for local authorities which already have collection schemes in place to aid in planning. The peer model was developed in response to an observed gap in the literature regarding the generalizability of these waste prediction models to regions which the model had not previously been trained on. The purpose of this model was therefore to predict OFMSW volumes based only on intrinsic information about the local authority for a given period such as population and weather patterns. In our specific case, the purpose of creating a peer model is to be able to extrapolate forecasts for organic waste to those local authorities that do not yet have organic waste collection schemes in place to highlight the “lost potential” for organic waste collection. As such, it could be used to support the decision to implement such a scheme in a local authority. The peer model was trained on 60% of the 327 local authorities over the entire time period from 2009 to 2016 and tested on the other 40% of local authorities which had not been seen by the model. The predicted waste volumes for the unseen local authorities were compared to actual waste volumes to determine the accuracy of the peer model. Table 2 shows the tuning parameters used for each model type on the garden and food waste streams. Data Set Splitting. In order to obtain training, cross-validation and test sets for the peer model, the original data sets were split such

Table 2. Models for Food and Garden Waste Forecasting and peer model parameters Model Type Forecasting Peer

Waste Type Garden Food Garden Food

Tree Depth

Number of Estimators

Learning Rate

4 8000 4 1000 3 900 2 1800 Model prediction results

0.025 0.025 0.004 0.006

Accuracy (R2 score) Model Type

Waste Type

Training

CV

Test

Forecasting

Garden Food Garden Food

0.994 0.951 0.719 0.787

0.770 0.691 0.549 0.373

0.651 0.506 0.291 0.262

Peer

that similar waste volume distributions were observed across all three sets. Authorities were ranked by their mean waste volume and split into quintiles based on this metric. Finally, stratified random sampling was used to split the authorities into training, cross-validation, and test sets, stratifying on these quintiles. Model Evaluation. The model performance is measured by a coefficient of determination R2 (eq 1) measured against a baseline model (i.e., mean of the waste volume in the training set).

R2 = 1 −

2 ∑i (yi − ytrain ̅ )

∑i (yi − yi , pred )2

(1)

where yi and y(i,pred) denote the true and predicted waste volumes, respectively; y̅train represents the mean waste volume in the training data set. This coefficient of determination can be calculated for each of the training, cross-validation, and testing data sets for each model. The models’ parameters are tuned with the aim of maximizing the crossvalidation R2. Model Interpretation. A number of model interpretation techniques could be applied in waste forecasting models to provide useful insights into individual feature contribution to the model’s decision-making process. Lundberg et al.35 showed that a widely adopted approaches such as split count and gain are inconsistent metrics of feature importance and proposed a new method known as Shapley Additive Explanations (SHAP), where SHAP advantages and limitations are thoroughly discussed. SHAP combines a number of Additive Contribution Explanation algorithms to provide a measure of feature contributions that meets the requirements of local accuracy, missingness, and consistency.35 SHAP values measure the contribution of each feature to the model’s output for a certain example. The mean absolute SHAP values across all examples for each feature are calculated to evaluate the importance of the features for the model. Higher mean absolute SHAP values indicate that those features generally have a more D

DOI: 10.1021/acssuschemeng.9b00821 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering

Figure 3. Model predictive performance on mean quarterly collected garden (A) and food waste (B) per authority for training and test set from 2009−2016.

Figure 4. Relative feature importance for garden waste (A) and food waste (B) prediction models.

Figure 5. Correlation between magnitude of feature values and impacts on prediction model outputs for food waste (A) and garden waste (B).

both waste types with an R2 of just 0.29 for garden waste due to the inherently difficult nature of predicting the waste output of previously unseen local authorities. Figure 3 compares the forecasting model prediction results for mean quarterly garden and food waste to actual collection volumes. The quarterly prediction results were averaged across all local authorities to demonstrate mean waste volumes for the U.K. as a whole. On aggregate, model accuracy is higher at the national level compared to the local authority level with test R2 scores of 0.766 and 0.899 for garden and food waste models, respectively. The forecasting model for garden waste follows the training set very closely but is less accurate when applied to the test set. However, it captures the seasonality of garden waste generation indicated by peaks and troughs. The

significant overall additive contribution to the model’s output. In this study, SHAP values were calculated using the TreeSHAP algorithm,36 implemented in Python’s SHAP library, version 0.25.2.



RESULTS AND DISCUSSION Model Performance. Table 2 shows the prediction accuracy for both the forecasting and peer models for each waste type, as measured by the R2 value across the entire training/cross-validation/test set. Overall, the forecasting model shows good prediction accuracy on both food and garden waste. For both model types, test accuracy is consistently higher for garden than food waste, with a 29% and 11% increase in R2 value for garden waste compared to food waste from the forecasting and peer model, respectively. The peer model shows significantly lower test accuracies for E

DOI: 10.1021/acssuschemeng.9b00821 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering

the differences in waste collection methods between local authorities are unknown and hence not considered in the models. Reported accuracies of the peer models are highly dependent on the local authorities included in the separate training, cross-validation, and test data sets. As a consequence, variations in prediction accuracy were observed when randomized stratified sampling was repeated. The generalizability of the models to “unseen” local authorities therefore relies on sufficient similarities between local authorities in the training and test sets. As only a limited number of local authorities provide consistent waste data, ensuring a representative split across the training and test sets without introducing external bias is a nontrivial problem. The results imply that the transfer of models to local authorities without data with which to retrain is not straightforward, with a datadriven approach such as this. The variability in R2 across different authorities can be seen in Figures S3−S6. In addition, forecasting model accuracy is impacted by variations in the number of food and garden waste collections across local authorities. Authorities with a higher number of waste collections are represented better by the forecasting model than those with only a few. This is likely to result in increased model accuracy for such authorities compared to those which are less well represented in the data. Overall, the strong performance of the forecasting models at the local authority and national level proves their suitability for projecting OFMSW variability in waste quantity and quality (composition). Such projection could inform the future OFMSW commoditization where the commodity grade definitions are dependent on waste composition and recovery values as energy carriers or nutrient substitutes. Furthermore, the spatial granularity of model predictions offers a promising approach to inform the decision-making on technology choice, sizing, location, and logistics of OFMSW recovery facilities. These underpin the potential transformation of waste value chains from retrospective waste management to proactive resource recovery in a coordinated OFMSW value chain, where waste resource and facilities are interconnected via the Internet, supporting real-time decision-making (following an Industry 4.0 framework). In future research, model performance may be strengthened by improving data quality and incorporating more features which reflect the dynamics of waste collection, e.g., collection schedules, population demographics, and the number of households registered for OFMSW collection. Further benefits can be realized from the models investigated in this study by extending their prediction capabilities to account for variability in OFMSW chemical composition. Determination of model prediction uncertainty could be another important step toward industrial application of this research, e.g., prediction uncertainty quantification with lower/upper bounds for waste forecasts to optimize responsiveness of waste processing operations.

forecasting model for food waste captures the increasing trend in waste collection overtime as well as short-term changes in the overall trend. Feature Importance. Figure 4 shows the relative feature importance for the garden and food waste prediction models. The results indicate that population is the most important feature. However, the importance of socio-economic and environmental factors varies significantly between the models. Monthly solar radiation and mean temperature are significant features for the garden waste prediction models; however, they are relatively unimportant for the food waste models. Sociodemographic variables, such as population dynamics, gross disposable household income per head, and index of deprivation appear to be more important features for food waste prediction models. Figure 5 illustrates the degree to which model output is positively or negatively impacted by the magnitude of the value of each feature. Feature impact on model output was measured using SHAP. Both models demonstrate that more populated regions have a high positive impact on model output. Lower population values show a receding influence on the output, but the magnitude of this influence is noticeably smaller than that for high population values. This suggests that for higher populations, the population size itself becomes a more dominant feature for waste prediction, while for lower populations, other features are emphasized to a greater degree.



DISCUSSION The forecasting models provide robust projection for two types of OFMSW, i.e., food and garden waste. They represent carbon and nutrient rich resources for recovery including carbohydrates, lipid, protein, cellulose, hemicellulose, and lignin. However, more robust results were obtained when predicting garden waste volumes compared to food waste. This is largely due to the availability of the features characterizing each waste type. We hypothesized garden waste collection to be influenced primarily by weather-related factors and these are represented well by the feature set used for model training. Conversely, food waste collection could be influenced by individual household behaviors in addition to socio-economic variables. These are more difficult to quantify on a local authority basis; therefore, the feature set is less representative for food waste. Feature importance results provide insight into the relative weighting of each feature in the model’s decision-making process. Population is the most important feature for both model types as expected (more inhabitants generate higher levels of OFMSW), while monthly sunshine is a more dominant feature for garden waste than mean temperature. More generally, weather reflects the seasonally varying features acting as the second most important feature for garden waste prediction models, indicated by high mean SHAP values for monthly solar radiation and temperature. The effect of using weather as a feature is reflected by the model’s capability to accurately capture the seasonality of garden waste generation (see Figure 3B). Additionally, monthly mean temperature and solar radiation are strongly correlated to their impacts on predicted waste volumes. Model predictive performance is subject to a number of data and methodology limitations. The presence of incomplete and noisy data and the inconsistencies in reporting standards across the U.K. resulted in discrepancies and missing data. Moreover, variables such as community acceptance of waste-recovery and



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acssuschemeng.9b00821. Tabulated outline of advanced techniques for MSW forecasting including types of application areas and F

DOI: 10.1021/acssuschemeng.9b00821 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering



(17) Xu, L.; Gao, P.; Cui, S.; Liu, C. A Hybrid Procedure for MSW Generation Forecasting at Multiple Time Scales in Xiamen City, China. Waste Manage. 2013, 33 (6), 1324−1331. (18) Ali Abdoli, M.; Falah Nezhad, M.; Salehi Sede, R.; Behboudian, S. Longterm Forecasting of Solid Waste Generation by the Artificial Neural Networks. Environ. Prog. Sustainable Energy 2012, 31 (4), 628−636. (19) Abbasi, H.; Emam-Djomeh, Z.; Ardabili, S. M. S. Artificial Neural Network Approach Coupled with Genetic Algorithm for Predicting Dough Alveograph Characteristics. J. Texture Stud. 2014, 45 (2), 110−120. (20) Beigl, P.; Lebersorger, S.; Salhofer, S. Modelling Municipal Solid Waste Generation: A Review. Waste Manage. 2008, 28 (1), 200−214. (21) Hekkert, M. P.; Joosten, L. A. J.; Worrell, E. Analysis of the Paper and Wood Flow in The Netherlands. Resour. Conserv. Recycl. 2000, 30 (1), 29−48. (22) Hockett, D.; Lober, D. J.; Pilgrim, K. Determinants of Per Capita Municipal Solid Waste Generation in the Southeastern United States. J. Environ. Manage. 1995, 45 (3), 205−217. (23) Noori, R.; Abdoli, M. A.; Ghasrodashti, A. A.; Jalili Ghazizade, M. Prediction of Municipal Solid Waste Generation with Combination of Support Vector Machine and Principal Component Analysis: A Case Study of Mashhad. Environ. Prog. Sustainable Energy 2009, 28 (2), 249−258. (24) Abbasi, M.; Abduli, M. A.; Omidvar, B.; Baghvand, A. Results Uncertainty of Support Vector Machine and Hybrid of Wavelet Transform-Support Vector Machine Models for Solid Waste Generation Forecasting. Environ. Prog. Sustainable Energy 2014, 33 (1), 220−228. (25) Jalili, M.; Noori, R. Prediction of Municipal Solid Waste Generation by Use of Artificial Neural Network: A Case Study of Mashhad; 2007; Vol. 2. (26) M, A.; Abdoli, M.; Omidvar, B.; A, B. Forecasting Municipal Solid Waste Generation by Hybrid Support Vector Machine and Partial Least Square Model; 2013; Vol. 7. (27) Johnson, N. E.; Ianiuk, O.; Cazap, D.; Liu, L.; Starobin, D.; Dobler, G.; Ghandehari, M. Patterns of Waste Generation: A Gradient Boosting Model for Short-Term Waste Prediction in New York City. Waste Manage. 2017, 62, 3−11. (28) Kane, M. J.; Price, N.; Scotch, M.; Rabinowitz, P. Comparison of ARIMA and Random Forest Time Series Models for Prediction of Avian Influenza H5N1 Outbreaks. BMC Bioinf. 2014, 15 (1), 276. (29) Tchobanoglous, G. Integrated Solid Waste Management : Engineering Principles and Management Issues; Theisen, H., Vigil, S., Eds.; McGraw-Hill: New York, 1993. (30) Department for Environment, F. and R. A. WasteDataFlow. 2018. (31) Hastie, T. The Elements of Statistical Learning : Data Mining, Inference, and Prediction, Second.; Tibshirani, R., Friedman, J. H., Eds.; 2009. (32) Elith, J.; Leathwick, J. R.; Hastie, T. A Working Guide to Boosted Regression Trees. J. Anim. Ecol. 2008, 77 (4), 802−813. (33) Friedman, J. H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29 (5), 1189−1232. (34) Friedman, J. H. Stochastic Gradient Boosting. Comput. Stat. Data Anal. 2002, 38 (4), 367−378. (35) Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions; 2017. (36) Lundberg, S.; Erion, G.; Lee, S.-I. Consistent Individualized Feature Attribution for Tree Ensembles; 2018.

waste streams, superstructure showing compositions and conversion pathways for food and garden waste constituents of organic waste, graphic showing methodology for development of the gradient boosting model, comparison of performance of tested models on forecasting and peer prediction tasks for green garden waste, figures showing variations in R-2 scores at the local authority level for food and garden waste across the forecasting and peer models (PDF)

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Miao Guo: 0000-0001-7733-5077 Author Contributions #

Equivalent contribution.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS MG would like to acknowledge the U.K. Engineering and Physical Sciences Research Council (EPSRC) for providing financial support for EPSRC Fellowship ‘Resilient and Sustainable Biorenewable Systems Engineering Model’ [EP/ N034740/1].



REFERENCES

(1) Nations, U. World Population Prospects; 2015. (2) Hoornweg, D.; Bhada-Tata, P. What a Waste - A Global Review of Solid Waste Management; World Bank: 2012. (3) Hoornweg, D.; Bhada-Tata, P.; Kennedy, C. Environment: Waste Production Must Peak This Century. Nature 2013.502615 (4) FAO. Food Wastage: Key Facts and Figures. 2016. (5) Commission, E. 2018 Circular Economy Package. 2018. (6) Affairs, D. for E. F. and R. U.K. Statistics on Waste. 2018. (7) Affairs, D. for environment F. and R. Digest of Waste and Resource Statistics. 2018. (8) Committee on Climate Change. Reducing U.K. Emissions 2018 Progress Report to Parliament. 2018. (9) WRAP. Food Waste Chemical Analysis; 2010. (10) WRAP. Estimates of Food Surplus and Waste Arisings in the U.K. ; 2018. (11) Shi, Y.; Ge, Y.; Chang, J.; Shao, H.; Tang, Y. Garden Waste Biomass for Renewable and Sustainable Energy Production in China: Potential, Challenges and Development. Renewable Sustainable Energy Rev. 2013, 22, 432−437. (12) Kolekar, K. A.; Hazra, T.; Chakrabarty, S. N. A Review on Prediction of Municipal Solid Waste Generation Models. Procedia Environ. Sci. 2016, 35, 238−244. (13) Sha’Ato, R.; Aboho, S. Y.; Oketunde, F. O.; Eneji, I. S.; Unazi, G.; Agwa, S. Survey of Solid Waste Generation and Composition in a Rapidly Growing Urban Area in Central Nigeria. Waste Manage. 2007, 27 (3), 352−358. (14) Schiller, F.; Angus, A.; Herben, M.; Young, P. J.; Longhurst, P. J.; Pollard, S. J. T.; Raffield, T. Hidden Flows and Waste Processing − an Analysis of Illustrative Futures AU - Schiller, F. Environ. Technol. 2010, 31 (14), 1507−1516. (15) Wei, Y.; Yin, J.; Ni, W.; Xue, Y. Prediction of Municipal Solid Waste Generation in China by Multiple Linear Regression Method AU - Wei, Yuanwei. Int. J. Comput. Appl. 2013, 35 (3), 136−140. (16) Navarro-Esbrı′, J.; Diamadopoulos, E.; Ginestar, D. Time Series Analysis and Forecasting Techniques for Municipal Solid Waste Management. Resour. Conserv. Recycl. 2002, 35 (3), 201−214. G

DOI: 10.1021/acssuschemeng.9b00821 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX