Application of an adaptive Bayesian-based model for probabilistic and deterministic PV forecasting

Accurate prediction of solar photovoltaic plant energy generation is essential for optimal planning and operation of modern power systems, and incorporating such plants into the energy sector. In this study, an adaptive Gaussian mixture method (AGM) and a developed variational Bayesian model (VBM) inference through multikernel regression (MkR) are utilized to assist desirable precise prediction. In this model, the MkR processes the multiresolution solar energy signal, and then the AGM models the complex signals forecasting error. Finally, the proposed model can be optimized, and the concurrent output of the solar energy signal in both probabilistic and deterministic status can be attained through the introduction of the VBM. The solar energy output of an actual plant, including four measurement sites provided the data for the study. The results conﬁrmed that the proposed model delivers higher prediction accuracy for both probabilistic and deterministic forecasts when compared with


INTRODUCTION
While renewable energy generation sources continue to contribute an ever-higher percentage in the energy sector, the consumption continues to rise dramatically. The reason is that the electrification of transportation and the heating sectors have been making progress and their demand increased significantly. However, the unpredictability of renewable energy generation plants and most notably solar energy power plants creates challenges for both power system operation and planning activities. Accurate prediction models for different time frames, namely, a day ahead, an hour ahead, within minutes and also seconds are required for different system functionalities and purposes, such as planning, control, and market pre-dispatch [1][2][3][4][5][6][7][8][9][10].
In [3], a correlation of the sun's radiation variabilities in near sites as a function of their distances was investigated. In [4], a sub-hourly variability of photovoltaic (PV) energy based on hourly satellite-derived insolation data was forecasted. In [5], to increase the cloud tracking model, a neural-network-based forecasting engine was presented to predict 15 min ahead.
All the prediction methods should assure suitable level of stability, accuracy and should be able to operate on different time horizons. Various deterministic methods have been proposed by researchers for solar PV power. However, the enhancement of probabilistic methods can play a crucial role in the prediction process since they can be a suitable model for PV uncertainties and also can reduce some risks in market participation [5]. In [6], the authors proposed quartile regression models to address the abovementioned problem. A Bayesian model was presented in [7], and a hybrid model based on the k-nearest neighbours and the kernel density function was introduced in [8]. Stochastic differential equations were suggested in [9]. Additionally, a probabilistic spatial method was presented in [10]. Applying a series of multiple forecasters rather than a single forecaster to present such models is significant for obtaining improved results. Hypothetically, the single forecasters can be either probabilistic or deterministic [2,4,6]. The deterministic models are based on machine learning models such as gradient boosting. It would be better to have a linear collection of various probabilistic forecasters to perform aggregate forecasting, as presented in [11]. This model can be combined with a Bayesian framework that is based on a logarithmic collective [11]. A review on some prediction methods for solar signal is presented in Table 1. Also, Figure 1 shows that among the previous research works between 2010 and 2019, 55% are based on multi-layer model,  and 36% are based on hybrid methods while 90% are studied in short term domain, in different countries. The hybrid solar forecasting models can be considered for this problem as they are able to provide speed convergence and accurate forecasting outcomes. Moreover, the single/simple approaches can fail in desired solution finding which proofs the importance of application of hybridization methods. Additionally, it can be mentioned that all the hybrid approaches provide high performance in comparison to the stand-alone methods with various inputs/outputs, in all test cases.
The main goals of a probabilistic forecasting model are to deliver fast prediction along with proper reliability that are at the centre and considered as a core in a forecasting model. The speed can concentrate on the probabilities, and the reliability can focus on the accuracy of the probabilistic predictions. The probabilistic prediction model should be accompanied by sufficient standards to achieve quick and reliable forecasting. Some data and hence, information may be lost during the data transformation. The high-resolution solar energy data recording is the model input to prevent data and information losses.
Benefits of Gaussian mixture method (GMM) can be summarized in easy implementation and requires a small number of parameters. Furthermore, the log-likelihood function is basically simple which is employed to predict the variables. Moreover, the Bayesian model is used in this work. It corresponds to the minimum variational free energy and a lower bound of the Bayesian evidence, which is in turn a vital factor for choosing the model. The Bayesian model can be also effective in approximation method since the variational stochastic complexity displays the distance from the variational posterior distribution to the true Bayesian posterior distribution in terms of Kullback information. Additionally, this model can affect hyperparameters on the learning process while, the VBM minimizes the variational free energy, the derived bounds specify how the hyperparameters impact the learning procedure. Hence, the advantages of proposed model can solve the proposed forecasting problem as complex signals can be incorporated, adaptive spatial or other regularizing priors can be used in this model, and it can be also extended to hierarchical models. In this study, we incorporate the proposed model based on adaptive Gaussian mixture method (AGM) and a developed variational Bayesian model (VBM) inference through multikernel regression (MkR).
Moreover, a mean-max discrepancy method is considered that addresses the kernel function to obtain the appropriate features in Hilbert space. In the next step, the proposed improved mean-max discrepancy (IMD) function maps the features to the Hilbert space. The kernel function could also map the processed data into the Hilbert space within the required timeframe. The construction of the MkR must use this technique, which employs multiresolution information. The MkR processes the multiresolution solar energy signal and then the AGM will model the complex signal's forecasting error. Finally, the proposed model can be optimized, and the concurrent output of the solar energy signal in probabilistic and deterministic status can be attained by utilizing variational Bayesian methods that cause each forecast to possess a continuous probability density function (PDF).
The key considerations and contributions of this paper can be summarized as follows: ▪ Applying an improved version of a mean-max discrepancy model based on allocating a weight for the samples' functions. ▪ Utilizing the AGM method to model the complex signals of the forecasting error. ▪ Integrating the VBM model to optimize the proposed model. ▪ Deriving concurrently, the deterministic and probabilistic forecasts from the forecast distributions. ▪ Applying the AGMs as posterior distributions of the solar energy forecasts.
The remainder of the paper is organized as follows. Section 2 introduces the proposed forecasting model. Section 3 presents the estimation criteria for the prediction process. The case study is presented in Section 4. Section 5 presents the numerical results and analysis, and finally Section 6 summarizes the conclusions of the paper.

Proposed IMD model
In this section, the proposed method is introduced. The mechanism of mean-max discrepancy starting with mapping of x i and x j (as samples) according to the ϕ (x) function based on a random variable x ∈ Rd. Then, the kernel function K (x i , x j ) = 〈ϕ (x i ), ϕ (x j )〉 assesses two samples' similarity, where the inside creation of the candidates is determined by 〈⋅, ⋅〉. The Gaussian kernel that is the most common form of the kernel function is computed as follows: Based on (1), the Euclidean distance between x i and x j is the criterion for computing the similarity between them. However, according to the accidental parameters of X and Y by the p(X) and p(Y) PDF, respectively, it is not appropriate to use the Euclidean distance. The distance between distributions p(X) and p(Y) is measured by mean-max dependency utilization. The description is provided in Figure 2 and the calculation is provided in [26].
In (2), the μ X introduces the kernel embedding in RKHS for the p (X) distribution and can be obtained as; The practical application of p(X) is indirect. In other words, a set of samples of {X 1 , X 2 … X N }, which are prolated by p (X), are used instead of p(X). Then, the computation of the empirical kernel embedding can be obtained through: Accordingly, the unbiased evaluator of the mean-max dependency can be calculated as shown in [26].
Finally, the resemblance among two accidental parameters can be presented as:

Enhanced MMD
Due to non-existence bias between distributions, Equation (5) will be evaluated. It is the estimator calculation that can cause unbiased samples from the solar signal candidates' distribution. In other words, if the samples are biased in our target distribution, that is, the solar signal, this evaluation can guess the variance among the simulations and the biased empirical distribution. Thus, our model will internally recalculate the biases. To address this problem, an enhanced version of this model is presented while if the normalized constant of Z is assumed, the unbiased estimator can be calculated among the P(y) and P(x) by allocating a weight for the functions of the samples based on λ(x)P(x) and the likelihood ratio of P(x)/λ(x)P(x) = 1/ λ(x), as follows: However, as the exact value of λ(x) is not known, (7) must change to (8) based on the self-normalized weights: where, . "

The classic model of the Gaussian mixture
The classic version of the Gaussian mixture (GM) is composed of K Gaussian members [27]. Therefore, to obtain the probability density (PD), those Gaussian constituents should be added as obtained by: Taking the logarithmic function was necessary due to the small result of the multiplication of multiple probabilities. The maximum likelihood function is expressed as follows: ) . (10) Phase one: The primary GM variable or the variables from the prior iterative evaluation based on the PD for each K-GC generated signal are used [27].
Phase two: The results from the first step by estimating the parameters of the Gaussian model are used.
Phase three: The first two phases are repeated until the likelihood function value is less than the assumed threshold. In other words, the first two phases are repeated to obtain a stable value for the likelihood function.

2.2.1
Adaptive GM strategy The classic GM method possesses a fixed mixing degree. The main idea of the developed algorithm comprises three aspects. Firstly, for a Gaussian component to be useful, it should not have a light weight, and the related space, as well as the other Gaussian component (GC) should not be less than a specific value. Secondly, both GCs must avoid weight values being too large and thirdly, both GCs must avoid distance values being too small due to the merged GC. Additionally, if a GC's weight is large and its variance value is very large, the GC that shows multiple fragments of the signal specification distribution will be scattered. The result of this division of the GC should be two components. The process can be presented as follows: 1) the GC's weight is defined as i , which should be smaller than the threshold i . Furthermore, the space between the GCs should be greater than D 1 . After omitting the defined component, that is, i , its Gaussian component weight was allocated to the other components. Accordingly, the related degree of GM can be −1. In this manner, if the multiple components are removed, the lowest GC weights should be removed. Specifically, the space of the i th and j th GCs are computed via the following formula: In (15), the dimension of mean GC is introduced by 1 = 1, 2,…, N.
2) A smaller space among the GCs when compared to D 2 as the threshold and lower weights of the two GCs when compared to 2 showed that the piece of information conveyed by the two GCs was identical. Thus, the variables can be changed as follows: The i th and the j th Gaussian components were substituted with the new Gaussian component. Therefore, the related degree of the GM can be −1. If the merging conditions were met for multiple GCs, then those Gaussian components with the nearest distance would be the target of merging.
3) The Gaussian component conveyed too much information when the GC's weight was larger than the defined threshold 3 , while the was larger as the variance. It needed to be separated and its parameters changed as follows: As is evident in the formula, the diagonal of Σ i was a = Σ * i 2 n , Σ * i with the maximum value. The dimension and all elements of the matrix E = [1,…,N] were N and 1, respectively. The original Gaussian component was replaced by the most recent two Gaussian components { T , T , Σ T } and { T +1 , T +1 , Σ T +1 }. Hence, the related degree can be set to +1. In this manner, if an appropriate condition is prepared for multiple GCs, then the largest GC weight can be selected.
The first parameter should go through the updating process; then, the expectation maximization (EM) algorithm should be used for training. The number of training iterations should reach the preset value; thus, the aforementioned two steps should be repeated continuously until the value is reached.

Developed variational Bayesian model
Independent observations (n) should be denoted by x = (x 1 , …, x n ). In our context, the Dirichlet distribution is possible by allowing p (θ) be the past dispersal of a Bayesian model [28]. In the following one can see the formula θ ∼ D (α 1 ,…, α K ). In this case, the hyperparameters are defined by α k > 0, k = 1, …, K.
Moreover, the marginal likelihood is introduced as: It is very important to estimate this quantity because it is used for model selection. In general settings, although Markov Chain Classic variational estimator model The lower-bound (L) of log m(x) is used to compose the variational Bayesian (VB) methods. Therefore, an approximate distribution q should be diverged from the true posterior distribution p. The divergence should be in a free-form minimization of the Kullback-Leibler (KL) divergence, summarized as KL(q||p).
Since KL is positive, it is obvious that the KL's divergence minimization is matched by the lower constraint's maximization. The optimization should be done within a restricted range of distributions to manage the problem as follows: The varied parameters are : = ( iK : i = 1,…, n, k = 1, …, K). In (18), z and θ are independent because the mixture models are based on the VB methodology. Additionally, parameterizing is performed only for the distribution of z, while the explicit analytical evaluation of the boundary is used for the approximate distribution of θ as a byproduct.
By considering the distributions in (18), [29] imitates a constraint to the logarithm of (18) in the following manner: Considering (18) for all q (z) ∈ℑ, θ ∈ Θ K, we can write the following: In the above equation, the analytical assessment can be presented as follows: C and Γ (.) denote a known constant (not dependent on ) and the gamma function, respectively. Moreover, according to the significance of the analytical integration of (22), θ is approximately distributed, as follows: In the above formula, γ k = α k + ∑ n i=1 ik . Then, the element in ℘ 1 ( ) could be maximized by using an EM-revised . The maximization of Equation (23) occurs using the proposed algorithm in the slope. Therefore, p ( k |x) indicates an estimate of the density of future bordering in k = 1,…,K rising from extended MCMC executions that are assumed to be the truth [30].
An evaluation of p ( k |x) against the VB estimate in Equation (24) is illustrated by the red lines in Figure 3. The primary downside of this technique is that it leads to variance inaccuracy, although this approach displays favourable performance in terms of the means of posterior (the black and red dashed perpendicular lines). In Equation (24), the KL convergence between the combined posterior p( , z|x) and the circulations measured in Equation (21) is increased. Therefore, the circulation would be improved. Nonetheless, it does not serve as the "superlative" Dirichlet estimate of the posteriorp( |x), considering the grey curves. The next section presents methods for achieving an improved estimate.

Estimating a non-increased posterior
Based on varied inferential points of view, the hidden variables' (z) independency and the mixture model weights (θ) are assumed for convenience; however, this could be perceived as an oversimplified approach. This approach could be moderated by amalgamating the hidden variables from the estimation. First, if the standard differential bound (L 1 ) is compared with a novel bound (L 2 ) from the relegation of the same estimated circulation over the hidden space, it is found to be inferior. This is confirmed in the subsequent scheme [31].

Scheme 2.3.2.1 q (θ) is designated and
The equality holds only under the condition that q(θ,z) = p(θ,z|x), ∀θ,z. (25) is a lower bound (LB) of log m(x) is simple. Certainly, the LB between q and p is signified as follows:

Proof. Showing that Equation
Next, the second inequality of Equation (26) is proved by the log-sum variation ∀θ ∈ Θ, as follows: .
If both sides of the previous discrimination are integrated, we conclude the following: Provided that Equation (28) is replaced by Equation (27), it could be concluded that ℘ 2 ≥ ℘ 1 . Note that if and only if q (θ,z) = p(θ, z|x) is ∀θ,z the log-sum inequality perceived as equality; this could accomplish the proof [31].
Considering the same estimation of θ|x, ℘ 2 as an LB could be a more accurate approximate calculation. Marginalizing over z and neglecting the factorization presumption make it different than the initial bound. A more accurate estimate is not a remedy for this problem; however, an estimated circulation inside a distinct category of circulations should be determined to improve this lower bound. A discussion of it is provided in the following. f signifies every subcategory of circulations with q(θ) ∈ f. Equation (26) suggests the following: Therefore, first, a fixed f that comprises q (θ) must be specified. Then, optimization to find the best approximating circulations to construct an improved border of the bordering probability is performed. The parameter space of "f" is represented through ∆ f and adopts δ∈∆ f . The boundary in the calculation is converted to the following formula by knowing that the estimation "f" is changing in the set f: Logging m(x) for all δ∈∆ f is performed by Equation (30). There is an emphasis on the inability to compute Equation (30) with a static δ. Nevertheless, Equation (30) can be estimated through a suppositional estimate because it represents the random variable mean g(θ): = log p(x|θ)+log p(θ)-log f(θ;δ),θ∼f(⋅;δ). Consequently, the unbiased drive could be written as follows:

2.3.3
Structure of "f" Remaining with the Dirichlet function to have the best differential estimation, especially in the case of the joint posterior, should be considered. The subclass of the Dirichlet circulations family should be searched to find an obvious choice for f. Furthermore, it would be useful to consider a family with an even wider series, that is, a Dirichlet group of circulations that is representative of the whole [29]. A generalized Dirichlet circulation could be expressed via the VB solution as follows in which + l ∶= ∑ K j =l +1 l , l = 1, … , K − 1. Subsequently, two different parameters of f are elaborated in Equation (31), signified via f D and f ℑD with f D ⊂ f ℑD . Consider that parameters (γ, γ + ) → ( ∼ , ∼ +) must be changed to perform this calculation. The same mean value is retained as the original VB circulation to simplify the optimization. This is sensible by assuming replication of the study; this circulation is fairly precise in valuing the posterior means.
The first alteration is based on only one variable δ ∈ R, as follows: Considering that this alteration suggests the following: As a result, with respect to every δ, the consequential circulation affiliates to the subsection of the Dirichlet group, as follows: The additional alteration reduces the limit of the residual inside the Dirichlet family; currently, δ = (δ 1 … δ K-1 ).
This leads to the subsequent subsection of the Dirichlet family that is representative of the whole, as follows: Considering that the number of parameters in Equation (34) is one and the number of parameters in Equation (36) is then K − 1. Furthermore, every f ∈ f ℑD maintains the following: where ∀k = 1, …, K; because f D ⊂ f ℑD , it also conforms to f ∈ f D . Therefore, the circulation of q(θ) is associated with both families' circulations in Equations (34) and (36). Finally, noticing that both parameterizations are the same when K = 2, and a Dirichlet circulation that has the potential to be representative of the whole converts to a Dirichlet that is not generalizable. To make the best use of Equation (31), the subsequent suppositional estimation algorithm was applied.

Suppositional estimation model
Considering that the unbiased function in Equation (32) can be assessed for any assumed assessment of δ. Assuming that θ m , m = 1, …, M is an unselective instance of f(⋅,δ). Consequently: 1. Equation (38) is an unbiased estimator of L(δ). The suppositional estimation algorithms in [30] serve as a remedy to the problem of improving a function that could be perceived under the mere existence of noise. Assuming that both 1 and K − 1 conform to the number of parameters, that is, signified by d: = card (δ).

Setting t = 0 and providing particular original standards δ(t).
3. In terms of t = 1,2, … a. Determining a slope estimation = ( , (t −1) ) of ∆L as follows: The Spall conditions [30--32] certify that a difficult combination of δ (t) and the best value is t→∞. The conditions are satisfied with the addition of orders {a t }, {c t }, t = 1, 2. In the following, the Spall [32] set is provided: To calculate the slope ∇L by using a Monte Carlo-type estimation for specific a, A, c, γ > 0, Phases of 2(a).i and 2(a).iv suggest the "concurrent disagreement" method of [32]. By utilizing this technique in comparison with a standard finite difference approach (2 vs 2d), the entire fundamentals of the parameter course δ change concurrently. Additionally, in each iteration, the number of objective function evaluations decrease remarkably. "DU" denotes the distinct unchanging circulations in the range {−1,1}. The fraction in phase 2(a).iv is representative of the nominator being divided by each element of the vector with a slight change in values.
Based on [33], limited sample training of suppositional estimation algorithms cannot be done simply by establishing simple stopping criteria. During our presentation, a favourable achievement was possible by using the following discovery process. The moving average of the parameter standards is calculated for every s repetitions, as follows: where =s, 2s, 3s, 4s, … .
Then, by considering a predetermined progressive digit v, it could be denoted as follows: The approximation of the unbiased function assessed at ( ) is as follows: Finally, representing the group of the previous v assessments as follows: To terminate the algorithm, the number of executions at sign vector R(τ) should be equal to v − 1 at iteration τ ∈ {vs,(v+1)s, (v+2)s…}.

The proposed adjustable MkR method
The risk of losing information during data transformation could be avoided by adding an input into the solar power forecasts within a specified time. This is feasible by adding highresolution (i.e. captured every 5 s) solar power data, which reproduces certain variation data. From Section 2.1, to repair high-resolution solar power information, it should be processed via a maximum mean discrepancy (MMD) based kernel function. Hence, the consideration of progressed data within a specified time can be performed by the use of a kernel function that is based on radiance. The recent summarized information loss could be moderated by establishing a multikernel regression model. Nevertheless, modelling the observed complex solar power prediction error is not feasible via the Gaussian error circulations assumption (a counterpart of least-squares loss). To address this problem, it is assumed that the error term of the suggested MkR model is handled by a GMM for which the quantity of constituents is mechanically designated using the stick breaking structure method.
Consequently, the recommended adjustable rough MkR model, in addition to avoiding information loss, has the advantage of addressing the discrepancy between the assumption of error circulation used in the prediction models and the error circulation of the real solar power prediction. In addition, a differential Bayesian method is a method by which adaptive robust multikernel regression could be optimized and deterministic and probabilistic forecasts could be generated concurrently.

ASSESSMENT BENCHMARKS FOR PREDICTION PERFORMANCE
This section introduces the assessment benchmarks for both deterministic and probabilistic predictions.

Assessment benchmarks for deterministic predictions
The deterministic prediction performance of various models is based on three benchmarks including standardized mean total error (SMTE), standardized root mean square error (SRMSE), and an improved version of the mean total percentage error (IMTPE) [34,35].

Assessment benchmarks for probabilistic predictions
Assessing the constructed prediction intervals (PIs) can be performed by applying the following five criteria to evaluate various models' probabilistic forecasts. The most prominent criterion is PI coverage probability (PICP), which is defined as: where L p l represents the lower boundary and U p l represents the upper boundary of l th PI. A higher rate of PICP indicates that existent solar power data are situated in the built PIs. However, assuming a PI nominal confidence level (PINC), higher calculated values of PICP may be inferior [36]. The correlational relationship between smaller digressions of the PINC and PICP and the improved PI structure is positive. This digression is designated by the middling coverage error (MCE) precisely, as follows: Additionally, PI quality could be assessed based on the width criterion. For a given PICP, the PI conveys more useful information with a narrower width. Advantageous decisions cannot be made with broad PIs. The PI normalized average width (PINAW) is normally applied to the PI breadth [35] as follows: where Max p and Min p denote the extreme and least of the prediction objectives, respectively. Simultaneous consideration of both the cover probability and the breadth is essential in weighing the practicality of dissimilar PIs. The cover width-based yardstick (CWY) is a mixed yardstick that is recommended to assess the excellence of the built PIs systematically. The CWY is illustrated as follows: Consider that at time l, L is the length of the test set, C signifies the fixed volume and R p l and F p l designate the real solar power and the prognosticated solar power, respectively.
The nominal confidence level is signified via ν. In this study, ρ is adopted to train inacceptable PIs, and it is set to 50 in this investigation. The I (PICP) is presented as follows: The quality of the solar energy PIs in terms of sharpness is evaluated by the Winkler score and it can be calculated as follows: to obtain the best output from the prediction process. In this work, the partial autocorrelation function (PACF) is used to find the potential inputs of the proposed MkR model. Because the optimal order of the solar signal is three, we can use the proposed model to forecast the previous half hour's documented five-second solar power output. The optimal values of the proposed MkR parameters for the MMAPE error criteria are presented in Figure 4. The optimal values for the parameters of the proposed model are tuned based on the minimum value of the forecasting error by considering the validation set ( Figure 4). In other words, the kernel parameters are optimized by the validation set based forecasting process of the recommended model. The proposed model is formed after determining the input. The proposed model attaches high-resolution information to forecast solar energy. The values of the MMAPE that are being applied to the offered validation set based model are shown in Figure 4. The smallest number of errors in the validation set based prediction by the recommended model achieves optimal parameters.

Data collection
The four selected locations in China under the labels of Place 1, Place 2, Place 3 and Place 4 are situated in a Tibet self-governing district [37]. Figure 5 indicates the details (i.e. parallels, meridians, and time region). Additionally, it could be inferred from this figure that Tibet is a region in China with richer global horizontal radiation. Therefore, studying the four designated places with diverse parallels (i.e. 4326.99 m, 4737.80 m, 4603.35 m and 4849.95 m) is a substantial effort. Eight groups of mean hourly data files are devoted to each site. These groups are comprised of global horizontal irradiance (W/m2), pressure (mbar), rain (cm), hotness (degree centigrade), comparative moisture percent %), solar peak angle (degree), wind acceleration m/s) and wind direction (unit of measurement: degree). These data are obtained from the National Solar Radiation Database (NSRDB) [38]. The study lasted for 365 days (from 1 January to 31 Decem-

FIGURE 5
Tibet self-governing district test cases in China ber, 2014) and a 24-h period was considered each day for the solar elevation above 0. The global horizontal radiation hours ranged from approximately 10 to 14 per day because the duration of sunshine varies between seasons. Prediction of global horizontal radiation by using these seven relative meteorological factors is the central goal of this study. By regarding the last seven days of each month as the test data file, the remaining days were treated as the training data. Specifically, 84 days (24-h ahead) were predicted by use of the other 281 days. Each season, one day was selected at random to be prognosticated by three randomly selected days (training data). Generally, 12 days were used to predict one day. To show the effectiveness of the suggested approach in this paper (see Table 2), the last one week of each month are considered as a test dataset, and the remaining days are used for training data while, the Dataset is (x,y) = {(x(t),y t |t = 1,2,…,281}, Training-set will be {(x(t), y t | t = 1, 2, …, 281}, and Forecasting test = {(x(t),y t |t = 282, 283,…,365}. For the pre-processing model we used the Maximum likelihood principal component analysis (PCA) [39] based data imputation. In this model high variance assigns to the missing values prior to PCA allowing to fit PCA model by disregarding the missing points which allows PCA in presence of missing values.

Deterministic solar energy forecasting
Deterministic solar power prediction is achieved by the qualified recommended model using the MkR. The results are presented in Table 3. In this figure, support vector machine (SVM), least squares SVM (LSSVM), persistence model, autoregressive integrated moving average (ARIMA), quantile regression (QR), wavelet transform plus SVM (WT+SVM), and particle swarm optimization plus ANFIS (PSO+ANFIS) models have been considered to be well-known prediction strategies to compare with the proposed model. Although the running time of proposed algorithm may be higher than of some models, accurate outcome of this model can cover this It could be inferred that among the compared five prediction models, the LSSVM's achievement was the best and the SVM's performance was the worst. However, the proposed model could provide better results in comparison with LSSVM, where (4.18 − 2.13)/4.18 = 49% improvement can be considered for NMAE error at Site 1. The improvement at this site for NRMSE and MMAPE are (8.07 − 4.47)/8.07 = 47% and (10.45 − 7.24)/10.45 = 31%, respectively. At the second site, the values of improvement in comparison with LSSVM (as the best compared model) are 49%, 45%, and 33% for NMAE, NRMSE and MMAPE, respectively. Furthermore, these types of improvement for the third and fourth sites are: 61%, 50% and 34% for Site 3 and 47%, 44%, and 37% for Site 4. In the proposed model, the prediction error is directed by AGM, not an alternative of a lone Gaussian circulation. The predominance of the prediction model over the others is due to the following reasons. First, the AGM has a much longer tail in comparison with a lone Gaussian circulation. This causes the proposed model to be richer than the other standard models. Second, as the AGMs are outstandingly correct for every  constant circulation, the solar power prediction error of the AGM models was estimated more accurately. Figure 6 is representative of the real solar power prediction error and the conforming PDF values from the MkR model. Furthermore, the deterministic prediction model in comparison with LSSVM and QR are presented in Figure 7 for all sites on May 26. As shown in this figure, the proposed method could fit the real solar signal in comparison with the other models and obtain better accuracy by following the signal's volatility.

5.1.1
Step-by-step implementation of proposed model In this section, the proposed forecasting approach is applied step-by-step to show the effects of proposed improvements and novelties in forecasting approach. For this purpose, the proposed model is applied on site 1, through NMAE, NRMSE and MMAPE criteria. As shown in Table 4, the suggested model is tested by classic version of MMD, classic GM and classic VBM. Additionally, the improvement of proposed method in comparison with FR Smirnov and improved IMD model is about 46%, 31% and 22% for NMA, NRMSE AND MMAPE, respectively, which proof the validity and superiority of suggested approach.

Probabilistic solar energy forecasting
In this section, the proposed forecasting method is applied to the mentioned test case to obtain the probabilistic outcomes. According to the probability density figure (Figure 7), the proposed AGM approach is better than the other single Gaussian methods. Consequently, deriving built PIs from the AGM may result in more precise data than deriving them from a single Gaussian prediction circulation. The PIs built via the proposed MkR approach are evaluated against the PIs produced by other wellknown models (i.e. persistence, QR, and ARIMA) to obtain the proposed MkR efficiency. The outcomes of the other three prediction approaches in comparison with the proposed model for different PINCs (90%, 96%, and 99%) are depicted in Figures 8 and 9. The PIs made by the proposed MkR model under dissimilar PINCs are shown in Figure 10. As shown in Figures 8 and 9, the PIs with the highest coverage probability and smallest gap according to the ACE standards are produced by the suggested MkR approach under all PINCs. Note that the applied submission of the built PIs is vitally affected by high coverage of the PIs. Regarding the values of PINAW, the PIs produced by the proposed MkR are wide, while the PIs produced by ARIMA are the narrowest among the models. As a result, being restricted to merely three pointers (i.e. PICP, ACE, and PINAW) makes determining the best probabilistic prediction approach a challenging task. The CWC index is the best choice for evaluating the constructed PIs since it taps into both coverage probability and the PI width, which must be considered in the evaluation. We can stand by the CWC value information in Figures 8 and 9 and the Winkler score criterion to prove the recommended model outperformed the other models. In summary, if the problems of the PI size are removed, the recommended MkR model is the best in terms of performance among other models.
The recommended model is of large width because AGM has a longer tail in comparison with the single Gaussian distribution. As a result, possessing a large PINC leads to the width increasing. If the PINC is low, then there will be a distribution with a sharp peak, which leads to the PIs having a narrower width.

Diebold-Mariano test
This test model is suggested by Diebold and Mariano [40] which is described as: ▪ Let {yt} indicates the real data series. And {ŷ h i,t } describes the i th competing h-step prediction signal. ▪ Assume the prediction errors from the i th competing approach are, e h i,t = (i, 1, 2, 3, … , m)while m shows the number of prediction methods. So, h-step prediction errors e h i,t is: And the accuracy of prediction can be evaluated by: .
The value of h is considered as 1 in this work, and the superscript h is deleted in the following context. In this paper, the following squared-error loss is considered: The statistic test of DM is: In which 2f d (0)is a consistent predictor of the asymptotic variance of √ T d . The variance is considered in the statistic due to the sample of loss differentials d t are serially correlated for h > 1. Accordingly, DM statistics converge to a normal distribution; the null hypothesis can be rejected if |DM| > 1.96 at the 5% level. The mentioned status correlated to the zone A and zone C in Figure 11. Else, if |DM| ≤ 1.96, the null hypothesis H • cannot be rejected, that is shown in zone B of Figure 11.

DM test evaluation
In this section the proposed forecasting approach is compared with other two closest models in deterministic and probabilistic models. Obtained results of this comparison are presented in Table 5. In this comparison, the zero hypothesis means that the obtained performance differences among the two compared models is insignificant, and alternative hypothesis indicates significant differences in this comparison. As shown in this table, test results in all comparisons are H 1 which means significant differences between proposed method and other approaches are higher than 1.96. So, by this comparison we can claim that the forecasting accuracy of the suggested model is better than other methods.

CONCLUSION
In this paper, a new model for the prediction of solar energy in both deterministic and probabilistic cases is presented. The recommended approach is highly capable of fitting an extensive series of unceasing compound circulations. The effectiveness of the proposed model was tested in different solar