Using Machine Learning Algorithms to Predict Failure on the PCB Surface under Corrosive Conditions

A printed circuit board (PCB) surface can fail by corrosion due to various environmental factors. This paper focuses on machine learning (ML) techniques to build predictive models to forecast PCB surface failure due to electrochemical migration (ECM) and leakage current (LC) levels under corrosive conditions containing the combination of six critical factors. The modeling methodology in this paper used common supervised ML algorithms by accomplishing significant evaluation metrics to show the performance of each algorithm. The conclusion of this study presents that ML algorithms can create predictive models to forecast PCB failures and estimate LC values effectively and quickly.


Introduction
Today electronic systems worldwide experience several corrosion failure issues due to their extensive use under various environmental exposure conditions [1,2].Enhanced use of power electronics as part of renewable energy systems and electrification of the vehicles enhances corrosion problems due to increased exposure to harsh environmental conditions as well as the required enhanced performance boundaries, therefore requiring to use higher power and bias level on the parts such as PCBs.Main reason for the corrosion failure is due to the formation of water film on PCB surface under exposure to humid conditions, which acts as connecting electrolyte layer between biased components triggering electrochemical failure modes.Multiple factors are involved in such corrosion failures, which involve different failure modes caused by electrochemical process resulting from water film formation on the PCB surface due to atmospheric humidity and condensation [3,4].Electrochemical failure modes leads to malfunction of the electronic devices as the stray currents caused by the electrochemical process interfere with PCB functionality [5,6].Two types corrosion failures are important in this respect, namely surface leakage current (LC) produced in the connecting electrolyte due to electrochemical reactions and subsequent dendrite formation due to electrochemical migration (ECM) between positively and negatively biased points causing electrical short circuits [7,8].Among the several factors involved in causing these failure modes, although effect of individual factors can be determined, dependency of the combination of factors are more complex, however more realistic to understand in connection with failures in actual practice.The combination of critical factors are humidity (H) and temperature (T) related to the climatic conditions category (assuming that the electronic device is well protected from ingression of aerosols), process related contamination (residues) types existing on the PCB surface (CT) and levels (C), pitch distance (P) meaning the distance between positively and negatively biased points on the PCB surface, and bias level (V) at the concerned area of the PCB.Corrosion failure modes on PCBs under various conditions of combination of critical factors like H, T, CT, C, P, and V include failures described above such as LC influencing the functionality of a particular component or whole PCB and electrical short circuit due to ECM [9,10].
The remedial and preventive action of PCBs failure at an initial stage based on optimization of the design, process optimization to reduce contamination, and such protection strategies require predictive analytics approaches to understand the multi-parameter influences [11][12][13].Machine learning (ML), as an important technique of predictive analytics, plays an important role with the high performance in failure predictive analytics [14,15].Failure prediction using ML techniques has acquired considerable attention due to its high ability to discover patterns and create accurate models to predict future actions [16].ML deal with the issues of how to build and design computer programs that improve their efficiency and accuracy for some specific task based on past events or observations [17,18].There are two main purposes of ML, which include: (i) to predict the dependent variable based on independent variables and (ii) to estimate the effect of the independent variables on the dependent variable [18,19].Generally, ML has three different categories, namely supervised, unsupervised, and reinforcement learning [20,21].Supervised learning is commonly used to build the prediction model that links input data with output ones based on many input-output pairs datasets [22].In other words, supervised learning has prior knowledge of the output values for each input variable and tries to learn a pattern of the relationship between predictors (input) and response (output) [23,24].Learning of each algorithm is the process of building itself a model from the training dataset [25].Classification and regression are two categories of supervised ML [26].The difference of the regression from the classification is that its output value is continuous.Therefore the regression may be mapped into the classification by discretizing the output value [27].There is much literature on applying supervised classification ML algorithms in different applications like wastewater plant, software, aircraft engine reliability, and COVID-19 effective prediction [28][29][30][31]; as well as there are much literature on using supervised regression ML algorithms in diverse requests like healthcare [32], finance [33], robotics [34], travel [35], the automotive industry [36], and atmospheric corrosion [37].
Furthermore, there are some studies on PCB using ML algorithms have performed like fault recognition regarding the PCB glue [38], detection of various types of defects in PCB inspection [39], reliability prediction of solder joints failures [40], prediction of solder joint health [41], and evaluation of the life prediction effect of ECM on PCB using three regression methods [42], which these only used some of the ML algorithms, or they used one of the classifications or regression analysis

Table 1
Chosen critical factors and levels along with symbols, values, and units.

Table 2
Overview of different evaluation metrics and formulas for classification and regression supervised ML algorithms.(TP: true positive, FP: false positive, FN: false negative, TN: true negative, n: total number of the samples in the dataset, y i : experimental value, y i : predicted value).

Formula
Metrics Formula F1 score 2TP (2TP + FP + FN) Bahrebar et al. for some limited critical factors and in low diverse corrosive conditions.However, none of these works considered LC predictions and connected ECM failures in detail.Generally, The ML-based approach for predicting PCB failures has not been considered enough in the past, as ML generally needs enough data samples to build efficient prediction models.Data collection is time consuming in relation to humidity effects on electronics in order to gather enough data samples suitable for machine learning tasks.Therefore, this study has made ML possible by doing many experiments to study how ML help in predicting PCB failure status as well as leakage current.
In this work, both classification and regression analysis using most applicable supervised ML algorithms included [43]; k-nearest neighbors (k-NN), decision tree (DT), random forest (RF), support vector machines (SVM), and deep neural network (DNN) was utilized in prediction of short circuit failure mode due to ECM and LC on PCB surface due to corrosion process.The 729 different conditions constructed of a 3 6 complete crossed design of three levels of six critical factors; H, T, CT, C, P, and V at 4374 h experiments runs inside a climatic chamber.Surface insulation resistance (SIR) PCBs with interdigitated electrodes are used as test boards, pre-contaminated with three weak organic acids (WOAs), namely adipic, glutaric, and succinic acids, which are common activators in the solder flux used for the PCB manufacturing process.Hence, the presence of these residues lead to water film formation under humid conditions and corrosion process assisting final failure development, namely LC and ECM.The LC measurement due to electrochemical process for all 729 conditions over time showed a pattern of initial stable level followed by a sudden jump depending on whether the corrosion process leading to ECM dendrite formation and short circuit.The failure conditions in classification and the LC in regression categories of the supervised ML algorithms were investigated to determine the best model for prediction using F1 score and mean squared error (MSE), respectively, which are influential evaluation metrics for classification and regression analysis, respectively [44,45].The k-fold cross-validation was used to select the best-tuned hyperparameters values of each ML algorithm along with grid search approach.The grid search approach has been used to set a grid of discrete hyperparameter values based on the metrics for scoring the best algorithm performance [46].Eventually, the applicable predictive model was chosen by performance estimation using the 115 new and unseen data conditions in the test dataset.The results presented the SVM and RF models for classification with the highest F1 score, accuracy, sensitivity, precision, and area under the receiver operating characteristic (ROC) curve (AUC) metrics values, as well as DT and RF models in regression with the lowest values of the variety of errors, had the top score as the best models of the test dataset.The conclusion of this study presented that the RF model in the training validation, predicting both PCB failure conditions and LC values, offered the best results.

Experimental data conditions
The experimental data conditions were obtained from systematic experiments using interdigitated SIR PCB surfaces mimicking PCB   layout and exposed to various conditions.Each condition was made of combinations of one categorical and five numerical factors.Each contamination type as a categorical factor has collected 243 different data conditions.Each condition is a unique compound of P, C, T, H, V, and CT at 21,600 s measurements.The experimental conditions were designed to cover all conditions with three levels of the six critical factors with performing a 3 6 complete crossed design.In general, 729 different conditions of all combinations factors/levels at 4374 h experimental time have been employed to train and validate all supervised ML algorithms.Table 1 displays the five numerical factors included H, T, C, P, V besides CT as a categorical factor, each of them at three levels, along with symbols, values, and units.Actually, the aggressiveness increases from level 1-3 for all factors except pitch distance (inverse) and contamination type (aggressiveness of G(glutaric acid) is more than S (succinic acid) and A(adipic acid), and S is more than A).The following segment presents a brief overview of selecting the critical factors and levels in this study.

Pitch distance
The various demand and the remarkable tendency to miniaturize electronic devices cause reducing the space of PCB and high component density [4].Hence, the pitch distance is a significant factor in PCB failure due to the special effect on increasing the electric field power and making it easy for dendrite formation during ECM [47].Three pitch distances of the SIR PCB used are similar to actual component size with electronic industries association (EIA) codes; 0201, 0402, and 0603.

Voltage
The voltage is another critical factor for the electrochemical and corrosion process on PCB, which creates the electric field between water film connection biased points, which not only cause electrochemical reactions at the biased points but also migration of ions through the water film resulting in ECM [48,49].The three voltage levels are considered in this work following the real range of bias on low power electronic devices and stress direct current (DC) voltage standard [48].

Climatic conditions
The climatic conditions have a major role in the PCB failure as it leads to the water film formation.It refers to the level of humidity, temperature, and their variations.These two factors can determine the condensation range of water vapor on the PCB surface [50].For condensation, the dew point range and the size of dew droplet formation on the PCB surface are affected by climatic conditions [51].Climatic conditions can also influence in increasing the absolute humidity (AH) to interact with PCB surface, reducing the deliquescent relative humidity (RH) level and increasing solubility for contamination, and extending the deposition process for ECM through the increasing electrochemical process [52,53].The levels of both temperature and RH have proportional with the actual environment and deliquescence of solder flux activators [54].

Contamination
Cleanliness is an essential issue from the PCB failure point of view.In other words, the potential for corrosion occurrence is determined by the PCB cleanliness [55].The detrimental effects of the contamination depend on different factors such as the type of contamination (CT), the quantity of contamination or contamination level (C), and the location of the contamination [56,57].The importance of contamination effects with the combination of climatic conditions is to induce water film formation due to deliquescence, contributing to the conductivity of the water layer, and depending on the type of contamination acting as an aggressive species for corrosion to occur [54].The main type of contaminations originated from the component assembling process of the PCB manufacturing, such as wave and reflow soldering process is related to the use of solder flux chemistry containing WOAs [58].Between wave and reflow soldering process, wave soldering contributes higher contamination due to the use of liquid flux and application by spraying process.The adipic acid, glutaric acid, and succinic acid as important activator WOA compounds are considered in this work, each of them in three levels, which match with typical levels of flux residues usually seen after the wave soldering process [59,60].

SIR PCB specimen
According to the IPC standard testing, SIR PCBs with interdigitated electrodes are frequently used for finding climatic effects on reliability, material qualification, service life estimation, and evaluating different factors on the PCB surface [61,62].Individual SIR patterns included three different surface areas, and three pitch distances of 0.3, 0.6, and 1 mm are shown in Fig. 1.They were made on FR-4 laminate with a thickness of 1.6 mm, complying with the IPC-4101/21 standard.

SIR PCB preparation for testing
Before starting the experiment, the two electrodes for each SIR pattern are hand soldered to two small external wires to connect with the SIR measurement system instrument.The PCBs are cleaned and then dried by an air compressor using isopropanol for three times.Then, each SIR pattern is contaminated with the solution of one critical WOAs, which is made up of 2.5 g WOAs dissolved in 100 ML isopropanol at the concentration of 25 g/L.Three levels of the solution are dispensed on the surface of SIR patterns using a micropipette to obtain three levels of contamination.A1 (N is not a hyperparameter), hL: the number of hidden layers).

Climatic chamber for testing and electrochemical system for LC measurement
All experiments are performed using the Espec climatic chamber with the tolerances of ± 0.3 • C and ± 2.5% RH.The 1.5 h stabilization time is considered to be in the climatic chamber before each test.The BioLogic VSP with the multichannel workstation (5 channels) is used to apply bias to the board and measure leak current.For electrodes connections set up, two wires were used.One of them from the test board was connected to the working electrode, while another electrode was used both for reference and counter.

ML algorithms and designation of their hyperparameters
Recently, ML techniques are being adopted for a variety of applications of predictive analytics [63,64].Predictive analytics can include ML  algorithms to analyze data quickly and efficiently [65].ML algorithm predicts the outcome based on training input data obtained for analytical purposes [66].The five commonly used supervised ML algorithms (k-NN, DT, RF, SVM, and DNN) have been used for both regressions (obtain numerical values, e.g., LC values) and classifications (sort items into appropriate categories, e.g., for failed or not-failed prediction in various conditions) [43].
In order to improve the precision and commission of each ML algorithm, there need several hyperparameters to tune [67].Hyperparameters are important individual parameters for each algorithm to improve the model's performance, such as its complexity or its learning rate, and are commonly chosen based on some insights and trials for the training datasets [68].This study has considered two main hyperparameters of each algorithm that are understood to have the most considerable influence on the algorithms' performance [69].In contrast, for other hyperparameters, the default value is given by scikit-learn ML library of Python version 3 [70].
k-NN is a common supervised ML algorithm that identifies the similar and nearest neighbors of an data sample based on the preferences investigation [71].The k refers to the number of nearest neighbors, while the nearest neighbor is a data point that is among the k closest data points to the data point under consideration [72].The weight (W) in k-NN can be uniform that points to equal worth/weight given to each neighbor, or distance, which alludes to nearer neighbors having more weight than the others.
Decision tree (DT) algorithm belongs to the tree-based family of supervised ML algorithms, which builds decision rules from training data in form of decision nodes and leaf nodes branched from its root   node [73].The primary advantage of DT is that it is intuitive and easily explainable to the users.DT has two important hyperparameters, maximum depth of the tree (mD) and minimum samples (mSS), which is the minimum number of samples required to split an internal node.Random Frests (RF) is a common ensemble learning method for classification and regression tasks that operates by constructing a multitude of decision trees at training time, where each decision tree works on a sample set of the data.RF selects samples randomly and votes or averages over the predictions from all the decision trees for classification or regression tasks [72].The number of trees (nE) in the forest and the ma-ximum fraction of data features to be split for trees (mF) are the two important hyperparameters affecting the performance of the trained models in RF algorithm.Support vector machine (SVM) is a supervised ML algorithm that separates different target classes using extreme or support vectors to create the hyperplane in multidimensional space [74].Kernel, gamma (γ), and regularization (C) are among the most important hyperparameters which directly affect the performance of SVM [75].Radial basis function (RBF) kernel is the most preferred function to make proper separation when there is no prior knowledge of data [69].C is  greater than zero and tells the SVM optimization how much you want to avoid misclassification [75], where a smaller value of C allows the optimizer to ignore points close to the boundary and increases the margin.γ can be between 0 and 1 and is the influence of a plausible line of separation or the speared of the RBF function.Lower γ means less curvature or far away points considered.
The artificial neural network (ANN) is a supervised learning algorithm consisting of the multilayer perceptron that helps learn complex relations between inputs and outputs.Generally, an ANN comprises three layers: the input, hidden, and output layer, each containing different neurons (nodes).The term of deep learning comes from having multiple hidden layers [76].The input layer typically contains the independent variables that are used to predict the output.The output layer may have a different number of neurons based on the target labels/values [77].The hyperparameters in the deep neural network (DNN) algorithm include the number of hidden layers (hL) and the number of nodes or neurons in each hidden layer (nN).

Evaluation metrics for ML algorithms
The evaluation metrics are used to assess how well ML algorithms have performed the training and testing of the dataset.Moreover, the evaluation metrics by comparing the difference between the model prediction and actual experimental data can measure the accuracy of each model [43].Four commonly used evaluation metrics for classification include F1 score, precision, sensitivity, accuracy, and area under the receiver operating characteristic (ROC) curve (AUC).i.e., accuracy as an instinctive performance is measured using the confusion matrix.It is defined as the ratio of correct predictions to all observations.Similarly, four applicable metrics while predicting continuous variables have been utilized for regression analysis.These include mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).However, for unsophisticated selection of the best ML algorithm, F1 score and MSE as important evaluation metrics using the grid search approach have been employed for classification and regression, respectively.F1 score can be interpreted as a harmonic mean of precision and sensitivity.The best F1 score reaches 1, and the worst score at 0. In the regression, the best values, as well as best ML algorithms, have been obtained from comparing the MSE from 0 to 1, which the lower value indicates lower errors between predicted and real values and thus a better algorithm [78,79].The supported criterion hyperparameter for regression is also by default considered mean squared error (MSE), which is equal to variance reduction.Table 2 illustrates different metrics for both classification and regression, which can be acquired from the formulas [80,81].

Results of humidity testing: leak current data and electrochemical migration failure
Using interdigitated PCB test structure, two responses resulting from exposure to various corrosive conditions are investigated for failure, namely the LC resulting from the corrosion action as well as the failure due to ECM.The LC consequent from SIR reduction due to electrochemical process occurring between two water film connected electrodes due to the influencing factors such as electric field (based on bias level and pitch distance), flux residues (from the soldering process), and climatic conditions factors such as temperature.The failure status due to ECM is formed by begetting conductive electrolytes between two electrodes, besides the dissolution of anodic metal ions and migration then deposition at the cathode due to the bias condition, finally leading to short circuit.Fig. 2 illustrates the typical electrochemical current behavior as an important indicator to define LC levels due to initial current leaking through the water layer and dendrite formation current due to ECM.Curves represent failure conditions of the three contamination types in the same corrosive conditions, including P2, C3, V1, T3, and H3, with three different magnitudes, specifying two responses.First, there are LC before dendrite formation (show with yellow upward arrows) used for regression supervised ML modeling, and second, failure state due to dendrite formation (show with green downward arrows after the sudden jump) used for classification supervised ML modeling.The LC is assigned to the stable part, and the failure part is referred to initiate the process of failure on SIR PCB surfaces with a considerable jump in LC due to dendrite formation bridging between conductors (examples in Fig. 3).The threshold of around 100 μA for this state is defined for the present analysis, which is detected for the ECM and dendrite formation.This value of current is commonly accepted as the threshold for failure due to ECM.This figure also clearly shows, in overall glutaric acid, succinic acid, and adipic acid due to being the relative difference in aggressivity of contamination, respectively, having varying LCs and failure times due to ECM [82,83].Among these contaminations, Glutaric acid is more aggressive due to its deliquescent relative humidity (DRH) value and high solubility in water film due to higher dissociation constant followed by succinic acid, and adipic acid [84,85].Glutaric acid has DRH in the range of 80-87%RH for a temperature range of 25-60 • C, while values for adipic and succinic acid for the same range of temperature are around 83-99%RH and 81-95%RH, respectively [54,82,83,86].Table 3 displays the snippet of the experimental data results with the average of LC and failure state in each condition gathered by such kinds of the graphs from Fig. 2 and used for the ML modeling part in the next section.Fig. A2 and Fig. A3 in the appendix present all experimental data results for the LC values and failure state at each condition, respectively.Fig. 3 shows the typical surface view of test PCB showing permanent dendrite filament between two electrodes on SIR PCB surface when applied corrosive conditions (P2, C3, V1, T3, and H3) for three different contamination types (the current measurement curves of them are shown in Fig. 2) using a digital microscope at ex-situ for the optical inspection of dendrite formation (failure state) as an example.Pictures in Fig. 3 shows ECM dendrite formation for all contamination types due to the higher humidity, temperature and contamination level, while succinic and adipic acid shows less susceptibility for the formation of dendrite at lower humidity, temperature and contamination levels.In general, tendency for formation of ECM increases with increasing aggressivity of conditions (shown in Table 1) namely increase in bias, contamination level, temperature, humidity and decrease in pitch distance.This is also clear in snipped data shown in Table 3, which shows lower LC values and no-failed conditions for less aggressive contaminations (succinic and adipic) and in general for low bias, temperature, humidity and high pitch distances.

Validation and hyperparameters tuning of ML algorithms
The training set is automatically divided into two parts; training and validation dataset.Selecting the validation method usually depends on the dataset size and is categorized into three methods; leave one-out cross-validation (small sample size), K-fold cross-validation (moderate sample size), and validation with test set (large sample size) [87].This study has used K-fold cross-validation with having K equal to 5 for all algorithms.In 5-fold cross-validation, the training dataset is separated into five subsets, of which four subsets are for training an ML model, and the fifth subset evaluates the quality of the trained ML model.This means that this method uses 20% of the data for testing, and this is usually very accurate.This procedure has been repeated five times, choosing different subsets for validation [88].
The grid search approach is a common technique for hyperparameters tuning [89].It searches exhaustively through the grid for each set of hyperparameters and evaluates each algorithm score based on an evaluation metric (F1 score, MSE, etc.).This approach methodically selects the best combination of hyperparameters ranges values in a specific mixture grid [46].

ML algorithms setup
All of the experimental results obtained from 729 different conditions have been used for the training set of five different ML algorithms in classification analysis to predict failed or not-failed conditions.We use all not-failed data samples for regression analysis to predict LC values which is around 71.5% of all data means 521 conditions.We only use not-failed data in our regression task, as LC values were obtained from the initial part of the current measurement in the stable part before the failure [90].
Table 4 shows two hyperparameters of each algorithm that have been assigned at different levels.Combining different values for each hyperparameter of Table 4 gives 100 separate combinations for SVM, DT, and RF algorithms, 20 combinations in the k-NN, and 84 combinations in the DNN algorithm.We used grid search to select the best value of each hyperparameter, giving the best models trained by each algorithm.
Once the best set of hyperparameters for each ML algorithm is selected by the grid search and the 5-fold cross validation, the best trained model of each algorithm was chosen for predicting the test dataset.Our test dataset consists of 115 unseen data samples with new conditions of combinations of six critical factors, which have not been in the training dataset used by grid search.Fig. 4 generally shows our overall pipeline for finding the best machine learning models of both classification and regression tasks.The next section displays the visual comparison of different range values of all ML algorithm hyperparameters to select the best values.In order to simplify the presentation of 84 different combinations of neurons in four hidden layers in Fig. 5(e), for comparison F1 score, we have dedicated 84 numbers for each combination.Table A1 in the appendix shows the arrangement of the neurons and hidden layers with dedicated numbers.Comparison of the number of hidden layers combined with how to arrange the neurons for each layer allocates the different values of F1 score.For instance, N equals 83 presents a DNN structure of 3 hidden layers (hL) in order of 20 neurons for the first hL, 20 neurons for the second hL, and 15 neurons for the third hL.The N is used to enumerate the available DNN architectures, and it is not a hyperparameter.The figure demonstrates that generally F1 score achieved by DNN classification algorithm increase with increasing the complexity of the neural network (number of neurons and hidden layers (L)) although the F1 score converges between 0.91 and 0.93 even in more complex networks.

Evaluation of trained ML algorithms for estimating LC values
Fig. 6 expresses the results of the 5-fold cross validation of each ML algorithm with different hyperparameters for regression analysis.We used MSE as comparison metrics in grid search for regression analysis.A1 for simply present to compares MSE obtained by DNN regression algorithm with different combinations (84 different assortments) of neural and hidden layers.This figure shows MSE in 2, and 3 hidden layers approximately give the same value; however, increasing the number of neurons/nodes decreases MSE values.For instance, according to Fig. 6(e) and Table A1, the best MSE value (lowest one) belongs to number 20, which represents the combination of two layers wherein each layer exists highest neurons/nodes ( [20; 20]).

Selection of the best trained ML models
Generally, the best k-NN model obtained from the set of K equals 5 with uniform as a weight hyperparameter for classification and with distance as another weight hyperparameter for regression.DT and RF models demonstrated the very close values of F1 score and MSE for each set of their hyperparameters at various ranges, which average of the metrics values compared to other models for both classification and regression give us the bests on them.The high ranges of the γ and low ranges of the C gave the highest F1 score and lowest MSE for SVM model.The best DNN model also illustrated the significant effect of neurons numbers in all hidden layers.Table 5 includes the optimum set of all hyperparameters for each studied ML algorithm, where a list of values of hyperparameters as well as the performance achieved is reported.We considered maximum of F1 scores for classification and minimum of MSE values for regression to select the best models.
Fig. A1 in the appendix, visually shows the recap of Table 5, which contains the best models of each studied algorithm for both classification and regression tasks.The RF was found as the best ML algorithm among this study with F1 score of around 0.96 ± 0.01 for prediction of PCB failure (Table 5 and Fig. A1).Moreover, in LC values prediction, DT with about 0.0001 MSE difference of RF was seen as the best ML algorithm for regression analysis with 0.0002 ± 0.0002 (Table 5 and Fig. 7).Nevertheless, there is no significant difference between all ML algorithms on the training dataset for both classification and regression analysis.In other words, the results show that ML algorithms are able to learn the distribution of our data and effectively predict failures of PCB boards as well as estimating the LC values.Therefore, the best of each algorithm has been used to predict of test dataset for finding the accuracy and performance of all applicable predictive models in the next section.

Evaluating the best-trained models
We use the test dataset to evaluate the performance of the bestselected trained model by each ML algorithm.The test data set comprised 115 new conditions of the mixture of different levels of the six critical factors.Table A2 in the appendix displays the different levels of the test dataset, which have been used for making other different conditions for test data, and those have not been used for the training in the ML algorithms.It is because of having the actual estimation of the accuracy and proficiency of the applicable predictive ML algorithms.Fig. 7 presents a confusion matrix achieved by each algorithm showing a summary of the classification results.All our classification performance metrics (i.e.F1 score, AUC, accuracy, sensitivity, and precision) can be calculated from the confusion matrix.
According to Fig. 7, SVM model in the F1 score, accuracy, and precision metrics have provided the highest values, and the RF model in AUC and sensitivity has given the best amounts.We prefer the F1 score over other metrics because it is effective for imbalanced datasets [91], and our test data consists of 76% failed and 24% Not-failed conditions, which is an imbalanced dataset.After F1 score, AUC with the highest value represented the best predictive model [92].Fig. 8 compares ROC curves and AUC of the best-trained model by each algorithm.
Table 6 compares MSE, RMSE, MAD, and MAPE of our studied regression algorithms to find the best ML model for LC predictions.The k-NN and the DNN models have the lowest MSE and RMSE and high MAD and MAPE values compared to others except for SVM.However, RF and DT have the lowest MAD and MAPE values.There is no significant difference to the MSE and RMSE values for whole models and they are following the same magnitude.Kernel density estimate (KDE) plot in Fig. 9 clearly shows that the predictions of LC values are obtained through different ML algorithm' compared to real LC's averaging in experimental measurement.The KDE plot is used for visualizing the density distribution of the real mean of LC and the predicted values [93].Based on comparisons of Table 6 and Fig. 9, it can be concluded that the RF and DT models have demonstrated better prediction according to the test dataset in this study.In other words, density of predicted LC Mean closely follows the density of real LC values for RF (Fig. 9e) and DT (Fig. 9d) in comparison with other studied regression algorithms.Fig. 10 presents the overview of the best ML algorithm of the testing dataset according to the F1 score and AUC metrics for classification, as well as MSE and MAD metrics for regression.

Overall discussion
ML has gained a lot of attention because of its ability to effectively solve complex problems across various industries.This study aims to create a generalized ML model for predicting PCB failure and LC values based on the input conditions combining six critical factors (i.e.pitch distance, contamination level, temperature, humidity, voltage, and contamination type).The results of our study on common ML algorithms prove that well-trained ML model can efficiently forecast failures of a specific PCB provided that the PCB is exposed to a condition where relevant descriptors are known.Predicting failures under known conditions is very much useful for pro-active design strategy for PCBs to increase robustness to prevent failure in exposure to critical conditions.As another anchor of this paper, LC prediction can be used to estimate the possibility of failure or even time to failure if we could find the correlation between LC and failure in different applications.
Comparing with the number of failures conditions on the SIR PCB surface at the same condition except for different CTs, the models based on machine learning had a satisfactory prediction performance even with the limited training dataset.For instance, the experiments indicated that glutaric acid compare to other CTs has a high number of corrosion failure conditions in the same corrosive condition on PCB surfaces, that ML models also emphasized it and could precisely predict different new conditions.
We followed two main steps for training and testing our predictive models: 1. Training and validation where we build predictive models using common ML algorithms (k-NN, DT, RF, SVM, DNN), and use validation data to find best trained models, and 2. Testing is our final evaluation of the best trained models using new input data (conditions) which have never been seen by the algorithm during the training procedure.Both regression and classification tasks were used as a trade-off between the model's inference and the model's complexity to avoid underfitting and overfitting.Underfitting occurs when a trained model is too simple to capture the relationship between the input and output.Overfitting occurs when the model has been overtrained and is good at predicting training data while at the same time showing high errors in predicting new test data.
The RF model in evaluating trained ML algorithms for classifying PCB failures with around 0.96 F1 score and estimating LC values about 0.0003 MSE illustrated the best-trained ML algorithm.Furthermore, in validating the best-trained models based on the test dataset, RF for failure prediction with 0.87 F1 score and highest AUC and for LC prediction with 0.0021 MSE presented one of the best models, which can be used for both classification and regression analysis.
Fig. 11 illustrates relative factor importance in classification and regression analysis according to the best ML model selected.The humidity factor has a major effect in both analyses (classification for ECM and regression for LC).Humidity is the key factor causing water film formation on the board, which is required as a medium for causing electrochemical failure [94].Humidity factor is also important in connection with contamination effects as the hygroscopic nature of the ionic residue (e.g.adipic, succinic or glutaric acids) represented by the DRH value reduces the humidity level required for water film formation on a PCB surface under transient climatic conditions.This observation is in agreement with the combined effect of humidity, and contamination types resulting from flux residue are reported in the literature [94][95][96][97][98].However, it is interesting to note that the contamination level for LC (Fig. 11b) shows significant effect, while less for classification (ECM) in Fig. 11(a).This is attributed to the fact that contamination level provides conductivity for the water film to produce LC, therefore triggering electrochemical process, while the ECM depends on the amount of metal ion and migration to cathode, which is independent of contamination level.Contamination type has significant effect on ECM failure meaning that more aggressive contamination (e.g.glutaric acid) caused more ECM failures [54,82,84].This is due to the increased acidity and aggressiveness of solution due to the high solubility of glutaric acid, therefore resulting in increased tin ion dissolution for migration.It has been reported that among the acid testing, glutaric acid cause high levels of ECM above ~85% humidity level [54,82,99].Similarly, although voltage shows less effect on both ECM failure and LC, while pitch distance showed more significant effect due to increased electric filed with bias levels.This is significant especially in connection with miniaturization for which reduction in component sizes and distances cause higher electric field [100,101].Temperature is effect is also significant for LC (regression), which is due to two factors namely: (i) reduction in DRH level for contaminations and (ii) increased solubility of acids giving higher conductivity.Kamila et al. have extensively investigated the combined effect of flux residue contaminations, temperature, and humidity conditions, which showed significant reduction in DRH values, increase in water absorption capability of contaminations, LC, and ECM susceptibility with increase in temperature [83,85,102,103].Overall relative factor importance from the ML analysis fit with experimental observations not only from the theoretical point of view in connection with humidity caused electrochemical failures in electronics, but also the reported experimental results in the literature.

conclusion
The study presents how machine learning is able to forecast corrosion failure on PCBs based on experimental data set.Using classification for predicting failure status in the form of Failed and Not-Failed labels, the PCB failures are adapted into a binary classification problem and then solved by the classification task.Relative factor importance of both classification and regression shows the importance of various influencing factors in agreement with experimental observations.Further, given that many corrosive conditions are combined with various factors/ levels that affect the PCB failure, use of well-trained ML algorithms in both classification and regression can be used as a tool to predict failure status and LC values reducing the number of experiments needed at different new conditions.
All ML algorithms in the training dataset showed suitable learning in both classification and regression.According to the results, the F1 score for all ML algorithms were between 0.71 and 0.96.Moreover, AUC, as another significant metric, could present a value of more than 0.9 in all of them.In addition, for classification, the range value of MSE was between 0.01 to the best value of 0.0002, which showed the acceptable error for all ML algorithms.
High performance in both classification and regression shows that ML is interesting for predicting failures of PCBs as well as estimating the LC values.We believe that there is room for more accurate ML models with higher number of experiments (conditions).Based on our evaluation, SVM and RF algorithms could achieve the highest scores (F1 score, AUC, accuracy, sensitivity, and precision) for predicting PCB failures under different conditions of critical factor combinations.
Our study shows that DT and RF regression algorithms could achieve the lowest values of errors on the test dataset.The RF algorithm could train the most effective model with high performance in predicting PCB failure as well as LC values as the best classifier and regressor, respectively.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Table A1
Arrangement of the hidden layers (hL), and number of neurons/nodes (nN), which illustrated with numbers (N) from 1 to 84 in Fig. 5

Fig. 2 .
Fig. 2. Typical leak current behavior in the same corrosive condition to define two responses (LCs and failure state) for supervised ML algorithms of three contamination types; adipic acid (black), glutaric acid (red), and succinic acid (blue).

Fig. 4 .
Fig. 4. General pipeline for building the applicable predictive model and performance estimation in ML.

Fig. 5 .
Fig. 5. Performance diagrams of classification (a) k-NN algorithm, (b) DT algorithm, (c) RF algorithm, (d).SVM algorithm, and (e) DNN algorithm on the training dataset of PCB failure classification for different values of the hyperparameters.(K: the number of neighbors, W: the weight of each neighbor, mD: the maximum depth of the tree, mSS: the minimum samples needed to split an internal node, nE: the number of trees, mF: the maximum fraction of data features to be split for trees, C: the misclassification or error term, γ: the influences of plausible separation, N: the number of each combination of neurons and layers based on TableA1(N is not a hyperparameter), hL: the number of hidden layers).

Fig. 6 .
Fig. 6.Performance diagrams of regression (a) k-NN algorithm, (b) DT algorithm, (c) RF algorithm, (d) SVM algorithm, and (e) DNN algorithm on the training dataset of LC regression for different values of the hyperparameters.(Refer to Fig. 5 captions).

Fig. 7 .
Fig. 7. Overview of the confusion matrices for assessing the classification accuracy of a) SVM algorithm, b) k-NN algorithm, c) DT algorithm, d) RF algorithm, and e) DNN algorithm on test dataset to predict failure on PCB surface.

Fig. 9 .
Fig. 9. KDE plots for each algorithm comparing the density of real and predicted LC values; (a) SVM algorithm, (b) k-NN algorithm, (c) DNN algorithm, (d) DT algorithm, and (e) RF algorithm.

Fig. 10 .
Fig. 10.Overview of the best ML algorithm of the testing dataset according to; (a) the F1 score and AUC metrics for classification, as well as (b) MSE and MAD metrics for regression, as the most significant metrics.

Fig. 5
Fig. 5 describe the results of the 5-fold cross-validation of each ML algorithm with different hyperparameters for classification task.The 5fold cross-validation randomly divides the dataset into 5 sets of approximately equal size, then the first fold is kept for testing and the model is trained on 4 folds to reduce the data bias and the variance of the result estimation.We used F1 score as comparison metrics in grid search for classification.Fig. 5(a) compares F1 score attained by k-NN classification algorithm with different values of K and W where each diagram represents a W value.The figure illustrates a shape similar to the parabola downwards curve that F1 score first increase and then decrease by increasing the K values.K equals 5 represents the climax of F1 score.Fig. 5(b) compares F1 score acquired by DT classification algorithm with diverse values of mD and mSS where each diagram characterizes an mSS value.The figure displays that F1 score increase entirely by increasing the Md values.The F1 score for the mD≤ 5 are the same, and after that mSS equal 7 shows the highest F1 score for all mD values.Fig. 5(c) also compares F1 score obtained by RF classification algorithm with different ranges of nE and mF where each diagram describes an mF value.The figure at a glance indicates similar effect of the mF and nE ranges upon the F1 score.Since, F1 score for all the 100 set of different values of mF and nE presents the very close value between 0.92 and 0.96.Fig. 5(d) compares F1 score achieved by SVM classification algorithm with different values of γ and C where each diagram represents a γ value.The figure shows that F1 score increase by incrementing C for all γ values especially for γ > 0.05 and converges with C> 300.In order to simplify the presentation of 84 different combinations of neurons in four hidden layers in Fig.5(e), for comparison F1 score, we have dedicated 84 numbers for each combination.TableA1in the appendix shows the arrangement of the neurons and hidden layers with dedicated numbers.Comparison of the number of hidden layers combined with how to arrange the neurons for each layer allocates the different values of F1 score.For instance, N equals 83 presents a DNN structure of 3 hidden layers (hL) in order of 20 neurons for the first hL, 20 neurons for the second hL, and 15 neurons for the third hL.The N is used to enumerate the available DNN architectures, and it is not a hyperparameter.The figure demonstrates that generally F1 score achieved by DNN classification algorithm increase with increasing the complexity of the neural network (number of neurons and hidden layers (L)) although the F1 score converges between 0.91 and 0.93 even in more complex networks.

Fig. 6 (
Fig.6expresses the results of the 5-fold cross validation of each ML algorithm with different hyperparameters for regression analysis.We used MSE as comparison metrics in grid search for regression analysis.Fig. 6(a) compares MSE achieved by k-NN regression with different values of K and W where in contrast to Fig. 5(a) presents MSE decrease and increase from K equal 5.The distance hyperparameter compared to the uniform displays a slightly better function in both classification and regression.Fig. 6(b) compares MSE earned by DT regression algorithm with various values of mD and mSS, where each diagram characterizes an mSS value.The figure displays MSE decrease by increasing the mD until

Fig. A1 .
Fig. A1.Overview of the best ML algorithm of the training dataset according to the F1 score and MSE metrics.

Fig. A3 .
Fig. A3.All PCB failure classification data results under various conditions combined with P, C, V, T, H, and CT factors, each at three levels.The red and blue colors present the Not-failure and Failure conditions, respectively.Each cell (box or rectangle) presents a condition merged of six factors at different levels.For instance, the cell marked with a light blue dotted line displayed a failure condition constructed of P1C1V1T2H1, G, i.e., low level of pitch distance (P1 = 300 µm), low level of contamination level (C1 =25 μg/cm 2 ), low level of voltage (V1 =2 V), medium level of temperature (T2 = 40 • C), low level of humidity (H1 =78%), and glutaric acid (G) as a contamination type.

Table 3
Snipped of the experimental data results with the failure state and average of LC in each condition constructed of a combination of CT, P, C, V, T, and H factors at different levels.

Table 4
Hyperparameters and their ranges of the studied ML algorithms.

Table 5
Best values for each ML algorithm were obtained from 5-fold cross-validation of training data.

Table 6
Comparison of the regression metrics on the test dataset.

Table A2
(e) and 6(e) for better visualization.Various levels of six critical factors to make new conditions for test dataset.