Fault diagnosis of gearboxes using wavelet support vector machine , least square support vector machine and wavelet packet transform

This work focuses on a method which experimentally recognizes faults of gearboxes using wavelet packet and two support vector machine models. Two wavelet selection criteria are used. Some statistical features of wavelet packet coefficients of vibration signals are selected. The optimal decomposition level of wavelet is selected based on the Maximum Energy to Shannon Entropy ratio criteria. In addition to this, Energy and Shannon Entropy of the wavelet coefficients are used as two new features along with other statistical parameters as input of the classifier. Eventually, the gearbox faults are classified using these statistical features as input to least square support vector machine (LSSVM) and wavelet support vector machine (WSVM). Some kernel functions and multi kernel function as a new method are used with three strategies for multi classification of gearboxes. The results of fault classification demonstrate that the WSVM identified the fault categories of gearbox more accurately and has a better diagnosis performance as compared to the LSSVM.


Introduction
Fault diagnosis of gearboxes is one of the most common and intricate challenges in plants.Analysis of vibration signal is a principal method for gearbox fault diagnosis.The procedure for a fault diagnosis of a gearbox can be stated in several steps: data acquisition, signal processing, feature selection and diagnostics [1,2].To analyze vibration signals, some methods such as time [3,4], frequency [5], and time-frequency domain [6] have been investigated.Between these, wavelet transform [7][8][9][10] has progressed in the last two decades, and outweighs the other time-frequency ways, although it is lacking in a few aspects as well.Discrete wavelet transform is primarily considered as an efficient tool for vibration based signal processing for fault detection.Wavelet analysis could provide local features in both time and frequency domains and has the feature of multi-scale, which enables wavelet analysis to distinguish the abrupt components of the vibration signal [11].The foundations of Support Vector Machines (SVM) have been developed by Vapnik [12,13] which is applied to both pattern recognition [14][15][16][17][18] and regression forecasting [19][20][21][22][23][24].The effectiveness of wavelet based features for fault diagnosis of gears using SVM and proximal support vector machines has been revealed by Saravanan et al. [25].Qu and Zuo [26] utilized a SVM to identify the wear degree of slurry pump.Sun et al. [27] predicted the remaining life of a bearing by establishing a SVR-based model.Hou and Li [28] optimised the parameters of SVR through an evolution strategy and formulated a SVR-based short-term fault prediction strategy.Shen et al. [29] presented a novel intelligent gear fault diagnosis model based on empirical mode decomposition and multi-class transductive support vector machine.Xian and Zeng [30] developed an intelligent fault diagnosis procedure based on wavelet packet transform (WPT) and hybrid SVM.Zamanian and Ohadi [31] presented a method for feature extraction based on exact wavelet analysis to improve the fault diagnosis of gears.In their study, feature extraction was based on maximization of local Gaussian correlation function of wavelet coefficients.They used from a linear support vector machine to classify feature sets extracted with the presented method.
The rest of this paper is outlined as follows.Section 2 briefly describes the fundamental theory of wavelet packet decomposition and two wavelet selection criteria.The proposed new machine health status identification method is presented in Section 3, followed by the experimental verification tests using both bearing and gearbox datasets as stated in Section 4. In Section 5, the effect of different wavelet basis functions on the performance of the proposed scheme is discussed.Conclusions are drawn in Section 6.

The review of wavelet packet transform
Wavelet packet transform is an extension of discrete wavelet transform.The signals are decomposed into a hierarchical structure of detail and approximations at limited levels as follows: where ( ) denotes the wavelet detail and ( ) stands for the wavelet approximation at the th level [1].A wavelet packet is a function with three indices of integers , and which are the modulation, scale and translation parameters, respectively: The wavelet functions are determined as follows: The original signal ( ) is defied after level of decomposition as follows: While the wavelet packet component signal ( ) are stated by a linear combination of wavelet packet functions , ( ) as follows: where the wavelet packet coefficients , ( ) are calculated by: , = ( ) , ( ) .Providing that the wavelet packet functions satisfy the orthogonality: Two wavelet selection criteria are used and compared to select a suitable wavelet for feature extraction of the problem.

Maximum relative wavelet energy criterion
Relative wavelet energy gives information about relative energy with associated frequency bands and can detect the degree of similarity between segments of a signal [32,33].The energy at each resolution level , will be the energy content of signal at each resolution is estimated by: where ' ' is the number of wavelet coefficients and , is the th wavelet coefficient of th scale.The total energy can be calculated as follows: The distribution of energy probability is defined as follows [33]: where ∑ = 1 , and the distribution, , is considered as a time scale density.The Total Energy is calculated for each scale and for vibration signals at different rotor speed and for different loading conditions using healthy and faulty gearbox conditions.

Maximum energy to Shannon entropy ratio criterion
A suitable wavelet is chosen as the base wavelet, which can extract the maximum amount of Energy while minimizing the Shannon entropy of the corresponding wavelet coefficients.The amount of the Energy and Shannon entropy of a signal's wavelet coefficient is shown by Energy to Shannon Entropy ratio [34] and is given as: In Eq. ( 12), the entropy of signal wavelet coefficients is given as follows: The energy probability distribution of the wavelet coefficients ( ), is given by: with ∑ = 1 , and log = 0 if = 0.

Multi class support vector machine
The SVM is a supervised learning method based on statistical learning theory formulated by Vapnik [12].The SVM maps the low dimensional data to the high dimensional feature space, and aims to solve a binary problem by searching an optimal hyper plane which can separate two datasets with the largest margin in the high dimensional space.The optimal hyper plane is established through a set of support vectors from the original datasets and these subsets form the boundary between the two classes.The classification function can be described as follows: where the nonlinear mapping function Ф( ) maps the input feature vector in to a higher dimensional feature space, is the bias, is the weight vector.and are used to determine the position of the separating hyper-plane.Some problems about multi-class classification have been researched [20,21].As seen before, really SVM is a binary classifier.However, rotating machinery may usually suffer more than two faults.To tackle this problem, in this paper three strategies, such as one-against-one (OAO), one-against-all (OAA) and one against others (OAOT) are used [35].

Least square support vector machine
LSSVM is a reformulation of standard SVM which was proposed by Suykens and Vandewalle [36].In contrast to SVM, the LSSVM uses a least squares cost function and involves equality constraints instead of inequalities in the problem formulation.Given the training set {( , )} with ∈ and ∈ (−1, 1).To class the training set, LSSVM has to find the optimal (with maximum margin) separating hyper plane so that LSSVM has good generalization ability.All of the separating hyper planes have the following representation in the feature space: ( ) = Ф( ) + , where is the normal vector of the separating hyper plane.Margin maximization is obtained by minimizing the squared norm of while also minimizing the fitting error of the training set.The resulting optimization problem of LSSVM can be formulated in the following form: where ́ is the regularization parameter.The Lagrangian comes in the form: where is the Lagrange multiplier.According to the conditions for optimality yield, the following equations must be satisfied: ⁄ = 0; ⁄ = 0; ⁄ = 0; and ⁄ = 0. Then a linear system for classification and regression can be obtained from the Karush-Kuhn-Tucker conditions [37].Its solution is found by solving the system of linear equations expressed in matrix form as follows: where Then the regression function of LSSVM is obtained: where the kernel function can be given by ( , ) = Ф ( )Ф( ) and it meets Mercer's condition.In the process of fault diagnosis, it is very important to choose a reasonable kernel function for support vector machine.Different kernel functions will obtain different decision functions so that determine the operation performance for support vector machine.Generally, two kinds of kernels, i.e. local kernel and global kernel, are utilized to construct the decision functions [38].A typical local kernel is radial basis function kernel, which is defined as follows: where is the width of the RBF kernel.A typical global kernel is the polynomial kernel, which is defined as follows: where denotes the kernel parameter.In order to improve the classification performance and generalization ability for LSSVM, a multi-kernel ( ) support vector machine (MSVM) is constructed in this study by a controlled parameter based on the local kernel function and global kernel function : where 0 < < 1 is the controlled parameter.To be an admissible kernel in SVM, kernels must satisfy Mercer's Theorem.Since and all satisfy Mercer's Theorem, therefore a convex combination of them also satisfy Mercer's Theorem.In the MSVM model, there are four parameters: weight parameter , penalty constant , kernel parameters and .The weight parameter is used for weight assignment for different kernel function.The penalty constant is used for these samples misclassified by the optimal separating plane and its role is to strike a proper balance between the calculation complexity and the separating error.The kernel function parameters and reflect the characteristics of the training data.All these parameters affect the generalization of MSVM and exert a considerable influence on the performance of MSVM.However, it is not known beforehand which parameters are best for a given problem.In this work, parameters in multi-kernel SVM are randomly selected.The LSSVM was initially proposed to deal with binary classification problems.Multi-classification problems can also be solved by combining a number of binary LSSVMs using any of a number of strategies, such as one-versus-one, one-versus-all and one against others.In this study, OAO, OAA and OAOT methods are used.

Wavelet support vector machine
The wavelet function group can be defined as: where , , ∈ , is a dilation factor, and is a translation factor.Assuming that ( ) is the wavelet function of 1D, the multi-dimensional wavelet function can be defined using tensor theory as: where = ( , , , … , ) ∈ and, is the dimension number.Let ( ) denotes a mother kernel function.Then dot-product wavelet kernels are: The decision function for classification is [39]: where the denotes the th component of the th training example.The Mexican hat mother wavelet is ( ) = (1 − )exp (− 2 ⁄ ), and the corresponding wavelet kernel function is: Similar to Mexican hat wavelet kernel function, Morlet wavelet kernel is also an admissible SV kernel function.The Morlet function is defined as follows: And the corresponding wavelet kernel function is: In this paper, four kernel functions are used: wavelet Morlet, wavelet Mexican hat, Gaussian wavelet kernel and wavelet Shannon.The multi-class classification strategy, such as OAA, OAO and OAOT with different wavelet kernel functions is used for classification in this paper.

Experimental validation of the proposed intelligent machine fault diagnosis scheme
Rolling element bearings and gears are the most common and important components used in rotating machinery such as gearboxes.Faults occurring on the surface of these components could cause unexpected machine breakdown.Therefore, it is necessary to develop an effective intelligent gearbox fault diagnosis method.To verify the effectiveness of the proposed method,

Case 1. Ottawa gearbox vibration datasets
Data collected in this section come from Ottawa University gearbox under Prognostics and Health Management Society [40].Data were sampled synchronously from accelerometers mounted on both the input and output shaft retaining plates of the gearbox.An attached tachometer generates one pulse per revolution providing very accurate zero crossing information.Data were collected at different variable shaft speed under high and low loading.The test runs include seven different combinations of faults and one fault-free reference run.The signals were sampled with sampling frequency 66.666 kHz and the sampling horizon was 4 s long.

Case 2. Shahrekord experimental setup
The experimental setup at Shahrekord University to collect dataset consists of a one-stage gearbox with spur gears, a flywheel and an electrical motor.The test rig has been shown in Fig. 1.Vibration signals are obtained in the radial direction by mounting the accelerometer on the top of the gearbox."Easy Viber" data collector and its software, "SpectraPro", are used for data acquisition.The sensitivity and dynamic range of accelerometer probe are 100 mv/g and ±50 g.The signals are sampled at 16000 Hz lasting 4 s.In the present study, four pinion wheels are used.The vibration signal from accelerometer is captured for the following conditions: good gear, gear with tooth breakage, chipped tooth gear and eccentric gear.For bearing vibration signal acquisition five self-aligning ball bearings (1209 K) are used.One new bearing is considered as good bearing.In the other three bearings, some defects are created and then various bearings are installed and the raw vibration signals acquired on the bearing housing.So the vibration signals are captured for the following conditions: good bearing, bearing with spall on inner race, bearing with spall on outer race, bearing with spall on ball and bearing with combine defect.

Result and discussion
Based on Table 1, Daubechies wavelet (db44) and Meyer are selected as the best base wavelet among the other wavelets considered from the Maximum Relative Energy and Maximum Energy to Shannon Entropy criteria respectively.The wavelet packet coefficients of all signals with db44 and Meyer are calculated at the four eighth level of decomposition.After WPT, 2304 statistical features are extracted from the 256 nodes at eight decomposition levels.When applying wavelet transform to a signal, if the Shannon entropy measure of a particular scale is minimum then we can say that a major defect frequency component exists in the scale but, in the present study out of 256 scales considered, the scale having the Maximum Energy to Shannon Entropy of healthy condition is selected, and the statistical features of the wavelet packet coefficient corresponding to the selected level are calculated.Statistical moments like kurtosis, skewness and standard deviation are descriptors of the shape of the amplitude distribution of vibration data, and have some advantages over traditional time and frequency analysis, such as its lower sensitivity to the variations of load and speed.In the present paper, authors' use statistical moments like standard deviation, crest factor, absolute mean amplitude value, variance, kurtosis, skewness and fourth central moment as features to effectively indicate early faults occurring in rolling element bearings and gears.In addition, energy and Shannon entropy of the wavelet coefficients are used as two new features along with other statistical parameters as input of the classifier.These statistical features are fed as input to the soft computing techniques like SVM for fault classification.Two cases of input data and feature sets are considered for classification.In case A, statistical parameters of wavelet packet transform are considered (for each type of the gearbox fault).Case B is related to the condition that statistical features in optimal level, which has been extracted based on the criteria of Maximum Energy to Shannon Entropy ratio, are considered (for each type of gearbox fault).In addition, energy and Shannon Entropy factors are used as two new features as features sets in this case.Table 2 shows the results of classification of gearbox with Maximum Energy to Shannon Entropy criterion.In the case B, by Maximum Energy to Shannon Entropy ratio criterion (Table 2), for test set, correctly classified instances for LSSVM and WSVM are 91.11% and 95 % respectively.While using 10-fold cross validation average classification accuracies are 90.55 % and 93.88 % for LSSVM and WSVM respectively.Table 3 shows accuracy associated with each technique for fault classification with Maximum Relative Wavelet Energy criterion.The correctly classified instances using test set for LSSVM and WSVM are 87.77% and 92.22 % respectively with two new features.For 10-fold cross validation, average classification accuracies for LSSVM and WSVM are 86.11% and 90.55 % respectively, which is slightly less than the previous case.
From Tables 2 and 3, we found that the Maximum Energy to Shannon Entropy criterion with two new features is better for fault classification of gearbox with respect to Maximum Relative Wavelet Energy criterion.Furthermore, the accuracy comparison of WSVM with OAOT, OAA and OAO with Maximum Energy to Shannon Entropy is listed in Table 4. From Table 4, it is clear the proposed method based on wavelet support vector machine using the Morlet wavelet kernel has improved the classification accuracy by 9.97 % with respect to Haar wavelet kernel.In this case, the overall average classification accuracy is 99.67 %.From Table 4, we find that the classification accuracy with OAOT strategy is better than OAA and OAO.The classification accuracy with LSSVM and Maximum Energy to Shannon Entropy criterion is shown in Table 5.From Table 5, we find that, the classification accuracy with multi kernel by OAOT is better than RBF and polynomial kernels.Fig. 2 and 3 show the testing time and training time of WSVM and LSSVM with three strategies.We can observe that the training time in OAA is bigger than in OAO and OAOT under all kernel functions.As shown in Fig. 2, the performance of the Morlet kernel for machinery fault diagnosis is acceptable.From Fig. 2, we find that the Morlet kernel has the least testing and training time with respect to other kernel functions.It is clear from Fig. 3, the multi kernel has the least training and testing time with OAOT algorithm.Therefore, the OAOT strategy is better than OAO and OAA for the problem.
In the case of polynomial kernel, is the important parameter of polynomial kernel, and it is not known before hand how much value of is the best for classification problem.A 10-fold cross-validation is used to find the best value of and the one with lowest cross validation error is picked.We study the value of from the range ={1, 2,…, 8}, the accuracy of three strategies for the multi-class classification is compared in Fig. 4. From Fig. 4, we can know that in the case of OAOT algorithm, the accuracy of classification reaches the highest point (88.72 %) when = 3 and the lowest classification rate as = 1.With the grown of parameter , the over-fitting or under-fitting problem is caused and the recognition rate degrades.Generally, the OAOT algorithm is better than OAO algorithm and OAA algorithm under the same value of , and their best classification rate is 85.23 % and 86.80 %, respectively.Therefore, the optimal result of the  5 shows that the accuracy of LSSVM using OAOT algorithm with the RBF kernel reaches the highest point (90.07 %) with = 30 and = 2. Similarly, when we apply the RBF kernel to OAO algorithm and OAA algorithm, the best classification ratio is 86.72 % and 88.38 %, respectively.
From Table 5, in the case of multi kernel at LSSVM, we observe that the highest accuracy is 91.11 % with OAOT.Fig. 6 shows that the accuracy of WSVM using OAOT algorithm with Mexican hat kernel reaches the highest point (94.22 %) with = 38.7 and = 0.83.Similarly, when we apply the Mexican hat kernel to OAO algorithm and OAA algorithm, the best classification ratio is 88.96 % and 92.81 %, respectively.Fig. 7 shows that the accuracy of WSVM using OAOT algorithm with the Morlet kernel function reaches the highest point (95 %) with = 29.7 and = 0.74.Similarly, when we apply the Morlet kernel to OAO algorithm and OAA algorithm, the best classification ratio with same , and is 90.69 % and 94.41 %, respectively.Fig. 8 shows that the accuracy of MSVM using OAOT algorithm with the Shannon kernel reaches the highest point (86.91 %) with = 50 and number of vanishing moment ( = 0.4).Similarly, when we apply the Shannon kernel to OAO algorithm and OAA algorithm, the best classification ratio is 82.99 % and 85.09 %, respectively.Fig. 9 shows that the accuracy of MSVM using OAOT algorithm with the Gaussian kernel reaches the highest point (92.63 %) with = 100 and = 0.5.Also, when we apply the Gaussian kernel to OAO algorithm and OAA algorithm, the best classification ratio is 88.48 % and 90.99 %, respectively.The authors declare that they do not have any conflict of interests in their submitted paper.

Conclusions
This study presents, a methodology for detection of gearbox faults by classifying them using two SVM model like WSVM and LSSVM.First, wavelet packet transform applied over the signal, employing the six mothers wavelet.Two wavelet selection criteria Maximum Energy to Shannon Entropy ratio and Maximum Relative Wavelet Energy are used and compared to select an appropriate wavelet for feature extraction.Results obtained from the two criteria show that the wavelet selected using Maximum Energy to Shannon Entropy ratio criterion gives better classification efficiency.Two soft computing methods were good, but the results of faults classification with WSVM are better than LSSVM.To find very efficient features for classification, Maximum Energy to Shannon Entropy ratio was employed to search for the optimal level decomposition level of wavelet packet and consequently the features were reduced.In addition, the Morlet, Mexican hat, Gaussian and Shannon wavelet kernel functions are used to construct the WSVM algorithms.The results show that the Morlet kernel is more accurate and faster than other wavelet kernel function for fault classification of gearbox.As a new idea, energy and Shannon entropy have been applied as two new features along with statistical parameters as input of SVM.The obtained results indicate that the accuracy of the classifier has been increased between 1 to 4 percentage points by considering these two features but the training time of SVM increased with optimal level decomposition and two new features.

2 . 3 .
Fig.5shows that the accuracy of LSSVM using OAOT algorithm with the RBF kernel reaches the highest point (90.07 %) with = 30 and = 2. Similarly, when we apply the RBF kernel to OAO algorithm and OAA algorithm, the best classification ratio is 86.72 % and 88.38 %, respectively.From Table5, in the case of multi kernel at LSSVM, we observe that the highest accuracy is

Fig. 4 .Fig. 5 .Fig. 6 .
Fig. 4. Comparison of accuracy of three algorithms based on WPT feature extraction with different for polynomial kernel

Fig. 7 .Fig. 8 .Fig. 9 .
Fig. 7. Comparison of accuracy using OAOT algorithm based on WPT feature extraction with Morlet kernel in different ( , ) 1935.FAULT DIAGNOSIS OF GEARBOXES USING WAVELET SUPPORT VECTOR MACHINE, LEAST SQUARE SUPPORT VECTOR MACHINE AND WAVELET PACKET TRANSFORM.MOHAMMAD HEIDARI, HADI HOMAEI, HOSSEIN GOLESTANIAN, ALI HEIDARI 1935.FAULT DIAGNOSIS OF GEARBOXES USING WAVELET SUPPORT VECTOR MACHINE, LEAST SQUARE SUPPORT VECTOR MACHINE AND WAVELET PACKET TRANSFORM.MOHAMMAD HEIDARI, HADI HOMAEI, HOSSEIN GOLESTANIAN, ALI HEIDARI 1935.FAULT DIAGNOSIS OF GEARBOXES USING WAVELET SUPPORT VECTOR MACHINE, LEAST SQUARE SUPPORT VECTOR MACHINE AND WAVELET PACKET TRANSFORM.MOHAMMAD HEIDARI, HADI HOMAEI, HOSSEIN GOLESTANIAN, ALI HEIDARI , 1935.FAULT DIAGNOSIS OF GEARBOXES USING WAVELET SUPPORT VECTOR MACHINE, LEAST SQUARE SUPPORT VECTOR MACHINE AND WAVELET PACKET TRANSFORM.MOHAMMAD HEIDARI, HADI HOMAEI, HOSSEIN GOLESTANIAN, ALI HEIDARI new gearbox datasets provided by the by Ottawa University in collaboration with the Prognostics and Health Management Society and the test rig experimental setup datasets collected in the Shahrekord University are analyzed.

Table 1 .
1935.FAULT DIAGNOSIS OF GEARBOXES USING WAVELET SUPPORT VECTOR MACHINE, LEAST SQUARE SUPPORT VECTOR MACHINE AND WAVELET PACKET TRANSFORM.MOHAMMAD HEIDARI, HADI HOMAEI, HOSSEIN GOLESTANIAN, ALI HEIDARI Comparison of parameters for wavelet selection

Table 2 .
Classification performance (maximum energy to Shannon entropy criterion) 1935.FAULT DIAGNOSIS OF GEARBOXES USING WAVELET SUPPORT VECTOR MACHINE, LEAST SQUARE SUPPORT VECTOR MACHINE AND WAVELET PACKET TRANSFORM.MOHAMMAD HEIDARI, HADI HOMAEI, HOSSEIN GOLESTANIAN, ALI HEIDARI

Table 3 .
Classification performance (maximum relative wavelet energy criterion)

Table 4 .
The classified result of experiment data using WSVM with three methods

Table 5 .
The classified result of experiment data using LSSVM with three methods 1935.FAULT DIAGNOSIS OF GEARBOXES USING WAVELET SUPPORT VECTOR MACHINE, LEAST SQUARE SUPPORT VECTOR MACHINE AND WAVELET PACKET TRANSFORM.MOHAMMAD HEIDARI, HADI HOMAEI, HOSSEIN GOLESTANIAN, ALI HEIDARI 1935.FAULT DIAGNOSIS OF GEARBOXES USING WAVELET SUPPORT VECTOR MACHINE, LEAST SQUARE SUPPORT VECTOR MACHINE AND WAVELET PACKET TRANSFORM.MOHAMMAD HEIDARI, HADI HOMAEI, HOSSEIN GOLESTANIAN, ALI HEIDARI