Abstract
A novel approach on kernel matrix construction for support vector machine (SVM) is proposed to detect rolling element bearing fault efficiently. First, multiscale coefficient matrix is achieved by processing vibration sample signal with continuous wavelet transform (CWT). Next, singular value decomposition (SVD) is applied to calculate eigenvector from wavelet coefficient matrix as sample signal feature vector. Two kernel matrices i.e. training kernel and predicting kernel, are then constructed in a novel way, which can reveal intrinsic similarity among samples and make it feasible to solve nonlinear classification problems in a high dimensional feature space. To validate its diagnosis performance, kernel matrix construction based SVM (KMCSVM) classifier is compared with three SVM classifiers i.e. classification tree kernel based SVM (CTKSVM), linear kernel based SVM (LSVM) and radial basis function based SVM (RBFSVM), to identify different locations and severities of bearing fault. The experimental results indicate that KMCSVM has better classification capability than other methods.
1. Introduction
Rolling element bearing (REB) is a critical unit in rotating machinery and its health condition is often monitored to identify incipient fault. When a defect like bump, dent or crack that occurs in REB' outer race, inner race, roller or cage, continuously contacts another part of bearing under operation, a sequence of impulsive responses can be acquired in the form of vibration [13], acoustic emission [4], temperature, motor current, ultrasound [5], etc. However, the measured signals involve both faultinduced component and noises from structure vibration, environment interference, etc. Furthermore, faultinduced signal is often masked by noises due to its relatively low energy. In fact, many signal processing techniques including time domain analysis, frequency analysis and timefrequency analysis have been explored to draw fault signatures effectively. For example, Statistical parameters in time domain are used as defective features such as RMS, Variance, Skewness, Kurtosis, etc. [6, 7]. Features are derived from time series model like the Autoregressive [8, 9]. Frequency analysis aims to find whether characteristic defect frequency (CDF) exists in spectrum [1014]. As nonstationary signals, bearing fault signals are extensively dealt with using timefrequency analysis to obtain local characteristic information both in time and frequency domain [1517]. Two or more kinds of signal processing techniques are also combined together for feature extraction [1820]. Some signal analysis methods have been optimized before performing feature extraction [21, 22] like flexible analytic wavelet transform [23] by employing fractional and arbitrary scaling and translation factors to match fault component. Highdimension features could be compressed into lowdimension features by optimal algorithms [2426] like manifold learning [2730] for efficient diagnosis. Due to its complexity of bearing, it is almost impossible for even domain experts to judge the bearing condition just by inspecting the characteristic indices. In order to automate diagnosis procedures and decisionmaking on REB health state, a variety of automatic diagnosis methods have been put forward such as artificial neural network (ANN), support vector machine (SVM), fuzzy logic, hidden Markov model (HMM) and other novel approaches [31]. In [32], the anomaly detection (AD) learning technique has got higher accuracy than SVM classifier for bearing fault diagnosis. The trifold hybrid classification (THC) approach can isolate unexampled health state from exampled health state and discriminate them exactly [33]. Simplified fuzzy adaptive resonance theory map (SFAM) neural network is investigated and able to predict REB remaining life [34]. A polycoherent composite spectrum (PCCS), retaining amplitude and phase information, is observed to have a better diagnosis than methods without phase information [35]. HOSSVM model, which integrates high order spectra (HOS) features and SVM classifier, indicates the capability of diagnosing REB failures [36].
As mentioned above, great progress has been made in detecting bearing conditions. Meanwhile, these proposed methods also face some challenges. For instance, owing to the fluctuation in speed or load, a measured CDF is probably inconsistent with the theoretical calculation. The selection of base wavelet and scale levels mostly relies on researchers’ experience and prejudice rather than objective criterions. The discrete wavelet transforms (DWT) still suffers from limitations of fixed scale resolution regardless of signal characteristics. The structures of ANN, particularly initial weights, which are randomly determined by trial and experience, may weaken generalization capability and training velocity. For SVM classifier, the kernel function is demanded to map samples from an input space to a higher feature space where the samples can be linearly separated. However, the kernel function confines to typical formulas such as linear, polynomial, radial, multilayer perception and sigmoid function which will not surely succeed in search of the intrinsic correlation among the samples. Consequently, it possibly contributes to poor classification.
Thereby, a novel method on kernel matrix construction for SVM (KMCSVM) is proposed to identify REB fault more precisely. Two kernel matrices, i.e. the training kernel matrix $\mathbf{K}$ and the prediction kernel matrix ${\mathbf{K}}_{t}$, are constructed in this way. The matrix $\mathbf{K}$ exposes the similarity of intrinsic characteristics among training samples, while the matrix ${\mathbf{K}}_{t}$ specifies the similarity between training samples and test samples. The results show that KMCSVM has better ability for REB fault diagnosis. To our best knowledge, KMCSVM has not been observed in rotating machinery fault diagnosis fields.
The rest of this paper is organized as follows: Section 2 reviews the background knowledge about CWT and singular value decomposition (SVD) for feature extraction. The procedure based on KMCSVM is presented in Section 3. The proposed method is validated by identifying bearing fault locations and severities in Section 4. Finally, conclusions are drawn in Section 5.
2. Methods review
Because signals from defective bearing are nonstationary, nonlinear, local and transient, CWT is chosen to process the signals and SVD is used to calculate the eigenvector from the coefficient matrix as signal signature.
2.1. Continuous wavelet transform
CWT aims to measure a local similarity between wavelet $\psi \left(t\right)$ at scale $s$ position $\tau $ and signal $f\left(t\right)$. The wavelet coefficient $c\left(s,\tau \right)$ can be defined by Eq. (1):
By shifting $\psi \left(t\right)$ in time and scaling $\psi \left(t\right)$, a wavelet coefficient matrix $\mathbf{C}$ can be created which is viewed as a timefrequency space as Eq. (2) and represents the dynamic characteristics of the signal $f\left(t\right)$:
where ${c}_{m,n}$ is the coefficient at the $m$th scale and at the $n$th data point of a sample signal.
2.2. Singular value decomposition
SVD is used to decompose the wavelet coefficient matrix $\mathbf{C}$. Assuming matrix $\mathbf{C}$ with the size of $m\times n$, the SVD results can be expressed by Eq. (3):
where $\mathbf{A}$ and $\mathbf{B}$ are orthogonal matrices of $m\times m$ and $n\times n$, respectively. $\mathrm{\Lambda}$ is an $m\times n$ diagonal nonnegative matrix. The diagonal elements in $\mathbf{\Lambda}$ are called singular values (SVs) of $\mathbf{C}$, which are only determined by matrix $\mathbf{C}$ itself and denote the natures of matrix $\mathbf{C}$, namely, the characteristics of a sample signal. Given $m<n$, Eq. (3) can be illustrated in details as Eq. (4):
SVs constitute vector $\mathbf{x}$ described as Eq. (5). $\mathbf{x}$ also denotes the feature vector extracted from a sample signal:
3. Proposed method
SVM is well suited for linear pattern recognition. However, the original feature vectors extracted from REB are not linearly separated. Suppose there exists a high dimensional space where the original feature vectors are mapped into the high dimension feature vectors that can be linearly separated using SVM in it, the linear pattern recognition based SVM turns to find kernel matrices with the inner product between the imaged high dimension feature vectors. Fig. 1 shows the stages of kernel pattern analysis. The sample feature vectors are used to create training and predicting kernel matrix. The pattern function then uses the matrices to recognize unseen samples. For kernel pattern analysis, the key is how to construct kernel matrices.
Fig. 1Stages in the implementation of kernel pattern analysis
3.1. Kernel matrix pattern based SVM
A training set ${\mathbf{S}}_{1}$ and a test set ${\mathbf{S}}_{2}$ are given as below:
Assume $\mathbf{\phi}\left(\mathbf{x}\right)$ is a image of point $\mathbf{x}$ mapped into a high dimensional feature space $\mathbf{F}$ and all the sample images can be separated by a hyperplane as Eq. (8):
The hyperplane is determined to solve the following optimization problem:
It is equivalent to solving a constrained convex quadratic programming optimization problem:
$\mathbf{K}$ is named training kernel matrix which is a $l\times l$ symmetric matrix with ${k}_{ij}=\kappa \left({\mathbf{x}}_{i},{\mathbf{x}}_{j}\right)$, the inner product between the images of two training samples in space $\mathbf{F}$. $l$ is the number of training samples, $\mathbf{e}$ is a column vector with ${e}_{i}=1$, $\mathbf{\alpha}={\left({\alpha}_{1},{\alpha}_{2},\dots ,{\alpha}_{l}\right)}^{T}$ is a Lagrange multiplier vector, $\mathbf{\Lambda}$ is an $l\times l$ diagonal matrix with ${\mathrm{\Lambda}}_{ii}={y}_{i}$, $c$ is error penalty constant, ${y}_{i}$ is the $i$th sample class label.
By maximizing $\mathbf{W}\left(\alpha \right)$, the optimized ${\mathbf{\alpha}}^{*}$ can be obtained. Thus, the optimized ${b}^{*}$ can be computed using the following equation:
where ${\mathbf{K}}^{i}$ is the $i$th column vector of $\mathbf{K}$.
Hence, the pattern function of SVM to predict the class of unseen sample ${\mathbf{x}}_{v}$ can be written as:
and:
${\mathbf{K}}_{t}$ is named prediction kernel matrix of $l\times p$ with ${k}_{iv}={\kappa}_{t}\left({\mathbf{X}}_{i},{\mathbf{X}}_{v}\right)$, the inner product between the images of a training sample and a test sample in space $\mathbf{F}$. $p$ is the number of the test samples, ${\mathbf{K}}_{t}^{v}$ is the $v$th column vector of ${\mathbf{K}}_{t}$.
According to Eq. (15), the result of pattern analysis just depends on kernel matrices, so it is feasible for SVM to solve nonlinear classification problems by developing appropriate kernel matrices.
3.2. Kernel matrix construction
A novel method on kernel matrix construction (KMC) is presented to solve nonlinear classification problems using pattern analysis based SVM.
To our best knowledge, this KMC based method has not been studied in the field of machinery fault diagnosis. The specific procedure of KMC is stated below and illustrated in Fig. 2.
Fig. 2Flow chart of training kernel matrix K construction
Step 1: Provide training set ${\mathbf{S}}_{1}=\left\{\left({\mathbf{x}}_{1,}{y}_{1}\right),\dots ,\left({\mathbf{x}}_{l,}{y}_{l}\right)\right\}$ and test set ${\mathbf{S}}_{2}=\left({\mathbf{x}}_{l+1,}{y}_{}\left(l+1\right)\right),\dots ,\left({\mathbf{x}}_{l+p,}{y}_{l+p}\right)$. Suppose there exists $r$ classes of samples in ${\mathbf{S}}_{1}$. ${\mathbf{S}}_{1}$ is used to construct training kernel matrix $\mathbf{K}$ of $l\times l$. ${\mathbf{S}}_{1}$ and ${\mathbf{S}}_{2}$ are used for predicting matrix ${\mathbf{K}}_{t}$ of $l\times p$. Let ${\mathbf{T}}_{1}$ be a matrix of $l\times l$, ${\mathbf{T}}_{2}$ of $l\times p$. Initialize ${\mathbf{T}}_{1}\left({\mathbf{T}}_{2}\right)$, $\mathbf{K}\mathbf{}\left({\mathbf{K}}_{t}\right)=$ 0.
Step 2: Produce distance matrix ${\mathbf{D}}_{1}$$\left({\mathbf{D}}_{2}\right)$ by computing pairwise distance of samples using Eq. (17). Thus, ${\mathbf{D}}_{1}$ about pairwise distance of training samples and ${\mathbf{D}}_{2}$ about pairwise distance between training and test samples are shown as Eq. (18):
${d}_{ij}$ denotes the distance of the $i$th training sample and the $j$th training sample in ${\mathbf{D}}_{1}$ and the distance of the $i$th training sample and the $j$th test sample in ${\mathbf{D}}_{2}$.
Step 3: Find the $k$ closest neighbors distribution of each sample. The $k$ closest neighbors of each sample are the $k$ least numbers in each column of ${\mathbf{D}}_{1}$$\left({\mathbf{D}}_{2}\right)$. Set 1 to the elements in ${\mathbf{T}}_{1}$$\left({\mathbf{T}}_{2}\right)$ that have the same locations of the $k$ least numbers in ${\mathbf{D}}_{1}$$\left({\mathbf{D}}_{2}\right)$. The rows of ${\mathbf{T}}_{1}$$\left({\mathbf{T}}_{2}\right)$ is divided into $r$ blocks, its blocks and columns stand for classes and samples, respectively. Eq. (19) shows the k closest neighbors distribution in different classes by setting 1:
Step 4: Classify using majority vote among the $k$ neighbors. If a sample has the majority of $k$ neighbors within one block, the sample belongs to the block related class. Set 1 to the column within the block, 0 to the rest of that column. For example, if ${\mathbf{x}}_{i}$$\left({\mathbf{x}}_{j}\right)$ belongs to the 1st class, ${\mathbf{T}}_{1}$$\left({\mathbf{T}}_{2}\right)$ is revised as Eq. (20):
Step 5: Compress multiclasses of ${\mathbf{T}}_{1}$$\left({\mathbf{T}}_{2}\right)$ into two classes. The 1st class remains unchangeable and the other classes merge into the 2nd class. Where a sample is 0(1) in the 1st class must be 1(0) in the 2nd class. The updated ${\mathbf{T}}_{1}$ and ${\mathbf{T}}_{2}$ are shown as Eq. (21):
Step 6: Select the 1st row of ${\mathbf{T}}_{1}$$\left({\mathbf{T}}_{2}\right)$ as a row matrix ${\mathbf{R}}_{1}$$\left({\mathbf{R}}_{2}\right)$. ${\mathbf{R}}_{1}$ reveals training samples class, ${\mathbf{R}}_{2}$ describes test samples class:
The training kernel matrix $\mathbf{K}$ can be constructed based on ${\mathbf{R}}_{1}$, it is an $l\times l$ symmetric matrix with diagonal element 1 as Eq. (23). $\mathbf{K}$ reflects the similarity among training samples. The prediction matrix ${\mathbf{K}}_{t}$ with $l\times p$ can be likewise established according to ${\mathbf{R}}_{1}$ and ${\mathbf{R}}_{2}$. ${\mathbf{K}}_{t}$_{}exhibits the similarity between training and test samples. In $\mathbf{K}$$\left({\mathbf{K}}_{t}\right)$ “1” means the maximum similarity between corresponding samples and “0” means no similarity:
Step 7: Increase $k=k+1$ and repeat from Step 3 to Step 6 till k exceeds the upper. The upper should be given to a medium value to save computing time.
Step 8: Take the average of the matrices $\mathbf{K}$$\left({\mathbf{K}}_{t}\right)$. A number of $\mathbf{K}$$\left({\mathbf{K}}_{t}\right)$ would be produced with the closest neighbor $k$ changing from the lower to the upper. Average these matrices to get better intrinsic relations among samples. The averaged $\mathbf{K}$$\left({\mathbf{K}}_{t}\right)$ is applied to the pattern function for classification.
4. Case studies
REB fault diagnosis is investigated to validate the effectiveness of KMCSVM. Fig. 3 shows the scheme of REB fault diagnosis.
Fig. 3Flow chart of REB fault diagnosis
4.1. Experimental setup and vibration data
The experiment data about faulty bearings is taken from the Case Western Reserve University Bearing Data Center. The vibration data has been widely utilized as a standard dataset for REB diagnosis. As shown in Fig. 4, the test stand consists of a 2 hp motor (left), a torque transducer/encoder (center), a dynamometer (right), and control electronics. The test bearings support the motor shaft. Motor bearings were seeded with faults using electrodischarge machining. Faults ranging from 0.007 inches in diameter to 0.021 inches in diameter were introduced separately at the inner raceway, rolling element and outer raceway. Faulted bearings were reinstalled into the test motor and vibration data was recorded for motor loads of 0 to 3 horsepower (motor speeds of 1797 to 1720 RPM). Bearing Information is shown as Table 1 and Table 2. Vibration signal was collected using accelerometers, which were attached to the drive end of the motor housing with magnetic bases. Then vibration signal was digitalized through a 16 channel DAT recorder. Digital data was collected at 48.000 samples per second for drive end bearing faults and post processed in a MATLAB environment. Speed and horsepower data were collected using the torque transducer/encoder and were recorded.
Table 1Bearing information: 62052RS JEM SKF size: (inches)
Inside diameter  Outside diameter  Thickness  Ball diameter  Pitch diameter 
0.9843  2.0472  0.5906  0.3126  1.537 
In this experiment, the vibration data of the drive end bearing are chosen to perform location and severity identification of bearing fault. The sampling frequency is 48 kHz and each sample contains 2048 data points. Four different bearing conditions, i.e. healthy state, outer race fault, inner race fault and ball fault are observed for fault location recognition using KMCSVM. In addition, four types of fault severities (healthy, 0.007, 0.014 inch and 0.021 inch) are also considered to assess KMCSVM classification performance.
Table 2Fault specifications size: (inches)
Bearing  Fault location  Diameter  Depth  Bearing manufacturer 
Drive end  Inner raceway  0.007  0.011  SKF 
Drive end  Inner raceway  0.014  0.011  SKF 
Drive end  Inner raceway  0.021  0.011  SKF 
Drive end  Outer raceway  0.007  0.011  SKF 
Drive end  Outer raceway  0.014  0.011  SKF 
Drive end  Outer raceway  0.021  0.011  SKF 
Drive end  Ball  0.007  0.011  SKF 
Drive end  Ball  0.014  0.011  SKF 
Drive end  Ball  0.021  0.011  SKF 
4.2. Feature extraction
Referring to wavelet selection criterion in subsection “Wavelet selection” presented in [37], the energy to entropy ratios about six different wavelets including the Shannon, Gaussian, Complex Morlet, Daubechies, Meyer and Morlet are plotted in Fig. 5. due to the maximum energy to entropy ratio, the Shannon wavelet is selected as the best mother wavelet to perform continuous wavelet transform. The feature vectors are calculated from the coefficient matrices using SVD.
Fig. 4Rolling element bearing test rig
Fig. 5Energy to entropy ratios of datasets using wavelets
4.3. Classification of bearing conditions
The performance of KMCSVM is evaluated by identifying bearing fault location and fault severity, and compared with other kernel pattern recognition methods like CTKSVM, LSVM and RBFSVM that have been studied in the previous work [37]. CTKSVM is a SVM based on the classification tree kernel which is constructed using fuzzy pruning strategy and tree ensemble learning algorithm to improve the diagnostic capability of REB fault. LSVM makes use of classical linear kernel as well as RBFSVM with radial basis function to diagnose REB fault. Both fivefold cross validation and independent test are conducted to obtain the classification accuracy of these SVM classifiers. To discover the true fault from the possible multifaults, SVM classifiers are trained in a tournament of one against others by setting one class as +1 and others as –1, and continuous to detect unknown sample in the same manner.
4.3.1. Identification of fault location
Fault location recognition strives to distinguish four different bearing conditions, i.e. healthy state, outer race fault, inner race fault and ball fault. Table 3 lists 12 datasets with various loading, fault size and shaft speed for analysis. There are 48 samples for each state, thus total 192 samples for all states in each dataset shown as Table 4.
The groups of sample sets are allocated in the way that satisfies the tournament of training and test using fivefold cross validation and independent test as described in Table 5.
Table 3Description of 12 datasets on fault locations
Dataset  1  2  3  4  5  6  7  8  9  10  11  12 
Fault (inch)  0.007  0.007  0.007  0.007  0.014  0.014  0.014  0.014  0.021  0.021  0.021  0.021 
Load (HP)  0  1  2  3  0  1  2  3  0  1  2  3 
Speed (RPM)  1796  1772  1750  1725  1796  1772  1750  1725  1796  1772  1750  1725 
Table 4Composition of dataset on fault locations
Fault type  Sample size 
H  48 
O  48 
I  48 
B  48 
H – healthy, O – outer race defect, I – inner race defect, B – ball defect 
Table 5Sample set with different fault locations for training and test
Sample label  5fold cross validation  Independent test  
Training  Test  
1  H vs. (O+I+B)  48: (48 + 48 + 48)  24: (24 + 24 + 24)  24: (24 + 24 + 24) 
2  O vs. (I+B)  48: (48 + 48)  24: (24 + 24)  24: (24 + 24) 
3  B vs. (O+I)  48: (48 + 48)  24: (24 + 24)  24: (24 + 24) 
4  I vs. (O+B)  48: (48 + 48)  24: (24 + 24)  24: (24 + 24) 
5  I vs. B  48:48  24: 24  24: 24 
6  O vs. I  48:48  24: 24  24: 24 
7  O vs. B  48:48  24: 24  24: 24 
Fig. 6 illustrates the accuracy of the four classifiers corresponding to the 12 datasets in Table 3 using fivefold cross validation. The classification accuracy of RBFSVM is obviously lowest among all the methods. In eight cases (Fig. 6(b)(h), Fig. 6(k)), KMCSVM achieves a higher classification accuracy. In three cases (Fig. 6(a), Fig. 6(i), Fig. 6(l)), the classification rates based on KMCSVM, CTKSVM and LSVM are almost similar to each other. Only in one case (Fig. 6(j)), the classification accuracy of KMCSVM is slightly lower than those of CTKSVM and LSVM. As a whole, the classification ability increases in the order of RBFSVM, LSVM, CTKSVM and KMCSVM. Additionally, the classification accuracy of KMCSVM maintains the least fluctuation. It indicates that KMCSVM is insensitive to the changes of sample sets.
The classification accuracy of KMCSVM is observed as the fault size changes under specific loads (0 HP, 1 HP, 2 HP, 3 HP). It can be inferred from Fig. 7 that the accuracy of KMCSVM descends in sequence of fault sizes from 0.007 to 0.021 then to 0.014 inch except that the accuracy alternately occurs between 0.014 and 0.021 inch under 1 HP load as described Fig. 7(b). In the early stage of bearing fault (0.007 inch), the accuracy arrives at 100 %. The accuracy then falls with the growth of bearing fault (0.014 inch). When the fault size further enlarges (0.021 inch), the classification accuracy rises again.
Fig. 6Accuracy of the four classifiers corresponding to 12 datasets
Fig. 8 describes the classification accuracy of KMCSVM with the load variation while fixing the fault size. In Fig. 8(a), the accuracy for fault with 0.007 inch always keeps 100 %. So KMCSVM is robust against the load interference and excellent fault classification performance. From Fig. 8(b) and Fig. 8(c), it demonstrates that the loading disturbances bring the accuracy fluctuations irregularly.
It also can be seen from Table 6 that the average accuracy of KMCSVM, whenever fivefold cross validation or independent test, is the highest (all more than 95.60 %). The corresponding training and test time are summarized in Table 7. For 5 folds cross validation, the computational cost of training KMCSVM is higher than that of the other three methods. The reason is that the construction of training kernel matrix needs more computational time. Once KMCSVM is trained, it has the efficient diagnosis capability with no more than 8.3 s. For independent validation, it takes less time to train (less than 9.97 s) and test (less than 3.02 s) KMCSVM which is very close to other methods. Thereby, KMCSVM displays its outstanding fault diagnosis performance.
Fig. 7Accuracy of KMCSVM with the fault size variation
Fig. 8Accuracy of KMCSVM with the load variation
4.3.2. Identification of fault severity
Fault severity recognition seeks to evaluate REB fault size that influences the machinery health and its lifetime. In Table 8, four types of fault severity conditions are considered to assess KMCSVM classification performance using datasets in Table 9.
The groups of sample sets are provided by means of tournament to identify different fault sizes as described in Table 10.
Table 6Average accuracy of 4 classifiers using 12 datasets on fault locations
Sample set  5 folds cross validation  Independent validation  
KMCSVM  CTKSVM  LSVM  RBFSVM  KMCSVM  CTKSVM  LSVM  RBFSVM  
H vs. (O+I+B)  99.87  99.57  97.79  49.01  99.65  99.39  93.75  44.44 
O vs. (I+B)  97.57  95.95  92.59  66.38  97.68  95.60  92.59  68.06 
B vs. (O+I)  96.99  93.87  89.12  48.15  96.41  93.52  87.50  62.38 
I vs. (O+B)  95.72  94.04  90.58  72.34  95.60  94.10  84.49  74.19 
I vs. B  96.09  93.84  92.45  66.84  95.66  92.36  92.19  58.16 
O vs. I  97.22  96.34  96.09  64.18  97.40  96.35  95.83  56.42 
O vs. B  98.61  96.53  92.53  63.37  98.26  96.18  93.39  69.44 
Table 7Average training time and test time of 4 classifiers using 12 datasets on fault locations
Sample set  Time (s)  5 folds cross validation  Independent validation  
KMCSVM  CTKSVM  LSVM  RBFSVM  KMCSVM  CTKSVM  LSVM  RBFSVM  
H vs. (O+I+B)  Train  262.73  12.86  10.75  15.82  9.97  1.90  1.57  4.97 
Test  8.30  0.01  0.01  0  3.02  0  0  0  
O vs. (I+B)  Train  108.29  11.94  10.76  12.87  3.96  1.98  1.60  3.11 
Test  5.90  0  0  0  1.25  0  0  0  
B vs. (O+I)  Train  109.28  14.42  11.95  14.27  3.93  2.10  1.76  3.16 
Test  5.54  0  0  0  1.25  0  0  0  
I vs. (O+B)  Train  107.91  14.31  11.93  13.80  4.14  2.12  1.93  3.35 
Test  5.23  0  0  0  1.28  0  0  0  
I vs. B  Train  33.66  9.76  8.45  9.29  1.22  1.55  1.35  1.76 
Test  1.51  0  0  0  0.42  0  0  0  
O vs. I  Train  29.03  8.51  7.40  8.30  1.42  1.44  1.28  1.71 
Test  1.39  0  0  0  0.40  0  0  0  
O vs. B  Train  28.54  9.53  8.09  8.97  1.31  1.55  1.41  1.83 
Test  1.38  0  0  0  0.40  0  0  0 
Table 8Composition of dataset on fault severity
Fault severity  Sample size  Defect size(inch) 
H  48  0 
S1  48  0.007 
S2  48  0.014 
S3  48  0.021 
H – healthy, S1 – fault with 0.007 inch, S2 – fault with 0.014 inch, S3 – fault with 0.021 inch 
Table 9Description of 12 datasets on fault severity
Dataset  1  2  3  4  5  6  7  8  9  10  11  12 
Location  O  O  O  O  I  I  I  I  B  B  B  B 
Load (HP)  0  1  2  3  0  1  2  3  0  1  2  3 
Speed (RPM)  1796  1772  1750  1725  1796  1772  1750  1725  1796  1772  1750  1725 
Fig. 9, Fig. 10 and Fig. 11 illustrate the accuracy of KMCSVM tested on the 12 datasets in Table 6 using fivefold cross validation and compared with CTKSVM, LSVM and RBFSVM. Clearly, RBFSVM contributes to the lowest accuracy. In seven cases (Fig. 9(a)(d) and Fig. 10(a)(c)), KMCSVM reaches the highest 100 %. In four cases Fig. 10(d), Fig. 11(a) and Fig. 11(c)(d)), the accuracy based on KMCSVM are second only to LSVM. Fig. 11(b) indicates the accuracy of KMCSVM is slightly lower than those of CTKSVM and LSVM. Consequently, KMCSVM is highly suitable for fault severity recognition of bearing outer race and inner race. Moreover, the accuracy curves of KMCSVM stay little fluctuation. It exhibits good stability of KMCSVM on changes of sample sets and load interference.
Fig. 9Accuracy of the four classifiers for fault severity in bearing outer race
Fig. 10Accuracy of the four classifiers for fault severity in bearing inner race
Table 11 gives the average accuracy of 4 classifiers about REB fault severity recognition. For fivefold cross validation, the classification performance of KMCSVM is slightly lower than LSVM because KMCSVM is not so well as LSVM in fault severity recognition of bearing ball. However, KMCSVM is the best one of 4 classifiers which gets the highest accuracy for independent test. The corresponding training and test time are shown in Table 12. The computational cost of training and test KMCSVM is similar to that used for fault locations diagnosis mentioned above.
Fig. 11Accuracy of the four classifiers for fault severity in bearing ball
Table 10Sample set with different severities for training and test
Sample label  5 folds cross validation  Independent test  
Training  Test  
1  H vs. (S1+S2+S3)  48: (48 + 48 + 48)  24: (24 + 24 + 24)  24: (24 + 24 + 24) 
2  S1 vs. (S2+S3)  48: (48 + 48)  24: (24 + 24)  24: (24 + 24) 
3  S3 vs. (S1+S2)  48: (48 + 48)  24: (24 + 24)  24: (24 + 24) 
4  S2 vs. (S1+S3)  48: (48 + 48)  24: (24 + 24)  24: (24 + 24) 
5  S2 vs. S3  48:48  24: 24  24: 24 
6  S1 vs. S2  48:48  24: 24  24: 24 
7  S1 vs. S3  48:48  24: 24  24: 24 
Table 11Average accuracy of 4 classifiers using 12 datasets on fault severity
Sample set  5 folds cross validation  Independent validation  
KMCSVM  CTKSVM  LSVM  RBFSVM  KMCSVM  CTKSVM  LSVM  RBFSVM  
H vs. (S1+S2+S3)  100  99.48  92.54  61.24  99.74  99.13  80.73  83.42 
S1 vs. (S2+S3)  98.84  97.28  95.14  58.62  98.73  92.94  83.91  80.44 
S3 vs. (S1+S2)  98.15  97.79  99.42  74.17  98.03  96.76  92.13  85.07 
S2 vs. (S1+S3)  98.21  96.59  98.67  67.85  97.80  93.12  86.81  82.87 
S2 vs. S3  97.83  98.10  98.00  60.68  97.57  96.89  92.19  90.11 
S1 vs. S2  99.16  96.70  99.48  57.12  98.79  93.75  88.02  84.03 
S1 vs. S3  98.96  98.09  99.48  65.02  99.13  98.96  94.27  80.21 
According to the results in the above experiments, KMCSVM earns higher accuracy in diagnosis of fault locations and severities compared to the other three methods. The success of KMCSVM owes to the strategy for the construction of kernel matrix $\mathbf{K}$ and ${\mathbf{K}}_{t}$. This strategy can effectively suppress irrelevant features and mine the similarity degree of samples. So $\mathbf{K}$ and ${\mathbf{K}}_{t}$ can express the intraclass compactness and interclass separation more objectively than CTKSVM. RBFSVM and LSVM employ fixed kernels that have nothing to do with the analyzed samples, thus fall behind KMCSVM and CTKSVM. Hence, KMCSVM is a competitive method for REB fault diagnosis.
Table 12Average training time and test time of 4 classifiers using 12 datasets on fault severity
Sample set  Time (s)  5 folds cross validation  Independent validation  
KMCSVM  CTKSVM  LSVM  RBFSVM  KMCSVM  CTKSVM  LSVM  RBFSVM  
H vs. (S1+S2+S3)  Train  164.32  11.75  10.68  15.84  15.58  1.87  1.53  3.76 
Test  5.86  0.01  0  0.01  5.08  0  0  0  
S1 vs. (S2+S3)  Train  66.69  11.82  10.56  12.46  6.62  1.91  1.55  2.53 
Test  5.42  0  0.01  0  2.07  0  0  0  
S3 vs. (S1+S2)  Train  63.68  11.51  9.60  11.60  6.44  1.74  1.43  2.38 
Test  5.54  0  0  0  2.15  0  0  0  
S2 vs. (S1+S3)  Train  61.69  12.16  10.27  12.42  6.49  1.89  1.56  2.47 
Test  5.22  0  0  0  2.16  0  0  0  
S2 vs. S3  Train  17.85  8.74  7.38  8.52  2.14  1.52  1.26  1.53 
Test  0.98  0  0  0  0.65  0  0  0  
S1 vs. S2  Train  17.30  9.24  7.89  8.97  2.10  1.53  1.30  1.60 
Test  1.03  0  0  0  0.64  0  0  0  
S1 vs. S3  Train  17.02  7.78  6.72  7.73  2.30  1.35  1.16  1.45 
Test  0.95  0  0  0  0.67  0  0  0 
5. Conclusions
In this study, KMCSVM based on kernel matrix construction is proposed to carry out nonlinear classification for REB defects. The results of fault locations and severities identification verify that KMCSVM can achieve higher accuracy for bearing fault diagnosis than the other SVM classifiers. KMCSVM also has the ability to keep robust against the load interferences and detects defects at earlier time, which is significant for REB condition monitoring. In addition, the effectiveness of KMCSVM can help to predict deterioration degree and remaining lifetime of bearing. Summarily, KMCSVM demonstrates its great advantages and potential in rotating machinery fault diagnosis.
References

Lou X. S., Loparo K. A. Bearing fault diagnosis based on wavelet transform and fuzzy inference. Mechanical Systems and Signal Processing, Vol. 18, Issue 5, 2004, p. 10771095.

Safizadeh M. S., Latifi S. K. Using multisensor data fusion for vibration fault diagnosis of rolling element bearings by accelerometer and load cell. Information Fusion, Vol. 18, 2014, p. 18.

Fan Z. Q., Li H. Z. A hybrid approach for fault diagnosis of planetary bearings using an internal vibration sensor. Measurement, Vol. 64, 2015, p. 7180.

SaucedoEspinosa M. A., Escalante H. J., Berrones A. Detection of defective embedded bearings by sound analysis: a machine learning approach. Journal of Intelligent Manufacturing, Vol. 28, Issue 2, 2017, p. 489500.

ElThalji I., Jantunen E. A summary of fault modelling and predictive health monitoring of rolling element bearings. Mechanical Systems and Signal Processing, Vols. 6061, 2015, p. 252272.

Wu C. X., Chen T. F., Jiang R., Ning L. W., Jiang Z. ANN based multiclassification using various signal processing techniques for bearing fault diagnosis. International Journal of Control and Automation, Vol. 8, Issue 7, 2015, p. 113124.

Samanta B., Al Balushi K.R. Artificial neural network based fault diagnostics of rolling element bearings using timedomain features. Mechanical Systems and Signal Processing, Vol. 17, Issue 2, 2003, p. 317328.

Cheng J. S., Yu D. J., Yang Y. A fault diagnosis approach for roller bearings based on EMD method and AR model. Mechanical Systems and Signal Processing, Vol. 20, Issue 2, 2006, p. 350362.

Wang C. C., Kang Y., Shen P. C., Chang Y. P., Chung Y. L. Applications of fault diagnosis in rotating machinery by using time series analysis with neural network. Expert Systems with Applications, Vol. 37, Issue 2, 2010, p. 16961702.

Li H. K., Lian X. T., Guo C., Zhao P. S. Investigation on early fault classification for rolling element bearing based on the optimal frequency band determination. Journal of Intelligent Manufacturing, Vol. 26, Issue 1, 2015, p. 189198.

Rai V. K., Mohanty A. R. Bearing fault diagnosis using FFT of intrinsic mode functions in HilbertHuang transform. Mechanical Systems and Signal Processing, Vol. 21, Issue 6, 2007, p. 26072615.

Tsao W. C., Li Y. F., Le D. D., Pan M. C. An insight concept to select appropriate IMFs for envelop analysis of bearing fault diagnosis. Measurement, Vol. 45, Issue 6, 2012, p. 14891498.

Dong G. M., Chen J., Zhao F. G. A frequencyshifted bispectrum for rolling element bearing diagnosis. Journal of Sound and Vibration, Vol. 339, 2015, p. 396418.

Xue X. M., Zhou J. Z., Xu Y. H., Zhu W. L., Li C. S. An adaptively fast ensemble empirical mode decomposition method and its applications to rolling element bearing fault diagnosis. Mechanical Systems and Signal Processing, Vol. 62, Issue 63, 2015, p. 444459.

Yan R. Q., Gao R. X., Chen X. F. Wavelets for fault diagnosis of rotary machines: a review with applications. Signal Processing, Vol. 96, 2014, p. 115.

Li C., Liang M. A generalized synchrosqueezing transform for enhancing signal timefrequency representation. Signal Processing, Vol. 92, Issue 9, 2012, p. 22642274.

Ali J. B., Fnaiech N., Saidi L., ChebelMorello B., Fnaiech F. Application of empirical mode decomposition and artificial neural network for automatic bearing fault diagnosis based on vibration signals. Applied Acoustics, Vol. 89, 2015, p. 1627.

Patel V. N., Tandon N., Pandey R. K. Defect detection in deep groove ball bearing in presence of external vibration using envelope analysis and Duffing oscillator. Measurement, Vol. 45, Issue 5, 2012, p. 960970.

Pandya D. H., Upadhyay S. H., Harsha S.P. Fault diagnosis of rolling element bearing with intrinsic mode function of acoustic emission data using APFKNN. Expert Systems with Applications, Vol. 40, Issue 10, 2013, p. 41374145.

Wang Z. J., Han Z. N., Gu F. S., Gu J. X., Ning S. H. A novel procedure for diagnosing multiple faults in rotating machinery. ISA Transactions, Vol. 55, 2015, p. 208218.

Kankar P. K., Sharma S. C., Harsha S. P. Fault diagnosis of ball bearings using continuous wavelet transform. Applied Soft Computing, Vol. 11, Issue 2, 2011, p. 23002312.

Rafiee J., Tse P. W., Harifi A., Sadeghi M. H. A novel technique for selecting mother wavelet function using an intelligent fault diagnosis system. Expert Systems with Applications, Vol. 36, Issue 3, 2009, p. 48624875.

Zhang C. L., Li B., Chen B. Q., Cao H. R., Zi Y. Y., He Z. J. Weak fault signature extraction of rotating machinery using flexible analytic wavelet transform. Mechanical Systems and Signal Processing, Vol. 64, Issue 65, 2015, p. 162187.

Saravanan N., Siddabattuni V. N. S. K., Ramachandran K. I. Fault diagnosis of spur bevel gear box using artificial neural network (ANN) and proximal support vector machine (PSVM). Applied Soft Computing, Vol. 10, Issue 1, 2010, p. 344360.

Konar P., Chattopadhyay P. Bearing fault detection of induction motor using wavelet and support vector machines (SVMs). Applied Soft Computing, Vol. 11, Issue 6, 2011, p. 42034211.

Gan M., Wang C., Zhu C. A. Multipledomain manifold for feature extraction in machinery fault diagnosis. Measurement, Vol. 75, 2015, p. 7691.

Tang B. P., Song T., Feng Li, Deng L. Fault diagnosis for a wind turbine transmission system based on manifold learning and Shannon wavelet support vector machine. Renewable Energy, Vol. 62, 2014, p. 19.

Gharavian M. H., Ganj F. A., Ohadi A. R., Bafroui H. H. Comparison of FDAbased and PCAbased features in fault diagnosis of automobile gearboxes. Neurocomputing, Vol. 121, 2013, p. 150159.

Li F., Tang B., Yang R. S. Rotating machine fault diagnosis using dimension reduction with linear local tangent space alignment. Measurement, Vol. 46, Issue 8, 2013, p. 25252539.

Zhao M. B., Jin X. H., Zhang Z., Li B. Fault diagnosis of rolling element bearings via discriminative subspace learning: visualization and classification. Expert Systems with Applications, Vol. 41, Issue 7, 2014, p. 33913401.

Kan M. S., Tan A. C. C., Mathew J. A review on prognostic techniques for nonstationary and nonlinear rotating systems. Mechanical Systems and Signal Processing, Vol. 62, Issue 63, 2015, p. 120.

Purarjomandlangrudi A., Ghapanchi A. H., Esmalifalak M. A data mining approach for fault diagnosis: An application of anomaly detection algorithm. Measurement, Vol. 55, 2014, p. 343352.

Tamilselvan P., Wang P. F. Atrifold hybrid classification approach for diagnostics with unexampled faulty states. Mechanical Systems and Signal Processing, Vol. 50, Issue 51, 2015, p. 437455.

Ali J. B., ChebelMorello B., Saidi L., Malinowski S., Fnaiech F., Accurate bearing remaining useful life prediction based on Weibull distribution and artificial neural network. Mechanical Systems and Signal Processing, Vol. 56, Issue 57, 2015, p. 150172.

YunusaKaltungo A., Sinha J. K., Elbhbah K. An improved data fusion technique for faults diagnosis in rotating machines. Measurement, Vol. 58, 2014, p. 2732.

Saidi L., Ali J. B., Fnaiech F. Application of higher order spectral features and support vector machines for bearing faults classification. ISA Transactions, Vol. 54, 2015, p. 193206.

Wu C. X., Chen T. F., Jiang R., Ning L. W., Jiang Z. A novel approach to wavelet selection and tree kernel construction for diagnosis of rolling element bearing fault. Journal of Intelligent Manufacturing, 2015, https://doi.org/10.1007/s1084501510704.
Cited by
About this article
The work was supported by National High Technology Research and Development Program of China (2009AA11Z217). The authors would like to thank all the reviewers for giving valuable comments and constructive suggestions on this paper. The authors also thank Case Western Reserve University for downloading the bearing data freely.