Abstract
In view of the problem that the electroerosion fault signal is rare and weak during motor operation, and the database is seriously imbalanced, this paper proposes an ASMOTECFR training model based on adaptive minority oversampling technology. Four bearing vibration acceleration signals in different states were collected through experiments, and each signal obtained 32 sets of energy features using wavelet packet decomposition. Then ASMOTE technology is used to balance the energy features of electroerosion fault signal. And construct a vector matrix combined by energy features and bearing fault state features. Finally, the collaborative filter model of matrix decomposition is used to train and identify. The results show that the recognition rate of the ASMOTECFR model proposed in this paper is 98.46 %, which improves by 7 % compared with the traditional CFR, which verifies the effectiveness of this method.
1. Introduction
The formation mechanism of bearing electrical erosion fault is complex, during the operation of the motor, the coupling capacitance between the stator, rotor, winding and other components of the motor will form a circuit provided by the commonmode voltage inside the motor, which will cause the motor to generate shaft voltage and shaft current. The electroerosion fault of the bearing is that the shaft current is generated by the shaft voltage during the operation of the bearing. The shaft current breaks through the oil film, and sparks are generated through the lubricating oil film in the inner and outer rings and the working surface of the rolling elements, causing local melting and unevenness on the surface, that is early electroerosion fault [1]. As a special type of fault that affects the life of the bearing, the failure to suppress it in time will seriously shorten the service life of the bearing. In the field of practical engineering, how to accurately predict the electroerosion fault state is an urgent problem to study. In the process of obtaining comprehensive monitoring data, the explosion of data makes traditional identification methods less and less accurate. With the development of machine learning, collaborative filtering and recommendation technology has shown unique advantages in solving the problem of “information overload”. The process of monitoring the condition of motor bearings will appear the problem of collecting signal overload. collaborative filtering and recommendation technology is an effective solution to solve the problem of information overload. This paper [2] applies the theory of collaborative filtering to the field of fault diagnosis of civil aircraft. And calculates the similarity between interrupt layers by Pearson’s method and vector cosine method.
This paper [3] uses a collaborative filtering algorithm based on realtime status data to recommend faults for online electric multiple units, and further provides fault accessibility schemes through the knowledge base. However, the above methods are memorybased collaborative filtering that relies on the calculation of fault similarity, and the sparsity of the fault data often leads to inaccurate similarity calculations, which cannot identify more general conditions and more general failures. In the practice of bearing condition monitoring, a large amount of bearing normal condition data is often obtained, compared to which the bearing failure state data is much less [4]. For supervised learning, this creates a data imbalance problem, and in simple terms, one class has significantly more samples than another class [5]. From a learning point of view, minority classes usually contain more important classification information, and a small number of samples are more expensive to misclassify [6]. The collaborative filtering algorithm for rolling bearing fault identification proposed in the paper [7] only considers the balanced training set, and may generate an unsatisfactory classification model in the face of unbalanced data, and some information about bearing fault status is often drowned in a large number of bearing normal data.
Aiming at the data imbalance problem in practical applications, collaborative filtering based on matrix decomposition has excellent performance in dealing with the sparse problem of faulty data [8]. Adaptive minority oversampling technology can solve the problem of sparse fault data [9]. In this paper, aiming at the problems that the electroerosion fault data monitored under complex working conditions is small and it is not easy to quickly diagnose the bearing fault type in a large amount of data, an ASMOTECFR system suitable for motor bearing electroerosion fault identification is proposed. Firstly, the typical characteristics are extracted from the vibration data of each state of the bearing. and then the number of samples of the electroerosion fault data to be synthesized is determined by calculating the imbalance of the electroerosion fault data in the data set. Then, the equalization of the typical characteristics of bearings by ASMOTE technology is realized. Then, the equalized typical feature matrix is used as the bearing fault characteristic scoring matrix of the collaborative filtration system, and then a scoring matrix that can accurately describe each state of the bearing is designed, and finally the matrix of these two different characteristics are organically combined to obtain the joint scoring matrix of bearing state recognition. And the collaborative filtering algorithm based on matrix decomposition identifies the electroerosion fault. Compared with traditional CFR, the new method can mine more fault information and achieve a higher recognition rate.
2. Adaptive SMOTE oversampling technology
The adaptive synthetic minority sample oversample method is to set the new number of minority sample samples to be generated according to the balance degree of the sample data distribution [10]. Then, its distribution ratio is calculated for each minority sample, and then a different number of new samples are synthesized for minority samples [11], that is, Adaptive Synthetic Minority Oversampling, and abbreviated as ASMOTE. Based on the bearing data subsequent in this article, the process of ASMOTE algorithm is introduced in detail [14].
Step 1: Calculate the imbalance.
Assuming the smallest number of electroerosion fault failures in the selected group $u$ bearing data, assuming group ${X}_{s}$, the number of bearings in normal condition is the largest, assuming group ${X}_{m}$, the imbalance of the bearing sample data as shown in Eq. (1):
Step 2: Calculate the number of minority samples to be synthesized.
The unbalance in step 1 is further calculated according to Eq. (2) for the minority sample size to be synthesized:
where when $d=$ 1, the number of new samples synthesized is exactly balanced with the number of samples of the majority class.
Step 3: Find the minority sample in the $X$ sample data, and use the Euclidean distance method to find the $X=\left\{{X}_{1},{X}_{2}\cdots ,{X}_{n}\right\}$ close neighbors of the minority sample dataset [15]. In these $k$ neighbors, suppose $\u2206i$ is the number of samples of the majority class among $k$ neighbors. Note that the proportion of the number of samples of most classes in the adjacent samples of $k$ is $r$, as shown in Eq. (3):
Step 4: For each minority class sample ${r}_{i}$ obtained in step three, normalize it as shown in Eq. (4):
Step 5: For a small number of samples in the bearing experimental data, the number of new samples should be synthesized according to the following Eq. (5):
Step 6: Among the $k$ neighbors of each minority type of electroerosion fault sample ${x}_{i}$ to be synthesized, select 1 electrical erosion fault sample ${x}_{zi}$ according to Eq. (6):
where $\lambda $ (0 <$\lambda $< 1) is the proportion factor of a small number of new samples to be synthesized, repeat the above steps for adaptive oversampling until the number required in step 5 is met.
2.1. Collaborative filtering recommendation techniques
The core idea of collaborative filtering is [11]: By analyzing the user’s existing behavior, find a neighbor user similar to the target user in a large number of user groups, and predict the target user’s preferences according to the neighbor user's evaluation of a certain information [13].
Next, taking the movie system as an example, the collaborative filtering algorithm based on matrix decomposition is introduced in detail:
Table 1 shows the UserMovie rating table. from which the userMovie rating matrix is available, as is shown in Eq. (7):
where $m$ represents the user, $n$ indicates the different types of movies, $R$ is listed in row $i$, and the column $j$ element is recorded as what the user $i$ scores on the movie $j$. The movie recommendation system aims to [16].
For the movies without the ratings of the user $i$, give the prediction score of the user $i$ through the recommendation system, and the corresponding recommendation is given to the user $i$ accordingly.
The idea of the collaborative filtering algorithm based on matrix decomposition is to decompose the scoring matrix with higher dimension into the product of two lower dimensional matrices that are the user latent factor matrix and the movie latent factor matrix, respectively.
Where $k\ll \mathrm{min}\left(m,n\right)$ is the number of latent factor features, the inner products of ${P}_{km}$ and ${Q}_{kn}^{T}$ constantly approach the original matrix ${R}_{mn}$ by iterating, and then make the score prediction for the items that users do not score. As shown in Eq. (8):
Table 1“UserMovie” score table
Movie 1  Movie 2  $\cdots $  Movie $n$  
User 1  ${r}_{1}^{\left(1\right)}$  ${r}_{2}^{\left(1\right)}$  ${r}_{n}^{\left(1\right)}$  
User 2  ${r}_{1}^{\left(2\right)}$  ${r}_{2}^{\left(2\right)}$  ${r}_{n}^{\left(2\right)}$  
$\vdots $  $\ddots $  
User m  ${r}_{1}^{\left(m\right)}$  ${r}_{2}^{\left(m\right)}$  ${r}_{n}^{\left(m\right)}$ 
The collaborative filtering algorithm based on matrix decomposition is used to learn the best user potential factor matrix and the movie potential factor matrix to predict the score of the user items [17]:
For the existing n score records, the loss function for each score was calculated using error squares. As shown in Eq. (10):
To prevent overfitting, a regularization term was added to the q. As shown in Eq. (11):
where $\lambda $ is the regularization coefficient, further, the gradient descent method is used to handle the above minimization problem. The core problem of the matrix factorization model is to minimize $L\left(P,Q,R\right)$, which minim the overall loss function of the above formula by finding suitable parameters $P$ and $Q$.
2.2. Construction of the adaptive oversampling score matrix
In order to diagnose bearing electrocorrosion faults accurately from a large number of unbalanced data, this paper selects db3 wavelet base for each set of vibration data collected, performs 5layer wavelet packet decomposition, and constructs 32 wavelet packet energy features into a bearing typical characteristic matrix. As shown in Table 2.
Table 2Bearing feature score table
${S}^{1}$  ${S}^{2}$  $\cdots $  ${S}^{k}$  ${S}^{k+1}$  $\cdots $  ${S}^{u}$  
${f}_{a0}^{i}$  ${f}_{a0}^{1}$  ${f}_{a0}^{2}$  $\cdots $  ${f}_{a0}^{k}$  ${f}_{a0}^{k+1}$  $\cdots $  ${f}_{a0}^{u}$ 
${f}_{a1}^{i}$  ${f}_{a1}^{1}$  ${f}_{a1}^{2}$  $\cdots $  ${f}_{a1}^{k}$  ${f}_{a1}^{k+1}$  $\cdots $  ${f}_{a1}^{u}$ 
$\vdots $  $\vdots $  $\vdots $  $\ddots $  $\vdots $  $\vdots $  $\ddots $  $\vdots $ 
${f}_{ab}^{i}$  ${f}_{ab}^{1}$  ${f}_{ab}^{2}$  $\cdots $  ${f}_{ab}^{k}$  ${f}_{ab}^{k+1}$  $\cdots $  ${f}_{ab}^{u}$ 
The ${S}^{u}$ means that the vibration signals of various fault states and normal states of the $u$ group bearings are collected. ${f}_{ab}^{i}$ represents the collected group $i$ bearing vibration data, extracting $\mathrm{a}\mathrm{b}$ typical characteristic values.
In the actual collected sample data, the vibration data of bearing electro corrosion failure is very small, therefore, for this unbalanced data set, in this paper, ASMOTE technology is adopted to adaptively oversample a few classes of bearing electro corrosion fault data. In order to achieve a balanced sample set. Assuming that the electroerosion fault data is increased from the previous $u$ group to group $w$ after ASMOTE, the typical characteristic score table of the bearing samples in group $w$ after ASMOTE is as Table 3. Where ${S}^{w}$ represents the data of various states of group $w$ bearings after ASMOTE, including fault status and normal state. ${f}_{ab}^{i}$ indicates the extraction of ab typical features from group $i$ bearing sample data.
Table 3ASMOTE rear bearing typical feature score table
${S}^{1}$  ${S}^{2}$  $\cdots $  ${S}^{k}$  ${S}^{k+1}$  $\cdots $  ${S}^{w}$  
${f}_{a0}^{i}$  ${f}_{a0}^{1}$  ${f}_{a0}^{2}$  $\cdots $  ${f}_{a0}^{k}$  ${f}_{a0}^{k+1}$  $\cdots $  ${f}_{a0}^{w}$ 
${f}_{a1}^{i}$  ${f}_{a1}^{1}$  ${f}_{a1}^{2}$  $\cdots $  ${f}_{a1}^{k}$  ${f}_{a1}^{k+1}$  $\cdots $  ${f}_{a1}^{w}$ 
$\vdots $  $\vdots $  $\vdots $  $\ddots $  $\vdots $  $\vdots $  $\ddots $  $\vdots $ 
${f}_{ab}^{i}$  ${f}_{ab}^{1}$  ${f}_{ab}^{2}$  $\cdots $  ${f}_{ab}^{k}$  ${f}_{ab}^{k+1}$  $\cdots $  ${f}_{ab}^{w}$ 
2.3. Build a joint scoring matrix based on adaptive oversampling
The difficulty to be solved by introducing collaborative filtration technology into the identification of bearing electroerosion faults is how to construct a suitable scoring matrix to complete the prediction according to the characteristics of bearing electroerosion data. In view of this problem, we have established a scoring matrix that can reflect the typical characteristics of the bearing in Table 3. The score level reflects the fault characteristics at each moment of the bearing, that is, the degree of “preference” in the recommendation system. Further, a state scoring matrix reflecting each bearing state is designed for different bearing states. In the bearing state scoring matrix, the maximum value 1 is given to the known bearing status score, while the minimum value $\epsilon \left(\le \frac{1}{10000}\right)$ is given to the absent state score, as shown in the Table 4.
Table 4Bearing feature score table
${S}^{\left(1\right)}$  ${S}^{\left(2\right)}$  $\cdots $  ${S}^{\left(h\right)}$  ${S}^{\left(h+1\right)}$  $\cdots $  ${S}^{\left(w\right)}$  
State ${Z}^{\left(1\right)}$  1  $\epsilon $  $\cdots $  $\epsilon $  ${p}_{1}^{\left(h+1\right)}$  $\cdots $  ${p}_{1}^{\left(w\right)}$ 
State ${Z}^{\left(2\right)}$  $\epsilon $  1  $\cdots $  $\epsilon $  ${p}_{2}^{\left(h+1\right)}$  $\cdots $  ${p}_{2}^{\left(w\right)}$ 
$\vdots $  $\vdots $  $\vdots $  $\ddots $  $\vdots $  $\vdots $  $\ddots $  $\vdots $ 
State ${Z}^{\left(v\right)}$  $\epsilon $  $\epsilon $  $\cdots $  1  ${p}_{v}^{\left(h+1\right)}$  $\cdots $  ${p}_{v}^{\left(w\right)}$ 
As shown in the Table 4, the bearing has $Z$ states. For the one section of bearing vibration data collected, if the corresponding bearing state is ${Z}^{\left(1\right)}$, the maximum value is 1 at this position, and the other states ${Z}^{\left(2\right)}\cdots {Z}^{\left(v\right)}$ is given the minimum value $\epsilon \left(\le \frac{1}{10000}\right)$.
This method is used to assign bearing status score to the sample after ASMOTE, which constitutes the bearing condition scoring matrix, and finally, the typical feature matrix reflecting the bearing characteristics and the bearing condition scoring matrix reflecting the bearing state are organically combined to obtain the joint scoring matrix of bearing condition recognition.
2.4. ASMOTECFR bearing fault identification method
The specific operation methods of how to use ASMOTECFR to identify the various states of bearings are described in detail.
Assuming that the vibration signal data ${S}^{\left(1\right)},\cdots ,{S}^{\left(h\right)},{S}^{\left(h+1\right)},\cdots ,{S}^{\left(w\right)}$, of the $u$ group of rolling bearing exists, the sample data has a total of $w$ group after ASMOTE, and there are $v$ different types of states ${Z}^{\left(1\right)},{Z}^{\left(2\right)},\cdots ,{Z}^{\left(v\right)}$ for this $w$ group of signal data. Now that the state of the first $h$ group training data ${S}^{\left(1\right)},\cdots ,{S}^{\left(h\right)}$ is known, our goal is to use ASMOTECFR to identify the state of the subsequent $uh+1$ group test data ${S}^{\left(h+1\right)},\cdots ,{S}^{\left(w\right)}$.
Further, the group I signal data ${S}^{\left(i\right)}\left(i=1,\cdots ,h,h+1,\cdots ,u\right)$ is extracted from 32 typical bearing eigenvalues using wavelet packet decomposition for normalization, and the normalized eigenvector ${T}^{\left(i\right)}$ is obtained as follows:
where ${f}_{ab}^{\left(i\right)}\in \left(\mathrm{0,1}\right)$.
According to the characteristic vector ${T}^{\left(i\right)}$, design the bearing characteristic scoring table as shown in the previous section, and further get the corresponding scoring matrix $A\in {R}^{\left(b+1\right)\times w}$, as shown in Eq. (13):
According to the various states of the bearing, and then design the bearing state characteristic scoring table as shown in Table 4, to get the corresponding scoring matrix $B\in {R}^{v\times w}$, as shown in Eq. (14):
In the training data ${S}^{\left(1\right)},\cdots ,{S}^{\left(h\right)}$, for known states, the score is given a maximum value of 1, while for unknown states, the score is given a minimum value of $\epsilon \left(\le \frac{1}{10000}\right)$.
For the test data ${S}^{\left({i}^{\text{'}}\right)}\left({i}^{\text{'}}=h+1,\cdots ,w\right)$, the ${Z}^{\left(t\right)}\left(t=\mathrm{1,2},\cdots ,v\right)$ status is unknown, and the score is given a zero value. Combine the bearing feature scoring matrix $A$ and the bearing state scoring matrix $B$ to obtain the joint scoring matrix $C$ for the bearing state recognition:
where $C\in {R}^{d\times w}$, and $d=ab+1+v$.
For motor bearing failures, our ASMOTECFR approach to identify the following objectives:
The joint scoring matrix $C$ is decomposed into $P\in {R}^{u\times k}$ and $Q\in {R}^{d\times k}$, that is $C=PQ$, and the prediction score ${r}_{t}^{{i}^{\text{'}}}$ of the test data ${S}^{\left(i\mathrm{\text{'}}\right)}{S}^{{(i}^{\text{'}})}$ to the state ${Z}^{\left(t\right)}$ is given according to the two feature matrices, specifically as follows:
By finding the optimal parameters $\theta $ and $X$, the overall loss function is minimized, as specified in Eq. (17):
where $a$ is the regularization coefficient.
Finally, the parameters $P$ and $Q$ are optimized by gradient descent and alternating least squares methods, thus obtaining the prediction score ${r}_{t}^{\left({i}^{\text{'}}\right)}$ of the state ${Z}^{\left(t\right)}$ from the test data ${S}^{\left({i}^{\text{'}}\right)}$:
Then the state ${Z}^{\left(t\right)}$ corresponding to the highest score ${r}_{t}^{\left({i}^{\text{'}}\right)}$ is the state of the identified test data ${S}^{\left({i}^{\text{'}}\right)}$.
3. Experimental verification and analysis
In this paper, the proposed ASMOTECFR method is used to verify the experimental data of bearing electroerosion failure for the vibration data of complex motor bearings.
The electroerosion fault failure test bench is shown in Fig. 2, and the data acquisition system is the PULSE system of Danish B&K. The experimental data of outer ring pitting corrosion, outer ring crack, outer ring electrical corrosion failure and normal. Four states of deep groove ball bearing type 6205EKA were collected shown in Fig. 1. (Photos were taken by Huanke Cheng in the fault diagnosis laboratory of Hunan University of Science and Technology). In the data acquisition, the motor speed is 600 rpm and 1200 rpm respectively, the sampling frequency is 16384 HZ, and the sampling time is 10 s. A total of 402 groups of outer ring pitting corrosion, 390 groups of outer ring cracks, 85 groups of bearing electroerosion failures, 423 groups of normal groups, and a total of 1300 groups of data samples were selected.
First, 32 energy bearing features were collected for each data group by using wavelet packet decomposition Then 1300 sets of sample data are divided into training and test sets according to the 8:2 ratio. Among them, there were 1040 groups of training sets, which were 322 groups of outer ring pitting, 312 groups of outer ring cracks, 68 groups of outer ring electroerosion and 338 groups of normal data. and then ASMOTE technology was used to balance the electroerosion fault in the training set.
In the ASMOTE process, we set the following parameters: the unbalance is calculated to be 0.5, and this paper sets 3 times oversampling, that is, the electroerosion fault data has been increased from the original 68 groups to the current 204 groups after ASMOTE.
Fig. 1Four states of deep groove ball bearing type 6205EKA: a) Outer ring pitting; b) outer ring crack; c) outer ring electroerosion fault; d) outer ring normal
a)
b)
c)
d)
Fig. 2The electroerosion fault failure test bench: (1) electric motor; (2) insulated coupling; (3) principal axis; (4) supporting bearing pedestal; (5) current loading device; (6) test bearing’s bearing pedestal; (7) vibration acceleration sensor; (8) insulated bearing; (9) base; (10) current simulator
The electroerosion fault data after ASMOTE is as follows Table 5.
Table 5Electroerosion fault data in the postASMOTE training set
${x}_{1}$  ${x}_{2}$  $\cdots $  ${x}_{130}$  ${x}_{131}$  $\cdots $  ${x}_{203}$  ${x}_{204}$  
${f}_{0}$  0.1221  0.1324  $\cdots $  0.1231  0.1230  $\cdots $  0.1133  0.1259 
${f}_{1}$  0.0281  0.0253  $\cdots $  0.0278  0.0213  $\cdots $  0.0281  0.0239 
$\vdots $  $\vdots $  $\vdots $  $\ddots $  $\vdots $  $\vdots $  $\ddots $  $\vdots $  $\vdots $ 
${f}_{30}$  0.0056  0.01632  $\cdots $  0.0034  0.0152  $\cdots $  0.0121  0.01172 
${f}_{31}$  0.011  0.0138  $\cdots $  0.0122  0.0138  $\cdots $  0.01383  0.01379 
Next, ASMOTECFR is used for state recognition of 1176 sets of data in the training set and of 260 sets of data in the test set.
Table 6 shows joint score of bearing condition recognition for the test set and the training set. where ${x}_{1}$${x}_{260}$ is the test set and ${x}_{261}$${x}_{1436}$ is the training set.
Table 6Joint scoring of bearing state recognition of test set and training set after ASMOTE
${x}_{1}$  ${x}_{2}$  $\cdots $  ${x}_{259}$  ${x}_{260}$  $\cdots $  ${x}_{1435}$  ${x}_{1436}$  
${f}_{0}$  0.03723  0.04367  $\cdots $  0.04491  0.11187  $\cdots $  0.08138  0.11093 
${f}_{1}$  0.06781  0.06941  $\cdots $  0.07192  0.07784  $\cdots $  0.12364  0.07972 
${f}_{2}$  0.04327  0.04329  $\cdots $  0.04204  0.06093  $\cdots $  0.07948  0.06539 
$\vdots $  $\vdots $  $\vdots $  $\ddots $  $\vdots $  $\vdots $  $\ddots $  $\vdots $  $\vdots $ 
${f}_{30}$  0.00843  0.00834  $\cdots $  0.00757  0.01893  $\cdots $  0.00741  0.01328 
${f}_{31}$  0.02081  0.01921  $\cdots $  0.02024  0.01273  $\cdots $  0.01238  0.01478 
Table 7Scoring results under the best model parameters
${x}_{1}$  ${x}_{2}$  $\cdots $  ${x}_{259}$  ${x}_{260}$  $\cdots $  ${x}_{1435}$  ${x}_{1436}$  
State 1  0.51327  0.32603  $\cdots $  0.48235  –0.0311  $\cdots $  0.9972  0.0534 
State 2  0.20326  0.18564  $\cdots $  0.24582  0.0632  $\cdots $  0.0963  –0.0767 
State 3  0.14359  0.26347  $\cdots $  0.19732  0.99914  $\cdots $  0.0693  0.0232 
State 4  0.16326  0.10587  $\cdots $  0.11904  0.054597  $\cdots $  –0.0934  0.9994 
Table 7 shows the scoring results under the best model parameters, where ${x}_{1}$${x}_{260}$ is the test set and ${x}_{261}$${x}_{1436}$ is the training set.
Table 8ASMOTECFR test set recognition rate
$\lambda $  8  9  10  11  12 
0.002  81.72 %  83.21 %  76.39 %  97,34 %  90.56 % 
0.0025  83.12 %  79.34 %  80.34 %  96.89 %  89.05 % 
0.003  83.20 %  79.19 %  79.38 %  97.45 %  83.24 % 
0.0035  77.19 %  80.36 %  79.03 %  98.03 %  91.06 % 
0.004  76.82 %  82.52 %  78.95 %  98.46 %  81.58 % 
From Table 8 it can be seen that when the regularization coefficients $\lambda $ = 0.004 and $K=$ 11, the bearing condition score of the test set reaches 98.46 %. When $K$ = 11, $\lambda $ takes0.02, 0.0025, 0.003 and 0.0035, the bearing status of the test set reaches 97.34 %, 96.89 %, 97.45 % and 98.03 % accuracy respectively. The performance of the model on the test set is evaluated, and it is proved that the model has good generalization ability under this parameter.
Table 9CFR test set recognition rate
$\lambda $  8  9  10  11  12 
0.002  88.46 %  87.31 %  88.08 %  89.23 %  91.92 % 
0.0025  85.00 %  85.00 %  82.69 %  88.46 %  91.92 % 
0.003  84.61 %  84.62 %  82.70 %  90.38 %  91.15 % 
0.0035  84.23 %  84.23 %  81.15 %  90.76 %  91.19 % 
0.004  83.08 %  84.23 %  88.85 %  90.76 %  91.15 % 
Table 9 show the results of the model directly identified by CFR technology using different regularization coefficients $a$ and feature number $k$, relying on the training set and crossvalidation set. As can be seen from Table 9 when the regularization coefficient $\lambda $ = 0.002 and $k$ = 12, the scoring result of the test set reaches 91.92 %. When $k$= 12, $\lambda $ take 0.0025, 0.003 and 0.0035, the bearing state of the test set reaches 91.92 %, 91.15 % and 91.19 % accuracy respectively. In comparison, the highest recognition accuracy of CFR can only reach 91.92 %.
Fig. 3. compares the fault identification accuracy of ASMOTECFR and CFR methods for different regularization coefficients $\lambda $ when $K=$11. In contrast, ASMOTECFR has an accuracy rate of more than 98 % when $\lambda =$0.004,$k=$11. Table 10 shows the specific identification of the various states of the bearing when $\lambda =$ 0.004, $k=$ 11 is in ASMOTECFR.
The results show that ASMOTECFR can effectively identify various states of bearings, especially for the electroerosion failure of motor bearings, the accuracy rate more than 98 %, and the optimized parameters have good generalization ability.
Fig. 3ASMOTE CFR and CFR technology accurate recognition rate comparison
Table 10Test set identification effect
State  Number of tests  Identify the results right error  Recognition rate 
Outer ring crack  80  78 2  97.5 % 
Outer ring Pitting  78  77 1  98.7 % 
Outer ring electroerosion fault  17  17 0  100 % 
Outer ring normal state  85  84 1  98.82 % 
Total  260  256 4  98.755 % 
4. Conclusions
In this paper, aiming at the problem that the electroerosion fault signal is rare and weak during motor operation, and the database is seriously imbalanced, a set of ASMOTECFR method suitable for motor bearing electroerosion fault identification is proposed, and the results show that:
1) The overall identification rate of bearing faults by ASMOTECFR method exceeds 97.5 %, and the identification rate of electroerosion faults reaches 100 %.
2) Compared with the data test without ASMOTE, the recognition accuracy of ASMOTECFR method can be improved by 7 % under the same parameters.
References

E. H. E. Bouchikhi, V. Choqueuse, and M. E. H. Benbouzid, “Current frequency spectral subtraction and its contribution to induction machines’ bearings condition monitoring,” IEEE Transactions on Energy Conversion, Vol. 28, No. 1, pp. 135–144, Mar. 2013, https://doi.org/10.1109/tec.2012.2227746

J. H. Zhu, J. H. Liu, and X. J. Yang, “Study on failure causes and prevention of aviation rolling bearings,” Equipment Management and Maintenance, Vol. 2020, No. 13, pp. 67–68, 2020, https://doi.org/10.16621/j.cnki.issn10010599.2020.07.34

M. L. Guo, “Research and Realization of EMU’s Operation and maintenance decisionmaking recommended technology based on knowledge base,” M.E. thesis, Beijing Jiao tong University, Beijing, China, 2015.

J. W. Zhang, L. M. Guo, and X. M. Yang, “Improved algorithm for oversampling and random forest for unbalanced data,” Computer Engineering and Applications, Vol. 56, No. 11, pp. 39–45, 2020, https://doi.org/10.3778/j.issn.10028331.19080338

Y. T. Yan, Y. W. Zhu, Z. B. Wu, and Y. W. Zheng, “SMOTE oversampling method for constructive overlay algorithm,” Computer Science and Exploration, Vol. 14, No. 6, p. 975, Jun. 2020, https://doi.org/10.3778/j.issn.16739418.1905091

Z. L. Zhang and Y. B. Feng. Z. K. Zhao, “An Oversampling method for unbalanced datasets based on SVM,” Computer Engineering and Applications, Vol. 56, No. 23, pp. 220–228, 2020, https://doi.org/10.3778/j.issn.10028331.20060449

K. Dai, “An empirical study on data sampling problem of imbalance classification,” Central China Normal University, 2020.

G. Wang, Y. He, Y. Peng, and H. Li, “Bearing fault identification method based on collaborative filtering recommendation technology,” Shock and Vibration, Vol. 2019, pp. 1–9, May 2019, https://doi.org/10.1155/2019/7378526

S. H. Yang, C. H. Zhou, Y. M. Jiang, F. Q. Zhang, and T. Zhang, “An improved oversampling algorithm for unbalanced data BNSMOTE,” Computer and Digital Engineering, Vol. 48, No. 9, pp. 2108–2113, 2020, https://doi.org/10.3969/j.issn.16729722.2020.09.007

X. H. Zhao, “Research on ensemble classification algorithm of unbalanced data based on oversampling,” Chongqing University, 2020.

F. F. Zhang, L. M. Wang, and Y. M. Chai, “An improved oversampling unbalanced data ensemble classification algorithm,” Journal of Chinese Computer Systems, Vol. 39, No. 10, pp. 2162–2168, 2018.

Y. H. He, “Identification method of motor bearing current damage based on collaborative filtration,” Hunan University of Science and Technology, 2020.

S. Wang, X. M. Sun, and Y. B. Gao, “Personalized product recommendation method based on neural collaborative filtering,” information technology, Vol. 355, No. 6, pp. 143–147, 2021, https://doi.org/10.13274/j.cnki.hdzj.2021.06.026

J. Xu, “Research on adaptive unbalanced data classification,” Beijing Jiao tong University, 2020.

Z. Sun, Q. Song, X. Zhu, H. Sun, B. Xu, and Y. Zhou, “A novel ensemble method for classifying imbalanced data,” Pattern Recognition, Vol. 48, No. 5, pp. 1623–1637, May 2015, https://doi.org/10.1016/j.patcog.2014.11.014

P. Y. Li, “Personalized recommendation algorithm based on collaborative filtering,” Hebei University of Architecture and Engineering, 2022.

L. Z. Cheng, “Research on Hybrid Collaborative Filtering Algorithm for Mature Users Based on Matrix Decomposition,” Anhui University of Finance and Economics, 2022.
About this article
Acknowledgements Financial support from National Natural Science Foundation of China (51575178), financial support from Hunan Natural Science Foundation of China (2018JJ2120).
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
The authors declare that they have no conflict of interest.