Test case simplification based on coupling metrics in software bug location

. Software test cases are one of the most critical aspects of software testing in the product development process. As software products are updated several times, the same test requirement may be covered by multiple test cases, so this aspect is often redundant, yet the approximate test case set has an impact on its error detection rate. This study proposes the idea of using redundant test cases in software error location, introduces a coupling metric, analyses its program slicing and establishes a second coverage criterion in order to balance the relationship between the reduced test suite and the false detection rate the test case set. The results show that the size of test set and the number of error detection by the Ruby On Rails (ROR) method used in this study are larger than those of other commonly used reduction algorithms. The test suite has the lowest error detection loss rate, with an average of 17.96 % across the six test case sets. The highest error detection capability of individual test cases was found in the reduced test set, with a mean value of 90.63 % in the test set. The method also has the highest average reduction efficiency of 91.05 %. Compared with other simplification methods, the research method has a better balance between the size and false detection rate of the reduced test suite and the advantages of simplification.


Introduction
With the boom in computer technology, software testing has become an important way of ensuring software quality in the software industry, throughout the software maintenance and development cycle [1][2]. Software errors can occur in a variety of places, and the main method of locating errors based on program slicing is to analyse the program's data flow and control flow to decompose the program to be tested and identify statements related to a subset of the program's behaviour, thus narrowing down the scope of statement checking [3]. In addition, in the actual software testing process, redundancies in test suite can occur due to product updates and therefore test suite needs to be reduced. Maintaining a balanced relationship between the impact of test case set simplification on its error detection rate is an important study today [4]. It has been shown that software programs are prone to errors at locations with high coupling and that applying coupling to test case set simplification can help improve the error detection rate of test case sets, but existing coupling measures based on program slices do not distinguish between different types of nodes in the coupled common parts [5][6]. To this end, an improved coupling metric based on program slicing in software error location is proposed, aiming to improve the efficiency of test case set reduction while better balancing its relationship with the size of the reduced test case set.

Related work
As software testing has received more and more attention in the development of software products, software use case testing has also received sufficient attention. As software products need to be updated and upgraded several times, the number of test cases also increases. Therefore, how to improve the parsimony rate of the test case set and achieve a balance between it and the parsimony size of the test case set has become a hot topic of research today. Tempero E. et al. proposed a coupling model that relies on the concretization of concepts as the basic unit in order to solve the problem caused by incomplete metric definitions, and experiments have shown the superiority of this model [7]. Dutta et al. proposed a mutation-based fault location technique that can identify incorrect statements by calculating the proximity to different mutants [8]. Yan et al. established a framework based on entropy, which was called Efilter and could filter untagged test cases to solve the problem of filtering untagged test cases. The efficiency of fault location was significantly improved with the use of Efilter [9]. Pradhan et al. proposed to determine the priority of test cases by defining the fault detection capability and test case dependency score, and by considering the results of run-time test case execution in order to obtain higher fault detection capability. The results showed that the method is effective and applicable [10]. Yong et al. determine the priority of test cases by defining the fault detection capability and test case dependency score, and by considering the results of run-time test case execution to reduce the execution cost of mutation-based error location techniques, a mutation execution method was proposed to prioritize implementation of variants and test suite. The experimental proved that this method can effectively reduce the execution cost of mutation tests while ensuring the error location rate [11].
Kumar L. et al. scholars put forward a framework of software fault prediction model based on cost and efficiency evaluation of testing phase, and experimental results showed that the model selected the best set of source code metrics for fault prediction [12]. Hellhake et al. researchers analysed the relationship that was between coupling measures of different components and interfaces and fault distribution in automobile system integration test. There is a positive correlation between fault propensity, the results verified that coupling is a valid indicator of fault propensity [13]. Yu et al. scholars proposed a new process metric based on evolutionary data of object-oriented programs, from the historical package defect rate and the degree of class variation, and the results verified that the method can improve the performance of evolution-oriented defect prediction [14]. Chiang et al. proposed an algorithm to improve fault detection. A four-weighted combination algorithm and a linear programming model were proposed to improve the effectiveness of fault detection. Experimental showed the fault detection rate of the software was improved, and the test set is significantly reduced [15]. Rathee et al. proposed an artificial intelligence metaheuristic-based algorithm to optimise the test sequence in order to reduce the time, effort, and redundancy used to design an optimised test sequence. The algorithm was shown to be effective [16].
The research on test case reduction techniques by scholars at home and abroad shows that there are relatively few research results on their use in software error location. As it is necessary to reduce the test suite, the change in size of the test suite will affect the error detection rate, the coupling metric can help the error detection rate, and the high coupling position can help detect software errors. The study therefore proposes a coupling metric-based test case reduction method for software error location, aiming to improve the reduction rate of the test suite, and balance the change of test suite size after reduction.

Coupling metrics and test case set simplification methods in software error location
Software error location is a dynamic software debugging process in which the tester usually needs to record information generated by a particular test suite during the runtime of various program entities and locate the location of software errors based on this information [17][18]. During the run of the software, the information collected may be the number of times a program entity has been executed by a test case, covered by a test case. Then, according to different program spectrum formulas, this information is used to calculate the suspiciousness of the program entity, which are ranked, and the program statements are examined one at a time in rank order until an erroneous statement is found [19]. Firstly, the program to be tested is represented by and the entity representation formula for is given in Eq. (1): where, can be interpreted in various ways, such as a statement, method, class or a code fragment in a program. The program entity under study is the statement in the test program and represent the number of rows. The test suite of the program to be tested is shown in Eq. (2): where, indicates the result of the first test case, where 1 ≤ ≤ . If = 1, the output of the test case is different from the expected output, which means that the result of the test case fails. Conversely, if = 0 indicates that the test suite was successful. The program spectrum can be expressed as a coverage matrix, as in Eq. (4): where, represents the matrix of coverage relationships between the set of test cases and program entities, and is the matrix of × . the first row of indicates that when the test case is executed, whether the statements in are executed or not is determined by the coverage vector of the test case. The column in can be represented as the execution of the statement in by the test cases. If = 1, then the test case executes to the executable code line. If = 0, the test case is not executed to the line of executable code during the run.
Software metrics are studied in a wide range of ways, and software metrics can be divided into structure-oriented metrics, object-oriented metrics, methods, classes and objects are modules in object-oriented programming, and software design metrics are a node of object-oriented and process-oriented metrics. Coupling is a key point in the metrics. There are five types of coupling, which represent two types of interdependent metrics, namely inherited coupling, combined coupling, aggregated coupling, associative coupling, and usage coupling [20]. The coupling information between various classes includes the number of access attributes, the number of different methods called, the number of return types, and the number of parameters passed. Based on these four parameters the dependency between the client class and the service class can be measured by integrating the four parameters into one expression using "⋅ " and using this to express the coupling between the classes and as in Eq. (5): where, indicates the number of common variables of class used between classes and indicates the number of methods of class called by class. indicates the number of different return types that occur in and indicates the number of different parameters that occur in . The normalized estimation method for a complexity is denoted and can be expressed in its normalized form by Eq. (6): where, , represents the complexity information matrix of the test pile, and the rows and columns represent the dependencies of each class and class on class respectively. Where = , , , = 1,2, ⋯ , = , , , = 1,2, ⋯ . A test pile is estimated as shown in Eq. (7): where, , , , represents the weights and where, the values of , , , are obtained with the opener tool SOOT, and the values of , and , are obtained with the opener tool SOOT, representing the maximum and minimum values of the attribute and method coupling matrix, respectively, of the statistical data. Respectively, constituted by the statistical data.
, , , is calculated according to Eq. (9): where, the values of , , , are obtained through the opener tool SOOT, and the values of , and , are obtained through the opener tool SOOT, representing the maximum and minimum values of the return value coupling matrix and the parameter coupling matrix constituted by the statistical data, respectively. For the weights , , , , the study proposes a calculation method as in Eq. (10): For a given test sequence , breaking the dependent edge , the overall test stub complexity for this test sequence is calculated as in Eq. (11): There are two main types of use case set simplification methods: direct simplification of the test suite, and analysis of each test requirement first to find out its interconnection, and then simplification of the test case based on its connection line about the test requirement [21]. The requirements-based simplification algorithm requires first delineating the relationships between the test requirements, as shown in Fig As in Fig. 1, if the test requirement contains , then any test case covering the test requirement will cover , and the test style will only test the requirement . If the test requirements and are mutually inclusive, then and are said to be equivalent test requirements and only one of them needs to be tested. If there is an overlap between the test requirements and, then the intersection of and is tested as one test requirement and and are removed. If they are independent of each other, then there is no intersection between their test cases and it is not possible to simplify the test requirements. After them have been reduced, the resulting test requirements and test cases are then reduced according to the hairpin algorithm.

Test case set simplification based on improved coupling metrics in software error location
After the test cases have been reduced, a key issue to consider is the test suite error detection rate. Numerous studies have shown that the error detection capability of the test suite is proportional to the size of the test case set, and increases with its increase [22][23]. Therefore, it is essential to ensure the error detection rate of the test suite when it is reduced. Traditional methods of computing coupling are based on information flow metrics, in contrast to slice-based metrics which are more accurate [24]. Mark Harman proposes to use slicing to compute slices for each primary variable in method , and defines to denote the flow from into a function . Its formula for calculating the coupling, as in Eq. (12): where, reference program, indicates the number of slice statements. represents the set potential operation (cardinalit), i.e. the operation that gets the number of elements in the collection takes the number of elements in a set. , definition stream represents the stream from functions to , where represents the program containing , .
, is calculated as Eq. (13): Define , to represent the coupling between the functions and , which is calculated as in Eq. (14): . .
The coupling that exists between different modules in an object-oriented program is analysed by replacing with , , , , and = 1,2 respectively. This is shown in Eq. (15): In Eq. (15), , , represents a backward slice, i.e. an utterance that is influenced by the variable . , , represents a forward slice, i.e. a statement that is influenced by the variable . If Eq. (15) is not ∅, then there is a coupling between and , which leads to a formula as in Eq. (16): where, , denotes the slicing criterion, denotes the statement number of the slicing point in the program, denotes the attribute variable there, and 0 ≤ , ≤ 1. , the coupling between the two is calculated as shown in Eq. (17): where, , , denotes the slicing criterion, denotes the method of slicing out, and represents the significant variable or set of variables for . The study analyses the backward slicing of , , in Eq. (17) and finds that it does not explore the different types of statement nodes in the common part of the slices in depth. For example, Fig. 2 shows the slice diagram corresponding to the method 1 and the method 2 in a program. S22 S23 S25 S26 S4 S2 S18 S19 S20 S13 S16 S10 S12 S6 S8  Fig. 2 24, 4, , , , ℎ . The result of the slicing of the graph consists of the remaining nodes 21, 22, 23, 1, 2 after removing the nodes in that graph. The intersection of the two slicing criteria is a directed graph with statement 17 as the root, and node 18 is used as a judgment statement, with a true result executing statement 19. Conversely, statement 20 is executed. If 1 executes statement 18 as true during execution and 2 executes statement 18 as false during execution, the two methods affect only statements 17, 18, 9, 10, and must jointly affect statements 9, 10, 17, 18.
The coupling calculation must consider all possible intersections of slices between two modules and distinguish their nodes by the probability of being affected. In case of coupling, the intersection of slices must be represented separately for nodes that will be executed and for nodes that are likely to be executed. Finding the set of nodes that must be executed when the method 1 is coupled with the method 2, the remaining node − in represents the node that has a certain probability of being affected, thus transforming Eq. (17) into Eq. (18): In Eq. (18), represents the average probability that the remaining nodes in the slice graph intersection are affected. The probability of each point in − being affected is 0.5. To simplify the process of calculating the metric value, the study takes to be 0.5. The improved coupling calculation is more rigorous and better represents the degree of coupling between modules, as in Eq. (19): In order to the relationship between the size of the test suite and its error detection rate, it has been proposed to add redundant test cases to the test suite [15]. The GE reduction algorithm was chosen to reduce the test suite through the first coverage standard, and then retain the second coverage standard for the generated redundant test cases. This reduced scale of test suite and ensured its error detection capability, as shown in Fig. 3. It can be seen from Fig. 3, first, select the basic test case from the test suite to add to the reduced set , and to remove the test requirements met by these test cases from . If the test case can meet the test requirement that has been removed, it will be added to the redundant test cases to be retained at . The second step is to select the redundant test cases from using the second coverage criterion and add them to . The test case from that meets the highest number of test requirements from the second coverage criterion is selected and added to . Then delete the test conditions met by . Until the remaining test suite at do not meet the test requirements for the new second coverage criterion, empty . Finally go to the next step and use the greedy algorithm [25][26] on the test case set . If the first sub-criterion is not fully covered, find the test case that can meet the highest number of remaining test conditions at and add it to , while removing the test conditions from the first coverage criterion covered by . If all the requirements met by an * are removed, it is added to . Enter the second step if the second coverage criterion is not fully covered [27][28]. Conversely, the last step continues until the test requirements in the first coverage criterion are completely covered.

Analysis of experimental results of coupling metric-based test case simplification in software error location
The experiments start with the implementation of a program slicing and coupling metric tool, with the program slicing part starting with the analysis of the source program lexicography and syntax. The experiments use the open source compiler generation tool ANTLR, which is able to automatically generate a compiler based on a given grammar. The grammar of the auxiliary code segment was then embedded in ANTLR to construct a lexical and syntactic analyser for the text. In this experiment 700 test cases satisfying branch coverage were constructed and the satisfaction relationship between each test case and the score coverage was maintained during the construction process. This group of test cases is called test case library, which has a high level of redundancy. A group of six test cases is then constructed from the test case library bms0, bms0 − 0.1, bms0 − 0.2, bms0 − 0.3, bms0 − 0.4, bms0 − 0.5. During construction, the test cases from the library are then selected from and added to . If not all of the scores covered by are met, then select test cases from the library that meet other conditions and added to until all requirements are met. This experiment compared the coupling metric-based test case set reduction algorithm with four other reduction algorithms, as shown in Fig. 4.   Fig. 4. Experimental process of reduction algorithm comparison As shown in Fig. 4, fractional coverage was used as the first coverage criterion. And use each of the five reduction algorithms to reduce each group of test cases MINbr + cp The GE algorithm can reduce the test suite, RSR was reduced using the add redundant test cases algorithm, and the second coverage criterion was unstated coverage RUR. The algorithm for adding redundant test cases was used, the second coverage criterion was all-useful coverage, and the ATAC tool was used. RTR the second coverage criterion was not analysed for critical points in the program.
RORA coupling metric based test case set simplification algorithm was able to simplify the algorithm for adding test cases, and the second coverage criterion was not used to analyse the coupling paths obtained by the algorithms with higher coupling in this study. Experiments were conducted on the definition of error points in the program, and various methods were used to simplify to obtain the test suite and to record the number of test cases and the number of tests contained in . The raw data for each test case set is shown in Fig. 5 Fig. 6. Scale of test suite and number of error detection after reduction by five methods As shown in Fig. 6, the ROR simplification algorithm used in this study had more test case sets in all six test case sets than the other four simplification methods, with the number ranging from 15.02 to 23.27, and the average number in the six test sets was 19.8. The average number of test suite in the six test case sets for the MINbr+cp, (Rank-sum ratio) RSR, (Rational Unified Process) RUR, and (Real Time Operating System) RTS simplification algorithms were 18.98, 19.31, 19.405, and 19.43, respectively. Similarly, the studied parsimony algorithm has more error checks in all six test case sets than the other four parsimony methods and has the highest average number of error checks in the six test sets at 13.64. The average number of error checks in the six test case sets for the MINbr+cp, RSR, RUR, and RTS parsimony algorithms is 12.52. The average error detection values for the six test case sets of MIN, RSR, RUR, and RTS were 12.52, 12.69, 12.90, and 12.95, respectively. The reduction rate of test suite after the five methods are reduced is shown in Fig. 7.   Fig. 7. Reduction rate of test case set after reduction by five methods As shown in Fig. 7, among the five simplification methods, MINbr+cp has the highest test case set simplification rate in each test case set, up to 92.98. Its average simplification rate across the six test case sets is about 85.225. The average of the six test case set simplification rates for the RSR, RUR, RTS, and ROR simplification algorithms are 72. 16, 71.98, 71.93, and 71.185, respectively. As the redundancy in the positive initial use case set increases with the scale of test suite, the simplification algorithm increases its simplification rate. Since the simplification does not use the method of adding redundant test cases, the reduced redundancy is the smallest and the corresponding simplification rate is the highest. As shown in Table 1, it is the result of the error detection loss rate of the test suite reduced by five methods. .008 %, respectively. This indicates that the approximate simplification of the studied RORs was the most effective in detecting error loss. Based on the ROR reduction method in software error location, the number of errors detected after reduction is statistically analyzed for individual test case set samples extracted in the experiment. The error range of the algorithm is further observed and studied from the whole data. The relationship between the error detection rate of the ROR reduction algorithm and the number of reduced test cases is shown in Fig. 8.
From Fig. 8, when the number of test cases set is 9.98 after reduction by the research method, the error detection rate of the research model in software error location can reach 92.3 %. With the increase of test case set, the detection rate of research model decreases accordingly. Moreover, when the test case set is less than 9.98, the error detection rate of the research model also decreases. Thus, in order to achieve the best error detection effect of the research model, the number of test cases should be within the range of 8.97-13.26. As Fig. 9 shows the individual use case error detection capability of the test case sets after the five methods were approximately simplified. As can be seen in Fig. 9, the ROR reduction method proposed in the study has a higher error detection capability for individual use cases than the other four methods for the reduced test set, and has a mean capability of 90.63 % across the six test sets, which is on average 4.97 %-3.94 % higher than the error detection capability of the other methods. Fig. 10 shows the results of the overall efficiency comparison of the five methods of simplifying case test suite.
As can be seen from Fig. 10, the combined simplification efficiency of the six test case sets of the ROR simplification method studied ranged from 88.25 % to 94.46 %, with an average simplification efficiency of 91.05 %. Its simplification efficiency is better than other methods. which are all less than 90 %. The results show that the scale of the reduced use case set of the ROR reduction algorithm is higher than the other methods. The false detection rate of this method is smaller, and the reduction capability and efficiency are higher than the other four methods. It also provides a better balance between the size and error detection rate of the reduced test suite. For any test case set, the study method adds test cases with error detection capability, and the number of error detection cases after simplification is large, and the error detection capability of a single test case set is high. When the size of the test case set is small, the ROR introduces a coupling metric, so its individual use case error detection capability is higher. This shows the validity and superiority of the study of coupling metric based test case set simplification.

Conclusions
Software test cases are one of the most critical aspects of software testing, which are often redundant, yet the approximate test case set has an impact on its error detection rate. This study proposes the idea of redundant test cases in software error location and introduces a coupling metric for test suite reduction to balance the relationship between the reduced test suite and the false detection rate of the test case set. The experiments have proved that the size and number of false detection of the reduced test suite using the research method ROR are larger than those of the other four reduction methods. Since the reduction method uses the addition of redundant test cases, the reduced redundancy of the research method is larger and the corresponding reduction rate is the smallest among the four methods. However, its test suite error detection loss rate is also the smallest, with an average of 17.96 % across the six test case sets. And the study method has the highest error detection capability for individual test cases in the reduced test set, with a mean value of 90.63 % across the six test sets. The combined reduction efficiency of the method in the six test sets ranged from 88.25 % to 94.46 %, with an average reduction efficiency of 91.05 %, which is higher than other methods. The proposed test case set reduction based on coupling metric in software error location is effective and efficient, and has a good balance between the size and false detection rate of the reduced test suite. This study focuses on the analysis of the innovative and improved method of software coupling measurement in software error location, and verifies the effectiveness and superiority of the improved method through experiments. Secondly, the important research and application point of this paper is to introduce the coupling measure based on program slicing into the reduction of test case set. This method can achieve a better balance between the size of test case set and the error detection rate after reduction, so as to improve the efficiency of software testing in product development. Although this study has achieved good results, there is still room for improvement. Due to the limitation of the length of the paper, only C++ programming language is considered in this study, and the analysis of other languages is missing. In future studies, the differences between algorithms and frameworks applied in other programming languages will be further discussed, and a method to eliminate the differences will be designed.