The discriminant statistic based on MPE-MWPE relationship and non-uniform embedding

. The slope fitting line between MPE (multi-scale permutation entropy) and MWPE (multi-scale weighted permutation entropy) is recently proposed as a discriminant statistic for testing the nonlinearity of a time series. The main objective of this paper is to demonstrate that the selection of the optimal parameters of the non-uniform embedding is essential for the proposed discriminant statistic. In particular, the presented case studies indicate that the modified discriminant statistic based on non-uniform embedding can detect differences between such time series which remain indistinguishable if the original approach is used.


Introduction
Permutation entropy is a robust time series tool which provides a quantification measure of the complexity of a dynamic system by capturing the order relations between values of a time series and extracting a probability distribution of the ordinal patterns [1]. Weighted-permutation entropy (WPE) considers the amplitude information which is ignored in the symbolization approach used in PE [3]. PE and WPE can measure the complexity of a time series in a single scale. Multi-scale permutation entropy (MPE) and multi-scale weighted permutation entropy (MWPE) are introduced in [18] and do help to describe the complexity of a time series in multiple scales. It is shown in [2] that a linear correlation between MPE and MWPE exists in multiple scales. The slope of the linear regression between MPE and MWPE is used as a discriminant statistic to detect nonlinearity of financial time series through surrogate data analysis in [2].
The main objective of this paper is to employ case studies to demonstrate that the selection of optimal embedding parameters is an essential step which must be taken before this discriminant statistic (the slope of the linear regression between MPE and MWPE) can be used to characterize the complexity of a time series.
The paper is organized as follows. Short definitions of PE, WPE, MPE, WMPE, the linear regression between MPE and WMPE are given in Section 2. The selection of the optimal embedding parameters is discussed in Section 3. The comparison between the discriminant statistic and the modified discriminant statistic based on non-uniform embedding is presented in Section 4. Computational simulations with synthetic data are discussed in Section 5. Discussions on the novelty and the implications of the results are presented in Section 6. Concluding remarks are given in Section 7.

Preliminaries
This section gives a concise description of algorithms and techniques used in [2].

Permutation entropy and weighted permutation entropy
For a given discrete time series , non-uniform embedding yields the trajectory matrix comprised of following row vectors [15,11]: where is the dimension of the delay coordinate space; is the vector of delays; = , , … , ; = 1,2, … , ; = − + ⋯ + . Note that the trajectory matrix , is constructed using overlapping observation windows (index runs successively throughout the original time series).
Permutation entropy (PE) quantifies the statistics of ordinal permutations [1]. For example, the ordinal pattern of a sequence 8,3,5 is = 3,1,2 because ≤ ≤ . PE of the trajectory matrix Eq. (1) is computed by the following algorithm. First, each row vector , is mapped into an ordinal pattern , = , , ∈ 1,2, … , ! ; where denotes the mapping function. The relative frequency of each ordinal pattern (the probability to observe a particular ordinal pattern in the trajectory matrix) reads: where the membership function σ is equal to one when the logical statement is true, and equal to 0 otherwise; Π is the complete space of ordinal patterns. Finally, PE of the trajectory matrix reads: Analogously, the weighted frequency of each ordinal pattern reads: where is the variance of each row vector in the trajectory matrix: Finally, weighted permutation entropy (WPE) of the trajectory matrix reads:

Multi-scale permutation entropy
Multi-scale analysis is added to the calculation of PE and WPE in [2]. Time series are analyzed at different scales by constructing consecutive coarse-grained time series by averaging a successively increasing number of data points in non-overlapping windows: where represents the scale factor and 1 ≤ ≤ int( / ). For a fixed , PE and MPE are calculated for resulting into the multi-scale permutation entropy (MPE) and the multi-scale weighted permutation entropy (WMPE). Note that MPE and WMPE are closely related to the Shannon information index [19].

Linear regression between MPE and WMPE
At fixed and , the computation of MPE and WMPE is performed at = 1,2, … ,20. Then, the slope of the fitting line (produced by the linear regression algorithm) between MPE and WMPE is proposed as a new discriminant statistic in [2].

The selection of optimal embedding parameters
Chen and Shang [2] follow the recommendations from Bandt and Pompe [1] and use = 3,4, … ,7. However, all time delays are fixed to 1 "for the practical purposes" (as stated in [2]).
It has been reported in a number of studies that non-uniform embedding helps to better reveal the properties of nonlinear time series compared to uniform embedding when all time delays are equal [15,10,9]. Moreover, the dynamical properties of the reconstructed attractor are best revealed when the dimension of the delay coordinate space is optimal [14].
The role of the optimal time delay can be illustrated by the following two-dimensional example. It is well known that the selection of a particular time delay does not influence the geometric shape of the reconstructed attractor if the embedded time series is completely random [14,11]. However, the situation is completely different if the embedded time series does represent a deterministic process (even if the signal to noise ratio is low). Fig. 1 depicts three different time series (a noise-free harmonic function, a numerical solution to the chaotic Rossler system [12], and a random signal) embedded at = 1 and at the optimal time delay (determined by the algorithm presented in [17]). It can be clearly seen that the geometric shape of the reconstructed attractor in the delay coordinate space does depend on the time delay if only the embedded time series is not a random time series.
The purpose of this comment is to demonstrate that the selection of optimal embedding parameters is an essential step which must be performed before computing the slope of the fitting line between MPE and WMPE.

Optimal non-uniform embeddings of two real-world time series
Let us consider two time series nv515.dat and qbirths.dat (Fig. 2); both are standard time series available at The Time Series Data Library [5]. Time series qbirths.dat does contain the number of daily births in Quebec (Canada) in the period between January 01, 1977 and December 31, 1990. Time series nv515.dat contains normalized tree-ring widths in dimensionless units (each tree-ring corresponds to one year vegetation period of the tree). Data are collected at Indian Pilo Garden in Nevada (USA) during the year 1980.
First of all, we determine the optimal dimension of the delay coordinate space for each individual time series. The classical FNN algorithm [6] yields the optimal dimension = 5 for both time series. Next, optimal time delays are individually determined for both time series by using the combinatorial optimization algorithm presented in [17]. This optimization algorithm is based on the maximization of the average area of the embedded attractor in all possible planar projections [17]. It appears that optimal time delays are: = 14, 14, 33, 12 for time series nv515.dat and = 2, 1, 2, 2 for time series qbirths.dat.  reconstructed attractors in Fig. 3 and Fig. 4 reveals that the projections of the attractor are more compressed along the hypotenuse in Fig. 3 (especially in parts A to D where time delays are equal to 1). The dynamical features of the reconstructed attractor are best revealed when the set of optimal time delays is used for its reconstruction [17] (Fig. 4). The notations on the axes correspond to the notation used in Fig. 2

Considerations about the optimal embedding dimension
As mentioned previously, the FNN algorithm suggests that the optimal embedding dimension for time series nv515.dat and qbirths.dat is = 5. However, the embedding dimension should be determined taking into account also the length of the time series [1]. The length of the time series should be much larger than ! in order for all possible permutations to have a chance to appear [1]. As a rule of thumb, the condition 5 ! is widely adopted in the literature as a convention [13].
The lengths of time series nv515.dat and qbirths.dat are 4351 and 5113 correspondingly. Therefore, the condition 5 ! is conveniently satisfied for both time series. However, this becomes a serious issue when coarse-grained time series are considered ( ( / ) 5 !). Acceptable and unacceptable embeddings for both time series are depicted in Table 1 and Table  2. Note that optimal time delays are determined separately for each embedding dimension .

The comparison between the discriminant statistic and the modified discriminant statistic
The algorithm presented in [2] is executed for both time series (nv515.dat and qbirths.dat) at = 5 and = 1,1,1,1 . Relationships between MPE and MPWE are depicted in Fig. 5. The discriminant statistic (linear regression coefficient) between MPE and MPWE for time series nv515.dat at = 1,1,1,1 is 0.6665 (Fig. 5(a)); linear regression coefficient between MPE and MPWE for time series qbirths.dat at = 1, 1, 1, 1 is 0.6639 (Fig. 5(b)). As noted previously, such computations are unreliable because the condition ( / ) 5 ! does not hold true for all s (gray circles denote such MPE -MPWE points where the condition is broken).  6. The discriminant statistic presented in [2] cannot determine a large difference between time series nv515.dat and qbirths.dat at = 5. The modified discriminant statistic (based on optimal embedding parameters) shows a large difference between time series nv515.dat and qbirths.dat at = 4. Linear regressions between MPE and MPWE for nv515.dat and qbirths.dat at = 1,1,1,1 , = 5 are shown in parts A and B respectively. Linear regressions between MPE and MPWE for nv515.dat (at = 46,45,50 ) and for qbirths.dat (at = 1,2,2 ) at = 4 are shown in parts C and D respectively.

Computational simulations with synthetic data
Computational simulations are continued with the chaotic Rössler system which is a paradigmatic model of chaotic dynamics [12]: where constants , , are set to = 0.1, = 0.1, = 14. The Rössler time series is generated by integrating the system of equations in Eq. (9) and by selecting every tenth value of (the time step is set to 0.01). Different realizations of the discrete Gaussian random noise with zero mean are added to the synthetic Rössler time series. Optimal embedding dimensions and optimal time lags for the Rössler time series with 10 %, 50 % and 200 % noise levels are shown in Table 3 [16]. Linear regressions between MPE and MPWE for the Rössler time series with 10 %, 50 % and 200 % noise levels (all time lags are set to 1) are shown in Fig. 6(a, c, e). Linear regressions at non-uniform embeddings with optimal time lags are shown in Fig. 6(b, d, f). The discriminant statistic (reconstructed by using uniform embedding) and the modified discriminant statistic (reconstructed by using non-uniform embedding) are shown in Table 3.
Two important conclusions can be done from the results depicted in Table 6. Firstly, the modified discriminant statistic is able to determine larger differences between different time series. Secondly, the difference between and becomes smaller as the noise level becomes larger. This fact does correspond well to the fact that any time delays are equally good (in terms of the area occupied by the embedded attractor in the delay coordinate space) if the embedded time series is the white noise [17].
It is also important to evaluate the uncertainty of the modified discriminant statistic . As discussed previously, the real-world time series nv515.dat and qbirths.dat are relatively short (Fig. 5). However, the chaotic Rössler time series is generated using computational algorithms without any restrictions in its length. Therefore, the slope coefficients (the modified discriminant statistic ) is computed ten consecutive times in ten non-overlapping observation windows. The confidence interval for using the three-sigma rule ( = 0.05) is computed according to the results of repetitive computational experiments.
The confidence intervals of for the Rössler  to observe that the discriminant statistic stays outside the confidence intervals of ( Table 3). This is a rather natural and expected results because the chaotic Rössler time series is a stationary ergodic time series [11]. Of course, the situation would be different if the analyzed time series would be non-stationary time series (the value of would depend on the location of the observation window).  Fig. 6 parts A, C, and E. Linear regressions for the Rössler time series with 10 %, 50 % and 200 % noise levels (optimal non-uniform time lags are set according to Table 1) are shown in Fig. 6 parts B, D, and F

Discussions
The main objective of this paper is to demonstrate that the selection of optimal time delay(s) is an essential step in the application of discriminant statistic based on the relation between MPE and MWPE.
The objective of time delay selection methods is making components of the reconstructed vectors independent as far as possible -yet not too far, in order to keep the information about dynamic properties of the embedded time series. Moreover, optimal time lags do also result into the maximal area (in average) in all possible planar projections of the embedded attractor. Setting all time delays to 1 results into higher correlations between the components of the reconstructed vectors -what has a direct impact to the computation of MPE and MWPE. Convincing demonstrations of these effects are given in Figs. 3 and 4.
The optimization of time delays can be ignored if the time series is random, or the signal to noise ratio is very low. As mentioned previously, the selection of particular time delays is meaningless if the embedded time series does represent a random noise. Otherwise, the geometric shape of embedded attractors (and the patterns of components of the reconstructed vectors) do depend on time delays.
For example, the shapes of the planar attractors in Fig. 3 are more compressed along the diagonal if compared to Fig. 4 (this effect is especially clearly seen in parts A to D). In other words, the components of the reconstructed vectors in the trajectory matrix are more correlated in Fig. 3 than in Fig. 4.
This effect is clearly represented in the discriminant statistic based on the relation between MPE and MWPE (Fig. 5). The discriminant statistic does not show a big difference between time series nv515.dat and qbirths.dat when all time delays are set to 1. The percentage difference between the two discriminant statistic values is 0.39 % (0.6665 versus 0.6639). However, the discriminant statistic shows a much higher percentage difference when time delays are optimal: 14.64 % (0.5539 versus 0.4728).
Spearman's rank correlation coefficient is used to assess how well the relationship between MPE and MPWE can be described using a linear function. Note that a linear fit between the two variables is better for time series nv515.dat than qbirths.dat (Fig. 5). Short optimal time delays for time series qbirths.dat 1, 2, 2 show that independent components of the trajectory matrix cannot be separated far away without destroying the information about dynamic properties of the embedded time series. Of course, a thorough investigation of the search space of time delays would be required to identify how strong is the global maximum 1, 2, 2 in respect to other local maximums. That would help to determine how different is this time series from the random noise [17]. Nevertheless, the fact that Spearman's rank correlation coefficient for births.dat is lower than for nv515.dat is not astonishing. On the other hand, the comparison of Spearman's rank correlation coefficient for Fig. 5(b) and Fig. 5(d) is meaningless because computational experiments depicted in Fig. 5(b) are based on unacceptable embeddings.  Optimal non-uniform time delays can be also useful for estimating basic ordinal quantifiers (PE, WPE, MPE, WMPE) -not only the proposed discriminant statistic based on the relation between MPE and WMPE. However, optimal non-uniform time delays are also useful for the extraction of much more complex ordinal features (two-dimensional patterns of PE could be a typical application discussed in detail in [16,7]. In other words, feature extraction algorithms based on ordinal quantifiers can be conditionally classified into the following multiscale structure according to their complexity. The basic ordinal quantifiers (PE, WPE, MPE, WMPE) could be considered as the basic elements of such a multiscale feature extraction structure. The discriminant statistic (relating MPE and MWPE) can be placed in the middle of this multiscale structure. Finally, two-dimensional patterns of PE would be the most complex elements in this structure of ordinal quantifiers. Fig. 9. Two-dimensional patterns of PE for qbirths.dat time series ( = 5; = 2, 1, 2, 2). Pairwise relaxation of time delays [A] results into six plane images: 2,1,i,j (part A); 2,i,2,j (part B); 2,i,j,2 (part C); i,1,2,j (part D); i,1,j,2 (part E); i,j,2,2 (part F) ); , = 1...50. The average of all six parts is shown in part G Two-dimensional patterns of PE for nv515.dat and qbirths.dat time series are depicted in Fig. 7(g) and Fig. 8(g) accordingly (the resolution of those patterns is set to 50×50 pixels). As mentioned previously, the optimal embedding dimension for both time series is = 5 (the set of optimal time delays for nv515.dat is = 14, 14, 33, 12; the set of optimal time delays for qbirths.dat is = 2, 1, 2, 2). Pairwise relaxation of time delays [16] results into six plane images for each time series: 14,14,i,j ( Fig. 7(a)); 14,i,33,j ( Fig. 7(b)); 14,i,j,12 (Fig. 7(c)); i,14,33,j ( Fig. 7(d)); i,14,j,12 ( Fig. 7(e)); i,j,33,12 ( Fig. 7(f)); 2,1,i,j ( Fig. 8(a)); 2,i,2,j ( Fig. 8(b)); 2,i,j,2 ( Fig. 8(c)); i,1,2,j ( Fig. 8(d)); i,1,j,2 ( Fig. 8(e)); i,j,2,2 ( Fig. 8(f)); , = 1,...,50. It is shown in [7] that deep learning based convolutional neural networks can classify the averaged patterns of PE ( Fig. 7(g) and Fig. 8(g)) with an extremely high accuracy. Indeed, a naked eye can observe clear differences between Fig. 7(g) and Fig. 8(g).
The main objective of this article is to highlight the fact that non-uniform time delays play a central role not only at the top level (in terms of complexity) of ordinal quantifiers. A proper assessment of optimal time delays is crucial also in the middle level of this structure (the discriminant statistic). The latter observation is even more important because the original discriminant statistic presented in [2] does not pay any attention to non-uniform embeddings.

Conclusions
Some problems regarding the use of non-overlapping windows for coarse graining, for the original MSE algorithm, are addressed in [4]. For example, the embedding problems related to the length of the time series could be almost completely eliminated if the coarse-grained time series would be constructed using overlapping windows. However, such a construction of a coarse-grained time series would result into a simple moving average -what would induce another problems discussed in [8]. Therefore, the construction of coarse-grained time series in this paper is performed using non-overlapping windows only.
The discriminant statistic presented in [2] is an interesting statistical parameter which can be used to characterize time series. However, this discriminant statistic is a result of a statistical algorithm. Therefore, it must be used with care (that statement holds true for any statistical algorithm in general). In particular, one should determine the optimal embedding dimension and the optimal set of time delays before running this algorithm for any time series.
In any case, it is always recommended to couple feature extraction algorithms based on ordinal quantifiers with the optimal parameters of non-uniform embeddings (the optimal embedding dimension and the optimal set of non-uniform time delays).