Canonical Correlation Analysis of Global Climate Elements and Rainfall in the West Java Regions

Doi: 10.24042/djm.v3i2.5870 Indonesia has a diversity of climate influenced by several global phenomena such as El Nino Southern Oscillation (ENSO), Indian Ocean Dipole (IOD), and Asian-Australian Monsoon. Continuously climate changing indirectly causes a hydrometeorological disaster. The purpose of this study was to analyze the relationship between global climate elements (ENSO, IOD, Asian-Australian Monsoon) with rainfall in the West Java regions (Bogor Regency, Bandung Regency, Sukabumi Regency, Garut Regency, and Kuningan Regency) simultaneously. The selection of the five regions was based on the natural disaster reports of Badan Nasional Penanggulangan Bencana (BNPB). The research method used was a quantitative research method through one of multivariate analysis technique called canonical correlation analysis. The results of this study indicate that there was a simultaneous relationship between global climate elements, with rainfall in the West Java regions by 0.819. The global climate element and rainfall in the West Java regions that most influenced the relationship were Asian-Austalian Monsoon and Kuningan Regency rainfall.


INTRODUCTION
Indonesia is a country with strategic geographical conditions that causes Indonesia to have a diversity of weather and climate. Climate diversity in Indonesia is influenced by several global phenomena such as El Nino Sourthen Oscillation (ENSO) on the Pacific Ocean, Indian Ocean Dipole (IOD) on the Indian Ocean, and the Monsoon Asia-Australia (Ridwan, 2019).
Climate changes because of interactions between elements of climate, including interactions between global phenomena mentioned earlier. Climate change that occurs continuously indirectly causes hydrometeorological disasters. According to Aldrian et al. (2011), hydrometeorological disasters consist of landslides, floods, droughts, hurricanes, tidal waves, abrasion, and forest and land fires. Based on the natural disaster reports of Badan Nasional Penanggulangan Bencana (BNPB), Bogor Regency, Bandung Regency, Sukabumi Regency, Garut Regency, and Kuningan District are five regions prone to hydro-meteorological disasters in West Java over the past 10 years.
Canonical correlation analysis is a multivariate analysis technique that aims to measure the relationship between a set of independent variables with a set of dependent variables simultaneously. It was first introduced by Hotelling (1936). Hotelling measured the correlation between arithmetic speed and arithmetic power to reading speed and reading power.
Canonical correlation analysis has been developed by several researchers in various cases. Chaghooshi et al. (2015) applied canonical correlation analysis to analyze the relationship between supply chain quality management and the competitive advantages of Sahami Alyaf (SA) Company, Iran. Irianingsih et al. (2016) analyzed the relationship between learning behavior and learning achievement in students of SMPN 1 Sukasari Purwakarta with canonical correlation analysis. Rustiana et al. (2017) conducted a study on the prediction of rainfall in the Cimanuk watershed with canonical correlation analysis. However, based on these studies, no one has tested the canonical correlation analysis assumptions completely, i.e., linearity, multivariate normality, homoscedasticity, and there is no multicollinearity between variables in a set of variables. Therefore, the authors are interested in studying canonical correlation analysis by including all the prerequisite assumption tests of canonical correlation analysis in explaining the relationship between global climate elements, i.e., ENSO, IOD, and the Monsoon Asia-Australia with rainfall in the West Java regions, i.e., Bogor Regency, Bandung Regency, Sukabumi Regency, Garut Regency, and Kuningan Regency.

METHOD
This research uses quantitative research methods through one multivariate analysis technique called canonical correlation analysis. The analysis is used to measure the linear relationship between a set of independent variables with a set of independent variables; canonical variables are formed for each set. A canonical variable is a linear combination of a set of variables. Canonical correlation analysis constructs a canonical function that maximizes the canonical correlation coefficient between two canonical variables (Hair, et al., 2009).
The analysis begins with testing the prerequisite assumptions in the canonical correlation analysis of each variable. If there are assumptions that are not met, then the variable transformation is needed. If all assumptions are met, then proceed to determining the canonical functions and coefficient estimators. After the canonical correlation coefficient is obtained, the next step is to test the significance of the canonical correlation. If there is a significant canonical correlation coefficient, then it is followed by a redundancy analysis. Then, the final step is the interpretation of canonical variables.

The prerequisite assumptions of canonical correlation analysis
Following are the prerequisite assumptions that must be fulfilled in the canonical correlation analysis. a.
Linearity, a linear relationship (linearity) between the independent variable and the dependent variable. Linearity affects two aspects of canonical correlation results. First, the canonical correlation coefficient of a pair of canonical variables is based on a linear relationship. Second, canonical correlation analysis maximizes linear relationships between two sets of variables (Hair, et al., 2009). Suppose there are a number of independent variables 1 , 2 , … , and a number of dependent variables 1 , 2 , … , . Linearity assumption testing is done by ANOVA or ANOVA lack of fit test of Homoscedasticity. This assumption is said to be important in canonical correlation analysis because it is the opposite of heteroscedasticity which can reduce intervariable correlations (Hair, et al., 2009). Homoscedasticity can be known through Glejser testing. d.
There is no multicollinearity between variables in a set of variables. Gozhali (in Yudiaatmaja, 2013), states that multicollinearity occurs when two or more variables correlate very strongly. A very strong correlation is meant when ≥ 0.9.

Canonical functions and coefficient estimators determination
Suppose there are a number of independent variables 1 , 2 , … , denoted as random vector and a number of dependent variables 1 , 2 , … , denoted as random vector .
The characteristics of the random vectors and are as follows.
( ) = ( ) = ( ) = ( ) = ( , ) = = ′ Canonical correlation is obtained by measuring the linear relationship between the linear combination of and the linear combination of . To determine the linear combination, the two variable sets can be arranged into = 1 1 + 2 2 + ⋯ + = ′ = + + ⋯ + = ′ so ( ) = ′ ( ) = ′ ( ) = ′ ( ) = ′ ( , ) = ′ ( , ) = ′ The number of linear combination pairs formed by and is defined as = ( , ). Canonical correlation is obtained by Johnson and Wichern (1998) state that the ℎ pair of canonical variables, is the pair of linear combinations and having unit variances, which maximize the correlation among all choices uncorrelated with the previous − 1 canonical variable pairs. Therefore, using the Lagrange Multiplier method, the coefficient vectors and that maximize the correlation between and can be obtained by determining the eigenvectors of the matrix − − and − − . The root of the eigenvalues corresponding to the two matrices is the correlation coefficient between and (Qiu, et al., 2016).
The characteristics of the canonical variables are as follows. a.

The significance test of canonical correlation
There are two hypotheses tested in canonical correlation analysis, namely: 1.
Overall Canonical Correlation Test H 0 : 1 = 2 = ⋯ = = 0 (All canonical correlations are not significant) H 1 : ≠ 0 ( = 1,2, … , , there is at least one significant canonical correlation) Statistics test: ∶ the number of the observations Decision criteria: Reject H 0 if > 2 with degrees of freedom at level of significant.

Redundancy Analysis
Redundancy is a value that calculates the proportion of total variance that can be explained by the canonical variables of the dependent variable and the independent variable.

•
The redundancy explained by is defined as The redundancy explained by is defined as Rencher, 1998).
• The redundancy index explained by is defined as The redundancy index explained by is defined as (Hair et al., 2009).
The redundancy between and is measured by squaring the canonical correlation coefficient and is called the redundancy coefficient (Rencher, 1998) Hair, et al. (2009) explained that there are three methods for interpreting canonical variables, namely: 1.

Interpretation of canonical variables
Canonical Weights Canonical weights are canonical coefficients and that multiplied by the standard deviation of the corresponding variables so that they become standard. Canonical weights are interpreted as the contribution of origin variables to canonical variables (Rencher, 1998).

2.
Canonical Loadings Canonical loadings is referred to as canonical structure correlation or simple linear correlation between the original variables and each of its canonical variables. The canonical load of variable is obtained by the following formula. = (9) where is the correlation matrix of the vector variables and is the standardized canonical coefficient . The canonical load of variable is obtained by the following formula. = (10) where is the correlation matrix of the variable vector and is the standardized canonical coefficient (Rencher, 1998).

Canonical Cross-Loadings
The canonical cross-loadings can be calculated from the correlation between the origin variable and the canonical variable which is incompatible with the origin variable. The canonical cross-loadings of variable is obtained by the following formula.
= (11) The canonical cross-loadings of variable is obtained by the following formula.

Data
Data used in this study are rainfall index of five West Java regions (Bogor Regency, Bandung Regency, Sukabumi Regency, Garut Regency, and Kuningan District), Nino3.4 index, DMI (Dipole Mode Index), and AUSMI (Australian Monsoon Index) from January 2010 -December 2018 with amount 108 data.
The rainfall index data is obtained from calculations on the Global Satellite Mapping of Precipitation (GSMap) rainfall data using the equation defined by Mulyana in Ningsih and Putranto (2019) as follows.

Rainfall index = − ̅
where is the ith monthly rainfall data; ̅ and are respectively the average and standard deviation of the rainfall at a certain time period. Nino3.4 index data was obtained from the National Oceanic and Atmospheric Administration (NOAA) website. DMI data were obtained from the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) website. AUSMI data were obtained from Lembaga Penerbangan dan Antariksa Nasional (LAPAN) Bandung.

RESULTS AND DISCUSSION
The prerequisite assumption test results of correlation analysis for data a.
Linearity Test Results Linearity testing of variables 1 and 2 on each variable ( = 1, 2, 3, 4, 5) is done by ANOVA lack of fit test because there is repeated data on variables 1 dan 2 . Based on the linearity test procedure assisted with SPSS 23, the results of the linearity test can be seen in Table 1,  Table 2, and Table 3.   For the ANOVA test in Table 3, it shows the linear effect test and obtained > 0.01(1,106) = 6.88 , it means H0 is rejected, so it can be concluded that there is a linear effect between the variables 3 with each variable ( = 1, 2, 3, 4, 5). Based on Figure 1 and Figure 2, the plot between the Mahalanobis distance and the quintile of the chisquare distribution in the set of dependent variables and inde-pendent variables formed resembles a straight line. It means that the data in the set of dependent variable and the data in the set of independent variables are multivariate normal distributed. c.

b. Multivariate Normality Test Results
Heteroscedasticity Test Results Homoscedasticity assumption testing is done through the Glejser test, which is regressing the independent variable with the absolute value of the residual dependent variable. Out of five dependent variables and three independent variables, the Glejser test results obtained prove the existence of heteroscedasticity at 2 and 5 because of the value > . These results can be seen in Table 4 and Table 5.  To overcome the heteroscedasticity, the authors carry out the neglog transformation that is defined as where is the ith data observation of variable for = 1, 2, … , (Whittaker, et al., 2005) on all variables. After the transformation is done, the linearity, multivariate normality, and homoscedasticity tests are repeated on new variables. The results of the three tests indicate that all three assumptions were satisfied. d.

Multicollinearity Test Results
Based on the results of calculations with SPSS 23, the correlation between variables 1 and 3 , 2 and 4 , 2 and 5 , with 3 and 4 are more than 0.9. It means there are multicollinearity. To overcome the multicollinearity in the dependent variable group, the authors chose to eliminate the variables 2 dan 3 . Testing the assumption of linearity, multivariate normal distribution, homoscedasticity, and multicollinearity is done again and the results obtained indicate that all four assumptions are satisfied.

The results of canonical functions and coefficient estimators determination
Following are the results of the calculation of canonical correlation with SPSS 23.  (17) is not significant. Thus, further analysis is done only on function (15). Based on Table 7, through calculations with equations (6) and (7), the proportion of variance of the set of dependent variables (rainfall in West Java regions) that can be explained by the set of independent variables (global climate elements) is 0.3857 or 38.57 % and the proportion of variance of the set of independent variables (global climate elements) that can be explained by the set of independent variables (rainfall in West Java regions) is 0.19 or 19%. Based on Table 8, the redundancy coefficient obtained is 0.670761, it means that the canonical correlation of function (15) can explain the relationship between the and of 67.08%.

The results of canonical variables interpretation of data
Based on the previous canonical correlation significance test, the results obtained indicate that the significant correlation is only the canonical correlation of function (15). Therefore, interpretation of canonical variables is only done on functions (15).

1.
Canonical Weights  In the canonical variable 1 , the order of relative contributions from the largest to the smallest of the original variables is 3 (AUSMI), 2 (DMI), and 1 (Nino3.4 index). In the canonical variable 1 , the order of relative contributions from the largest to the smallest of the original variables is 5 (Kuningan Regency rainfall index), 1 (Bogor Regency rainfall index), dan 4 (Garut Regency rainfall index).

2.
Canonical Loadings  Based on Table 11 and Table 12, the canonical variable in function (15) becomes 1 = −0.125 1 − 0.183 2 + 0.994 3 1 = 0.662 1 + 0.833 4 + 0.987 5 Canonical loadings states the correlation of the origin variable with its canonical variable. In the canonical variable 1 , the origin variable that has the strongest relationship is 3 (AUSMI). In the canonical variable 1 , the origin variable that has the strongest relationship is 5 (Kuningan Regency rainfall index).  Based on Table 13 and Table 14, the canonical variable in function (15) becomes 1 = −0.102 1 − 0.15 2 + 0.814 3 1 = 0.542 1 + 0.682 4 + 0.808 5 Canonical cross-loadings states the correlation of the origin variable in a canonical variable with its other canonical variables. In the canonical variable 1 , the origin variable that has the strongest relationship with the canonical variable 1 is 3 (AUSMI) and in the canonical variable 1 , the origin variable that has the strongest relationship with the canonical variable 1 is 5 (Kuningan Regency rainfall index). It is because the variables 3 dan 5 have the greatest coefficient values.

CONCLUSIONS AND SUGGESTIONS
Based on the results and discussion in this study, the conclusions obtained are by using Canonical Analysis, it can be explained that there is a simultaneous relationship between the global climate elements, i.e., ENSO, IOD, and the Monsoon