Chapter 4 Statistical Hypothesis Testing

In this chapter, first, we examine the strength of the relationship between two numerical variables using Pearson correlation coefficient. This way, we can get an idea of which variables have strong/weak and positive/negative correlation with each other. In the second part, we perform a hypothesis test between each of the top ten groups and the number of fatalities to see which groups represent similarity and differences. We use the data related to the top ten most active and violent groups only.

4.1 Data preparation

dfh <- df %>%   
  filter(group_name %in% top10_groups) %>%  # filter data by top 10 groups
  replace_na(list(nkill = 0, nwound = 0))   # replace NAs

# Shorten lengthy group names
dfh$group_name[dfh$group_name == "Kurdistan Workers' Party (PKK)"] <- "PKK"
dfh$group_name[dfh$group_name == "Al-Qaida in the Arabian Peninsula (AQAP)"] <- "AQAP"
dfh$group_name[dfh$group_name == "Houthi extremists (Ansar Allah)"] <- "Houthi_Extrm"
dfh$group_name[dfh$group_name == "Tehrik-i-Taliban Pakistan (TTP)"] <- "TTP"
dfh$group_name[dfh$group_name == "Al-Nusrah Front"] <- "Al-Nusrah"
dfh$group_name[dfh$group_name == "Islamic State of Iraq and the Levant (ISIL)"] <-"ISIL"
dfh$group_name[dfh$group_name == "Donetsk People's Republic"] <- "Donetsk_PR"

4.2 Correlation test

We use pairwise complete observations method to compute correlation coefficients for each pair of numerical variables.

#Extract numeric variables
tmp <- dfh %>%
  select(intl_ideological_attack, intl_logistical_attack, 
         part_of_multiple_attacks, n_peace_keepers, net_migration, 
         refugee_asylum, refugee_origin, gdp_per_capita, arms_import, 
         arms_export, conflict_index, population, extended, 
         nwound, nkill, suicide_attack, attack_success) 

# get the correlation matrix
m <- cor(tmp, use="pairwise.complete.obs")
# Get rid of all non significant correlations
ctest <- PairApply(tmp, symmetric=TRUE,
                   function(x, y) cor.test(x, y)$p.value)
m[ctest > 0.05] <- NA   # Replace p value > 0.05 with NAs
PlotWeb(m, lwd = abs(m[lower.tri(m)] * 10), 
        main="Correlation Web Plot", 
        cex.lab = 0.85, pt.bg = "#f2f2f2",
        args.legend = list(x = "bottomright", cex = 0.75, bty = "0", 
                           title = "Correlation"))

Figure 4.1: Correlation web plot

In the plot above, line width between the nodes is used in proportion to the correlation of two variables. To focus only on significant correlations, I have replaced observations with p-value more than 0.05 with NA. Legend on the bottom right represents correlation coefficient by line width and color depending on positive or negative linear relationship. The variables on the left-hand side of the plot are extracted from World Bank data (development indicators) and variables on the right-hand side are from GTD.

Specifically, we are more interested in the relationship to the variables on the right-hand side which will be used in time-series forecasting and classification modeling as the target variable. For example, a number of people wounded (nwound) variable has a positive linear relationship with a suicide attack. The conflict index variable shows a strong positive relationship with international ideological attacks and minor positive relationship with a part of multiple attacks. Overall, we can see that the majority of numerical variables shows a relationship with each other.

4.3 Hypothesis test: fatalities vs groups

The objective behind this hypothesis test is to determine whether or not means of the top 10 groups with respect to average fatalities are same. If at least one sample mean is different to others then we determine which pair of groups are different.

\[ \large \begin{aligned} {H_0 : } & \text{ The means of the different groups are the same} \\ &{(ISIL)} = {(Taliban)} = {(AQAP)} = {(PKK)} = \\ &{(Al-Shabaab)} = {(TTP)} = {(Boko Haram)} = \\ &{(Al-Nusrah)} = {(Donetsk_PR)} = {(Houthi_Extrm)} \\ \\ H_a: & \text{ At least one sample mean is not equal to the others} \end{aligned} \]

First, we use a box plot to examine distribution by quartiles for each group.

Figure 4.2: Boxplot: group vs fatalities

In statistical terms, we have some extreme outliers i.e. nkill ~ 1500 in ISIL group so X axis is log transformed for visualization purpose.

4.3.1 ANOVA test

The ANOVA model computes the residual variance and the variance between sample means in order to calculate the F-statistic. This is the first step to determine whether or not means are different in a pair of groups.

\[ \large \begin{aligned} F-statistic = & (S^2_{between}\ / S^2_{within}) \end{aligned} \]

#------------------------------------------
# Compute the analysis of variance (ANOVA)
#------------------------------------------
r.aov <- aov(nkill ~ group_name , data = dfh)

# display result
summary(r.aov)

               Df  Sum Sq Mean Sq F value              Pr(>F)    
group_name      9  111070   12341    40.7 <0.0000000000000002 ***
Residuals   21770 6597154     303                                
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model summary provides us F value and Pr(>F) corresponding to the p-value of the test. As we can see that the p-value is < 0.05, which means there are significant differences between the groups. In other words, we reject the null hypothesis. From this test, we identified that some of the group means are different however we don’t know which pair of groups have different means.

4.3.2 PostHoc test

PostHoc test is useful to determine where the differences occurred between groups. For this test, we use several different methods for the comparison purpose. This method can be classified as either conservative or liberal approach. Conservative methods are considered to be robust against committing Type I error as they use more stringent criterion for statistical significance. First, we run the PostHoc test by comparing results (p-value) from The Fisher LSD (Least Significant Different), Scheffe and Dunn’s (Bonferroni) test.

#------------------------------------------
# compare p-values for 3 methods
#------------------------------------------
posthoc1 <- as.data.frame(
  cbind(
    lsd= PostHocTest(
      r.aov, method="lsd")$group_name[,"pval"],     # The Fisher LSD
    scheffe= PostHocTest(
      r.aov, method="scheffe")$group_name[,"pval"], # Scheffe
    bonf=PostHocTest(
      r.aov, method="bonf")$group_name[,"pval"])    # Bonferroni
  ) 
posthoc1 <- rownames_to_column(posthoc1, var = "Pair of groups") %>% 
  arrange(desc(scheffe))

Table 4.1: Posthoc test (lsd, scheffe, bonf)

Pair of groups	lsd	scheffe	bonf
Donetsk_PR-Al-Shabaab	0.9191	1.0000	1.0000
Houthi_Extrm-Al-Shabaab	0.7934	1.0000	1.0000
Houthi_Extrm-Donetsk_PR	0.7797	1.0000	1.0000
Taliban-AQAP	0.6811	1.0000	1.0000
PKK-Donetsk_PR	0.5800	1.0000	1.0000
Houthi_Extrm-AQAP	0.4850	1.0000	1.0000
Donetsk_PR-AQAP	0.3615	0.9997	1.0000
PKK-Houthi_Extrm	0.3152	0.9994	1.0000
PKK-Al-Shabaab	0.3021	0.9993	1.0000
AQAP-Al-Shabaab	0.2561	0.9984	1.0000
Taliban-Houthi_Extrm	0.1928	0.9954	1.0000
TTP-AQAP	0.1508	0.9904	1.0000
Taliban-Donetsk_PR	0.1476	0.9898	1.0000
TTP-Taliban	0.1253	0.9846	1.0000
Boko Haram-Al-Nusrah	0.0851	0.9656	1.0000
PKK-AQAP	0.0610	0.9406	1.0000
TTP-Houthi_Extrm	0.0324	0.8694	1.0000
TTP-Donetsk_PR	0.0278	0.8481	1.0000
Taliban-Al-Shabaab	0.0135	0.7301	0.6094
TTP-Al-Shabaab	0.0024	0.4187	0.1088
ISIL-Al-Nusrah	0.0008	0.2574	0.0354
Taliban-PKK	0.0005	0.2071	0.0226
ISIL-Boko Haram	0.0002	0.1338	0.0097

Pair of groups	lsd	scheffe	bonf
TTP-PKK	0.0002	0.1172	0.0076
TTP-ISIL	0.0000	0.0072	0.0001
TTP-Al-Nusrah	0.0000	0.0006	0.0000
ISIL-AQAP	0.0000	0.0000	0.0000
ISIL-Donetsk_PR	0.0000	0.0000	0.0000
AQAP-Al-Nusrah	0.0000	0.0000	0.0000
Donetsk_PR-Al-Nusrah	0.0000	0.0000	0.0000
Houthi_Extrm-Al-Nusrah	0.0000	0.0000	0.0000
Taliban-Al-Nusrah	0.0000	0.0000	0.0000
ISIL-Houthi_Extrm	0.0000	0.0000	0.0000
TTP-Boko Haram	0.0000	0.0000	0.0000
Al-Shabaab-Al-Nusrah	0.0000	0.0000	0.0000
PKK-Al-Nusrah	0.0000	0.0000	0.0000
Donetsk_PR-Boko Haram	0.0000	0.0000	0.0000
Boko Haram-AQAP	0.0000	0.0000	0.0000
Houthi_Extrm-Boko Haram	0.0000	0.0000	0.0000
Taliban-ISIL	0.0000	0.0000	0.0000
ISIL-Al-Shabaab	0.0000	0.0000	0.0000
PKK-ISIL	0.0000	0.0000	0.0000
Taliban-Boko Haram	0.0000	0.0000	0.0000
Boko Haram-Al-Shabaab	0.0000	0.0000	0.0000
PKK-Boko Haram	0.0000	0.0000	0.0000

The Fisher LSD (Least Significant Different) test is the most liberal in all the PostHoc tests whereas the Scheffe test is the most conservative and protects against Type I error. On the other hand, Dunn’s (Bonferroni) test is extremely conservative (Andri Signorell et mult. al., 2018). Out of all the possible combination of pairs (45), 16 pair of groups indicates p adj value > 0.9 based on the Scheffe test. In statistical terms, it means 16 pairs of groups as shown in the table above have non-significantly different means in a number of fatalities.

Next, we use Tukey HSD (Honestly Significant Difference) method which is the most common and preferred method.

#---------------------------------------
# PostHoc Test with Tukey HSD method
#---------------------------------------
#extract only p-values by setting conf.level to NA
hsd <- PostHocTest(r.aov, method = "hsd", conf.level=NA)
# convert to data frame and round off to 3 digits
hsd <- as.data.frame(do.call(rbind, hsd)) %>% round(3)

Table 4.2: PostHoc test with Tukey HSD for pair of groups
	Al-Nusrah	Al-Shabaab	AQAP	Boko Haram	Donetsk_PR	Houthi_Extrm	ISIL	PKK	Taliban
Al-Shabaab	0.000	NA	NA	NA	NA	NA	NA	NA	NA
AQAP	0.000	0.981	NA	NA	NA	NA	NA	NA	NA
Boko Haram	0.783	0.000	0.000	NA	NA	NA	NA	NA	NA
Donetsk_PR	0.000	1.000	0.996	0.000	NA	NA	NA	NA	NA
Houthi_Extrm	0.000	1.000	1.000	0.000	1.000	NA	NA	NA	NA
ISIL	0.027	0.000	0.000	0.008	0.000	0.000	NA	NA	NA
PKK	0.000	0.990	0.687	0.000	1.000	0.992	0	NA	NA
Taliban	0.000	0.285	1.000	0.000	0.912	0.953	0	0.018	NA
TTP	0.000	0.073	0.916	0.000	0.457	0.499	0	0.007	0.879

4.3.3 Interpretation

The pairs of groups with adj p-value near or equals to 1 represents non-significantly different means in a number of fatalities such as Boko Haram - Al-Nusrah, Al-Qaida in Arabian Peninsula (AQAP)- Al-Shabaab, Houthi Extremist- PKK, Taliban- Tehrik-i-Taliban etc.

Similarly, a pair of groups with adjusted p-value near zero indicates significantly different means in a number of fatalities such as pairs of ISIL with all the remaining groups, Taliban - Al-Nusrah, PKK - Boko Haram, Donetsk_PR - Al-Nusrah etc.