Chapter 5 Pattern discovery

This part of the analysis is based on unsupervised machine learning algorithm and makes use of association rules to discover patterns in terrorist incidents from Islamic State, Taliban and Boko Haram group that were identified in top ten most active and violent groups.

Mining of association rules is a widely used method in retail and eCommerce environment and commonly known as Market Basket Analysis using Apriori algorithm. The logic behind this approach is that if a customer buys a certain group of products then they are more or less likely to buy another group of products (Karthiyayini & Balasubramanian, 2016).

Pseudocode of the Apriori algorithm: (minimal version¹¹)

\[ \begin{aligned} & \mathrm{Apriori}(T,\epsilon)\\ &\qquad L_1 \gets \{ \mathrm{large~1-item sets} \} \\ &\qquad k \gets 2\\ &\qquad \mathrm{\textbf{while}}~ L_{k-1} \neq \ \emptyset \\ &\qquad \qquad C_k \gets \{ a \cup \{b\} \mid a \in L_{k-1} \land b \not \in a \} - \{ c \mid \{ s \mid s \subseteq c \land |s| = k-1 \} \nsubseteq L_{k-1} \}\\ &\qquad \qquad \mathrm{\textbf{for}~transactions}~t \in T\\ &\qquad \qquad\qquad D_t \gets \{ c \mid c \in C_k \land c \subseteq t \} \\ &\qquad \qquad\qquad \mathrm{\textbf{for}~candidates}~c \in D_t\\ &\qquad \qquad\qquad\qquad \mathit{count}[c] \gets \mathit{count}[c]+1\\ &\qquad \qquad L_k \gets \{ c \mid c \in C_k \land ~ \mathit{count}[c] \geq \epsilon \}\\ &\qquad \qquad k \gets k+1\\ &\qquad \mathrm{\textbf{return}}~\bigcup_k L_k \end{aligned} \]

As the goal of this algorithm is to determine the set of frequent items among the candidates, this methodology can also be applied to discover patterns within the terrorism context. The idea is to understand attack habits from terrorist groups by finding association and correlation between different attacks that were carried out in the past. It’s important to note that output from this algorithm is a list of association rules (frequent patterns) and provides descriptive analysis only. The real value of such unsupervised learning is in the insights we can take away from the algorithm’s finding.

5.1 Data preparation

For this analysis, I have chosen specific variables that are not highly correlated with chosen groups i.e. target type, weapon type, attack type, suicide attack and a number of fatalities while excluding the observations where the value is “Unknown”.

tmp <- dfh %>%
  select(group_name, target_type, weapon_type, attack_type, suicide_attack, nkill) %>%
  filter(target_type != "Unknown" & target_type != "Other" & 
         weapon_type != "Unknown" & attack_type != "Unknown") %>%
  mutate(nkill = if_else(nkill == 0, "0",
                 if_else(nkill >= 1 & nkill <= 5, "1 to 5",
                 if_else(nkill > 5 & nkill <= 10, "6 to 10",
                 if_else(nkill > 10 & nkill <= 50, "11 to 50",  "more than 50")))))

#shorten lengthy names for visualization purpose
tmp$weapon_type[
  tmp$weapon_type == "Explosives/Bombs/Dynamite"] <- "Explosives"
tmp$attack_type[
  tmp$attack_type == "Facility/Infrastructure Attack"] <- "Facility/Infra."
tmp$target_type[
  tmp$target_type == "Private Citizens & Property"] <- "Civilians"
tmp$target_type[
  tmp$target_type == "Terrorists/Non-State Militia"] <- "Non-State Militia"
tmp$target_type[
  tmp$target_type == "Religious Figures/Institutions"] <- "Religious Figures"

#convert everything to factor
tmp[] <- lapply(tmp, factor)
str(tmp)

'data.frame':   18006 obs. of  6 variables:
 $ group_name    : Factor w/ 10 levels "Al-Nusrah","Al-Shabaab",..: 8 8 8 8 8 8 8 8 8 8 ...
 $ target_type   : Factor w/ 19 levels "Airports & Aircraft",..: 10 10 2 3 3 3 3 6 3 3 ...
 $ weapon_type   : Factor w/ 8 levels "Chemical","Explosives",..: 2 3 3 3 3 3 3 3 3 3 ...
 $ attack_type   : Factor w/ 8 levels "Armed Assault",..: 3 3 4 3 3 1 1 2 1 1 ...
 $ suicide_attack: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ nkill         : Factor w/ 5 levels "0","1 to 5","11 to 50",..: 2 2 1 3 4 3 2 2 1 4 ...

5.2 Explanation of key terms

The Apriori algorithm has three main measures namely support, confidence and lift. These three measures are used to decide the relative strength of the rules. In the model parameters, we set RHS to the chosen group and LHS refers to a frequent pattern that is observed.

Support indicates how interesting a pattern is. In the algorithm configuration (params), I have set the threshold to 0.001 which means a pattern must have appeared at least 0.001 * nrow(tmp) = 18 times.

Confidence value i.e 0.5 (set as a threshold in model params) means that in order to be included in the results, the rule has to be correct at least 50 percent of the time. This is particularly helpful to eliminate the unreliable rules.

Lift indicates probability (support) of the itemset (pattern) over the product of the probabilities of all items in the itemset (Hahsler et al., 2018).

In general, high confidence and good lift are the standard measures to evaluate the importance of a particular rule/ association however not all the rules are useful. This rules normally fall into three categories i.e. actionable, trivial(useless) and inexplicable (Klimberg & McCullough, 2017). Example of the useless rule can be an association that is obvious and thus not worth mentioning.

5.3 Islamic State (ISIL)

5.3.1 Apriori model summary

# set params
params <- list(support = 0.001, confidence = 0.5, minlen = 2)
group_ISIL <- list(rhs='group_name=ISIL', default="lhs")

# apriori model
rules <- apriori(data = tmp, parameter= params, appearance = group_ISIL)

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5   0.001      2
 maxlen target   ext
     10  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 18 

set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[52 item(s), 18006 transaction(s)] done [0.00s].
sorting and recoding items ... [48 item(s)] done [0.00s].
creating transaction tree ... done [0.01s].
checking subsets of size 1 2 3 4 5 6 done [0.00s].
writing ... [51 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

In the model summary, we can see that the Absolute minimum support count is 18 which means the pattern needs to appear at least 18 times in order to be included. We have set this threshold with support value as explained previously. Out of all the patterns, the model is able to find 51 association rules for the ISIL group. We further remove the rules that may be redundant before starting our analysis.

5.3.2 Top 5 patterns (ISIL)

rules <- rules[!is.redundant(rules)] # Remove redundant rules if any 
# Extract top 5 patterns based on confidence
subrules <- head(sort(rules, by="confidence"), 5)

    lhs                               rhs                support confidence  lift count
[1] {weapon_type=Chemical,                                                             
     attack_type=Bombing/Explosion} : {group_name=ISIL} 0.001055     0.9048 4.869    19
[2] {target_type=Non-State Militia,                                                    
     attack_type=Bombing/Explosion,                                                    
     nkill=6 to 10}                 : {group_name=ISIL} 0.001055     0.7308 3.933    19
[3] {target_type=Non-State Militia,                                                    
     attack_type=Bombing/Explosion,                                                    
     suicide_attack=1}              : {group_name=ISIL} 0.003443     0.6526 3.512    62
[4] {target_type=Military,                                                             
     suicide_attack=1,                                                                 
     nkill=11 to 50}                : {group_name=ISIL} 0.007997     0.6457 3.475   144
[5] {target_type=Non-State Militia,                                                    
     suicide_attack=1}              : {group_name=ISIL} 0.003499     0.6238 3.357    63

From the top five patterns based on confidence, we can see that the use of chemical weapon turns out to be the most frequent pattern with relatively high lift value. It is also interesting to see that attacks on other terrorists (non state militia) are observed in 3 out of top 5 patterns.

Figure 5.1: Association rules in ISIL group

The plot shown above represents all the discovered patterns (after removing redundant rules). We can see that majority of discovered rules are between 0.5 to 0.7 confidence while two rules with high support and both indicating an attack on the military with a suicide attack.

5.3.3 Network graph (ISIL)

The network graph shown below summarizes how things are related and interconnected with each other and describes the habits of the ISIL group.

Figure 5.2: Network graph of discovered patterns- ISIL group

5.4 Taliban

5.4.1 Apriori model summary

#---------------------------------------
#Apriori model on Taliban group
#---------------------------------------
params <- list(support = 0.001, confidence = 0.5, minlen = 2)
group_Taliban <- list(rhs='group_name=Taliban', default="lhs")
rules <- apriori(data = tmp, 
                 parameter= params, 
                 appearance = group_Taliban)

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5   0.001      2
 maxlen target   ext
     10  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 18 

set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[52 item(s), 18006 transaction(s)] done [0.02s].
sorting and recoding items ... [48 item(s)] done [0.00s].
creating transaction tree ... done [0.02s].
checking subsets of size 1 2 3 4 5 6 done [0.01s].
writing ... [139 rule(s)] done [0.00s].
creating S4 object  ... done [0.02s].

From the model summary, we can see that the algorithm is able to identify 139 rules within the set threshold as defined in model parameters. However, it is possible that many rules may be redundant so we eliminate those rules.

5.4.2 Top 5 patterns (Taliban)

#---------------------------------------
#Remove redundant rules if any
#---------------------------------------
rules <- rules[!is.redundant(rules)]

# Extract top 5 patterns based on confidence
subrules <- head(sort(rules, by="confidence"), 5)

    lhs                             rhs                   support confidence  lift count
[1] {weapon_type=Chemical,                                                              
     attack_type=Unarmed Assault} : {group_name=Taliban} 0.001222     0.8800 2.945    22
[2] {target_type=Police,                                                                
     weapon_type=Firearms,                                                              
     attack_type=Armed Assault,                                                         
     nkill=11 to 50}              : {group_name=Taliban} 0.004998     0.8257 2.763    90
[3] {target_type=Police,                                                                
     weapon_type=Firearms,                                                              
     nkill=6 to 10}               : {group_name=Taliban} 0.010163     0.8243 2.759   183
[4] {target_type=Police,                                                                
     weapon_type=Incendiary,                                                            
     attack_type=Facility/Infra.,                                                       
     nkill=0}                     : {group_name=Taliban} 0.001999     0.8000 2.677    36
[5] {target_type=Police,                                                                
     weapon_type=Firearms,                                                              
     nkill=11 to 50}              : {group_name=Taliban} 0.005665     0.7969 2.667   102

From the top five patterns above, we can see that the use of chemical weapon indicates the highest confidence and lift value. This was also the case in the ISIL group. It is also observed that police is the most common target in the incidents involving the use of firearms and resulting fatalities between 11 to 50.

Figure 5.3: Association Rules in Taliban group

From the plot above, we can identify many interesting patterns with confidence above 0.55 with high support such as attacks on NGO and government officials however most patterns indicate an attack on police only. Let us have a detailed look at all the patterns with network graph.

5.4.3 Network graph (Taliban)

Figure 5.4: Network graph of discovered patterns- Taliban group

5.5 Boko Haram

5.5.1 Apriori model summary

params <- list(support = 0.001, confidence = 0.5, minlen = 2)
group_Boko_Haram <- list(rhs='group_name=Boko Haram', default="lhs")
rules <- apriori(data = tmp, parameter= params, appearance = group_Boko_Haram)

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5   0.001      2
 maxlen target   ext
     10  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 18 

set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[52 item(s), 18006 transaction(s)] done [0.01s].
sorting and recoding items ... [48 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.02s].
writing ... [63 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

5.5.2 Top 5 patterns (Boko Haram)

rules <- rules[!is.redundant(rules)] # Remove redundant rules if any 
# Extract top 5 patterns based on confidence
subrules <- head(sort(rules, by="confidence"), 5)

    lhs                           rhs                      support confidence  lift count
[1] {target_type=Civilians,                                                              
     weapon_type=Explosives,                                                             
     suicide_attack=0,                                                                   
     nkill=more than 50}        : {group_name=Boko Haram} 0.001111     0.8000 7.728    20
[2] {target_type=Civilians,                                                              
     weapon_type=Explosives,                                                             
     attack_type=Armed Assault,                                                          
     nkill=11 to 50}            : {group_name=Boko Haram} 0.001111     0.7692 7.431    20
[3] {target_type=Civilians,                                                              
     attack_type=Armed Assault,                                                          
     nkill=more than 50}        : {group_name=Boko Haram} 0.001555     0.7568 7.310    28
[4] {target_type=Civilians,                                                              
     weapon_type=Explosives,                                                             
     attack_type=Armed Assault,                                                          
     nkill=6 to 10}             : {group_name=Boko Haram} 0.001388     0.7353 7.103    25
[5] {target_type=Civilians,                                                              
     weapon_type=Incendiary,                                                             
     attack_type=Armed Assault} : {group_name=Boko Haram} 0.001055     0.6786 6.555    19

In the case of Boko Haram, we can see quite different patterns in comparison to ISIL and Taliban group. All of the top five patterns, as shown above, indicates attacks on civilians. Specifically, incidents involving armed assault and use of explosives with resulting fatalities more than 50 are significant patterns. This also illustrates the differences in ideology between groups.

Figure 5.5: Association Rules in Boko Haram group

From the plot above, we can see many patterns with high support and lift value with confidence between 0.55 and 0.65. Four patterns with high support value (on the right-hand side of the plot) corresponds to attack on civilians using firearms as a weapon type, armed assault as an attack type resulting fatalities between 6 to 10 and 11 to 50. Religious figures and Telecommunication as a target is also visible within confidence value of 0.55 to 0.65 and lift value ~ 6.

In total, 27 rules are identified after removing redundant rules. Let’s have a closer look at all the 27 rules with network graph to visualize the characteristics and habits of the Boko Haram group.

5.5.3 Network graph (Boko Haram)

Figure 5.6: Network graph of discovered patterns- Boko Haram group

To summarize this chapter, we identified the most frequent patterns for ISIL, Taliban and Boko Haram group which indicates distinct nature/ habits among this groups. While use of chemical weapon in both ISIL and Taliban group turns out to be most frequent pattern, we also discovered other interesting and significant patterns such as ISIL being more likely to attack other terrorists (non-state militia) with bombing/explosion while having resulting fatalities between 6 to 10, Boko Haram having tendency to target civilians with explosives, without suicide attack and resulting fatalities more than 50, and Taliban having frequent target on police with explosives concentrating on resulting fatalities between 11 to 50.

https://en.wikipedia.org/wiki/Apriori_algorithm ↩