Chapter 8 Discussion and Conclusion
This thesis research set out to gain a better understanding about threat level and characteristics of most lethal terrorist groups that are currently active and responsible for the sudden increase in violent terrorism around the world in present years. The outcome of this study provides descriptive and predictive analysis and serves as an actionable intelligence toward counterterrorism support.
We analyzed the real-world dataset of global terrorist incidents and during exploratory data analysis, we found a nearly same trend in the increased number of attacks from the year 2010 in the Middle East & North Africa, South Asia, Sub-Saharan Africa and Southeast Asia region. Ensuring the implications in the present context, we determined the top 10 most active and violent terrorist groups and examined their characteristics based on past incidents. We found that most of these groups (6 out of 10) were formed after 2006 only. Upon analyzing their attack tactics, we found that bombing/explosions account the most in terms of attack type as well as the significantly increased use of explosives. For example, more than 2300 incidents each year (from 2014 to 2016) involved the use of explosives. Up to a certain extent, this indicates an easy access to sophisticated weapons, explosive devices, and DIY material online. The double-edged sword of information age further fuels the upward trend in use of explosives. For example, ISIL which is one of the deadliest groups in our top ten list makes use of social media such as Twitter and YouTube to spread the ideology through propaganda videos and materials. Although algorithms can detect and remove such materials from the web but the burning question we ask is how fast? An easy access to such materials on the web, specifically involving tutorial on making bombs and DIY kits makes terrorism the most preferred means of waging war in present years. It is needless to mention that increased radicalized attacks around the world are an indication that terrorism transitioning from a place to an idea.
Within Impact Analysis chapter, we examined the threat level from these groups and identified major and minor epicenters geographically based on a number of attacks, and corresponding cumulative fatalities and injuries. Based on findings, we conclude that 7 out of 10 groups (i.e. ISIL and Taliban) are operating decentralized and have their activity/ spread across many cities with the varying threat level. This strategy makes them difficult to chase in terms of combat, however, remaining three groups (Al-Nusrah Front, Houthi Extremists and Donetsk People’s Republic) have major epicenters based on threat level in just a few cities. To understand the political intentions behind attacks from all 10 groups, we analyzed the pattern by targets and found that 46.7% attacks were targeted at military and police followed by 27.3% attacks on civilians. To investigate similarity and differences between each of these groups with respect to fatalities, we performed statistical analysis. Results from our experiment suggest non-significantly different means in Boko Haram - Al Nusrah, Al Qaida in Arabian Peninsula (AQAP)- Al Shabaab, Houthi Extremist- PKK etc. Similarly, pairs of ISIL with all the remaining groups, Taliban - Al Nusrah, PKK - Boko Haram etc. suggests significantly different means with respect to fatalities.
One of the key findings from this research is pattern discovery within the individual group to describe how things are related and interconnected with each other. Using Apriori algorithm, an unsupervised machine learning technique, this research discovers 20 frequent patterns (association rules) in ISIL group, 61 patterns in Taliban group and 27 patterns in Boko Haram group with confidence value greater than 0.5. The confidence of 0.5 means the rule is correct at least 50% of the time. Results from our experiment suggest that use of a chemical weapon (with unarmed assault or with bombings/explosion) from ISIL (0.9 confidence) and Taliban (0.88 confidence) has maximum likelihood among all the discovered patterns. Some other interesting patterns we find is that ISIL is more likely to attack other terrorists/ informants for terrorist groups (non-state militia) with bombing/explosion while having resulting fatalities between 6 to 10 whereas Boko Haram is more likely to attack civilians with explosives, without suicide attack and while having resulting fatalities more than 50 in a single incident. In case of Taliban, we find that police is the likely target with an incident involving the use of firearms and resulting fatalities between 11 to 50.
This research also contributes positively to existing literature in terrorism research within supervised machine learning context. Previous research in time-series forecasting is limited to country and year level resolution. In this research, we have extended the previous study with seasonality components and have achieved resolution at a monthly frequency. Using Auto Arima, Neural Network, TBATS and ETS model, we have forecasted a number of attacks in Afghanistan and SAHEL region, and number of fatalities in Iraq. We have evaluated and compared the performance of each model on hold out set using several metrics before making an actual forecast. Our findings suggest that the model that works best in one time-series data may not be the best in another time-series data. We also illustrated the importance of using ensemble method and evaluated predicted vs actual values using Theil’s U statistic. Our experiment on three different time-series data using an ensemble approach shows significant improvement in forecasting accuracy when compared to best single models.
Similarly, in the classification task, previous research lacks the use of algorithms that are recently developed and that (practically) out perform traditional algorithms such as random forests, logistic regression or J48. We have extended the previous research in binary classification context involving severe class imbalance and have made use of a cutting-edge LightGBM algorithm to predict the class probability of an attack involving a suicide attempt. We have also proposed an alternate strategy for model evaluation and have described the reasons why standard validation techniques such as cross-validation would be a bad choice for this data. Using the explainer object, we have also investigated the decision-making process for each prediction from our trained model. Our model achieves 96% accuracy in terms of AUC metric and 86.5% accuracy in terms of specificity by correctly classifying 582 out of 673 instances of actual suicide attacks in Afghanistan.
8.1 Research limitations and future work
This research uses the most recently published (June 2017 release) data of the Global Terrorism Database which includes incidents up to the year 2016 only. Future work in this direction can be carried out depending on availability of new data. Within the pattern discovery part, this research is focused on the top ten groups. Possible future work could be to discover patterns by geographical location (i.e. city/ state) or by years to add more contexts in pattern discovery. Within time-series forecasting part, possible future work can be carried out by adding some other diverse models and using different techniques within ensemble approach such as weighted average to evaluate improvement in accuracy.
Although machine learning works the best on structured data however recent developments in deep learning framework for tabular data is drawing a lot of attention nowadays. Possible Future work to investigate threat level and characteristics of most violent terrorist groups and corresponding forecasting can be carried out using embeddings for the categorical variables approach in deep learning.