Forecasting epidemic diseases with Arabic Twitter data and WHO reports using machine learning techniques

Qanita Bani Baker, Farah Shatnawi, Saif Rawashdeh


Twitter is one of the essential social media tools used by many people because they express their views, daily problems, and what they suffer from the health aspects. On Twitter, we can detect and track the spread of the most serious diseases like flu; by analyzing people's tweets and collecting reports from health organizations. In this paper, the data from Twitter was collected in the Arabic language related to the spread of influenza using many Arabic keywords. Then, we applied several machine learning algorithms, which are random forest, multinomial naïve bayes, decision tree, and voting classifier. We also found the correlation between the collected tweets and the reports collected from the World Health Organization (WHO) website according to three experiments. These experiments are: i) between the tweets and reports based on the 13 countries regardless of the time, ii) between the tweets and reports based on the Arab regions that depend on these countries' dialects irrespective of the time, iii) between all tweets and all reports based on the week number. The results from these experiments show that there is a strong correlation between the tweets and the reports, which means that the tweets and the WHO reports can together detect the flu outbreaks in the Arab world.


Arabic language; Epidemic; Machine learning; Twitter; WHO reports

Full Text:




  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Bulletin of EEI Stats