You need to analyze textual log data from an online chat forum related to the
Anonymous hacktivist group. You will learn how to apply regular expressions, summarize log data,
quantify text data, and summarize time trends.
IRC is an early protocol for instant messaging developed in the early years of the Internet. The
openness and ability to remain anonymous has made IRC a popular channel for hacker networks to
collaborate and share ideas.
The data comes from https://www.azsecure-data.org/internet-relay-chat.html. It contains two years
of chats between hackers associated with the hacktivist group Anonymous. In these logs they share
information about malware, setting up servers to deploy attacks, and other information related to
The collection and analysis of these chats is a form of cyber-threat intelligence. The analysis of these
chats and other dark web data sources enable proactive defense against attacks.
1. Many users log in and view the chat without commenting. Which users spent the most time
in the logs? (3pts) Which users logged in the most (2pts)
2. Find the most common words (3 pts)
3. Count the total number of written messages (only those with actual text content) (2 pts).
Summarize the users that posted the most messages (2pts)
4. Find and rank (by count) words not in an English dictionary (3 pts). This is a simple method
that can identify some names of malware tools
5. Which hours of the day had the most messages (2pts)? Which days had the most traffic (or
6. Find and list the URLs posted in the chat. (2pts)