LogSentinel-ELK is a Big Data project designed to process, analyze, and detect anomalies in massive web server logs (>1.3 GB). By leveraging the power of ELK Stack (Elasticsearch, Logstash, Kibana) for data engineering and Unsupervised Machine Learning (Isolation Forest) for security analysis, this system can identify potential cyber threats such as DDoS attacks, brute force attempts, and data exfiltration.
The pipeline consists of three main stages:
- Ingestion & ETL: Parsing raw TSV logs using Logstash (Grok/CSV filters) and normalizing timestamps.
- Storage & Visualization: Indexing 13M+ records in Elasticsearch and visualizing traffic patterns in Kibana.
- Advanced Analysis: Extracting features using Python and applying Isolation Forest to detect outliers.
[Image of ELK Stack Architecture] (You can upload your architecture diagram here)
- Infrastructure: Docker & Docker Compose
- ETL Pipeline: Logstash 7.17
- Database: Elasticsearch 7.17 (Single Node Cluster)
- Visualization: Kibana 7.17
- Machine Learning: Python (Pandas, Scikit-Learn, Matplotlib)
- Dataset: NASA HTTP Server Log (Augmented to 1.3 GB / ~13 Million Hits)
- Docker Desktop (Engine running)
- Python 3.x
- Git
git clone [https://github.com/khalifaalhasan/LogSentinel-ELK.git](https://github.com/khalifaalhasan/LogSentinel-ELK.git)
cd LogSentinel-ELK