Skip to content
#

datapreprocessing

Here are 87 public repositories matching this topic...

This project is an Intrusion Detection System (IDS) using machine learning (ML) and deep learning (DL) to detect network intrusions. It leverages the CICIDS2018 dataset to classify traffic as normal or malicious. Key features include data preprocessing, model training, hyperparameter tuning, and Docker containerization for scalable deployment.

  • Updated Nov 19, 2025
  • Python

ScrapySub is a Python library designed to recursively scrape website content, including subpages. It fetches the visible text from web pages and stores it in a structured format for easy access and analysis. This library is particularly useful for NLP and AI developers who need to gather large amounts of web content for their projects.

  • Updated Jul 14, 2024
  • Python

This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.

  • Updated Aug 17, 2024
  • Python

A Streamlit web app utilizing Python, scikit-learn, and pandas for used car price prediction. Features data preprocessing (scaling, encoding), Random Forest model optimization with GridSearchCV, and interactive user input handling. Achieves high accuracy (R² score: 0.9028), showcasing skills in machine learning, data engineering, and deployment.

  • Updated Nov 27, 2024
  • Python

Improve this page

Add a description, image, and links to the datapreprocessing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the datapreprocessing topic, visit your repo's landing page and select "manage topics."

Learn more