Simple tool to split COCO annotations into train/test datasets.
-
Updated
Aug 15, 2023 - Python
Simple tool to split COCO annotations into train/test datasets.
Wind Power Forecasting Based on Hybrid CEEMDAN-EWT Deep Learning Method
Utilizes a Convolutional-based Transformer architecture for accurate and efficient PV power forecasting.
Cereja is a bundle of useful functions we don't want to rewrite and .. just pure fun!
Monotonic Optimal Binning algorithm is a statistical approach to transform continuous variables into optimal and monotonic categorical variables.
In this project we try to predict home credit default risk for clients. We try to predict, if the client will have payment difficulties or not.
Automating the process of Data Preprocessing for Data Science
This repository focuses on two machine learning projects in the healthcare domain.
This project is an Intrusion Detection System (IDS) using machine learning (ML) and deep learning (DL) to detect network intrusions. It leverages the CICIDS2018 dataset to classify traffic as normal or malicious. Key features include data preprocessing, model training, hyperparameter tuning, and Docker containerization for scalable deployment.
Python编写的处理法务邮单自动批量生成的脚本小工具-提取判决书内容免去手输填充邮单-Legal agency postal receipt automatically generate app
ScrapySub is a Python library designed to recursively scrape website content, including subpages. It fetches the visible text from web pages and stores it in a structured format for easy access and analysis. This library is particularly useful for NLP and AI developers who need to gather large amounts of web content for their projects.
Resume Screening using Machine Learning and Python
Building this project to generate MCQ Questions from any type of text and generate answers and distractors for it.
LGBM and logistic regression for prediction of customers' second time transaction for an online market app.
Python version of my machine learning framework that provides data preprocessing, feature selection, classification, regression and even more complex deep learning models, model persistence, autoencoders and anomaly detection
This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.
Algorithm designed to match strings by similarity
UltraClean is a fast and efficient Python library for cleaning and preprocessing text data for AI/ML tasks and data processing.
A Streamlit web app utilizing Python, scikit-learn, and pandas for used car price prediction. Features data preprocessing (scaling, encoding), Random Forest model optimization with GridSearchCV, and interactive user input handling. Achieves high accuracy (R² score: 0.9028), showcasing skills in machine learning, data engineering, and deployment.
Teaching computers to understand sign language! This project uses image processing to recognize hand signs, making technology more inclusive and accessible.
Add a description, image, and links to the datapreprocessing topic page so that developers can more easily learn about it.
To associate your repository with the datapreprocessing topic, visit your repo's landing page and select "manage topics."