LinkedIn Sentiment Analysis

Project Overview

This project conducts sentiment analysis on public opinions about OpenAI shared on LinkedIn. Using natural language processing (NLP) techniques, the system analyses posts and comments to categorise sentiment as positive, neutral, or negative. The project was developed as part of an engineering thesis with a focus on educational purposes.

Features

Data collection from LinkedIn using Apify scraper
Robust text preprocessing to handle URLs, slang, multilingual content
Manual labelling functionality for creating training datasets
Transfer Learning with RoBERTa model for sentiment classification
Cross-validation to evaluate model performance
Data visualisation with histograms and word clouds

Technical Architecture

The project follows a complete machine learning pipeline:

Data Collection: Scraping LinkedIn posts and comments containing "openai"
Preprocessing: Text cleaning, language detection, URL tokenisation
Labelling: Manual sentiment annotation interface
Model Training: Fine-tuning RoBERTa with Transfer Learning
Evaluation: Using cross-validation and accuracy metrics
Visualisation: Displaying sentiment distributions and word frequencies

Data Processing

English language detection using Lingua
URL standardisation and removal of duplicates
Sentiment categorisation (Positive, Neutral, Negative)

Models

The sentiment analysis uses:

Base model: RoBERTa
Transfer Learning: Fine-tuning on domain-specific data
Layer freezing: 10 bottom layers frozen to preserve general language understanding

Concepts Explored

The project explores several key data science and NLP concepts:

Web Scraping: Ethical data collection techniques from social media platforms
Natural Language Processing: Text preprocessing, tokenisation, and sentiment analysis
Transfer Learning: Adapting pre-trained language models to new domains
Feature Engineering: Extracting relevant features from text data
Model Evaluation: Cross-validation techniques and performance metrics
Class Imbalance: Handling uneven distribution of sentiment classes
Hyperparameter Tuning: Optimizing model parameters for better performance
Data Visualisation: Techniques for presenting text data analysis results

Requirements

Python 3.10
PyTorch
Transformers (Hugging Face)
Apify client
Lingua

Installation

pip install -r requirements.txt

Additionally, the project requires pytorch package with GPU compatible Nvidia CUDA version, which can differ across devices. Check your GPU's CUDA version with: nvidia-smi and install compatible pytorch package.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
data		data
reports/figures		reports/figures
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LinkedIn Sentiment Analysis

Project Overview

Features

Technical Architecture

Data Processing

Models

Concepts Explored

Requirements

Installation

About

Uh oh!

Uh oh!

Languages

License

NakerTheFirst/Sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

LinkedIn Sentiment Analysis

Project Overview

Features

Technical Architecture

Data Processing

Models

Concepts Explored

Requirements

Installation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages