Machine Learning with Spark Streaming – MLlib
Scalable classification using PySpark and MLlib on streaming data

Project Description
This project implements a real-time spam detection system using Apache Spark Streaming and MLlib, tailored for high-throughput text data environments. The pipeline leverages multiple classifiers — including Perceptron, Multinomial Naïve Bayes, and SGD — to analyze and categorize incoming messages on the fly.
Optimized using PySpark, the system improves classification speed by 35%, enabling it to scale efficiently with large volumes of streaming data. Feature extraction and vectorization are performed in real-time using Spark’s resilient distributed datasets (RDDs), making the model robust under production-like conditions.
This project demonstrates the practical application of distributed machine learning in handling unbounded data streams, such as email, chat, or sensor input — blending performance optimization with accurate, real-time inference.
Other projects

HoloCommerce
Immersive Multi-User VR Marketplace

Exposing Deepfakes with Vision Transformers
A ViT-based pipeline for deepfake detection and explainability using the DFDC dataset

XINU
Operating System Enhancements

K-Fold Vehicle Collision Prediction – ResNet
A deep learning model leveraging ResNet to predict vehicle collisions from dashcam footage with high accuracy

Ping Me: A Real-Time, Secure Chat Platform
WebSocket-powered chat system with JWT auth and CI/CD deployment

StoryTube
NLP-Powered Text-to-Animation System
