Nexar Dashcam Crash Prediction project screenshot

Nexar Dashcam Crash Prediction

Role: Computer Vision Engineer

PyTorch VideoMAE-2 TimeSformer Transformers

The Challenge

  • Build a safety-critical model that flags collisions and near-misses from raw dashcam streams in real time.
  • Handle a heavily imbalanced dataset of 1,500 videos with nuanced temporal cues inside a 1.5-second window.
  • Differentiate normal traffic stops from sudden braking to minimize false alarms.

The Solution

  • Fine-tuned VideoMAE-2 Large as the backbone for binary event classification.
  • Preprocessed 16-frame, 224x224 clips with TimeSformer to capture spatiotemporal motion signals.
  • Applied strong augmentations (flip, color jitter, rotation) to combat overfitting on scarce data.
  • Trained with Hugging Face Trainer on dual T4s, optimizing Average Precision across 500ms, 1000ms, and 1500ms horizons.

Key Outcomes

  • Placed 11th on the private leaderboard with 0.80 mAP.
  • Delivered high-precision early accident detection suitable for edge deployment.
  • Optimized batch inference to keep real-time latency in check.
MAP - Charting Student Math Misunderstandings project

MAP - Charting Student Math Misunderstandings

Role: NLP Researcher

LLMs LoRA/QLoRA Ensembles Qwen/DeepSeek

The Challenge

  • Automate detection of nuanced math misconceptions from free-form student explanations.
  • Control false positives where correct answers hide incorrect reasoning.
  • Meet a strict 9-hour inference budget on limited T4/L4 compute.

The Solution

  • Built a six-model ensemble spanning Qwen 2.5/3, Hunyuan 7B, and DeepSeekMath for reasoning diversity.
  • Used QLoRA (4-bit) and LoRA adapters to fine-tune up to 14B parameters on modest hardware.
  • Created a family-based weighted voting system combining probability sums, agreement bonuses, and correctness filters.
  • Ran domain-adaptive pretraining on math misconception corpora to align outputs with pedagogy.

Key Outcomes

  • Silver Medal - 45th globally with 0.947 MAP@3.
  • Outperformed baseline BERT models while meeting the 9-hour runtime cap.
  • Delivered a scalable ensemble pipeline for constrained inference budgets.
Lightweight Plagiarism Detector project

Lightweight Plagiarism Detector

Role: ML Engineer

PyTorch Transformers Streamlit PyMuPDF

The Challenge

  • Deliver accurate plagiarism detection without expensive proprietary APIs or large models.
  • Handle PDF ingestion and sentence-level semantic comparison for subtle paraphrasing.
  • Prove that small language models can excel on focused downstream tasks.

The Solution

  • Fine-tuned SmolLM2 (135M) into a binary classifier using the MIT Plagiarism dataset.
  • Trained with AdamW and Cross-Entropy over three epochs with custom padding for variable lengths.
  • Built an end-to-end Streamlit app with PyMuPDF extraction feeding the inference engine.
  • Deployed the full stack on Hugging Face Spaces for instant, serverless access.

Key Outcomes

  • Reached 96.20% accuracy and 0.96 F1 on 73k+ samples.
  • Delivered a fast, low-cost alternative to billion-parameter solutions.
  • Open-sourced the model and app under MIT for community use.
Automated Bank Statement Reconciliation project screenshot

Automated Bank Statement Reconciliation

Role: Python Automation Engineer

Python Pandas pdfplumber OpenPyXL

The Challenge

  • Replace manual reconciliation of 1,000+ monthly transactions locked in PDFs and Excel.
  • Normalize inconsistent statement formats while avoiding human transcription errors.
  • Compress a multi-day close process into minutes.

The Solution

  • Extracted high-fidelity transaction data from PDFs via pdfplumber.
  • Built a Pandas reconciliation engine matching by date, amount, and reference codes.
  • Generated formatted Excel discrepancy reports automatically with OpenPyXL.

Key Outcomes

  • Cut manual effort by 80%, saving about 20 hours per month.
  • Achieved 5x faster processing with 100% data integrity.
  • Flagged unmatched items instantly for finance review.
Predictive Analytics for Magazine Sales and Inventory project screenshot

Predictive Analytics for Magazine Sales and Inventory

Role: Data Scientist

Python XGBoost Scikit-learn Matplotlib

The Challenge

  • Balance supply and demand for magazine distribution while reducing returns.
  • Move from intuition-driven allocations to data-backed forecasts.
  • Capture seasonality, regional demographics, and holiday effects in one model.

The Solution

  • Trained an XGBoost regressor that hit R^2 of 0.77 on validation.
  • Engineered trend, seasonality, demographic, and holiday features for better signal.
  • Built a dynamic dashboard recommending per-location stock levels.

Key Outcomes

  • Drove 14% year-over-year sales growth by protecting high-demand areas.
  • Reduced unsold returns by 14%, lowering logistics and print waste.
  • Improved sales ops efficiency by 15%, freeing time for strategy.