← Back to projects

RoleHunt

Private

AI-Powered Job Search Platform — Full-Stack + ML

23
ATS sources built
901
tests passing
36
DB tables
Code available on request

Solo developer — designed, built, and deployed the entire platform end-to-end. Next.js 16 frontend, Supabase backend (36 tables, 38 migrations), FastAPI ML microservice (6-pass pipeline, 901 tests), Chrome extension, Gmail integration, Cloud Run deployment with automated cron scheduling. I use it daily.

The Problem

Job searching is broken. You apply to hundreds of postings, there is no reliable way to know which companies actually sponsor H-1B visas, and most platforms just wrap the same stale listings. I was going through this myself and got tired of the manual grind — so I built an entire platform: scrapers for 23 ATS sources, a FastAPI ML microservice with a 6-pass processing pipeline, salary prediction, skill extraction, contact discovery, and AI-powered resume tailoring.

How I Built It

23 ATS source integrations

Built custom scrapers and API integrations for Greenhouse (4,516 companies), Lever (947), Ashby (798), plus Workday, SmartRecruiters, BambooHR, JazzHR, iCIMS, and 14 more — 23 total. Each source has its own parser, rate limiting, and error handling. Automated Cloud Scheduler crons scrape fresh listings daily. Global deduplication using fuzzy title+company matching prevents duplicates across sources.

FastAPI ML microservice (6-pass pipeline)

Each job goes through a 6-pass processing pipeline: (1) OpenAI text-embedding-3-small for vector embeddings, (2) Gemini batch title enrichment, (3) AI + regex role tag classification, (4) IDF-weighted score computation against user profiles, (5) ESCO skill extraction via ML service (13,890+ skills taxonomy), (6) XGBoost salary prediction (R²=0.58, 13,591 predictions). Separately, the ML service hosts ConFit v2 job-resume matching (fine-tuned JobBERT, 768-dim ONNX, MRR 0.94 vs OpenAI's 0.85) and H-1B sponsorship detection (USCIS + DOL LCA + 110 regex). 901 tests passing.

H-1B sponsorship intelligence (multi-layer)

Goes beyond simple employer lookup. Cross-references USCIS H-1B employer data AND Department of Labor LCA filings. 110 regex patterns detect sponsorship language in job descriptions (both positive and negative signals). Companies that sponsored 50+ H-1B visas in the past year get flagged as likely sponsors. The multi-layer approach catches cases that a single-source check would miss.

Contact discovery pipeline

LinkedIn X-ray search to find hiring managers and recruiters at target companies. AI-powered ranking determines who is most likely the decision-maker. Email pattern engine with 287+ company-specific patterns (first.last@, firstl@, etc.) generates candidate emails, then validates deliverability. The goal: skip the application black hole and reach the actual human.

ConFit v2: fine-tuned job matching

Fine-tuned JobBERT embeddings using contrastive learning (ConFit v2 approach) on job-resume pairs. The resulting 768-dim ONNX model achieves MRR 0.94 for job-resume matching — significantly better than OpenAI embeddings (MRR 0.85) at zero marginal cost per query. Embeddings stored in pgvector with IVFFlat indexing for fast similarity search.

Full-stack infrastructure

Next.js 16 frontend with App Router, Supabase backend (PostgreSQL + pgvector, 36 tables, 38 migrations, Row Level Security). Gmail integration with AES-256-GCM encryption for application tracking (30+ ATS sender patterns, 40+ parsing rules). Chrome extension (Manifest V3) for one-click job saving across 15 ATS platforms. GDPR Article 17 compliance across 19 tables. Deployed on Cloud Run with Cloud Scheduler crons for automated ingestion, embedding, and verification.

Architecture

Next.js 16 (App Router) — Frontend + API Routes
├── Supabase (Auth + PostgreSQL + pgvector)
│   ├── 36 tables, 38 migrations, Row Level Security
│   ├── pgvector IVFFlat indexing for embedding search
│   └── GDPR Article 17 compliance (19 tables)
├── FastAPI ML Microservice (6-pass pipeline, 901 tests)
│   ├── Pass 1: OpenAI text-embedding-3-small
│   ├── Pass 2: Gemini batch title enrichment
│   ├── Pass 3: AI + regex role tag classification
│   ├── Pass 4: IDF score computation (2,575 weights, 74 baselines)
│   ├── Pass 5: ESCO Skill Extraction (13,890+ taxonomy)
│   └── Pass 6: XGBoost Salary Prediction (R²=0.58, 13.5K predictions)
├── ML Service Models
│   ├── ConFit v2 Job Matching (JobBERT 768d ONNX, MRR 0.94)
│   └── H-1B Detection (99K USCIS + 43K LCA + 110 regex)
├── 23 ATS Sources
│   ├── Greenhouse (4,516 cos) · Lever (947) · Ashby (798)
│   └── Workday · SmartRecruiters · BambooHR · 14 more
├── Contact Discovery Pipeline
│   ├── LinkedIn X-ray (Serper) → AI ranking
│   ├── Email pattern engine (287+ patterns) → SMTP verify
│   └── Hunter.io enrichment · GitHub email discovery
├── Chrome Extension (Manifest V3, 15 ATS platforms)
├── Gmail Integration (AES-256-GCM, 30+ sender patterns)
└── Google Cloud Run + Cloud Scheduler (3 crons) + pg_cron

Results

  • 23 ATS source integrations with automated daily scraping via Cloud Scheduler crons
  • FastAPI ML microservice: 6-pass processing pipeline, 901 tests passing
  • ConFit v2 job matching: MRR 0.94 (vs OpenAI embeddings at 0.85)
  • XGBoost salary prediction: R²=0.58, 13,591 predictions with 3-tier fallback (ML → LCA → BLS)
  • IDF-weighted skill scoring: 2,575 weights, 74 role baselines, 84.7% precision / 94.9% recall
  • H-1B intelligence: 99,117 USCIS sponsor records + 43,371 LCA filings + 110 regex patterns
  • Contact discovery: 287+ email patterns, LinkedIn X-ray, SMTP verification, GitHub scanning
  • 36 Supabase tables, 38 migrations, pgvector IVFFlat indexing, Row Level Security
  • GDPR Article 17 data deletion across 19 tables
  • Deployed on Cloud Run with Cloud Scheduler crons — I use this every day for my own search

If I were starting over, I would separate the ML microservice and the ingestion workers into independent services with a proper message queue between them — right now the FastAPI service handles both inference and batch processing.

Tech Stack

Next.js 16TypeScriptSupabasePostgreSQLpgvectorFastAPIPythonXGBoostONNXJobBERTGemini 2.5 FlashESCOChrome Extension (MV3)Google Cloud RunCloud SchedulerAES-256-GCMTailwind CSS