Hi, I'm Vivek 👋
AI Engineer building production LLM and computer-vision systems for government at scale — POCSO compliance AI, real-time face recognition, and live CCTV violation detection.
VV

About

Lead AI Engineer at AI4AP / RTGS, building production AI for the Andhra Pradesh and Telangana state police. I shipped India's first AI-powered POCSO charge-sheet compliance system (RAG over 1,000+ legal documents), a real-time face-recognition pipeline running on consumer-grade hardware across a 2,000-camera CCTV deployment (93.3% true-accept at a 1-in-1,000 false-match rate), and a YOLO26 no-helmet violation detector. B.Tech in AI & ML (2025). Google Summer of Code 2026 contributor to DeepChem, and published researcher in pose-guided image generation. Won 1st prize in the nationwide AI Agent Hackathon conducted by the Andhra Pradesh Police.

Work Experience

A

AI4AP / RTGS

Aug 2025 - Present
Lead AI Engineer
Lead AI engineer building production LLM and computer-vision systems for the Andhra Pradesh and Telangana state police. Shipped India's first AI-powered POCSO charge-sheet compliance system using RAG over 1,000+ legal documents with citation-backed retrieval — generating compliance scorecards and evidence-gap reports in under 5 minutes per case (vs. 4+ hours manual) and cutting prosecutor review time 60%. Built CCTV360, a YOLO26 no-helmet violation detector trained on 50,000+ traffic-camera images (2,000+ violations flagged in a single 6-hour window). Engineered a real-time face-recognition system (SCRFD + AdaFace) reaching 93.3% true-accept at a 1-in-1,000 false-match rate on real operational data, optimized to run on consumer-grade hardware across a 2,000-camera deployment, with AES-encrypted face templates and a strict PII-exclusion policy.
G

GGS Information Services

Dec 2024 - Feb 2025
Machine Learning Intern
Engineered production-ready 3D compression algorithms achieving 60% model size reduction while maintaining 95% geometric accuracy, deployed across 15+ enterprise clients with 40% rendering speed improvement. Built end-to-end AI pipeline for 2D-to-3D conversion using custom GANs, processing 500+ STEP files daily with 85% accuracy and reducing manual CAD modeling time by 70% for engineering teams. Implemented MLOps infrastructure with real-time monitoring dashboards tracking model drift, performance degradation, and inference latency, achieving 99.5% uptime in production.

Skills

LangChain
LangGraph
RAG Pipelines
Agent Orchestration
Finetuning (LoRA/QLoRA)
RLHF / GRPO / GSPO
vLLM
SGLang
Computer Vision
YOLO26
Object Detection
Face Recognition (SCRFD / AdaFace)
ByteTrack
OpenCV
ONNX Runtime
TensorRT
Edge Inference
Flash Attention 2
KV-Cache Quantization
DeepSpeed
FSDP
Ray
Python
FastAPI
Microservices
REST APIs
MongoDB
PostgreSQL
Redis
Vector Databases
Pinecone
FAISS
Qdrant
Azure
CI/CD
Weights & Biases
Docker
Kubernetes
PyTorch
TensorFlow
scikit-learn
XGBoost
Hugging Face
Diffusers
My Projects

Check out my latest work

I've worked on a variety of projects, from simple websites to complex web applications. Here are a few of my favorites.

TurboQuant: KV-Cache Quantization for Long-Context Inference

Implementing arXiv 2504.19874 (TurboQuant) on Qwen 32B to compress the KV cache via random rotation + scalar quantization, targeting 4x memory reduction for long-context inference on a single H100. Built custom Triton kernels for fused rotate-quantize-dequantize on the attention path; benchmarking against vanilla FP16 KV cache on perplexity and throughput.

PyTorch
Triton
Qwen 32B
CUDA

Genesis: ALife Simulation with Evolving GRU Agents

Built a virtual world where blank GRU-based agents evolve from scratch via genetic algorithms with no reward shaping, reaching 86 generations of emergent foraging and survival behavior. Designed a phased curriculum (perception → action → memory → social) to study how cognition emerges in minimal artificial-life systems.

PyTorch
Neuroevolution
Reinforcement Learning

DeepChem — Google Summer of Code 2026

Building an OLMo language-model wrapper for DeepChem, bringing fully open-source LLMs (weights + training data + code) to the cheminformatics ecosystem for molecule property prediction and chemical reasoning. Contributing to a 2.5k+ star scientific ML library used by pharma research labs; PR live and under maintainer review.

PyTorch
Hugging Face
OLMo
Cheminformatics

wingman-AI

Developed a stealth desktop app for AI-powered coding/interview assistance with advanced process hiding. Integrated Google Gemini and OCR for real-time speech/screen analysis and instant responses. Designed modular, cross-platform (Win/macOS/Linux) architecture and intelligent caching.

TypeScript
Electron
Next.js
Gemini AI
OCR

GSPO-DeepSeek-R1-Distill-Qwen-1.5B

Implemented Group Sequence Policy Optimization (GSPO) algorithm from Qwen Team research, achieving superior stability with 50-75% clipping rates vs 0.01-0.02% for baseline PPO/GRPO methods on reasoning tasks. Engineered complete knowledge distillation pipeline from DeepSeek-R1 to Qwen-1.5B architecture, incorporating 8-bit optimization and gradient checkpointing for memory-efficient training on H100/RTX hardware.

PyTorch
Transformers
Wandb
Flash Attention

Dial 112 AI: Emergency Call Intelligence

Developed production AI system for Andhra Pradesh Police processing 1000+ emergency calls daily, implementing speech-to-text, sentiment analysis, and priority classification. Built real-time geospatial analysis and emergency dispatch optimization, reducing average response time by 25% through intelligent resource allocation.

Speech Recognition
NLP
Real-time Processing
Python

CADify: AI-Powered 3D CAD Generation

Developed multimodal AI system transforming 2D engineering diagrams into 3D CAD models with 92% geometric accuracy using computer vision and LLM integration. Built robust feature extraction pipeline handling technical drawings, sketches, and annotations with advanced OCR and geometric reasoning capabilities.

PyTorch3D
OpenAI
OpenCV
Computer Vision

DeepRE: Deep Reinforcement Learning for Self-Verification

Reproduced DeepSeek R1 Zero achieving 90% functional parity, implementing advanced RL techniques for LLM self-verification on complex reasoning tasks. Engineered distributed training pipeline using VLLM backend and Flash Attention 2, enabling cost-effective training of 3B parameter models on consumer hardware with 40% cost reduction.

PyTorch
VLLM
Ray
Flash Attention

PPAG: Pose-Guided Image Generation

Engineered production-ready pose transfer system integrating 5 ControlNet models, achieving 90% pose accuracy on COCO-Pose benchmark with real-time inference. Implemented advanced prompt engineering pipeline with dynamic negative prompting and attention guidance, improving generation quality by 40% and reducing artifacts by 65%.

PyTorch
ControlNet
Gradio
Diffusers

Owl CLI: OS-Level AI Agent

Built intelligent CLI agent leveraging Google Gemini LLM for natural language to system command translation, enabling conversational OS interaction with 95% command accuracy. Engineered autonomous security auditing system with real-time monitoring, policy violation detection, and automated threat response capabilities.

LLMs
System Integration
Windows APIs
Python
Hackathons

I like building things

During my time in university, I attended 1+ hackathons. People from around the country would come together and build incredible things in 2-3 days. It was eye-opening to see the endless possibilities brought to life by a group of motivated and passionate individuals.

  • A

    AI Agent Hackathon by Andhra Pradesh Police

    Andhra Pradesh, India

    Won 1st prize in nationwide AI Agent Hackathon conducted by Andhra Pradesh Police for developing emergency call intelligence system.
Contact

Get in Touch

Want to chat? Just shoot me a dm with a direct question on twitter and I'll respond whenever I can. I will ignore all soliciting.