Vivek Varikuti

Hi, I'm Vivek 👋

AI Engineer building production LLM and computer-vision systems for government at scale — POCSO compliance AI, real-time face recognition, and live CCTV violation detection.

About

Lead AI Engineer at AI4AP / RTGS, building production AI for the Andhra Pradesh and Telangana state police. I shipped India's first AI-powered POCSO charge-sheet compliance system (RAG over 1,000+ legal documents), a real-time face-recognition pipeline running on consumer-grade hardware across a 2,000-camera CCTV deployment (93.3% true-accept at a 1-in-1,000 false-match rate), and a YOLO26 no-helmet violation detector. B.Tech in AI & ML (2025). Google Summer of Code 2026 contributor to DeepChem, and published researcher in pose-guided image generation. Won 1st prize in the nationwide AI Agent Hackathon conducted by the Andhra Pradesh Police.

Work Experience

AI4AP / RTGS

Aug 2025 - Present

Lead AI Engineer

Lead AI engineer building production LLM and computer-vision systems for the Andhra Pradesh and Telangana state police. Shipped India's first AI-powered POCSO charge-sheet compliance system using RAG over 1,000+ legal documents with citation-backed retrieval — generating compliance scorecards and evidence-gap reports in under 5 minutes per case (vs. 4+ hours manual) and cutting prosecutor review time 60%. Built CCTV360, a YOLO26 no-helmet violation detector trained on 50,000+ traffic-camera images (2,000+ violations flagged in a single 6-hour window). Engineered a real-time face-recognition system (SCRFD + AdaFace) reaching 93.3% true-accept at a 1-in-1,000 false-match rate on real operational data, optimized to run on consumer-grade hardware across a 2,000-camera deployment, with AES-encrypted face templates and a strict PII-exclusion policy.

GGS Information Services

Dec 2024 - Feb 2025

Machine Learning Intern

Engineered production-ready 3D compression algorithms achieving 60% model size reduction while maintaining 95% geometric accuracy, deployed across 15+ enterprise clients with 40% rendering speed improvement. Built end-to-end AI pipeline for 2D-to-3D conversion using custom GANs, processing 500+ STEP files daily with 85% accuracy and reducing manual CAD modeling time by 70% for engineering teams. Implemented MLOps infrastructure with real-time monitoring dashboards tracking model drift, performance degradation, and inference latency, achieving 99.5% uptime in production.

Education

Usha Rama College of Engineering

Dec 2021 - May 2025

Bachelor of Technology in AI and ML

Skills

LangChain

LangGraph

RAG Pipelines

Agent Orchestration

Finetuning (LoRA/QLoRA)

RLHF / GRPO / GSPO

vLLM

SGLang

Computer Vision

YOLO26

Object Detection

Face Recognition (SCRFD / AdaFace)

ByteTrack

OpenCV

ONNX Runtime

TensorRT

Edge Inference

Flash Attention 2

KV-Cache Quantization

DeepSpeed

FSDP

Ray

Python

FastAPI

Microservices

REST APIs

MongoDB

PostgreSQL

Redis

Vector Databases

Pinecone

FAISS

Qdrant

Azure

CI/CD

Weights & Biases

Docker

Kubernetes

PyTorch

TensorFlow

scikit-learn

XGBoost

Hugging Face

Diffusers

My Projects

Check out my latest work

I've worked on a variety of projects, from simple websites to complex web applications. Here are a few of my favorites.

TurboQuant: KV-Cache Quantization for Long-Context Inference

2025

Implementing arXiv 2504.19874 (TurboQuant) on Qwen 32B to compress the KV cache via random rotation + scalar quantization, targeting 4x memory reduction for long-context inference on a single H100. Built custom Triton kernels for fused rotate-quantize-dequantize on the attention path; benchmarking against vanilla FP16 KV cache on perplexity and throughput.

PyTorch

Triton

Qwen 32B

CUDA

Genesis: ALife Simulation with Evolving GRU Agents

2025

Built a virtual world where blank GRU-based agents evolve from scratch via genetic algorithms with no reward shaping, reaching 86 generations of emergent foraging and survival behavior. Designed a phased curriculum (perception → action → memory → social) to study how cognition emerges in minimal artificial-life systems.

PyTorch

Neuroevolution

Reinforcement Learning

DeepChem — Google Summer of Code 2026

2026

Building an OLMo language-model wrapper for DeepChem, bringing fully open-source LLMs (weights + training data + code) to the cheminformatics ecosystem for molecule property prediction and chemical reasoning. Contributing to a 2.5k+ star scientific ML library used by pharma research labs; PR live and under maintainer review.

PyTorch

Hugging Face

OLMo

Cheminformatics

Source

wingman-AI

2025

Developed a stealth desktop app for AI-powered coding/interview assistance with advanced process hiding. Integrated Google Gemini and OCR for real-time speech/screen analysis and instant responses. Designed modular, cross-platform (Win/macOS/Linux) architecture and intelligent caching.

TypeScript

Electron

Next.js

Gemini AI

OCR

Source

GSPO-DeepSeek-R1-Distill-Qwen-1.5B

2025

Implemented Group Sequence Policy Optimization (GSPO) algorithm from Qwen Team research, achieving superior stability with 50-75% clipping rates vs 0.01-0.02% for baseline PPO/GRPO methods on reasoning tasks. Engineered complete knowledge distillation pipeline from DeepSeek-R1 to Qwen-1.5B architecture, incorporating 8-bit optimization and gradient checkpointing for memory-efficient training on H100/RTX hardware.

PyTorch

Transformers

Wandb

Flash Attention

Source

Dial 112 AI: Emergency Call Intelligence

2025

Developed production AI system for Andhra Pradesh Police processing 1000+ emergency calls daily, implementing speech-to-text, sentiment analysis, and priority classification. Built real-time geospatial analysis and emergency dispatch optimization, reducing average response time by 25% through intelligent resource allocation.

Speech Recognition

NLP

Real-time Processing

Python

Source

CADify: AI-Powered 3D CAD Generation

2024

Developed multimodal AI system transforming 2D engineering diagrams into 3D CAD models with 92% geometric accuracy using computer vision and LLM integration. Built robust feature extraction pipeline handling technical drawings, sketches, and annotations with advanced OCR and geometric reasoning capabilities.

PyTorch3D

OpenAI

OpenCV

Computer Vision

Source

DeepRE: Deep Reinforcement Learning for Self-Verification

2025

Reproduced DeepSeek R1 Zero achieving 90% functional parity, implementing advanced RL techniques for LLM self-verification on complex reasoning tasks. Engineered distributed training pipeline using VLLM backend and Flash Attention 2, enabling cost-effective training of 3B parameter models on consumer hardware with 40% cost reduction.

PyTorch

VLLM

Ray

Flash Attention

Source

PPAG: Pose-Guided Image Generation

2025

Engineered production-ready pose transfer system integrating 5 ControlNet models, achieving 90% pose accuracy on COCO-Pose benchmark with real-time inference. Implemented advanced prompt engineering pipeline with dynamic negative prompting and attention guidance, improving generation quality by 40% and reducing artifacts by 65%.

PyTorch

ControlNet

Gradio

Diffusers

Owl CLI: OS-Level AI Agent

2025

Built intelligent CLI agent leveraging Google Gemini LLM for natural language to system command translation, enabling conversational OS interaction with 95% command accuracy. Engineered autonomous security auditing system with real-time monitoring, policy violation detection, and automated threat response capabilities.

LLMs

System Integration

Windows APIs

Python

Source

Hackathons

I like building things

During my time in university, I attended 1+ hackathons. People from around the country would come together and build incredible things in 2-3 days. It was eye-opening to see the endless possibilities brought to life by a group of motivated and passionate individuals.

2025

AI Agent Hackathon by Andhra Pradesh Police

Andhra Pradesh, India

Won 1st prize in nationwide AI Agent Hackathon conducted by Andhra Pradesh Police for developing emergency call intelligence system.

Contact

Get in Touch

Want to chat? Just shoot me a dm with a direct question on twitter and I'll respond whenever I can. I will ignore all soliciting.