Skip to main content

AI Engineer

I build self-hosted LLM systems — fine-tuning, quantization, RAG, and multi-agent pipelines. Currently shipping compliance and tax engines for the Australian market at Supreme AI. 3× IEEE published. Based in Nairobi.

View Projects
Samson Kinyanjui

About Me

AI Engineer based in Nairobi, Kenya. I build and deploy self-hosted LLM systems — fine-tuning, quantization, RAG, and multi-agent pipelines. Currently at Supreme AI building AI-powered products for the Australian market.

I also build AI tools independently — TradingAgents (a 13-agent NSE trading framework) and an LLM-powered financial auditor. MSc Computer Science from DeKUT with 3 IEEE publications. Full stack: Python, LangGraph, FastAPI, React, Next.js.

LLM Systems
Fine-Tuning (LoRA/QLoRA) Quantization (GPTQ/AWQ/GGUF) vLLM RAG Multi-agent (LangGraph) Synthetic Data Generation Model Deployment
Backend
Python FastAPI Flask PostgreSQL Docker
Frontend
Next.js React JavaScript
ML Foundations
TensorFlow LangChain
Download Resume
GitHub Activity
GitHub Contribution Graph

Experience & Education

2025 — Present

AI Engineer

Supreme AI — Australia (Remote)

Building AI-powered products for the Australian market including CGT Brain (capital gains tax engine) and an AML/CTF compliance system. Fine-tuning, quantization, and deploying self-hosted LLMs via vLLM.

2024 — Present

Independent AI/ML Researcher

Nairobi, Kenya

Implementing global AI research papers for African markets. Built TradingAgents (multi-agent NSE trading system) and an LLM-powered financial auditor. 3 IEEE publications.

Oct 2023 — Jun 2026 · Graduating 12 June 2026

Master's Degree, Computer Science

Dedan Kimathi University of Technology (DeKUT)

Research focus: Medical imaging with deep learning, NLP topic modeling, financial health assessment using ML. 3 IEEE publications.

May 2019 — Apr 2023

BSc Information Technology

Dedan Kimathi University of Technology (DeKUT)

Second Class Upper Honours.

Projects

CGT Brain

CGT Brain

An AI system that does what lawyers and accountants do for Capital Gains Tax. Feed it any property timeline and it applies ATO rules, performs full CGT analysis, and generates detailed reports — work that traditionally takes professionals hours, done in seconds.

Next.js FastAPI Gen AI
View Project Case study coming
Biashara Buddy

Biashara Buddy

An AI-powered business consultant that helps Kenyan entrepreneurs with budgeting, licensing requirements, business ideas, and location selection, everything a qualified business advisor can do.

React Flask Machine Learning
View Project Case study coming

AI Research in a Kenyan Context

Implementing global AI/ML research papers to solve local problems.

TradingAgents

TradingAgents

Multi-agent LLM trading system for the NSE, implementing the TradingAgents paper (UCLA/MIT). Four parallel AI analysts debate and synthesize market data, fundamentals, news, and sentiment to produce trading decisions — adapted for NSE constraints like T+3 settlement, KES commissions, and frontier market illiquidity.

LangGraph DeepSeek FastAPI Next.js
Live Demo Coming Soon Read Case Study

Based on arXiv:2412.20138

LLM Financial Auditor

Automating Financial Statement Audits with LLMs

An LLM-powered auditing system that ingests Kenyan company financial statements (PDFs), chunks and embeds them with RAG, and produces structured audit reports — flagging IFRS violations, material misstatements, and going-concern risks. Built to handle the nuances of Kenyan regulatory filings (CMA, NSE listing rules, Companies Act 2015).

RAG LangChain FastAPI Next.js

My Publications

Selected research contributions in data science and technology for development.

Automatic Detection and Classification of Gastrointestinal Pathological Findings using a Hybrid ResNet50-CNN from Endoscopic Images

Kinyanjui, Samson, Juliet Moso and Patrick Gikunda.

2025 IST-Africa Conference (IST-Africa), 2025.

DOI: 10.23919/IST-Africa67297.2025.11060502

Topic Clustering of COVID-19 Medical Literature Using LDA and K-Means: A Case Study

Kinyanjui, Samson and Benson Kituku.

IEEE International Conference on ICT4DA, Bahir Dar, Ethiopia, 2025.

DOI: 10.1109/ICT4DA67218.2025.11282626

Financial Health Assessment for Households in Kenya

Kinyanjui, Samson, Felix Lopuran, Peter Kimanga, Edna Mugoh, Dennis Kiprotich, and Patrick Gikunda.

IEEE International Conference on ICT4DA, 2024.

DOI: 10.1109/ICT4DA62874.2024.10777274

Latest from the Blog

AI/ML paper breakdowns, implementations, and research notes.

10-part series

The AI Engineer's Path

Python fundamentals through to a self-hosted, fine-tuned, quantized RAG service. Each part builds on the last.

View series
May 3, 2026 Tutorial

Part 10: Scaling AI Systems

The series finale: concurrency patterns, caching at every layer, structured logging, monitoring, graceful degradation, and a shipping checklist.

AI Engineer Path Scaling Production
Read more
May 3, 2026 Deep Dive

Running a 22B Audio-Video Diffusion Model on a Single GPU

Fitting a 22B audio-video diffusion model into a GPU budget shared with an LLM stack — FP8 quantisation, tiled VAE decode, subprocess VRAM isolation.

Diffusion FP8 Blackwell
Read more
May 3, 2026 Implementation

Designing a Consumer-Tier AI Product That Degrades Gracefully

The default LLM product pattern propagates inference failures to the user. For consumer purchase flows, that's the wrong default. Here's the inversion.

AI Products System Design LLM
Read more
May 3, 2026 Implementation

Building a Hybrid-RAG Assistant That Doesn't Hallucinate Statute

A chat assistant grounded in ~1,600 chunks of primary legislation — hybrid retrieval, careful chunking, and the parser bug that taught me to validate chunks first.

RAG Qdrant Hybrid Retrieval
Read more
May 3, 2026 Implementation

Generating Long-Form Compliance Documents with a Single LLM Call Pipeline

Why big-prompt generation breaks at scale, and how I rebuilt my document generator as a pipeline of small typed LLM calls.

LLM vLLM LoRA
Read more
May 2, 2026 Tutorial

Part 9: Knowledge Distillation

Distil a fine-tuned 4B teacher into a 1.5B student via synthetic data — with quality-control filters and an honest eval against the teacher.

AI Engineer Path Distillation Synthetic Data
Read more
May 1, 2026 Tutorial

Part 8: Serving Quantized Models with vLLM

From AWQ checkpoint to a production endpoint. Multi-LoRA hot-swapping, prefix caching, and swapping the local endpoint into the Part 5 RAG.

AI Engineer Path vLLM Serving
Read more
April 30, 2026 Tutorial

Part 7: Quantization for Deployment

14 GB FP16 down to 4 GB AWQ with sub-1% accuracy loss. Full pipeline from merged checkpoint to AWQ + GGUF artefacts, with benchmarks.

AI Engineer Path Quantization AWQ
Read more
April 29, 2026 Tutorial

Part 6: Fine-Tuning with LoRA and QLoRA

QLoRA + Unsloth on a free Colab T4. Dataset prep, hyperparameters, training, evaluation, common failure modes — end to end.

AI Engineer Path LoRA Unsloth
Read more
April 28, 2026 Tutorial

Part 5: Building a Production RAG System

Bolting search onto an LLM endpoint isn't RAG yet — it's the first 30%. Chunking, hybrid retrieval, prompt construction, citations, and eval.

AI Engineer Path RAG Hybrid Search
Read more
April 27, 2026 Tutorial

Part 4: Embeddings and Vector Search

How embeddings work, which model to pick, and how to add semantic search to the FastAPI service from Part 3.

AI Engineer Path Embeddings Qdrant
Read more
April 26, 2026 Tutorial

Part 3: Building APIs with FastAPI

Wrapping the Part 2 client in a production FastAPI service: validation, streaming, auth, rate limiting, and a Dockerfile.

AI Engineer Path FastAPI Authentication
Read more
April 25, 2026 Tutorial

Part 2: Your First LLM API Call

From four lines of OpenAI SDK to a production-grade client wrapper with retries, streaming, structured output, and cost tracking.

AI Engineer Path LLM API Streaming
Read more
April 24, 2026 Tutorial

Part 1: Python Foundations for AI Engineers

The Python skills that actually matter when you start building AI systems — type hints, async, context managers, and tests.

AI Engineer Path Python Fundamentals
Read more
Apr 16, 2026 Implementation

How I Build AI Agents That Actually Work in Production

The patterns, tools, and pitfalls I've learned from building multi-agent systems that go beyond demos.

AI Agents LangGraph ReAct
Read more
Apr 16, 2026 Deep Dive

How I Get the Best Out of My GPU Using vLLM for Local LLM Production

How PagedAttention achieves 14-24x throughput over HuggingFace Transformers, with deployment code.

vLLM PagedAttention GPU
Read more
Apr 14, 2026 Paper Breakdown

Demystifying LLM Quantization: GPTQ, AWQ & GGUF

How to shrink a 14GB model to 4GB and still get usable results — a practical guide with Python code.

Quantization LLMs GPTQ
Read more
Feb 18, 2026 Implementation

Building TradingAgents for the NSE

From UCLA/MIT research paper to a working multi-agent LLM system that debates, analyzes, and trades NSE equities — adapted for frontier market constraints.

LangGraph Multi-Agent NSE
Read more
Feb 15, 2026 Implementation

Automating Financial Statement Audits with LLMs

From research paper to production app — building an LLM-powered auditor for Kenyan company financial statements with RAG, FastAPI, and Next.js.

LLMs RAG FastAPI
Read more
Feb 5, 2026 Paper Breakdown

Understanding Attention Is All You Need

A deep dive into the Transformer architecture that revolutionized NLP and became the foundation for modern LLMs.

Transformers NLP Deep Learning
Read more

Contact Me

Let’s connect and build something great. I’d love to hear from you!