About Me

AI Engineer based in Nairobi, Kenya. I build and deploy self-hosted LLM systems — fine-tuning, quantization, RAG, and multi-agent pipelines. Currently at Supreme AI building AI-powered products for the Australian market.

I also build AI tools independently — TradingAgents (a 13-agent NSE trading framework) and an LLM-powered financial auditor. MSc Computer Science from DeKUT with 3 IEEE publications. Full stack: Python, LangGraph, FastAPI, React, Next.js.

LLM Systems

Fine-Tuning (LoRA/QLoRA) Quantization (GPTQ/AWQ/GGUF) vLLM RAG Multi-agent (LangGraph) Synthetic Data Generation Model Deployment

Backend

Python FastAPI Flask PostgreSQL Docker

Frontend

Next.js React JavaScript

ML Foundations

TensorFlow LangChain

Download Resume

GitHub Activity

Experience & Education

2025 — Present

AI Engineer

Supreme AI — Australia (Remote)

Building AI-powered products for the Australian market including CGT Brain (capital gains tax engine) and an AML/CTF compliance system. Fine-tuning, quantization, and deploying self-hosted LLMs via vLLM.

2024 — Present

Independent AI/ML Researcher

Nairobi, Kenya

Implementing global AI research papers for African markets. Built TradingAgents (multi-agent NSE trading system) and an LLM-powered financial auditor. 3 IEEE publications.

Oct 2023 — Jun 2026 · Graduating 12 June 2026

Master's Degree, Computer Science

Dedan Kimathi University of Technology (DeKUT)

Research focus: Medical imaging with deep learning, NLP topic modeling, financial health assessment using ML. 3 IEEE publications.

May 2019 — Apr 2023

BSc Information Technology

Dedan Kimathi University of Technology (DeKUT)

Second Class Upper Honours.

Projects

CGT Brain

An AI system that does what lawyers and accountants do for Capital Gains Tax. Feed it any property timeline and it applies ATO rules, performs full CGT analysis, and generates detailed reports — work that traditionally takes professionals hours, done in seconds.

Next.js FastAPI Gen AI

View Project Case study coming

Biashara Buddy

An AI-powered business consultant that helps Kenyan entrepreneurs with budgeting, licensing requirements, business ideas, and location selection, everything a qualified business advisor can do.

React Flask Machine Learning

View Project Case study coming

AI Research in a Kenyan Context

Implementing global AI/ML research papers to solve local problems.

TradingAgents

Multi-agent LLM trading system for the NSE, implementing the TradingAgents paper (UCLA/MIT). Four parallel AI analysts debate and synthesize market data, fundamentals, news, and sentiment to produce trading decisions — adapted for NSE constraints like T+3 settlement, KES commissions, and frontier market illiquidity.

LangGraph DeepSeek FastAPI Next.js

Live Demo Coming Soon Read Case Study

Based on arXiv:2412.20138

Automating Financial Statement Audits with LLMs

An LLM-powered auditing system that ingests Kenyan company financial statements (PDFs), chunks and embeds them with RAG, and produces structured audit reports — flagging IFRS violations, material misstatements, and going-concern risks. Built to handle the nuances of Kenyan regulatory filings (CMA, NSE listing rules, Companies Act 2015).

RAG LangChain FastAPI Next.js

Read Case Study

My Publications

Selected research contributions in data science and technology for development.

Automatic Detection and Classification of Gastrointestinal Pathological Findings using a Hybrid ResNet50-CNN from Endoscopic Images

Kinyanjui, Samson, Juliet Moso and Patrick Gikunda.

2025 IST-Africa Conference (IST-Africa), 2025.

DOI: 10.23919/IST-Africa67297.2025.11060502

Topic Clustering of COVID-19 Medical Literature Using LDA and K-Means: A Case Study

Kinyanjui, Samson and Benson Kituku.

IEEE International Conference on ICT4DA, Bahir Dar, Ethiopia, 2025.

DOI: 10.1109/ICT4DA67218.2025.11282626

Financial Health Assessment for Households in Kenya

Kinyanjui, Samson, Felix Lopuran, Peter Kimanga, Edna Mugoh, Dennis Kiprotich, and Patrick Gikunda.

IEEE International Conference on ICT4DA, 2024.

DOI: 10.1109/ICT4DA62874.2024.10777274

Latest from the Blog

AI/ML paper breakdowns, implementations, and research notes.

May 3, 2026 Tutorial

Part 10: Scaling AI Systems

The series finale: concurrency patterns, caching at every layer, structured logging, monitoring, graceful degradation, and a shipping checklist.

AI Engineer Path Scaling Production

May 3, 2026 Deep Dive

Running a 22B Audio-Video Diffusion Model on a Single GPU

Fitting a 22B audio-video diffusion model into a GPU budget shared with an LLM stack — FP8 quantisation, tiled VAE decode, subprocess VRAM isolation.

Diffusion FP8 Blackwell

May 3, 2026 Implementation

Designing a Consumer-Tier AI Product That Degrades Gracefully

The default LLM product pattern propagates inference failures to the user. For consumer purchase flows, that's the wrong default. Here's the inversion.

AI Products System Design LLM

May 3, 2026 Implementation

Building a Hybrid-RAG Assistant That Doesn't Hallucinate Statute

A chat assistant grounded in ~1,600 chunks of primary legislation — hybrid retrieval, careful chunking, and the parser bug that taught me to validate chunks first.

RAG Qdrant Hybrid Retrieval

May 3, 2026 Implementation

Generating Long-Form Compliance Documents with a Single LLM Call Pipeline

Why big-prompt generation breaks at scale, and how I rebuilt my document generator as a pipeline of small typed LLM calls.

LLM vLLM LoRA

May 2, 2026 Tutorial

Part 9: Knowledge Distillation

Distil a fine-tuned 4B teacher into a 1.5B student via synthetic data — with quality-control filters and an honest eval against the teacher.

AI Engineer Path Distillation Synthetic Data

May 1, 2026 Tutorial

Part 8: Serving Quantized Models with vLLM

From AWQ checkpoint to a production endpoint. Multi-LoRA hot-swapping, prefix caching, and swapping the local endpoint into the Part 5 RAG.

AI Engineer Path vLLM Serving

April 30, 2026 Tutorial

Part 7: Quantization for Deployment

14 GB FP16 down to 4 GB AWQ with sub-1% accuracy loss. Full pipeline from merged checkpoint to AWQ + GGUF artefacts, with benchmarks.

AI Engineer Path Quantization AWQ

April 29, 2026 Tutorial

Part 6: Fine-Tuning with LoRA and QLoRA

QLoRA + Unsloth on a free Colab T4. Dataset prep, hyperparameters, training, evaluation, common failure modes — end to end.

AI Engineer Path LoRA Unsloth

April 28, 2026 Tutorial

Part 5: Building a Production RAG System

Bolting search onto an LLM endpoint isn't RAG yet — it's the first 30%. Chunking, hybrid retrieval, prompt construction, citations, and eval.

AI Engineer Path RAG Hybrid Search

April 27, 2026 Tutorial

Part 4: Embeddings and Vector Search

How embeddings work, which model to pick, and how to add semantic search to the FastAPI service from Part 3.

AI Engineer Path Embeddings Qdrant

April 26, 2026 Tutorial

Part 3: Building APIs with FastAPI

Wrapping the Part 2 client in a production FastAPI service: validation, streaming, auth, rate limiting, and a Dockerfile.

AI Engineer Path FastAPI Authentication

April 25, 2026 Tutorial

Part 2: Your First LLM API Call

From four lines of OpenAI SDK to a production-grade client wrapper with retries, streaming, structured output, and cost tracking.

AI Engineer Path LLM API Streaming

April 24, 2026 Tutorial

Part 1: Python Foundations for AI Engineers

The Python skills that actually matter when you start building AI systems — type hints, async, context managers, and tests.

AI Engineer Path Python Fundamentals

Apr 16, 2026 Implementation

How I Build AI Agents That Actually Work in Production

The patterns, tools, and pitfalls I've learned from building multi-agent systems that go beyond demos.

AI Agents LangGraph ReAct

Apr 16, 2026 Deep Dive

How I Get the Best Out of My GPU Using vLLM for Local LLM Production

How PagedAttention achieves 14-24x throughput over HuggingFace Transformers, with deployment code.

vLLM PagedAttention GPU

Apr 14, 2026 Paper Breakdown

Demystifying LLM Quantization: GPTQ, AWQ & GGUF

How to shrink a 14GB model to 4GB and still get usable results — a practical guide with Python code.

Quantization LLMs GPTQ

Feb 18, 2026 Implementation

Building TradingAgents for the NSE

From UCLA/MIT research paper to a working multi-agent LLM system that debates, analyzes, and trades NSE equities — adapted for frontier market constraints.

LangGraph Multi-Agent NSE

Feb 15, 2026 Implementation

Automating Financial Statement Audits with LLMs

From research paper to production app — building an LLM-powered auditor for Kenyan company financial statements with RAG, FastAPI, and Next.js.

LLMs RAG FastAPI

Feb 5, 2026 Paper Breakdown

Understanding Attention Is All You Need

A deep dive into the Transformer architecture that revolutionized NLP and became the foundation for modern LLMs.

Transformers NLP Deep Learning

View All Posts

Contact Me

Let’s connect and build something great. I’d love to hear from you!

Get in Touch

sammainah98@gmail.com

+254 791 848007

LinkedIn GitHub X (Twitter)

About Me

LLM Systems

Backend

Frontend

ML Foundations

GitHub Activity

Experience & Education

AI Engineer

Independent AI/ML Researcher

Master's Degree, Computer Science

BSc Information Technology

Projects

CGT Brain

Biashara Buddy

AI Research in a Kenyan Context

TradingAgents

Automating Financial Statement Audits with LLMs

My Publications

Automatic Detection and Classification of Gastrointestinal Pathological Findings using a Hybrid ResNet50-CNN from Endoscopic Images

Topic Clustering of COVID-19 Medical Literature Using LDA and K-Means: A Case Study

Financial Health Assessment for Households in Kenya

Latest from the Blog

The AI Engineer's Path

Part 10: Scaling AI Systems

Running a 22B Audio-Video Diffusion Model on a Single GPU

Designing a Consumer-Tier AI Product That Degrades Gracefully

Building a Hybrid-RAG Assistant That Doesn't Hallucinate Statute

Generating Long-Form Compliance Documents with a Single LLM Call Pipeline

Part 9: Knowledge Distillation

Part 8: Serving Quantized Models with vLLM

Part 7: Quantization for Deployment

Part 6: Fine-Tuning with LoRA and QLoRA

Part 5: Building a Production RAG System

Part 4: Embeddings and Vector Search

Part 3: Building APIs with FastAPI

Part 2: Your First LLM API Call

Part 1: Python Foundations for AI Engineers

How I Build AI Agents That Actually Work in Production

How I Get the Best Out of My GPU Using vLLM for Local LLM Production

Demystifying LLM Quantization: GPTQ, AWQ & GGUF

Building TradingAgents for the NSE

Automating Financial Statement Audits with LLMs

Understanding Attention Is All You Need

Contact Me

Get in Touch