Michael Thomson
DPO LLM training pipeline

AI Writing Style Fine-tuning

Developed a custom Direct Preference Optimization (DPO) training pipeline that fine-tunes LLMs to match a target author's voice and formatting.

Overview

Created chosen/rejected dataset-curation tooling and a style-transfer evaluation harness.

What it does
  • 01DPO training
  • 02Voice matching
  • 03Dataset curation
  • 04Style transfer
Built with
PythonLLM Fine-tuningDPO TrainingDataset Curation