MLexperiment: UFC Fight Outcome Prediction with ML.NET
A machine learning system that predicts UFC fight outcomes (Red corner vs Blue corner) using binary classification on differential fighter statistics. Built with ML.NET and .NET 8.0.
Table of Contents
- Data Source
- Prerequisites
- Getting Started
- Project Structure
- Architecture
- ML Pipeline
- EloBuilder Tool
- Adding a New Trainer
- Example Output
- License
Data Source
UFC Complete Dataset (1996-2024) on Kaggle
Download the CSV and place it at src/Data/large_dataset.csv. This path is gitignored.
Prerequisites
- .NET 8.0 SDK
- The Kaggle dataset CSV (see above)
Getting Started
# Restore dependencies
dotnet restore src/MLexperiment.sln
# Build the full solution (main project + EloBuilder tool)
dotnet build src/MLexperiment.sln
# Run training and prediction (uses src/Data/large_dataset.csv by default)
dotnet run --project src/MLexperiment.csproj
# Or specify a custom dataset path
dotnet run --project src/MLexperiment.csproj -- /path/to/dataset.csv
Project Structure
UFC_Match_Prediction/
├── src/
│ ├── Common/
│ │ ├── ITrainerBase.cs # Trainer interface (Fit, Evaluate, Save)
│ │ └── TrainerBase.cs # Abstract base: pipeline, feature engineering, metrics
│ ├── Data/ # CSV datasets (gitignored)
│ ├── DataModels/
│ │ ├── UFCMatchData.cs # Input schema — maps CSV columns via [LoadColumn]
│ │ └── UFCMatchPredictions.cs # Output schema — PredictedLabel (bool)
│ ├── Predictors/
│ │ └── Predictor.cs # Loads a saved .mdl file and runs inference
│ ├── Trainers/
│ │ └── RandomForestTrainer.cs # FastForest with configurable leaves/trees
│ ├── MLexperiment.csproj # Main project (Microsoft.ML 4.0.2)
│ ├── MLexperiment.sln
│ └── Program.cs # Entry point: configure trainers, train, evaluate, predict
└── tools/
└── EloBuilder/
├── EloBuilder.csproj # CLI tool (CsvHelper 33.1.0)
└── Program.cs # Computes per-fighter Elo ratings from fight history
Architecture
The project uses a Strategy + Template Method pattern to keep ML algorithm implementations minimal while centralizing the shared pipeline logic.
ITrainerBase (interface)
│ Fit(), Evaluate(), Save(), PrintModelMetrics(), Name, ModelPath
│
└── TrainerBase<TParameters> (abstract)
│ Owns: MLContext, data split, pipeline construction, model I/O
│ Subclasses only set Name and _model in their constructor
│
└── RandomForestTrainer (concrete)
Sets _model = FastForest(numberOfLeaves, numberOfTrees)
Predictor is a separate class that loads a saved .mdl file by path and exposes a Predict(UFCMatchData) method. It does not retrain.
Model files are saved with trainer-specific names derived from the trainer's Name property. For example, RandomForestTrainer(32, 200) saves to rf_32_leaves_200_trees.mdl. This means multiple trainer configurations can be compared side-by-side without overwriting each other.
ML Pipeline
The pipeline executes inside TrainerBase.Fit() and proceeds through these stages:
- Load -- Read CSV via
LoadFromTextFile<UFCMatchData>with column mappings defined by[LoadColumn]attributes - Split -- 70/30 train/test split
- Feature engineering:
- One-hot encode
Weight_ClassandGender - Map
Winner("Red"/"Blue") to booleanLabel - Convert
Is_Title_Bout(bool) to float
- One-hot encode
- Concatenate all features into a single
Featuresvector - Normalize with MinMax scaling
- Cache the processed data
- Train by appending the trainer estimator and calling
Fiton the training set
Features Used
The model uses differential statistics (Red minus Blue) to avoid encoding absolute values:
| Feature | Description |
|---|---|
Weight_Class | One-hot encoded weight division |
Gender | One-hot encoded (Male/Female) |
Is_Title_Bout | Whether the fight is a title bout |
Wins_Total_Diff | Win count difference |
Losses_Total_Diff | Loss count difference |
Age_Diff | Age difference |
Height_Diff | Height difference |
Weight_Diff | Weight difference |
Reach_Diff | Reach difference |
TD_Def_Diff | Takedown defense difference |
Sub_Diff | Submission average difference |
TD_Diff | Takedown average difference |
Evaluation Metrics
After training, the system prints binary classification metrics:
- F1 Score -- Harmonic mean of precision and recall
- Accuracy -- Overall correct prediction rate
- Positive/Negative Precision -- Precision per class (Red/Blue)
- Positive/Negative Recall -- Recall per class
- AUPRC -- Area Under Precision-Recall Curve
All training uses MLContext(seed: 42) for reproducibility.
EloBuilder Tool
A standalone CLI tool that computes Elo ratings for every fighter in the dataset. It reads the CSV in chronological order, tracks per-fighter ratings, and outputs a new CSV with three appended columns.
# Basic usage (outputs large_dataset_with_elo.csv alongside the input)
dotnet run --project tools/EloBuilder/EloBuilder.csproj -- src/Data/large_dataset.csv
# Specify output path
dotnet run --project tools/EloBuilder/EloBuilder.csproj -- input.csv output.csv
Output columns:
| Column | Description |
|---|---|
r_elo | Red corner's Elo rating before the fight |
b_elo | Blue corner's Elo rating before the fight |
elo_diff | r_elo - b_elo |
Algorithm details:
- Starting Elo: 1500
- K-factor: 32
- Formula:
E = 1 / (1 + 10^((Rb - Ra) / 400)) - Wins score 1.0, losses 0.0, draws 0.5
Ratings are recorded before the fight to avoid data leakage, making the output safe to use as training features.
Adding a New Trainer
The architecture is designed so that new ML algorithms require minimal code:
- Create a new file in
src/Trainers/(e.g.,LightGbmTrainer.cs) - Extend
TrainerBase<TParameters>with the appropriate parameter type - In the constructor, set
Nameand assign_modelto an ML.NET trainer estimator
using Microsoft.ML.Trainers.LightGbm;
using MLexperiment.Common;
namespace MLexperiment.Trainers
{
public class LightGbmTrainer : TrainerBase<LightGbmBinaryModelParameters>
{
public LightGbmTrainer(int numberOfLeaves, int numberOfIterations) : base()
{
Name = $"LightGBM ({numberOfLeaves} leaves, {numberOfIterations} iterations)";
_model = _mlContext.BinaryClassification.Trainers.LightGbm(
numberOfLeaves: numberOfLeaves,
numberOfIterations: numberOfIterations);
}
}
}
That's it. Fit(), Evaluate(), Save(), and PrintModelMetrics() are all inherited. The model file will be saved with a unique name derived from Name.