For years, Python has held a near-monopoly on Machine Learning. If a PHP developer wanted to integrate a predictive model, a sentiment analyzer, or a classifier into their web application, they usually had to spin up an external Python microservice via a Flask or Fast API gateway.
But what if you could do it all natively?
Enter PHP-ML, a comprehensive machine learning library for PHP. It provides a clean, object-oriented framework for data preprocessing, feature extraction, classification, regression, and clustering—all running directly inside your native PHP runtime.
In this guide, we will break down how the machine learning workflow translates to PHP and build a fully functional K-Nearest Neighbors (KNN) classification script from scratch.
🏗️ The Machine Learning Workflow in PHP
Regardless of whether you are using Python or PHP, every standard machine learning pipeline follows the exact same lifecycle. PHP-ML exposes modular classes for every single node in this pipeline:
- Dataset Loading & Preprocessing: Importing raw arrays, CSVs, or database records and scaling the numerical variables.
- Splitting: Dividing your data into Training samples (to teach the algorithm) and Testing samples (to evaluate accuracy).
- Training (Fitting): Feeding data into the mathematical model so it can learn hidden patterns.
- Prediction: Passing new, unseen entries to the model to receive automated classification labels.
🛠️ Step-by-Step Implementation Guide
Let’s build a script that classifies items based on physical attributes. Suppose we run a logistics application and want to automatically classify packages into two tiers: Standard or Heavy/Oversized, based on two features: Weight (kg) and Volume (liters).
Step 1: Install PHP-ML via Composer
Make sure you have Composer installed on your local machine, open your terminal inside an empty project directory, and install the library:
Bash
composer require php-ai/php-ml
Step 2: Prepare and Split the Dataset
Create a file named predict.php. We will start by defining our historical dataset arrays and splitting them cleanly into training and testing pools using PHP-ML’s built-in RandomSplitter.
PHP
<?php
require_once __DIR__ . '/vendor/autoload.php';
use PhpMl\Dataset\ArrayDataset;
use PhpMl\CrossValidation\RandomSplitter;
use PhpMl\Classification\KNearestNeighbors;
use PhpMl\Metric\Accuracy;
// 1. Raw historical data: [Weight in kg, Volume in liters]
$samples = [
[2.5, 5], [1.2, 2], [3.0, 7], [0.8, 1], // Typical Small/Standard packages
[45.0, 90], [32.5, 75], [50.0, 110], [28.0, 60] // Typical Heavy/Oversized packages
];
// Target labels corresponding to the samples above
$labels = [
'Standard', 'Standard', 'Standard', 'Standard',
'Oversized', 'Oversized', 'Oversized', 'Oversized'
];
// 2. Wrap into a structured PHP-ML Dataset container
$dataset = new ArrayDataset($samples, $labels);
// 3. Split data: 70% for training, 30% for evaluating accuracy
$splitter = new RandomSplitter($dataset, 0.7);
$trainSamples = $splitter->getTrainSamples();
$trainLabels = $splitter->getTrainLabels();
$testSamples = $splitter->getTestSamples();
$testLabels = $splitter->getTestLabels();
Step 3: Instantiate and Train the Classifier
We will use the K-Nearest Neighbors (KNN) algorithm. KNN works by plotting features geometrically and classifying an unknown sample based on the labels of the $k$ data points closest to it.
PHP
// 4. Initialize the KNN Classifier (checking the 3 closest neighbors)
$classifier = new KNearestNeighbors(3);
// 5. Train the model by fitting the data patterns
$classifier->train($trainSamples, $trainLabels);
print "🏋️♂️ Model trained successfully using native PHP memory!\n";
Step 4: Run Evaluation Metrics
Before putting any model into deployment, we must verify its accuracy against our testing subset to make sure it isn’t hallucinating its predictions.
PHP
// 6. Predict the labels for our hidden test set
$predictedLabels = $classifier->predict($testSamples);
// 7. Calculate mathematical accuracy percentage
$accuracy = Accuracy::score($testLabels, $predictedLabels);
print "📊 Model Evaluation Accuracy: " . ($accuracy * 100) . "%\n";
Step 5: Inference on Unseen Custom Data
Now that our model is validated, we can pass completely new user input values directly through the system.
PHP
// 8. Unseen runtime data: A package weighing 38kg with a volume of 82 liters
$newPackage = [38.0, 82];
// Run the inference prediction
$result = $classifier->predict($newPackage);
print "🔮 The predicted shipping classification category is: " . $result . "\n";
Step 6: Persist (Save) Your Trained Model
You don’t want to re-train your machine learning models from scratch every single time a webpage reloads. PHP-ML lets you serialize your trained brain states directly into a local file cache.
PHP
use PhpMl\Persistence\ModelManager;
// Save the model to disk
$modelManager = new ModelManager();
$modelManager->saveToFile($classifier, __DIR__ . '/shipping_model.txt');
// --- In a completely separate script or endpoint ---
// $restoredClassifier = $modelManager->restoreFromFile(__DIR__ . '/shipping_model.txt');
📈 When Should You Use PHP-ML?
While PHP-ML is an incredibly engineered library, it’s vital to know its target production use cases:
- When to use it: Perfect for small to medium datasets, proof-of-concept projects, synchronous text classification (e.g., spam detection forms), anomaly behavior checking, or internal dashboard tracking apps where you want to keep the architecture simple without introducing Python.
- When to scale out: If you are dealing with millions of records, deep neural network training, or heavy computer vision models, the single-threaded CPU memory layout of PHP will eventually bottle-neck. For those enterprise scales, consider moving to a specialized engine or training offline and running inference pipelines elsewhere.
PHP-ML shows that the modern PHP ecosystem is more dynamic than ever, proving you can run predictive analytical pipelines natively without ever leaving your home framework.
