What is MLOps?
A step-by-step guide for setting up an MLOps pipeline end-to-end using Python.
The field of Machine Learning has many domains for a budding Machine Learning Engineer to dive into. You can program machine learning libraries from scratch, or you can deploy, monitor, and maintain ML models at scale. The latter is frequently called MLOps in job descriptions in 2026. The following guide demonstrates the role of a Machine Learning Engineer specializing in MLOps in a real-world scenario.
Defining the Problem
An emergency room is located in a linguistically diverse area. Hospital staff are having difficulty triaging patients because they are unable to properly identify the needs of the patient. The ML team has developed a new model: a Multinomial Naive Bayes classifier, to read input from patients in their native language and output the correct specialty (Cardiology, Neurology, etc.) that the patient needs to see. The ML team is asking us to handle the end-to-end MLOps pipeline for the triage system.
1 - Project Initialization & Data Preprocessing
The first step is formally called data preprocessing, which is when we 'wrangle & clean' the data, or to put it simply: prepare the data for the ML model. This step might sound less meaningful than it really is, when in reality, the result heavily depends on what is done at this step. The ML model we are using (Multinomial Naive Bayes), like any ML model, is just a mathematical algorithm, a way to process the data computationally. The model we are using is calculating probabilities based on word frequencies. Without data, the model cannot produce any result. The machine learning model will choose its response based on the data that it was trained on. For this guide, the model will be trained on the Symptom2Disease dataset. This dataset contains 24 unique disease labels, each paired with a text description. We will use this data to train the model to predict disease labels for symptoms recorded from emergency room patients.
We also need to add two utilities:
- A way to take the disease label and assign it to the correct specialist (hypertension -> cardiologist)
- A translator to change all patient input to English to align with the dataset
1.1 - Project Setup
Before we can run any scripts the project must be properly intialized. Create a new python project and .venv in your IDE of choice.
- In the project root create the following directories:
artifacts,data, andETL. - Within the
datadirectory, create 2 more folders:processed&raw. - Place
Symptom2Disease.csv(link: https://www.kaggle.com/datasets/niyarrbarman/symptom2disease?resource=download) into therawfolder. - In the root of the project create a new file called
requirements.txtand place the following Dependencies inside:
# Core ETL Pipeline Dependencies
pandas>=2.0.0
numpy>=1.24.0
# Machine Learning
scikit-learn>=1.3.0
joblib>=1.3.0
# Translation & NLP
deep-translator>=1.11.4
langdetect>=1.0.9
# Visualization
matplotlib>=3.7.0
seaborn>=0.12.0
# Web Dashboard
streamlit>=1.28.0
# Development & Testing
pytest>=7.4.0
- Activate your environment (if you haven't already):
- Mac/Linux:
source .venv/bin/activate - Windows:
.venv\Scripts\activate
- In the terminal run:
pip install -r requirements.txt
1.2 - Extract
What are we doing?
We are extracting the data from the CSV file and returning it as a Pandas DataFrame. In addition, we are also performing integrity checks by verifying the file exists and there are no missing columns. Then we are using built-in Pandas methods to log data metrics.
Why are we doing this?
Pandas has many built-in tools and features to make data preprocessing much easier. By storing the data in a DataFrame we can wrangle the data quick and easy, without having to write too much code. We can verify that our data structure is valid by checking for the Symptom2Disease 'label' and 'text' columns. We are doing this during the first step of our ETL pipeline, because we need to ensure the data is valid before we transform it. Detailed logging ensures we can see exactly what data is being entered into the pipeline.
The following python script ./ETL/extract.py is used to extract data from the Symptom2Disease.csv CSV file:
(Click on the highlighted line numbers to view annotations)
1.3 - Transform
What are we doing?
In the extract phase we logged how many null values were in our extacted DataFrame, but we did not do anything to clean the data. In the transform phase we are removing null values and duplicate values. Then we are fitting the cleaned data to our TF-IDF Vectorizer.
Why are we doing this?
We need to prepare the raw symptom text data into a format that machine learning models can actually use. ML models can't work with text directly - they need numbers. Additionally, we need to ensure data quality before training begins.
1.4 - Load
What are we doing?
This is the Load phase of the ETL pipeline - ensuring the persistance of the processed data and logging the pipeline execution.
Why are we doing this?
We are ensuring both the data itself and the pipeline metadata about processing are preserved for production-ready monitoring and reproducibility.
2 - Model Training & Evaluation
We have successfully completed our ETL pipeline and we are ready to begin the next step: Model Training. Our data is officially ready. The load phase saved cleaned_symptoms.csv which is the input for training the model. We will create a new folder in the root of our project directory called 'models'. Within this folder create 2 files: train.py and evaluate.py.
2.1 - Train
What are we doing?
We are training a machine learning model to predict medical conditions from symptom descriptions.
Why are we doing this?
Main Purpose: Create a reusable trained model that can predict diseases from new symptom text without retraining.
2.2 - Evaluate
What are we doing?
We are evaluating the trained model's performance on the held-out test data (20% from the split).
We will:
- Load the trained model from disk
- Load test data using saved indices (ensures same samples from training)
- Generate predictions on test set
- Calculate performance metrics (accuracy, precision, recall, F1-score)
- Test hypothesis (did we achieve ≥85% accuracy target?)
- Generate per-disease classification report
- Create visualizations (confusion matrix, class distribution, top keywords)
Why are we doing this?
We want to prove the model works on unseen data - training metrics alone can be misleading (overfitting). We are also testing a hypothesis that the model will work with at least 85% accuracy.
3 - Visualizations and UI
We have successfully trained & evaluated our model. Now let's create some visualizations using Matplotlib and a simple user interface to interact with our trained model.
3.1 - Visualizations Using Matplotlib
What are we doing?
This file creates 3 key visualizations for model evaluation and documentation.
Why are we doing this?
We are demonstrating how Matplotlib, Pandas, and Scikit-Learn all work together to create highly detailed visualizations.
3.2 - Building a UI
What are we doing?
Building a simple user interface to interact with our model.
Why are we doing this?
Testing how our trained model works in a live environment.
4 - Translation & Triaging
Arguably, the most important part of our application. Using the tool we built to help patients by:
- Mapping the diseases to a specialist.
- Translating patient input from any language into English so that our model can interpret it.
4.1 - Mapping Diagnosis to Speciality
4.2 - Translation Utilities
5 - Running the Application
The following script goes into main.py.
Single Entry Point: One command (python3 src/main.py) runs everything.
Comprehensive Logging: Prints detailed progress, summaries, and final deliverables.
Final Project Structure:
If the app is working without errors you should see something like this in your terminal when you run main.py:
Now run streamlit run ui/app.py in terminal to view the visualizations and a demo of our app.