2026SoftwareDataModeling
Wastewater Treatment ML Model
A MATLAB study comparing five regression algorithms to predict water quality outputs from real wastewater treatment plant data.
MATLABMachine learningData analysis

Role
Team lead, data modeling and analysis (five-person team)
Goal
Apply and compare multiple machine-learning regression algorithms in MATLAB to predict key water quality outputs, including dissolved oxygen, nitrate, and ammonium concentrations, from real wastewater treatment plant operating data.
Challenge
Determining which model best fit the data across all output variables was the hardest part, since no single algorithm performed consistently well on every parameter. Dissolved oxygen in tank 1 proved especially difficult, with models that predicted it well consistently underperforming on all other outputs.
Solution
Leading a five-person team, I trained and evaluated five regression models in MATLAB using an 80/20 train-test split with RMSE as the primary performance metric. Support vector machine regression was eliminated early due to significantly higher error and training time relative to the other models. Linear regression emerged as the top performer in both accuracy and efficiency, with decision tree regression a strong alternative where speed was prioritized.
Results
- Showed that optimal modeling in complex environmental systems means selecting different algorithms for different output variables rather than one universal approach.
- Linear regression won on accuracy and efficiency; decision tree regression was the fastest viable alternative.
- Strengthened the team's experience with supervised machine learning, data preprocessing, and quantitative model evaluation.
Slide deck