HealthPredict
Please log in to access this page.
documentation

How HealthPredict works

From a handful of clinical values to an explained, model-backed risk estimate — with the machine-learning kept cleanly separate from the web layer.

The models

Each condition has its own pair of scikit-learn models — a Logistic Regression and a Random Forest — trained on its own feature set. At prediction time both run and their probabilities are averaged into a single ensemble score, which is steadier than either model alone. The Logistic Regression's coefficients also power the per-factor explanation you see on every result.

The data

For this demo the models are trained on clinically-plausible synthetic datasets — each feature is drawn from a realistic population distribution and risk is generated from documented weightings. In production these are swapped for the standard public datasets (Pima Indians, UCI Cleveland, UCI Chronic Kidney Disease, Indian Liver Patient, and so on); the training pipeline is identical, only the data source changes. Every dataset and preprocessing step is documented in the training scripts.

Architecture

  • diseases.py — single source of truth for features, healthy ranges and risk weights.
  • train_models.py — builds the datasets, trains both models, reports accuracy + ROC AUC, persists each bundle with joblib.
  • ml.py — loads the bundles, ensembles the predictions, returns the explained factor breakdown.
  • app.py — Flask routing, authentication and persistence only — no ML logic.
  • SQLAlchemy ORM over SQLite for development; point DATABASE_URL at MySQL for production with no code changes.

Run it locally

python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python train_models.py        # trains + saves the models
python app.py                 # http://localhost:5000

Deploy (Linux VPS)

gunicorn -w 3 -b 127.0.0.1:8000 app:app
# then proxy it behind Nginx with a Let's Encrypt certificate

Conditions covered

Diabetes
7 inputs · LogReg + Random Forest
Heart Disease
7 inputs · LogReg + Random Forest
Chronic Kidney Disease
7 inputs · LogReg + Random Forest
Liver Disorder
7 inputs · LogReg + Random Forest
Thyroid Disorder
6 inputs · LogReg + Random Forest

HealthPredict provides statistical risk estimates for screening and education. It is not a medical device and does not provide a diagnosis.