Your Health Scores Are Wrong 40% of the Time
That "green" account that just churned? That "red" one that renewed and expanded? It's not bad luck. It's a broken model.
We've audited health scoring setups at dozens of B2B SaaS companies. The pattern is consistent: teams have health scores, but almost none of them have predictive health scores. They have color-coded dashboards that make everyone feel informed while nobody actually is.
The Problem With Red/Yellow/Green
Traditional health scoring was never designed to predict outcomes. It was designed to make dashboards look clean in board decks. There are three fundamental failures:
- Subjective inputs dominate. Most health scores lean heavily on CSM sentiment — how does the CSM feel about this account? That's not data. That's vibes. And vibes are biased toward recency and personal rapport.
- Lagging indicators only. By the time usage drops or support tickets spike, the decision to leave has already been made internally. You're reading the autopsy, not the vital signs.
- No backtesting. Almost nobody checks whether their health scores actually predicted anything. Did green accounts actually renew at a higher rate than yellow? If you haven't validated this, your scoring model is decoration.
What Predictive Scoring Actually Looks Like
A predictive health score weighs signals across multiple dimensions, and — critically — it's validated against real outcomes. Here's the framework we use with clients:
The Four Dimensions
- Business Value Realization. Is the customer actually getting the outcome they bought for? Not "are they using the product" — are they achieving the result? This requires mapping their original business case to measurable indicators.
- Technical Health. API reliability, integration stability, adoption depth across user segments. This is where product telemetry matters — not vanity metrics like "DAU" but functional engagement patterns.
- Relationship Depth. How many stakeholders are engaged? Is the economic buyer involved? Have you lost access to the original champion? Multi-threading isn't just a sales concept — it's a retention signal.
- Trajectory. The most overlooked dimension. A score of 65 that was 80 last quarter tells a completely different story than a 65 that was 50 last quarter. Direction matters more than position.
The Weighted Formula
Each dimension gets weighted based on your business model. A usage-heavy PLG product weights Technical Health higher. An enterprise deal with 18-month contracts weights Relationship Depth higher. There's no universal formula — but there is a universal process for finding yours:
- Pull 12 months of renewal and churn data
- Score each churned and renewed account retroactively across the four dimensions
- Run a correlation analysis to find which dimensions actually predicted the outcome
- Weight accordingly and set thresholds
- Backtest on a holdout set of accounts
This takes work. But once you've done it, you have a model that tells you where to spend time before the renewal conversation goes sideways. That's the difference between a color-coded dashboard and an early warning system.
Implementation: Where to Start
You don't need a data science team to do this. You need:
- 12 months of renewal/churn data with at least 30 outcomes (ideally 100+)
- Access to product usage data, support ticket history, and engagement records
- A spreadsheet and the willingness to spend a week doing the analysis
The output is a scoring model that your team can actually use to prioritize their week. Not a traffic light that nobody trusts — a ranked list of accounts by predicted risk, with the specific signals driving each score.
The Bottom Line
If you can't tell me which of your "green" accounts are most likely to churn in the next 90 days, your health scoring isn't predictive. It's performative. And it's costing you renewals you could have saved if someone had flagged them three months earlier.
Predictive scoring isn't a luxury for enterprise CS teams. It's table stakes for anyone who wants to stop being surprised by churn.