Your Health Scores Are Color Coding, Not Prediction — Here's How to Fix That
I've audited health scoring setups at dozens of B2B SaaS companies. Most of them have health scores. Almost none of them have predictive health scores.
Here's the difference:
A reactive health score says: "This account's usage dropped below 50 logins this month. It's red."
A predictive health score says: "This account's usage declined 22% over 60 days, their champion hasn't attended a meeting in 45 days, support ticket sentiment turned negative last week, and their renewal is in 90 days. Based on historical patterns, accounts with this combination of signals churn 74% of the time."
The first one tells you what already happened. The second tells you what's about to happen. That distinction is the difference between reacting to churn and preventing it.
Why Most Health Scores Fail
Three reasons:
1. They're rule-based, not pattern-based. Someone set thresholds — usage above X is green, below Y is red — and those thresholds are static. They don't account for the fact that different customer segments have different baselines. A startup with 10 users logging in 3 times a week might be healthy. An enterprise with 500 users at the same login rate is in trouble.
2. They use lagging indicators. NPS scores, CSAT surveys, and renewal outcomes are lagging. By the time NPS drops, the damage is done. Predictive scoring needs leading indicators: usage velocity (rate of change), feature adoption curves, stakeholder engagement cadence, support sentiment trajectory.
3. They're one-dimensional. A score built on usage alone misses relationship risk. A score built on CSM sentiment alone misses product risk. Prediction requires multiple independent signals combined.
Building Predictive Scores in Planhat
Planhat's health scoring system supports multi-dimensional scoring with configurable weights. That's the starting point. Here's how to make it actually predictive.
Dimension 1: Usage Velocity (Not Usage Volume)
Don't measure how much they use the product. Measure the rate of change.
In Planhat, set up Calculated Metrics with trailing window comparisons:
- 30-day average DAU vs. 90-day average DAU
- Week-over-week feature adoption rate
- Enable "Deviations from Normal" — Planhat's built-in ML that detects when a metric moves significantly from a customer's own historical baseline
The Deviations from Normal template is underused. It doesn't require you to define thresholds manually. It learns each customer's normal behavior and flags when something shifts. A customer who normally logs in 200 times a month dropping to 150 is a different signal than a customer who normally logs in 30 dropping to 15, even though both are "usage declines."
Set this as 25-30% of total health score weight.
Dimension 2: Stakeholder Engagement
Track this at the End User level, not the Company level:
- Meeting cadence with key contacts (are they attending QBRs?)
- Email response rates and sentiment
- Planhat's End User Relevance Score — the ML-powered 0-100 score based on activity level and interactions over time
Build a Formula Field that flags accounts where the highest-Relevance End User (your champion) has had declining activity over 30 days. That's your champion risk indicator.
Set this as 20-25% of health score weight.
Dimension 3: Support Sentiment Trajectory
Planhat's Conversational AI analyzes sentiment in emails, chats, and calls, categorizing them as Positive, Neutral, or Negative. This sentiment rolls up to Company, End User, and User levels.
Don't use current sentiment as your signal. Use the trajectory. An account that had 80% positive sentiment 90 days ago and now has 50% positive sentiment is declining, even if 50% positive still looks "okay" in isolation.
Build a Calculated Metric that tracks 30-day rolling sentiment against 90-day rolling sentiment. When the short window is significantly lower than the long window, something changed.
Set this as 15-20% of health score weight.
Dimension 4: Financial Signals
Use the License model to feed financial health into the score:
- Days until renewal (urgency factor)
- License value trend (expanding, flat, or contracting)
- Auto-renewal status
- Payment history (if available via billing integration)
A customer whose License value has contracted over two consecutive terms is a fundamentally different risk than one who just renewed flat. Use the License API to pull historical value changes and build a revenue trajectory metric.
Set this as 15-20% of health score weight.
Dimension 5: Product Fit / Adoption Depth
This is the one most teams skip entirely. It's not enough to know they're logging in. Are they using the features that correlate with retention?
For every SaaS product, there are 2-3 features that retained customers use and churned customers don't. Identify those features from your historical data. Then build Metrics that track adoption of those specific features.
In Planhat, you can use Assets to represent product modules or feature areas, and track usage at the Asset level with Calculated Metrics. When a customer isn't adopting the "sticky" features, the health score should reflect that.
Set this as 10-15% of health score weight.
Adding the AI Layer
With these five dimensions flowing, you have the data foundation for genuinely predictive scoring. Here's where LLMs add a layer that rules never could:
Pattern matching across churned accounts. Export your historical health score data for accounts that churned versus those that renewed. Feed this to Claude or Gemini with a structured prompt: "Analyze these 50 churned accounts and 200 renewed accounts. What combination of health dimensions preceded churn? What patterns distinguish churn risk from temporary dips?"
The LLM will identify multi-variate patterns that threshold-based rules can't capture. Maybe it's not just usage decline OR sentiment decline — it's usage decline AND sentiment decline AND renewal within 90 days AND no QBR in the last quarter. That specific combination might predict churn at 80%+ accuracy.
Via Planhat's MCP Server: Planhat offers a Model Context Protocol server that connects LLMs to live Planhat data with enterprise-grade permissioning. This means Claude or Gemini can query your customer data directly, run analysis against your actual accounts, and surface risk in real time without you manually exporting CSVs.
Through AI Workflows: Planhat's AI Workflow feature lets you combine automation steps with AI processing. Build a workflow that triggers when a health score drops below a threshold, feeds the account's full data context to an LLM, and generates a risk assessment with recommended actions. The output can be logged as a note on the Company, sent to the CSM via Slack, or used to auto-create a task.
Calibration Is Everything
A predictive health score on day one is a hypothesis. You need to track whether it's right.
Every quarter, pull the list of accounts your health score flagged as high risk. How many actually churned? How many false positives? How many churned accounts did the score miss entirely (false negatives)?
Use this to recalibrate weights. If support sentiment trajectory turns out to be a stronger predictor than usage velocity for your specific customer base, increase its weight. If stakeholder engagement isn't predictive (maybe your customers just don't attend QBRs regardless of health), reduce its weight and add a different dimension.
The health score is a living model, not a configuration you set once and forget.
The Bottom Line
A health score that tells you the current state of an account is a dashboard. A health score that tells you what's about to happen is a competitive advantage. The difference isn't technology — it's data architecture, signal selection, and the discipline to calibrate over time.
Most CS platforms can do this. Very few CS teams actually build it. The ones that do catch risk months before it becomes churn.
The CS Ops Playbook
Five chapters. One system. Read them in order or jump to what matters most.