Evolving the System Validation Plan (SVP) for AI-Enabled Pharmacovigilance Systems
For years the validation playbook in pharmacovigilance barely changed, and for good reason — it worked. You wrote a User Requirement Specification, the team built the code against it, you put together a validation plan, ran your IQ, OQ and PQ, signed off the Validation Summary Report, and issued the release notice. Clean, linear, defensible. An auditor could trace any output back through the system to a requirement, and you could prove, with evidence, that the system did exactly what it was specified to do. There is genuinely nothing wrong with that approach. For deterministic systems it still holds.
The trouble is the one assumption it quietly rests on: the same input always produces the same output. Feed the system X, you get Y, every time. That predictability is the foundation the entire CSV edifice sits on.
AI breaks it.
Run the same safety source document through an AI-assisted intake system today, then run it again tomorrow, and you may not get an identical result. The output is non-deterministic — for the same set of inputs, two runs can differ. And the obvious follow-up question, the one an auditor will ask you, is the hard one: why did the model make that choice? If you can’t answer that, you don’t have control over your system. You just have a system.
The EU AI Act moved the starting line
What’s changed most for those of us doing this work is when the thinking has to happen. Under the EU AI Act, you cannot treat AI risk as something you assess once the application is built and you’re getting ready to validate it. You have to perform an AI risk assessment before a single line of model integration code is written. The risk category you land in shapes everything downstream — the obligations, the documentation, the controls. CSV hasn’t been thrown out, but it has been pushed forward and made to start earlier.
So the real question for a quality lead is this: how do you validate an AI system, and how do you sit across the table from an auditor and demonstrate that the controls and guardrails are genuinely in place to keep the AI making the right decision?
Let me be blunt about one thing, because I hear it constantly. “We have a human-in-the-loop who approves every action” is not an answer. It is not a governance strategy. It’s a single control, and on its own it’s a weak one — humans rubber-stamp, humans get tired, humans trust the machine more the longer it’s been right. HITL has a place, and an important one, but if it’s the whole of your story you will not survive the conversation. The auditor wants to know what you did before the human ever saw the output.
Here’s how I’d lay the lifecycle out.
Start with the AI Risk Assessment
Before development, document the risk category and the reasoning that got you there. Make sure you’ve covered the four risks that actually matter for these systems:
- Model risk — the model itself: drift, hallucination, the failure modes specific to the architecture you’ve chosen.
- Data risk — what goes in and what the model learned from. Provenance, quality, representativeness.
- Operational risk — what happens when the model is wrong in production, and what catches it.
- Compliance risk — where the system sits against the AI Act, GVP, data protection, and your own SOPs.
You don’t need to invent the structure from scratch. The templates at artificialintelligenceact.eu are a sensible reference point for the risk-assessment piece. Use them, then adapt to your context.
Change the architecture specification — don’t bolt AI onto it
This is where a lot of teams go wrong. They keep their existing architecture document and add a section called “AI”. Don’t. The AI is the architecture now, and the specification has to reflect that throughout.
Document the model version explicitly, and treat it as significant, because a model version change is a revalidation trigger. That single line in your spec is what protects you when someone swaps a model six months from now and assumes it’s a like-for-like replacement. Document the rest of the AI stack too — the libraries, the standard configuration and, critically, the fallback values for when things go sideways. And document the harder questions honestly: the training dataset and how you obtained it, fairness, bias, transparency, explainability. If you can’t describe how the model reaches a decision, say so, and say what you’ve put around it to compensate.
Build it — and test it the traditional way too
None of this means you abandon what already works. Develop the model, the code and the application, and do your unit testing and code review exactly as you always have. The deterministic parts of the system are still deterministic and still deserve the discipline. Keep the libraries you’ve used recorded in the architecture spec, and keep the standard configuration with its fallbacks documented alongside.
Put the guardrails in the design document
The design document is where the controls live, in full. Not summarised, not gestured at — the complete set. Guardrails, data retention, archiving, transparency, explainability, data privacy. If a control isn’t written down here, as far as an auditor is concerned it doesn’t exist. This is the document that proves the system was designed to behave safely, rather than coaxed into behaving safely after the fact.
The validation plan is where AI validation earns its name
This is the part that has changed most, so it’s worth slowing down.
First, document the context of use. An AI system validated for one purpose is not automatically valid for another, and the context is what bounds the claim you’re making. Get it on paper.
Then your validation strategy — and this is the bit people underestimate. Manual testing is not sufficient. Neither is conventional code-based testing on its own, because the thing you’re trying to assure isn’t deterministic logic, it’s statistical behaviour. You need mathematical and statistical testing of the model’s outputs, and you need data-focused testing — running enough representative and edge-case data through the system to characterise how it actually performs, not how you hope it performs. Then a risk-based regression approach, so that when the model or its configuration changes, you re-test in proportion to the risk rather than re-running everything or, worse, nothing.
It also helps to be explicit about what each qualification stage is actually qualifying, because the familiar IQ/OQ/PQ labels take on new meaning here:
- IQ verifies the infrastructure and the model deployment — that the right model, at the right version, is installed and configured exactly as specified.
- OQ verifies the model’s behaviour and its explainability — that it does what it should, and that you can account for how it arrived there.
- PQ verifies real-world performance monitoring — that the system goes on performing in production, against live data, and not just on the day you signed it off.
And now you bring in the human-in-the-loop, in its proper place: a qualified person who reviews the AI’s output and holds the authority to accept, modify or reject it before it influences any decision. Defined like that — qualified, empowered, sitting on top of a validated system with documented guardrails — HITL is a real control. It’s the last line, not the only line.
Don’t re-validate your vendors from scratch
Hardly any AI-enabled PV system is built entirely in-house. There’s a model provider, a cloud platform, third-party components sitting underneath the application you’re validating. Two things follow from that.
First, qualify your vendors and service providers before you engage them. Qualification is a gate you pass through, not a box you tick once the contract is signed. Second, once a vendor is qualified, don’t burn effort re-doing validation work they’ve already done and evidenced. Duplicating it helps nobody, and it won’t make the auditor any happier.
What the SVP should do instead is state plainly how you’ll rely on vendor evidence, and what evidence you’ll accept. In practice that means things like: vendor validation summaries, third-party audit reports, SOC 2 certifications, ISO 27001 certifications, model performance reports and release documentation.
Name these in the plan, say how each one supports your own validation argument, and you get the best of both: you avoid duplicating work, and you can still show the auditor a continuous chain of evidence running from the vendor’s controls through to yours.
Summarise, then release
All of it rolls up into the Validation Summary Report, the same way it always did. The VSR tells the story end to end: the risk you assessed, the controls you designed, the testing you ran, the residual risk you accepted and why. Then you release.
The shape of the process — assess, specify, build, test, summarise, release — is reassuringly familiar. What’s changed is the substance underneath each step, and the order in which the risk thinking has to happen. We’re not throwing CSV away. We’re growing it up to handle a system that can surprise us.
That, more than any single template or tool, is the shift. Validation used to be about proving the system did what we told it to. With AI, it’s about proving we have control over a system that decides some things for itself — and being able to show, with evidence, exactly where that control sits.
Thanks for reading.
😃
If you have any comments, feedback, or requests, please feel free to connect with me on LinkedIn. And if you liked this post, don’t forget to share it with your network!