Getting Data Governance for Regulatory Submissions Right Before AI Gets it Wrong with Cary Smithson

Playback speed

Share post at current time

Share from 0:00

0:00

Getting Data Governance for Regulatory Submissions Right Before AI Gets it Wrong with Cary Smithson

How life science companies can better govern their data to meet structured submission requirements, minimize regulator questions, and unlock the real potential of AI across R&D and QA.

The FDA Group

Mar 05, 2026

Subscribe to our podcast, The Life Science Rundown, if you haven’t already.

Regulatory submissions aren’t just “documents” anymore. They’re structured data: standardized attributes, controlled vocabularies, traceable lineage. If the data feeding those submissions is inconsistent, siloed, or ungoverned, everything downstream suffers:

Slower approvals.
More questions from regulators.
AI tools that produce unreliable outputs.
Audit findings that could have been avoided.

Nick Capman recently sat down with Cary Smithson, Managing Partner and Owner of LeapAhead Solutions, to talk about what it takes to build a sustainable data governance capability in life sciences right now.

Cary brings decades of experience helping life science companies automate and modernize their regulatory operations, from the days of paper trial master files and manual quality document management through the current shift toward structured data submissions and AI-assisted authoring.

She leads the DIA RIM Working Group, has led the DIA RIM Intelligent Automation Team, co-authored the DIA RIM eBook, and has served as a data governance, RIM, and IT strategy consultant to organizations ranging from top-30 global biopharma to emerging biotech.

Her client work has spanned companies including Regeneron, Bristol-Myers Squibb, Johnson & Johnson, Daiichi Sankyo, Bayer, and BeiGene, among others. We’re honored to have had the chance to sit down and get her insight into modern data governance.

Apple Podcasts | Spotify | YouTube | Web + Others

Cary's key insights and practical takeaways

If you’re short on time or would rather read than listen, here are the most important lessons from the discussion.

Data governance is urgent for three converging reasons. First, regulatory expectations are becoming more structured: IDMP for medicinal product data, UDI for devices, HL7 FHIR for health data exchange, and structured PQ CMC expectations in the US. These demand consistent, well-managed master data and metadata. Second, interoperability across the value chain requires consistent, high-quality master data, and that’s not possible without governance. And third, AI and analytics depend on trustworthy data. Without clear definitions, controls, and provenance, AI models produce unreliable or non-compliant outputs. Good data governance unlocks the explainability and validation that regulated environments require.
Complex modalities and new data sources are compounding the challenge here. Cell and gene therapies, combination products, connected medical devices, remote trials, and real-world evidence all increase the breadth and volume of data companies need to manage — device software versions, manufacturing parameters, post-market performance data, and more. Meanwhile, health authorities expect faster, more consistent submissions with transparent data lineage. They want to know where you got your data and how you can explain it.
Start with four foundational moves, but don’t boil the ocean. Cary recommends:
- First, define scope and critical data elements. Identify the high-impact data across R&D, regulatory, and quality (product master, substance, strength, dosage form, manufacturing sites, batch data), but focus on your biggest pain point first, then scale.
- Second, assign ownership and stewardship. Name business owners and data stewards per domain, and make sure these are people in the business, not IT. IT can facilitate, but the business should own its data. Stand up a governance board and document policies.
- Third, map data lineage and interfaces within your scope. Identify how data flows across functions, where it’s captured, transformed, and consumed, and flag issues like hard-coded interfaces, spreadsheet-based management, or emailed files on someone’s shared drive.
- And fourth, codify standards and controls. Implement master data standards, controlled terminologies, reference data harmonization, quality rules, and change control. Pair with a data catalog and pilot governance in one or two high-value use cases.
The most common challenges here tend to be organizational, not technical. Siloed functions, fragmented processes, and disconnected systems are the top barriers. Cary advises a federated governance model: central standards with stewardship within domains, API-based system integration, and shared glossaries and taxonomies to reduce interpretation gaps. Quality and manufacturing may define the same term differently than R&D and regulatory, and understanding those differences while mapping to a common standard (like IDMP) is critical. Budget structures that don’t cross functional lines make funding harder, which is why tying governance to measurable business outcomes is essential.
Good incentives matter more than good intentions. Getting people to comply with a governance model requires connecting it to business outcomes they’re already measured on: reduced submission cycle time, audit readiness, right-first-time metrics. If it rolls up to someone’s performance review, you’ll get compliance. If it’s just a “good corporate citizen” thing, you probably won’t. Executive sponsorship, a governance council, and deliberate change management all reinforce this.
Don’t let tools substitute for process. Some organizations get “tool happy.” They assume a new platform will solve their governance problems without addressing the underlying people and process gaps. Cary’s advice: start with people and organization, define the initial governance structure and policy, communicate it, then layer in standards, procedures, and data remediation. Once the basics are in place, select tools that support the data catalog, quality rules, and data lineage. The tools support governance; they don’t replace it.
Poor data governance creates real, compounding risks. Inconsistent product attributes or site data can trigger health authority queries and resubmissions, directly delaying product approvals. Operationally, it can cause misaligned labeling, batch release delays, and inspection observations. And in the age of AI, non-standard or low-quality data can undermine model performance and compliance, producing erroneous or misleading results. Explainability and model validation fall apart without data provenance.
Treat governance as a program, not a project. Establish a data governance council and a domain steward network with recurring processes: regular council meetings, defined steward roles, policies with standards and controls underneath, monitoring, and a continuous improvement loop. Integrate data governance into change control, system validation, and training. It should be part of your SDLC and requirements process so you’re not implementing new systems and backtracking later. Fund it via business outcomes and report health via KPIs: data quality scores, right-first-time submissions, data timeliness, lineage completeness, and issue remediation SLAs.
Strong governance is the foundation for AI, not the other way around. Standardized critical data elements, consistent terminology, and validated data sets make AI model training far more robust. Without that foundation, nuances create problems. Cary shared an example from a trial master file project: even with well-defined classification standards, AI can struggle with nuances like distinguishing whose CV it’s filing (a principal investigator versus a sub-investigator) because the document goes to a different place in the hierarchy. If a human has to review everything the AI produces, the time savings evaporate. Beyond training data, governance supports traceability and explainability. Lineage and metadata enable AI model validation, auditability, and risk assessment — critical in GxP contexts. And established data catalogs, ontologies, and semantic layers enable use cases like automated authoring, signal detection, benefit-risk analytics, and faster tech transfer.
Align enterprise master data with regulatory data models, and maintain the “crosswalk.” For IDMP and SPOR, harmonize product and substance master data with standardized attributes and codes, manage lifecycle and affiliate variations, and ensure reference data alignment across systems. For HL7 FHIR, use FHIR profiles for data exchange with external stakeholders and design APIs that respect privacy, consent, and regulatory constraints. For PQ CMC, build CMC data models that reflect control strategies, manufacturing process parameters, specs, and analytical methods, and ensure submissions can be fed from governed data sources. Cary’s practical tip: maintain a crosswalk between enterprise master data and regulatory data models (like RIM and eCTD) with traceability from source systems to submission artifacts.
When seeking external support, demand life science depth. The nuances in this industry (terminology, regulatory frameworks, GxP requirements, computer system validation, predicate rules) mean a consultant from another industry will need extensive ramp-up and will create more miscommunications. Look for experience across the lifecycle, from R&D through regulatory, quality, manufacturing, labeling, and safety. Look for data standards and reference data expertise (DIA RIM Reference Model, IDMP, SPOR, PQ CMC). And critically, look for change management capability: the ability to embed governance into roles, SOPs, daily workflows, and training, not just tool implementation.
A real-world example: measurable results within 12 months. Cary described a mid-sized biopharma with about 20 marketed products that faced inconsistent product master data across RIM, ERP, and labeling systems. They launched a data governance program focused on IDMP readiness and CMC structured data — defining critical data elements, establishing steward roles, implementing a data catalog, aligning reference data with IDMP and SPOR, creating validation rules, mapping lineage into RIM and eCTD, and integrating with their QMS for change control. Within about 12 months, affiliate submissions were 30–50% faster due to standardized attributes and automated data feeds, health authority queries dropped approximately 25% thanks to more consistent product data, and AI-assisted authoring reduced drafting cycles for CMC sections and labeling updates with enhanced traceability supporting audits.

One thing to bring back to your team

Look at how your organization is managing the data that feeds your regulatory submissions and AI initiatives. Ask:

Is anyone formally accountable for data quality across functions, or does ownership default to IT by accident?
Are you governing data as a persistent program with KPIs, or treating it as a project that ends when the tool goes live?
Can you trace the lineage of your submission data from source system to final artifact?
If you’re investing in AI for authoring or analytics, is the underlying data standardized and governed enough to make those tools reliable?

The companies getting ahead aren’t the ones with the most sophisticated tools. They’re the ones with clean, governed data that makes every downstream process (submissions, AI, inspections, labeling) faster and more reliable.

Cary Smithson is Managing Partner and Owner of LeapAhead Solutions, Inc., where she drives a consulting practice focused on IT strategy, business process, and systems consulting for life sciences. She leads the DIA RIM Working Group and the DIA RIM Intelligent Automation Team, and co-authored the DIA RIM eBook.

With decades of experience spanning large pharmaceutical companies, consulting firms (including Grant Thornton and PharmaLex), and enterprise technology organizations, Cary has helped clients across the life sciences industry modernize their regulatory, quality, and R&D processes.

Her expertise spans regulatory information management, data governance, AI and data strategy, enterprise architecture, and GxP compliance. She has served clients including Regeneron, Bristol-Myers Squibb, Johnson & Johnson, Daiichi Sankyo, Bayer, BeiGene, and many others. Cary is a recognized thought leader who regularly presents at industry conferences and has published extensively on RIM, intelligent automation, and AI in life sciences.

Connect with Cary on LinkedIn here.

Who is The FDA Group?

The FDA Group helps life science organizations rapidly access the industry's best consultants, contractors, and candidates. Our resources assist in every stage of the product lifecycle, from clinical development to commercialization, with a focus on staff augmentation, auditing, remediation, QMS, and other specialized project work in Quality Assurance, Regulatory Affairs, and Clinical Operations.

With over 3,750 resources worldwide, over 325 of whom are former FDA, we meet your precise resourcing needs through a fast, convenient talent selection process supported by a Total Quality Guarantee.

Here’s why 17 of the top 20 life science firms access their consulting and contractor talent through us:

Resources in 75 countries and 48 states.
26 hours average time to present a consultant or candidate.
Exclusive life science focus and expertise.
Dedicated account management team.
Right resource, first time (95% success).
97% client satisfaction rating.

Talk to us when you're ready for a better talent resourcing experience and the peace of mind that comes with a partner whose commitment to quality and integrity reflects your own.

Subscribe to The Life Science Rundown:

Apple | Spotify | YouTube | Web + Others

Check out our newly launched AI-powered QMS audit tool, AICA (the Audit Intelligence Compliance Assistant).

The FDA Group's Insider Newsletter

Getting Data Governance for Regulatory Submissions Right Before AI Gets it Wrong with Cary Smithson

Cary's key insights and practical takeaways

One thing to bring back to your team

Who is The FDA Group?

Discussion about this video

Ready for more?