Why Vetting AI Models Is Different From Buying Software
When a small business buys off-the-shelf software, the vendor is responsible for security. When you deploy an open source AI model — whether to reduce costs, gain more control, or avoid sending data to a third-party cloud — that responsibility shifts to you.
This is not a reason to avoid open source AI. There are real advantages: lower ongoing costs, greater data privacy, and the ability to customize. But before you put a model into production, you need to answer a basic question: can you trust what is inside it?
Unlike traditional software, AI models carry risks that are harder to see. A model can behave correctly 99% of the time and still contain subtle biases or backdoors that only surface under specific conditions. It might have been trained on data from sources you would not approve of. It might phone home in ways your team has not detected.
This guide walks through five practical methods for evaluating an open source AI model before it touches your business operations — along with what each method can and cannot catch.
Who this is for: Small and mid-size business owners who are considering self-hosted or open source AI tools, or working with a technology partner who is proposing them.
Step 1: Run a Software Composition Analysis (SCA) Scan
Software Composition Analysis tools automatically scan the code and dependencies of an AI model for known vulnerabilities, malicious packages, and flagged components. Think of it like a virus scanner, but for AI model code.
Platforms like Hugging Face — where most open source AI models are published — maintain databases of reported vulnerabilities. SCA tools check the model you are evaluating against those databases and flag anything that matches.
What SCA catches well
- Known security vulnerabilities in model dependencies
- Malicious code or packages that have already been reported
- Data poisoning issues that have been documented and disclosed
What SCA cannot catch
- Vulnerabilities that have not yet been discovered or reported
- Novel attack methods with no prior record
- Issues introduced after the last database update
The bottom line: SCA is a necessary first check, but it only catches problems that someone else already found and reported. It is not a complete clearance. Think of it as a background check — valuable, but not exhaustive.
Step 2: Review the Code — With Human Eyes or AI Assistance
SCA tools work from databases of known issues. A manual code review looks for problems that have never been catalogued before: backdoors, unexpected network callbacks, and accidentally introduced security flaws.
For most small businesses, reviewing thousands of lines of model code directly is not realistic. But there is a practical shortcut: use an AI-assisted code review tool to do the initial pass, then have a human review the flagged sections.
What to look for during code review
- Backdoors — hidden code paths that allow unauthorized access
- Unexpected network callbacks — code that contacts external servers without a clear business reason
- Accidental vulnerabilities — bugs that are not intentional but still exploitable
A risk you may not have considered
If you use an AI tool to review model code, be aware of prompt injection via code comments. An attacker could embed malicious instructions inside the comments of the source code, designed to mislead an AI reviewer into missing or approving dangerous sections. A human reviewer would likely spot this immediately — which is why human review of flagged sections remains important even when using AI assistance.
Step 3: Investigate Model and Data Provenance
Even if a model has no known vulnerabilities and passes code review, there is still a subtler question: where did the training data come from, and who built this?
This matters more than many business owners realize. Security researchers have documented cases where developers intentionally deployed models trained on data from geopolitically sensitive sources — not through a cyberattack, but simply because they wanted access to the 'latest and greatest' model without checking its origins.
A locally-deployed open source model from certain sources may still have been designed to:
- Report usage patterns back to the originating organization
- Exhibit biases aligned with the goals of its original developers
- Carry intellectual property encumbrances that create legal risk
Questions to ask about any open source AI model
- Who trained it? Is the organization transparent about their identity and funding?
- What data was it trained on? Is the training dataset documented and independently verified?
- Is the model card complete? Reputable models publish a 'model card' documenting training methodology, known limitations, and intended use cases. Missing or vague model cards are a warning sign.
- Are there geopolitical considerations? Your legal and compliance team may have views on models originating from certain jurisdictions.
This is not about excluding any particular country's technology categorically — it is about making an informed decision with your eyes open, rather than discovering the origin after deployment.
Step 4: Test the Model Before You Go Live
No amount of code inspection tells you with certainty how a model will behave once it is processing real business data under real conditions. Pre-deployment testing is the only way to validate that a model actually does what you need it to do.
The key is to define your success criteria before you run the tests — not after. That way you are measuring against a fixed standard, not adjusting the goalposts.
What to test for
- Adherence to business requirements — Does the model produce the outputs your workflow requires, consistently?
- Unintended bias — Does the model treat different types of inputs (customers, products, scenarios) fairly and consistently?
- Data poisoning effects — If the model was intentionally corrupted during training, does it behave erratically in ways your tests would catch?
Setting performance thresholds
Machine learning practitioners use metrics like F1 score to measure how accurately a classification model performs. You do not need to understand the math, but you do need to agree in advance with your technical partner: 'This model must achieve at least X score on these test cases before we deploy.' If the model cannot meet that threshold, you do not deploy it — regardless of how impressive it looked in demos.
Step 5: Put Change Control Procedures in Place
Most small businesses using open source AI are not deploying a model once and leaving it forever. AI moves fast, and your developers or technical partners will want to update to newer models as they become available.
Without a defined process for evaluating and approving model changes, each update is a new security event that bypasses all the steps above.
Change control means establishing clear rules for:
- How model updates are proposed and reviewed
- Who approves a new model before it goes into production
- What documentation is required for each model in use (a running asset inventory)
- What happens if a model fails in production — who decides to roll back, and how fast
Organizations following the ISO 42001 standard for AI management systems include Annex A controls that address exactly these procedures. You do not need to be certified to ISO 42001 to benefit from its structure — the framework maps well to small business AI governance.
If your technical team is swapping models without a formal review process, you have a governance gap. The cost of closing that gap is low; the cost of discovering a problem after deployment is not.
A Simple Vetting Checklist
Before deploying any open source AI model, work through this checklist with your technology partner:
- [ ] SCA scan completed — No known vulnerabilities in model code or dependencies
- [ ] Code review completed — No backdoors, unexpected callbacks, or flagged behaviors found
- [ ] Provenance documented — Training data source and model origin are known and acceptable
- [ ] Performance tested — Model meets predefined accuracy thresholds on your test cases
- [ ] Bias assessment done — Model behaves consistently and fairly across relevant input types
- [ ] Change control in place — A documented process exists for approving future model updates
If any item on this list cannot be completed, that is a decision point — not necessarily a blocker, but something your team should explicitly accept or resolve before going live.
Open Source AI Is Worth Doing Right
The businesses getting the most value from open source AI are not the ones moving fastest. They are the ones moving deliberately. A model that saves you money on API costs but introduces a data breach or a regulatory problem costs you far more in the end.
The five steps in this guide do not require a large security team. They require a technical partner who takes the evaluation seriously, and a business owner who asks the right questions before saying yes.
If you are evaluating whether open source AI is right for your business — or assessing a proposal from a technology partner — we are glad to help you think it through.