Is Your Data Safe When You Use AI Tools?
Artificial intelligence tools are changing how small businesses operate -- from drafting emails to answering customer calls to organizing internal knowledge. But every time you send a prompt to an AI service, upload a document, or build a custom AI model, your data travels somewhere. And not every AI vendor treats that data the same way.
This guide breaks down exactly how your confidential business information can be exposed when you use AI-as-a-service platforms -- and what you can do about it before a costly mistake happens.
Key insight: Amazon lost over $1.4 million after employees unknowingly fed confidential business data into a commercial AI tool that used customer inputs to train its models. This is not a hypothetical risk.
The Three Levels of AI Data Exposure
There is a practical framework -- developed by AI security researchers -- that organizes AI data risk into three levels, based on how deeply your data becomes embedded in a vendor's systems. Each level carries a different risk profile, and each requires a different response.
Think of these levels like layers of commitment. At Level 1, you are renting a conversation. At Level 3, you may be permanently depositing your most sensitive business knowledge with a third party.
Level 1: Sending Prompts via the API
This is the most common way businesses interact with AI tools today. Every time you send a message to ChatGPT, Claude, or Microsoft Copilot -- or when your software calls one of these services in the background -- you are submitting a prompt to a vendor's server.
What actually happens to your prompt?
The good news: the major AI providers -- OpenAI, Microsoft, Anthropic, and Google -- do not use API-submitted prompts to retrain their base models by default. Your input does not automatically make its way into the next version of the AI.
The more nuanced reality: your prompts and the AI's responses are still retained on the vendor's servers for a period of time. That retention window varies by provider and plan. During that window, your data exists on infrastructure you do not control.
What this means for your business
- If an employee pastes a customer contract, a financial report, or patient health information into a chat prompt, that content lives on a vendor's server -- even if it is never used for training.
- A data breach at the vendor, a misconfigured access control, or a legal subpoena could expose that content.
- For businesses in healthcare, financial services, or legal services, this is not a theoretical concern -- it is a compliance matter.
What you can do
- Establish a clear policy on what types of information employees are permitted to paste into AI tools. Customer PII, financial data, and trade secrets should be off-limits without a formal review.
- Ask your AI vendor about Zero Data Retention (ZDR) options. These plans -- available from several major providers for enterprise and regulated-industry customers -- prevent prompts and responses from being stored after the session ends. They typically cost more, but the cost of a data breach is higher.
- Audit your existing tools. If your business already uses AI-powered software (scheduling tools, CRM assistants, email drafting aids), find out which underlying AI provider powers those tools and what their data retention policy is.
Level 2: Using RAG (Retrieval-Augmented Generation)
Retrieval-Augmented Generation, or RAG, sounds technical -- but the concept is straightforward. Instead of just answering from general knowledge, the AI retrieves specific documents or data from your own files before responding. This is what powers AI tools that can "answer questions about your business" or "search your knowledge base."
RAG is genuinely useful. It lets you build AI assistants that know your products, your policies, and your customers. But where your data lives during that process matters enormously.
Two very different implementation paths
Path 1 -- You control the data (lower risk)
Using tools like LangChain (a popular development framework), your documents stay in your own storage -- on your server, your cloud account, or your own database. Only the relevant excerpts are sent to the AI at query time. The vendor sees snippets, not your entire knowledge base.
Path 2 -- The vendor controls the data (higher risk)
Services like OpenAI Assistants allow you to upload documents directly into OpenAI's infrastructure. The AI can then search and retrieve from those documents. This is convenient -- but it means all of your uploaded context data is stored with OpenAI. Crucially, OpenAI does not offer Zero Data Retention for Assistants queries. Once that data is there, you have limited control over how long it stays or how it is protected.
What you can do
- Before uploading documents to any AI assistant platform, read the vendor's data retention policy for that specific product. Retention policies for chat-based prompts (Level 1) and document-upload features (Level 2) are often different -- and the Level 2 policies are frequently less favorable.
- Prefer self-hosted or developer-controlled RAG implementations when handling sensitive documents. This requires technical setup, but keeps your data in your own environment.
- Never upload documents containing PII, protected health information (PHI), or financial records to a vendor-hosted AI assistant without first confirming your data governance obligations and the vendor's specific terms for that feature.
Level 3: Fine-Tuning a Custom AI Model
Fine-tuning is the deepest level of AI customization. It means taking a general-purpose AI model and training it further on your specific data -- your past customer conversations, your internal documentation, your proprietary workflows -- so it behaves in a way that is uniquely suited to your business.
This capability is increasingly accessible. Platforms like OpenAI, AWS Bedrock, and Azure OpenAI offer fine-tuning services for businesses without requiring AI engineering expertise. The results can be impressive. But the data risk is also at its most significant here.
What happens to your fine-tuning data?
When you fine-tune a model using a third-party platform, you are uploading your training dataset to that vendor. They use it to modify the model. The resulting custom model is also stored with the vendor. This creates two assets outside your direct control: the data you trained on and the model that learned from it.
Vendor policies differ significantly -- and the differences matter:
- OpenAI: Training data retained indefinitely, until manually deleted. You must take explicit action to remove it.
- AWS Bedrock: Does not train base models on your data. Your data is not used to improve Amazon's models.
- Azure OpenAI: Auto-deletes inactive deployments after 15 days. Shorter window, but still a window.
Important distinction: "Does not train base models on your data" (AWS Bedrock's policy) is not the same as "does not retain your data." Always ask both questions: (1) Will you use my data to improve your models? (2) How long will you retain my data and the fine-tuned model?
The compounding risk when fine-tuning and RAG are combined
Many sophisticated AI deployments use both RAG and fine-tuning at the same time -- a fine-tuned model that also retrieves documents at query time. This is powerful, but it stacks the risk from both Level 2 and Level 3. If vendor policies are not carefully reviewed for each layer, a business can end up with proprietary data exposed through multiple pathways simultaneously.
What you can do
- Treat fine-tuning data selection as a legal decision, not just a technical one. Before assembling your training dataset, review what data it contains and whether you have the right to share it with a third party under your existing contracts and privacy obligations.
- Document which fine-tuned models exist, where they are hosted, and what data they were trained on. This is basic AI governance -- and most small businesses have not done it yet.
- Explicitly confirm deletion. With OpenAI, your training data does not disappear when you stop using the model. You must actively delete it. Build a process to track and confirm deletion when you end a vendor relationship.
- Compare vendor policies before you start. Switching AI vendors mid-project is costly. Do the policy comparison at the vendor evaluation stage, not after your data is already uploaded.
A Practical Due Diligence Checklist
Before you or your team begins using any AI tool with sensitive business data, work through these questions:
For any AI tool (Level 1 -- Prompts)
- Does the vendor train its base models on our prompts by default?
- How long are our prompts and responses retained on vendor servers?
- Is Zero Data Retention available for our plan or use case?
- What is the vendor's policy in the event of a data breach?
For AI tools that access your documents (Level 2 -- RAG)
- Where are our uploaded documents stored -- on our infrastructure or the vendor's?
- Does the vendor offer ZDR for document retrieval queries?
- What is the retention policy for uploaded documents specifically?
- Can we delete uploaded documents on demand, and how?
For custom AI model development (Level 3 -- Fine-Tuning)
- Does the vendor use our training data to improve their base models?
- How long is our training dataset retained after fine-tuning is complete?
- How long is the resulting custom model retained?
- What does explicit deletion require, and does it fully remove both dataset and model?
- What are our obligations if the training data contains customer information?
Industries That Need to Pay the Most Attention
Every business that uses AI has some exposure. But certain industries face regulatory consequences -- not just financial ones -- if they mishandle data through AI platforms.
Healthcare: Any AI tool processing patient information must comply with HIPAA. This includes AI-powered scheduling assistants, documentation tools, and patient communication platforms. Your vendor's data practices must align with your Business Associate Agreement (BAA) obligations. Not every AI vendor will sign a BAA.
Financial services: Clients trust you with sensitive financial information. Accidentally exposing that data through an AI tool's retention policy could violate client confidentiality agreements and trigger regulatory scrutiny.
Legal services: Attorney-client privilege applies to the information your clients share with you. If that information passes through a third-party AI platform and is retained there, the privilege picture becomes complicated. Your bar association may have guidance -- or may not yet have caught up with the pace of AI adoption.
Any business with contracts that restrict data sharing: Review your vendor contracts and client agreements. Many contain provisions about where data can be processed or stored. AI tools can inadvertently put you in breach of those provisions.
What Good AI Governance Looks Like
You do not need a compliance department or a legal team to implement basic AI governance. You need a short set of written rules and a habit of asking the right questions before deploying new tools.
A simple AI governance policy for a small business covers:
- Which employees can use AI tools for what types of tasks
- What data categories are off-limits for AI tools without explicit approval (PII, financial records, health information, legal documents, trade secrets)
- Which AI tools are approved and which require review before use
- A vendor review requirement before adopting any new AI service that will process customer or business data
- A deletion and offboarding process for when you stop using an AI vendor
ISO 42001 -- the emerging international standard for AI management systems -- provides a more formal framework for organizations that want structured governance. For most small businesses, a one-page internal policy and a vendor checklist will get you most of the way there.
Next Steps
AI tools are not going away, and the right response to these risks is not to avoid AI. It is to adopt AI with clear eyes about where your data goes and what your vendors are doing with it.
Start by auditing the AI tools your business uses today. For each one, answer the questions in the checklist above. You may find that most of your current tools are fine -- or you may discover a policy gap worth closing before it becomes a problem.
If you are considering a significant AI deployment -- a custom voice agent, a document-intelligence system, or a fine-tuned workflow automation -- due diligence at the vendor selection stage is far less expensive than a data incident after the fact.
Our team works with small businesses to design AI systems that are effective and appropriately governed. If you are not sure where to start, a consultation is a good first step.