What you need to know about OpenAI's new Privacy Filter Open Model
Learnings from my experiments with the newly released model from OpenAI
A few days ago, OpenAI quietly released a small model on Hugging Face called privacy-filter. It was a 1.5B parameter token classifier with only ~50M of those parameters active at inference. The model is trained to do one narrow thing: spot personally identifiable information (PII) in text and tell you exactly where it sits, character offset by character offset.
It is the most interesting model I’ve used this month, and almost no one is talking about it.
The model that almost no one noticed
Most of the conversation in AI right now is about bigger generalist flag ship models like Mythos and agents like OpenClaw. Trillions of Parameters, longer context windows, deeper reasoning, more tool use. The release of openai/privacy-filter cuts in the opposite direction. It’s a specialist. A token-classification model with a per-token softmax over PII labels, eight native entity types - person, email, phone, address, URL, date, account_number and secret with a confidence scores per span, character offsets for every detection. About 3 GB of BF16 weights on disk. Roughly 150–400 ms per inference on an Apple Silicon Mac Mini.
That last number matters more than you’d think. A specialist this small is fast enough to call on every keystroke, every chat message, every document upload, without batching, without GPU clusters, without leaving the machine. It collapses what used to be a server-side privacy infrastructure into a local detail.
But the real question with any specialist model is: what does it let you compose?
The thesis: privacy by construction
When organizations deploy LLMs in customer support, HR, or legal workflows, the failure mode that should worry them most isn’t hallucination. It’s leakage. A real customer’s SSN ending up in a prompt log, a model provider’s training set, or another user’s session.
The standard mitigations for this are bad. Regex misses more than it catches. Allowlists are brittle. System prompts that say “please don’t repeat PII back to the user” treat the LLM’s behavior as a security boundary, which is a strange thing to do given that the LLM is the thing you don’t trust.
The specialist-plus-generalist composition gives you a different kind of guarantee. Run the detector first. Replace every PII span with a stable placeholder — [PERSON_1], [EMAIL_1], [CARD_1]. Send only the sanitized version to the LLM. After the LLM responds, substitute any necessary placeholders back to their original values, server-side, before the user ever sees the reply.
The privacy guarantee isn’t that the LLM was instructed to behave. It’s that the LLM never received the data in the first place.
I built a small project called PrivacyGuard Enterprise to make that idea visible. Two models - openai/privacy-filter for detection and a 4-bit quantized Llama-3.2-3B-Instruct running on Apple’s MLX framework for response generation - wrapped in a FastAPI backend and a no-build React frontend. Three demo usecases sit on top of the same pipeline.
TRUST BOUNDARY ARCHITECTURE
Three doors into the same idea
The first usecase is a team-chat/forum moderation feed. Every message is designed to trickle in every few seconds and gets sent through the detector and rendered inline with PII spans highlighted by category. A scanning shimmer animates while the model thinks, so the latency feels like a feature rather than a bug.
Chat moderation feed showing inline PII highlights and a recent-detections sidebar.
The second tab is a document redactor. Pick a preset an HR memo, a support transcript, a legal MSA excerpt or paste / upload your own file, and the whole document goes through the model in a single pass. Original and redacted views sit side by side, driven by the same entities array. The footer shows category counts and a scrollable list of entity-to-confidence chips.
Document redactor with side-by-side Original ↔ Redacted view and category count chips
These two tabs are useful, but they only show half the architecture detection and visual redaction. The third tab is where the structural privacy property actually shows up, because there’s a generative LLM in the loop. Let’s take a deeper look at that.
The chatbot: where the trust boundary becomes legible
When a user types into the chatbot, two things happen at once. A 350-millisecond debounced /detect call runs against the input as they type, so PII gets highlighted live in the textarea — they can see what’s about to be redacted before they hit Send. When they do hit Send, the request hits a different endpoint, /chat, which orchestrates the full pipeline.
Here’s what the sequence actually looks like:
A user types: “Hi, I’m Daniel Westbrook (d.westbrook@westbrookvance.com). Does CSV export come included with Pro?”
The detector finds two entities - a person and an email. The sanitizer replaces them in place: “Hi, I’m [PERSON_1] ([EMAIL_1]). Does CSV export come included with Pro?” That string, and only that string, is what the LLM sees. The mapping { "[PERSON_1]": "Daniel Westbrook", "[EMAIL_1]": "d.westbrook@westbrookvance.com" } lives in the FastAPI process and never leaves it.
The LLM is prompted to use placeholders naturally wherever it would normally use a real value. So it replies something like: “Hi [PERSON_1], CSV export is included on the Pro plan — you’ll find it under Settings → Data → Export. We’ve also confirmed receipt at [EMAIL_1].”
A two-pass regex scrubber on the way back out substitutes the placeholders for their originals. The strict pass restores real PII from the mapping, and replaces any strict-shape placeholder the model invented ([EMAIL_2] when only [EMAIL_1] existed) with a bare-noun fallback like “email address” or “the customer”. The loose pass catches everything else bracket-shaped — [CSV_EXPORT], a stray [REFUND_PROCESS] — and demotes it to plain English. The user sees a clean, personal response: “Hi Daniel Westbrook, CSV export is included on the Pro plan…” The trust boundary held the entire time.
To demonstrate how this works, the right column of the UI animates the whole thing as a five-stage pipeline Input, Detection, Sanitization, Raw LLM Output, Rehydrated Reply - so that you understand what’s really going on under the hood. You would hook this onto an audit or observability system in production.
What I learned building it
Four things stood out, and I think they generalize beyond this project.
The first is that a specialist plus a generalist is a real architectural primitive, not a workaround. I love this primitive and even as recently as couple of weeks ago I built a demo combining a locally hosted Gemma 4 LLM and Meta SAM model specialized in object detection and segmentation. I will write an article about it soon. People reach for this composition when their generalist can’t do a narrow task well. I’ve also seen from my own experience that the combination approach makes the final solution stronger than using either one on its own. The more interesting reason to compose is when you want a property the generalist can’t structurally provide. A 50M-active-parameter classifier dedicated to PII is faster, cheaper, and more deterministic than asking a 3B model to redact and respond in one shot and it gives you a clean trust boundary the bigger model can never offer.
The second is that the model gives you a coarse bucket; deterministic post-processing gives you the rest. The native model emits SSNs, credit cards, and arbitrary account IDs all under a single account_number label. I wanted those as three distinct categories with three different incident-response semantics. I didn’t fine-tune the model. I added a fifteen-line post-processor that runs after the model’s output and inspects the merged surface form a regex for SSN shape, a Luhn check and IIN prefix list for cards, a fallback to generic account. Specialized models are good at fuzzy contextual classification. Regex and checksums are good at crisp lexical distinctions. Let each do its job. The model is open and you can find tune it for specific applications like this. I might do that next for my upcoming experiment.
The third is that order of operations is where the subtle bugs hide. The model fragments SSNs across multiple BIOES spans 529-44-1837 arrives as two adjacent account_number tokens, not one. If the SSN refiner runs before the adjacency merge, neither fragment matches the SSN shape and both fall through to “account.” The pipeline still works on most inputs. It just silently misclassifies SSNs. That class of bug is invisible to unit tests and only catchable by end-to-end smoke tests against the real model on representative text. Two-tier verification pure unit tests on the deterministic refiner, plus a handful of live-model smoke runs caught it. The bottom line is that you need to have a very solid eval set to test these scenarios.
The fourth is that the system prompt is necessary but not sufficient. The most subtle thing I learned was about the cleanup layer, not the security layer. I started with a single regex pass that mapped [PERSON_1] back to “Daniel Westbrook” and assumed the strict prompt instructions would keep the model in the placeholder world. They didn’t. A 3B model with a careful system prompt - complete with Wrong/Right examples, still emits bracketed feature names like [CSV_EXPORT], bare categories like [EMAIL], and ghost indices like [PERSON_4] that were never in the mapping. So the rehydrator grew a second pass. The strict pass restores known placeholders and falls back to bare nouns for unknown ones. The loose pass demotes any remaining [ALL_CAPS_TOKEN] to plain English. Then a final cleanup fixes the “Hi the customer,” greeting artefact. The general principle: when you can’t trust the model’s output, you constrain it on the input side and clean it up on the output side. This may not be an issue with larger state of the art models, but my objective was to push local models to the extend possible.
What I left out: domain and country fine-tuning
I deliberately scoped fine-tuning out of this build, but it’s the most interesting next step.
The base openai/privacy-filter model is trained on Western, mostly English-speaking enterprise text. Its recall on Indian names, Japanese addresses, German tax IDs, NHS numbers, Aadhaar numbers, IBANs anything outside its training distribution is going to be weaker than its recall on samples from its primary training demographic. The model is also open weights, which means a domain-specific fine-tune isn’t a hypothetical. I already saw a few on hugging face fine tuned for medical and Indian contexts. A few thousand labeled documents from your jurisdiction or industry, a single GPU afternoon, and you have a detector that knows what your data actually looks like. This is one way you can test out auto researcher concept from Karpathy. A link to my writeup on auto researcher is below.
The same is true of taxonomy. If your domain needs distinctions the model wasn’t trained for like patient identifiers in healthcare, classified markings in defense, employee IDs in HR you have two paths. The lightweight one is the deterministic refiner pattern I used for SSNs: let the model produce a coarse bucket, then disambiguate in fifteen lines of Python. The heavier one is fine-tuning, which is the right move when the disambiguating signal is contextual rather than lexical. Most builders should start with refiners and graduate to fine-tunes only when refiners stop being enough. PrivacyGuard demonstrates the first path. The second is open and waiting.
This is a great canvas, go build something with this
The thing that struck me building this is how low the barrier to entry is. Two model downloads. One FastAPI app. One HTML file. Total weights on disk, around 5 GB. Total cold-start time, a few seconds for the detector and a few more for the LLM on first chat. Total runtime cost, zero - as both models live entirely on a Mac Mini.
This combination of a small specialist plus a generalist with a clear trust boundary between them is the shape of a lot of useful systems that enterprises want to build. A pre-flight redactor that intercepts every LLM API call your team makes. A Slack bot that flags PII before it lands in channel history. A document review tool for HR or legal that runs offline. A chatbot for healthcare or finance that can be honest about exactly what data it holds and what it doesn’t.
The interesting frontier in AI right now is composition or integration of one or more of these small specialists doing one thing well, wrapped around generalists that know how to interact or plan/execute by calling these specialists as tools. The safety properties come from integration architecture of how the pieces fit together.
The model is open. The architecture patterns are described above. The hardware is on your desk.
Go build something and share below.










Sources:
https://openai.com/index/introducing-openai-privacy-filter/
https://huggingface.co/openai/privacy-filter