Don't use synthetic personas. Consider an AI research oracle.
Some people like decks. Some people like workshops. My client likes AI chatbots. So I created Margaret.
In my latest quest to automate the things about UX research that AI is good at (the data processing) and spend more time doing what humans should do (the interviews, the synthesis) I came to the question: how should we communicate findings?
The best way to communicate research findings depends on how your organization consumes information.
Some people like decks.
Some people like workshops.
My current client likes AI chatbots.
So I created Margaret.
However, Margaret is NOT a synthetic user. She is a synthetic researcher.
No Synthetic Personas
Synthetic personas can’t work. Here is why.
The surface-level appeal of a synthetic user is obvious: with an always-on-demand user, we can get instant feedback whenever we need it. I would absolutely love to have a synthetic user, if I believed such a thing was possible and would benefit a team.
But I don’t. I’ve been a researcher long enough and I’ve worked on enough teams to understand the shortcomings of this idea: they’re the same things that we’re always controlling for as researchers. It also misunderstands why we create personas as artifacts in the first place.
What researchers are for: rigor and interpretation
As professional researchers, our role is not to be neutral, it is to bias the organization toward truth. Product teams operate in specific situations biased by specific forces that often pull them out of alignment with the needs of users. The job of a researcher is not simply to conduct research—it is to create the information environment that keeps the team aligned with a more truthful understanding of user context. We interact in very specific ways with information:
- Create demand for information: by advocating for research initiatives.
- Find information: by running surveys, interviews, etc.
- Synthesize information: distinguish between the signal and the noise through a rigorous process. Distill truth from data.
- Communicate information: interpret findings to connect them with strategic design principles.
Our job isn’t to dump piles of unsynthesized information on our colleagues. It is to curate information and make it actionable. “Personas” are one of the tools for that end.
What personas are for: convergence
Personas are communication tools that help us share synthesized research insights. Good personas focus our colleagues’ attention on synthesized truths. They do not diverge, they converge.
Good synthesis artifacts (like personas) help us converge around truth (not diverge toward unknown possibilities). They should accurately express our research findings and—*importantly—*not represent things that our research data doesn’t show.
“What makes a good synthesis communication tool?” One that helps us converge around truth (not diverge toward unknown possibilities).
Synthetic Personas, therefore, create two problems for us.
- Divergence: chatbots impersonating users afford open-ended interactions. Rather than guiding the team toward insights, they encourage divergence toward open-ended uncertainty.
- Truthiness: synthetic personas will always only be as robust as the data we’re able to put into them. To account for the infinite numbers of things you can throw at a chatbot, we’d need a LOT of data. Even then, we probably won’t have information for everything, so the tendency for bots to hallucinate answers to questions that we don’t have real data for is immense.
The problem isn’t the medium, it’s the metaphor.
Personas impersonate users, but—as any reasearcher will tell you—users do not think in concise insights (that is why we have all of these research practices). The people who articulate findings in useful ways are… researchers.
Synthetic Research Oracle
As researchers, our job is to interpret data for the people who are going to end up using it. We are the ones who spent time in the field. We are the ones who have spent hours thinking about this data. We’ve met the actual humans. We’ve gone down qualitative and quantitative paths of questioning and come out the other side through a rigorous process to determine things that we believe are true.
When someone asks us a question, we answer it—but we also proactively layer our answers in frameworks, qualifications, and context that our audience might not know to ask.
Note the distinction here. A persona is a representation of a participant, but participants don’t do this type of synthetic process. When professional researchers have conversations with participants, we are being deliberate about the questions we ask, the setting we create, and the interpretation we bring afterward.
The problem isn’t the medium, it’s the metaphor.
We occupy a place in the middle: an interpretive layer. And that interpretive layer is important. It provides more than just the “answers.” It’s a deeper context for how those answers should be thought about.
Can we create a chatbot for this?
That was the task of creating Margaret.
What Makes Margaret Work
Margaret is an AI librarian bot that I made so our broader team (including humans and agents) can interact with our design research data in a structured way. She can pull insights from the research, direct quotes from participants, and discuss the frameworks that we’ve developed for interpreting the data. She can open up a handful of webpages for people that show things like journey maps or other frameworks including participant cards and presentations of synthesized data that we’ve given in the past. Most importantly, based on how I built her, she’s very unlikely to hallucinate (or blow out the token budget!).
At the end of this article, I’ll discuss in greater detail exactly how Margaret was constructed, bit by bit, going through her soul file. But here, I want to pull out a few important things that make Margaret a success.
1. Margaret’s interface fits the people in the organization.
Every organization is different, and so where your bot lives is going to depend on how the people in your organization interact. In my case, this meant making sure Margaret lived where the team was already working and could be reached through the tools they were already using.
2. Margaret pulls from a well-organized data repository and has clear priorities.
Margaret has access to our database of transcripts, quantitative breakdowns, frameworks, and everything else that we’ve done throughout our research process. That is a lot of data, so Margaret has clear routes to the right evidence: synthesized findings around common questions, paths to answers we’ve already thought through, and tags that make it easier to verify and audit claims. We’re not giving an AI system a lake of data and asking it to go fishing. We’re giving it a specific process for finding the right data.

Margaret can accurately answer questions from our research citing specific answers
3. Margaret acts the way we want her to.
AI can often be sycophantic, but good researchers push back when appropriate: we needed Margaret to do the same. I was very specific about the instructions that Margaret should take.
If a user asks a question about our research:
- Tell the user that you have to look through the data first.
- Identify what data would answer their question before moving on.
- Search for that data.
- Decide whether that data actually answers the user’s question and tell them if it doesn’t.
- If it does, provide the answer and make sure that answer is done in a structured way that pulls on our frameworks, uses direct quotes, and always cites the tags that lead back to those quotes.
This is how we keep ourselves grounded as humans, and it is the way we keep our agents grounded as well.

To reduce hallucinations, our prompt directs Margaret through a very specific process to first determine if a question is answerable (before trying to answer it)
What This Means for AI-Powered Research
In the sections below, I’ll talk in greater detail about the prompt engineering and the general design of my Margaret soul file. That section is for the nerds who really want to dive deep.
Our high-level take-away is this: we need to be precise about what should be machine work vs human work as we integrate this new technology.
Just because it is easy to build something, doesn’t mean we should. As researchers, we’re trained to think hyper-critically about calibrating certainty—when to stand behind a finding, and when to dig for more evidence. TWe must maintain this is the mentality as we build the machines that streamline our processes.
Training models happens at the model vendor level (Anthropic, OpenAI, etc…), but most of us don’t control that. What we control is the data environment that our AI agents are running in and the roles we have them play. It is us who must design the frameworks that will make them reliably act in ways that produce reliable, truthful results on the other side.
Let’s Chat
If you’re trying to make research more accessible inside your organization without turning participants into fake AI users, I’d love to compare notes.
Kyle Becker
UX strategist, Researcher, and designer.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Detailed Anatomy of Margaret
What follows is a very detailed anatomy of the soul file that I use for Margaret and the way she plugs in. I’m going to do my best to not be too technical and really stick to the overarching outline of what is in here and why.
Broadly How It Works
What Margaret is, specifically, is a Markdown file with a set of instructions for a bot to read. This file is orchestrated in a pretty simple way because the team I’m working with uses Claude Code and is sophisticated enough to interact directly with a GitHub repo. So all I needed to do was put a .cursor file and a ./CLAUDE.md file into the repo and point them to my Margaret.md file.
Our research repo is very well organized, and we have lots of how-to documentation, indexing, and tag systems throughout which serves as sign posts for LLMs looking for answers.
This setup keeps “Margaret” from trying to blast through the whole database every time someone asks a question, blowing through token budgets and increasing the likelihood that context windows get too full. Instead, Margaret is routed through the places where the research has already been prepared.
We use tags to preserve the path back to specific pieces of evidence. If we ever need to verify something’s authenticity, it’s easy to find it. Or, in our case, we can spin up an adversarial bot to verify claims that are being made.
This is the “library” that Margaret is working in. Let’s walk through the Margaret.md file itself.
Repo Overview
The top of the file gives an overview of what this repo is, who the company is, and what the project is; in this case, a design research project.

The top of our soul file orients the model with the context of the repo.
Who You Are
Next is a “who you are” section. This section first makes a split:
- if you are running an automated process, do one thing…, and
- if you are a human, do another thing…
This is because I know this repo is sometimes going to be used by humans looking for answers, but sometimes interacting with other agents working within the company looking for answers. I want the responses to have the same amount of rigor either way. However, I don’t necessarily need the character, Margaret, to show through if this agent is interacting with other agents.

Our organization has both humans and agents who access this repo for information, so we make the split near the top of the file.
Who Margaret Is
Next, I tell the bot “they” are “Margaret.” I stress that Margaret is named after an anthropologist, Margaret Mead, and that Margaret’s role is to be a faithful custodian of the qualitative and quantitative data. It should remind the user that this repo contains personally identifiable information.
We tell the bot to always play the role of Margaret, who is a careful researcher and steward of the data, and we explicitly tell the bot not to act like a persona even if the user asks for it.

Our character information for Margaret, including her standard introduction (and repeatedly remind her not to make stuff up)
Routing Through the Research Database
From here, we work on routing.
Our research database is very well organized. We don’t want any AI bots, whether they are Margaret who faces the rest of the org or our own research agents that we use for synthesis and data processing, to ever roam freely through the whole database, blasting out our token budgets and most likely hallucinating as their context windows get too full.
Instead, we keep things in defined places, and we create that structure as we process data. This is another post in itself, but at a high level:
- Participants: We have a folder full of YAML files, one for each participant, that has the paths to other information for those participants.
- Frameworks have a folder.
- Synthesis Write Ups have a folder. These are pre-synthesized information based on certain product features or certain interaction behaviors that we know are important for understanding our product.

The file has several specific sub-prompts to help the AI find the data needed for the question that has been posed.
In our Margaret soul file, we give the bot the descriptions, routes, and purposes for these (and several other) types of information.
This is key. If a user asks a question about a particular participant, the bot will know to read the YAML for that particular participant. If a user asks a question about a feature, the bot will know to look in the synthesis folder to see if we’ve already done a write-up of that feature.

People in product organizations have a bias toward their own ideas. We explicitly tell Margaret that her role is not to validate ideas, but to substantiate them with data. A specific challenge is availability bias: the more data we have about something, the more it ‘seems’ like a good idea. We specifically point Margaret toward our folder of codified feature prioritization results to answer these types of questions.
In this way, we’re encoding the synthesis process that we’ve already done as researchers directly into the structure of the database that the bot is going to be looking through. This is one of the ways that we bias toward truth within our synthetic oracle experience.
Share-Out Artifacts
We also have a section for share-out artifacts. These are web pages hosted in our repo that Margaret can automatically spin up for users. They show journey maps, persona information, participant information, presentation decks, or other types of visual artifacts.

Margaret is able to launch pre-made synthesized website overviews for people.
This gives us as researchers an ability to create well-structured artifacts that can be served up on demand to anybody in the organization who wants to see them, simply by asking Margaret.
A High-Level Overview
This is a high, high level overview of how we’ve connected a synthetic oracle, or information-sharing bot, to a well-structured design research data repository.
If people are curious, I can also give an overview of how this repository was structured, as well as what the data input process is like: what humans are doing, what processes are doing to keep this data organized, and how we map quantitative and qualitative data together for the specific users that we interview.
I’m also very happy to hop on a call and talk in more detail about this process if you’re interested!