Generative AI for Genealogy – Part IX

Redaction

So how do we prevent leakage?

Let’s take:

“Where was Jesus born?” and turn it into: “Where was Zebedee born?”

Who’s Zebedee? A placeholder. A decoy. A sacrificial genealogical goat.

Here’s the flow:

  • User asks: “Where was Zebedee born?”
  • LLM replies: “get-data:Zebedee”
  • App returns the data, but with Zebedee substituted
  • LLM answers: “Zebedee was born in Jerusalem”
  • App swaps Zebedee back to Jesus

This is essentially a CASB — a Cloud Access Security Broker. I didn’t invent it.

Sadly.

Outbound: Jesus → Zebedee

Inbound: Zebedee → Jesus

If your real name is Zebedee, I apologise. At least you’re not one of the three children named “Abcde”. (See my basketball post.)

Do we even need Zebedee? Not sure yet. It depends on how the LLM behaves.

If you ask “How old is $?”

I doubt it will return get-age:$

But if it does, we’ll return { data } with the name $.

Before presenting to the user, we insert the name.

So how do we implement this?

Glad you asked.

Using Llama, bizarrely enough.

Prompt:

Extract entities from this sentence: "Dave Smith lives in Oxford"

The entities mentioned in the sentence are:
1. Dave Smith (person)
2. Oxford (location)

Output:

  • Dave Smith (person)
  • Oxford (location)

Perfect. Now we know what to redact.

Or we can go further:

Extract entities from this sentence: "Dave Smith lives in Oxford". Rewrite the sentence replacing the entities with "*"

Output:

* lives in *

So the plan is:

  1. Extract entities
  2. Replace them with placeholders
  3. Send the redacted version to the cloud LLM
  4. Reverse the substitution on the way back

More on this when I build the AI CASB component. If you want to get ahead, read up on CASBs or find someone who’s already written one and “borrow” their code.

Next up: Code vs. Tools in Generative AI for Genealogy – Part X

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *