Generative AI for Genealogy – Part II

Harvesting Phrases (yes, literally)

The clever bit in all this isn’t the LLM at all, it’s what happens after the LLM makes a decision. Whenever the system encounters a new phrase that seems to mean “switch GEDCOM,” it doesn’t just shrug and move on. It harvests it.

And I mean that quite literally.

At start up, the app sends any newly discovered phrase back to the server, where it can be added to the global list. Over time, as more users ask for file switches in their own wonderfully unpredictable ways, the system becomes smarter. Eventually, it will recognise so many alternative phrasings that we won’t need the LLM for this task at all.

A self‑improving phrase collector. A tiny evolutionary ecosystem. Darwin would be proud.

Default Switch Phrases

We start with a seed list – the obvious, sensible, predictable ways a human might ask to load a different GEDCOM:

  • change file
  • change gedcom
  • different file
  • load new gedcom
  • load different file
  • load gedcom
  • load another file
  • new gedcom
  • open different file
  • switch to different file
  • switch file
  • use another file

These are the baseline. The “starter pack.” The linguistic equivalent of a tomato plant you buy from the garden centre before you start growing your own.

Removing the Polite Fluff

Before matching anything, we strip out the polite filler words – the verbal bubble‑wrap humans add when talking to machines. I’m always polite to LLMs myself, because if you’re rude to them long enough, you’ll eventually forget how to behave around actual humans.

But for intent detection, politeness is just noise.

So we remove:

  • please
  • could you
  • would you
  • can you
  • i would like to
  • i want to
  • let’s
  • let us
  • i’d like to
  • i need to
  • “ the ” (special case – quoted to avoid accidental partial matches)

This means:

“please could you change the file” → “change file”

Much cleaner. Much easier to match. Much less likely to confuse a tiny 1B‑parameter model that’s doing its best.

Collapsing Variations

Next, we reduce phrase variation even further by collapsing certain patterns:

  • “ a new ” → “ new”
  • “ a different ” → “different ”
  • “new file” → “different file”

This normalisation step dramatically reduces the number of unique phrases we need to handle. It’s like tidying a messy room: suddenly everything is easier to find.

The Llama Prompt (the tiny but mighty classifier)

I don’t plan to share all my prompts – some things must remain mysterious, like the Colonel’s secret herbs and spices, but this one is small enough to show without giving away the farm.

This is the prompt for Llama 3.2 1B Instruct, the tiny model that decides whether the user is asking to load a new file:

You must reply with whether the user is asking to load a new file (gedcom) or not.
If the user appears to want to load a new file, reply "[YES]", else reply "[NO]".
Do not attempt to answer the actual question.

Example - "new gedcom" reply "[YES]"
Example - "new file" reply "[YES]"
Example - "load" reply "[YES]"
Example - "different file" reply "[YES]"
Example - "who is the pope?" reply "[NO]"
Example - "where was bart born?" reply "[NO]"

And remarkably, it works. Not perfectly – nothing in LLM‑land is perfect, but reliably enough for a prototype.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *