How WikiPath works
What actually runs when you press the button — the search, the grounding, and the number, written out so you can check our work.
The question we answer
WikiPath answers one stubborn question: how is anything connected to anything? Type two things — a pharaoh and a pop star, your hometown and a deep-sea fish — and we chart the shortest chain of Wikipedia links between them, tell you the grounded story of how they connect, and stamp it with a Surprise Score for how unlikely that closeness really is.
It is a surprise engine, not a search engine. A search engine hands you the thing you asked for. We hand you the route — the genuinely shortest path from one article to another across the whole of English Wikipedia, and an honest measure of whether that nearness should astonish you or bore you. This page is the field manual: every method below is what actually runs when you press the button.
How we find the route
Wikipedia is a graph — roughly 7.19 million articles, each a node, every blue link an edge. We hold that whole graph in memory as a compact adjacency structure, which is why a path that sounds impossible resolves in under ten milliseconds.
To find the shortest chain we run a bidirectional breadth-first search: one wave fans out from your first thing, another from your second, and we declare victory the instant the two waves touch. Searching from both ends meets in the middle and explores a tiny fraction of what a one-directional crawl would — that is the trick that keeps it fast.
There is usually more than one shortest route of the same length, and here we make a deliberate choice. We route through prose first: we prefer chains where every hop is a real sentence in the source article — an actual described relationship — over chains that merely share a navigation box or a category. And we steer away from the giant hub articles. A lazy path can launder almost any two topics through United States or World War II; those hops are technically real and tell you nothing. So we penalise routes that lean on mega-hubs and favour the ones through the most telling, least obvious stops. The shortest path that means something, not the shortest path that cheats.
What “grounded” means
Every connection we show is anchored to evidence, and the rule we hold ourselves to is truth, not source. For most hops we pull the actual sentence from the source article that creates the link — the real prose, not a templated “X links to Y.” A storyteller then narrates the chain and is free to add genuinely-known true context to make it land, but it can never invent a connection that isn't in the evidence, and never state a date, number, or fact it isn't sure of.
When a hop is joined only by a shared list or category with no described relationship, we say so plainly rather than fake a bond. And if the narrator ever wobbles, a deterministic grounded fallback takes over and reports the chain straight from the evidence. You should never read a sentence on WikiPath that the underlying Wikipedia text doesn't support. That is the whole promise.
How the Surprise Score is computed
Not all connections are surprising. Two footballers one click apart is obvious; a pharaoh and a pop star four clicks apart is not. The Surprise Score (0–100) is our attempt to put a number on that gut feeling, and it weighs three things:
- Semantic distance — how far apart the two things
feel. We compare the meaning of the two articles using sentence
embeddings (the
mxbai-embed-large-v1model); topics from distant worlds score high. - Path length — how few clicks separate them. Distant things being far apart is expected, so a longer chain gently lowers the score; a short chain between unlike things sends it up.
- Obscurity — whether the route earned it. A path through rare, specific articles scores higher than one that coasts through mega-hubs, measured relative to the graph's own link structure — so “a hub” means a hub by Wikipedia's standards, not a hard-coded list.
We fold those together — roughly semantic distance, divided by path length, times obscurity — and pass the result through a curve that lands it on a clean 0–100. Then we hand it a plain-English band so the number means something at a glance:
- Are you kidding me? 90–100
- Wildly unexpected 78–89
- Genuinely surprising 62–77
- Didn't see that coming 45–61
- A bit of a stretch 28–44
- Pretty much expected 0–27
Where it all comes from
WikiPath is built on Wikipedia, and we owe it everything. Article text is available under the Creative Commons Attribution-ShareAlike 4.0 licence (CC BY-SA 4.0); the link graph, the prose we ground our stories in, and the meanings we measure all trace back to the encyclopedia and its editors.
We serve from a precomputed offline corpus — a frozen snapshot of the graph, the link sentences, and the embeddings. Nothing in your search hits live Wikipedia at the moment you press the button, which is what lets a cross-continental path resolve instantly. When Wikipedia changes, we rebuild the snapshot. The map is ours; the territory will always be theirs.
- 01 How does WikiPath find the shortest path between two Wikipedia articles?
- WikiPath runs a bidirectional breadth-first search over the full English Wikipedia link graph — about 7.19 million articles held in memory. Two search waves fan out from each endpoint and meet in the middle, returning the genuinely shortest chain of links in under ten milliseconds.
- 02 Why does WikiPath avoid routing through hub articles?
- Giant hubs like a major country or a world war link to almost everything, so a path through them is technically valid but tells you nothing. WikiPath penalises routes that lean on these mega-hubs and favours chains through specific, telling articles — the shortest path that actually means something.
- 03 What does it mean for a WikiPath connection to be grounded?
- Grounded means every hop is anchored to the real sentence in the source article that creates the link, not a templated phrase. The story can add true context but never invents a connection. If the narrator wobbles, a deterministic fallback reports the chain straight from the evidence. The rule is truth, not source.
- 04 What is the Surprise Score and what does it measure?
- The Surprise Score is a 0 to 100 rating of how unlikely it is that two things sit so few clicks apart. It weighs semantic distance (how far apart they feel), path length (how few clicks separate them), and obscurity (whether the route avoided lazy hubs), then maps the result onto a clean 0 to 100 score with a plain-English band.
- 05 How is the Surprise Score calculated?
- Roughly: semantic distance divided by path length, multiplied by route obscurity, then passed through a curve onto 0 to 100. Semantic distance is the meaning-gap between the two articles measured with mxbai-embed-large-v1 sentence embeddings; obscurity is judged relative to Wikipedia's own link structure, not a fixed list.
- 06 Where does WikiPath data come from?
- WikiPath is built on Wikipedia. Article text is available under the Creative Commons Attribution-ShareAlike 4.0 licence (CC BY-SA 4.0). We serve from a precomputed offline snapshot of the link graph, link sentences, and embeddings, so searches never hit live Wikipedia and resolve instantly.