Impedance Matching LLMs and Linked Data

Published on 1970-01-01 by @danja

If you see this message, this post is only half-done. I know what's needed so it should be done in a couple of hours. I've published prematurely to check image linking

Unifying Large Language Models and Knowledge Graphs: A Roadmap

An Observation

Large Language Models are sizeable knowledgebases which, at least in part, encapsulate sentence-oriented data structures derived from human language. The Web is a massive knowledgebase which at a structural level, has embedded sentence-like data (clearly apparent when viewed from a Linked Data perspective). There isn't an obvious direct mapping between these systems, but they both feature shapes that look very similar from 1,000ft. However you look at it, the future potential of a combined system is...TBD. We are in a position to take (long-legged) baby steps in that direction.

A Problem

For the purposes here, have a loose, back-of-envelope working definition of 'knowledge' :

A collection of structured data that represents information, together with a means of navigating that information.

Navigation isn't usually something highlighted in these parts. But applied in a very broad sense, I reckon is useful, as I hope to show here. Leave notions of agency to one side to avoid the bigger tarpit around intelligence, biological or artificial.

A Particular Characterization of LLMs

Deep Learning systems somehow embody knowledge somehow derived from their training data. Forget their internals for now, consider them as black boxes with external interfaces, communication protocols.

A Particular Characterization of the Web

I believe 'Semantic Web' hit the Peak (of Inflated Expectations) in the Gartner Hype Cycle around 2001, the time of a certain Scientific American article (PDF). But lot has happened since then. Masses of work has been done by people working directly in the field. There's been significant deployment by people from every imaginable field using the associated technologies for practical applications. Most web developers will have seen something related in their peripheral vision, quite possibly used such things in their day job without realising it. But for various historical reasons the big picture isn't that widely known.

First, a wormhole-speed trip from the

LLM

The cat sat on the mat. A lot of mats are blue.

Q & A

That there's some common topology between these systems shouldn't be a surprise. Both are representations of knowledge with humans as the immediate source.

But there are low-level hacks that might offer approaches good enough for many practical applications.

A Potential Path to a Partial Solution

relevance Similarity overlays on the web