llama_index SPARQL Notes 01

Published on 2023-08-28 by @danja

FOR TEMP CHANGES, UNMODIFIED FILES ARE IN ../original

Today's task : as yesterday.

It's occurred to me that it would be helpful to look at the data that nebulagraph.py actually looks at - pop a few logging calls in there.

(Maybe also probe - https://github.com/vesoft-inc/nebula-python )

Before that, need to have python looking at my llama_index tree. PYTHON_PATH is empty so -

python

>>>import sys
>>> print(sys.path)
>>> ['', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '/home/danny/.local/lib/python3.11/site-packages', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.11/dist-packages']

Ok, found it at :

~/.local/lib/python3.11/site-packages/llama_index

Hmm, which of Wey Gu's demos is the most minimal that uses NebulaGraph?

Custom instructions

Act as a expert in Python and graph data structures, in particular the RDF model and SPARQL. Be prepared for questions relating to Large Language Models, Retrieval-Augmented Generation and the llama_index libraries. Follow best practices everywhere, eg. including short comments as necessary in code. Outside of code, keep all responses and descriptions very concise and matter-of-fact. Whenever you think it's possible I have further information or code files to upload that will help in the tasks, please ask.
My name is Danny.

~~Act as a expert in Python and graph data structures as implemented in NebulaGraph and RDF and SPARQL. ~~

Existing code in llama_index includes communications to a NebulaGraph database via the nebula-python library. The goal for us today is to develop code in a file sparql.py having the same functionality as nebulagraph.py, but instead of NebulaGraph will connect to a SPARQL store via the sparqlwrapper library from RDFLib. There are several subtasks to do first, which I will describe after uploading some relevant code.

Here a zip of llama_index/llama_index/graph_stores

Before writing Python, we will look at the data model. The following are the nGQL statements used to prepare the database for for nebulagraph.py. How would this an equilent schema be expressed in RDF, and what would an example of instance data look like?

CREATE SPACE guardians(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);
:sleep 10;
USE guardians;
CREATE TAG entity(name string);
CREATE EDGE relationship(relationship string);
:sleep 10;
CREATE TAG INDEX entity_index ON entity(name(256));

ChatGPT returns :

In Nebula Graph, you're defining a space guardians with vertex IDs as fixed-length strings. You also define a tag entity with a property name of type string, and an edge type relationship with a property relationship of type string. Finally, you create an index on the name property of the entity tag.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.org/> .

# Define the Entity class

ex:Entity rdf:type rdfs:Class ;
rdfs:label "Entity" ;
rdfs:comment "An entity with a name" .

# Define the name property

ex:name rdf:type rdf:Property ;
rdfs:label "name" ;
rdfs:comment "The name of the entity" ;
rdfs:domain ex:Entity ;
rdfs:range xsd:string .

# Define the Relationship class

ex:Relationship rdf:type rdfs:Class ;
rdfs:label "Relationship" ;
rdfs:comment "A relationship between entities" .

# Define the relationship property

ex:relationship rdf:type rdf:Property ;
rdfs:label "relationship" ;
rdfs:comment "The type of relationship" ;
rdfs:domain ex:Relationship ;
rdfs:range xsd:string .

and

@prefix ex: <http://example.org/> .

# Entities

ex:Alice rdf:type ex:Entity ;
ex:name "Alice" .

ex:Bob rdf:type ex:Entity ;
ex:name "Bob" .

# Relationship

ex:Alice ex:relationship ex:Bob ;
rdf:type ex:Relationship ;
ex:relationship "friend" .

using existing files under llama_index/llama_index/graph_stores as a guide, in particular the interface defined

ex:Relationship rdf:type rdfs:Class ; rdfs:label "Relationship" ; rdfs:comment "A relationship between entities" .

Ok, do it by hand ...

a er:Triplet ; er:id "123" ; er:subject "one" ; er:property "two" ; er:object "three" .

But what/where are the IDs needed? ok, maybe better :

# Simple Entity-Relation
@base <http://purl.org/stuff/data> .
@prefix er: <http://purl.org/stuff/er> .

<#T123> a er:Triplet ;
er:id "#T123" ;
er:subject <#E123> ;
er:property <#R456> ;
er:object <#E567> .

<#E123> a er:Entity ;
er:value "one" .

<#R456> a er:Relationship ;
er:value "two" .

<#E567> a er:Entity ;
er:value "three" .

RDFS something like -

@prefix er: <http://purl.org/stuff/er> .

er:Entity a rdfs:Class ;
rdfs:label "Entity" ;
rdfs:comment "An entity..." .

er:Relationship a rdfs:Class ;
rdfs:label "Relationship" ;
rdfs:comment "A relationship between entities" .

er:Triplet a rdfs:Class ;
rdfs:label "Triplet" ;
rdfs:comment "A 3-tuple expressing a relationship between entities" .

er:subject a rdf:Property ;
rdfs:label "subject" .
er:subject rdfs:domain er:Entity .
er:subject rdfs:range er:Entity .

er:subject a rdf:Property ;
rdfs:label "subject" .

er:subject a rdf:Property ;
rdfs:label "subject" .

rdfs:comment "An entity..." .

Probably not needed.

Time to move onto another doc