llama_index SPARQL Notes 24
cd ~/AI/LIBS-under-dev/llama_index
pytest tests/graph_stores/test_sparql.py > ../testy.md
I've been fixing some of the things GitHub complained about when I did a pull request. Luckily things that ChatGPT can help with.
I've added type annotations to sparql.py
, started writing some tests.
added sparqlwrapper to setup.py
export PYTHONPATH=/home/danny/AI/LIBS-under-dev/llama_index
danny@danny-desktop:~/AI/LIBS-under-dev/llama_index$ pip install -e .
danny@danny-desktop:~/AI/LIBS-under-dev/llama_index$ pytest tests/graph_stores/test_simple.py
danny@danny-desktop:~/AI/LIBS-under-dev/llama_index$ pytest tests/graph_stores/test_simple.py > ../testy.md
ok, ChatGPT got the shape of the returned values wrong (so much for type annotations!) but was easy to fix.
Next is the more challenging one, test_sparql.py
.
Give ChatGPT a zip. Include paths.txt
so it gets the imports right.
"I wish to create a pytest test. I will now upload a zip of files. Look at the functions in test_simple.py and their relation to functions in simple.py, remember these. I will then give you further instructions"
"I would like a test file, test_sparql.py
that will test all the functions in sparql.py
. Much of this will be very similar to test_simple.py
as they both follow the same interface, types.py
. I would like you to go through every function in sparql.py
one at a time and create at test for it. Don't show me any results until this is done. Combine them into test_sparqly.py
and give me that. Keep explanation to a minimum. Do not ask me for confirmation, just continue until you have a full response."
much time, many prompts later...
9 errors. I noticed "EndPointNotFound"
I did a pull request on sparql.py
etc. last night.
GitHub gave me a message that there weren't any type annotations.
I'll probably do those tomorrow.
I started making up some diagrams to use in docs/slides.
Occurred to me, I should have use / not # in the URIs, so they'll be directly retrievable.
HTML links
https://www.w3.org/TR/rdfa-syntax/
git checkout main git merge sparql-01
https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging
path issues.
pip install -U llama-index
original llama_index has changes...
git checkout main git merge sparql-01
bit of fiddling, pushed
export PYTHONPATH=/home/danny/AI/LIBS-under-dev/llama_index
export PYTHONPATH=/home/hkms-apps/llama_index
pip install -e .
python docs/examples/graph_stores/graph-rag-sparql-mini.py
I spent the past few days mostly admin stuff (and stacking logs).
Prior to making a pull request on llama_index I want to :
I'm going to try to be a little more careful with git this time after a screw-up the other day.
So for a clean slate, I backed up and then deleted my fork of llama_index. Then :
sparql-01
git checkout sparql-01
I think the only necessary core files are :
and docs/demos :
I've got a bit distracted trying to set up good Jupyter Notebook hosting on my server. After going round in circles a lot have settled on using JupyterHub, specifically :
https://github.com/jupyterhub/the-littlest-jupyterhub
It's now running at http://hyperdata.it:8484/user/admin/lab
This provides a lot more than I need for this, is multi-user, but doesn't appear to make much demand on resources (assuming very few users).
Looking around this took me into auth territory a bit. I do want this set up for other services on my site, so I'm biting the bullet a bit.
LDAP seems the best base for me, JupyterHub has a plugin. More generally, an LDAP-based identity provider would be nice to have.
But before that - how to add the path of my in-progress LlamaIndex bits so the Notebook can see it. PYTHONPATH isn't working. Virtual envs were mention in the (quick) install procedure, so probe time -
import sys
import os
print('sys.prefix = '+sys.prefix)
print('\nos.environ = '+str(os.environ))
print('\nVIRTUAL_ENV = '+str(os.environ.get('VIRTUAL_ENV')))
print('\nsys.executable = '+sys.executable)
print('\nsys.path = ')
print(sys.path)
gives
sys.prefix = /opt/tljh/user
os.environ = environ({'LANG': 'en_US.UTF-8', 'PATH': '/opt/tljh/user/bin:/opt/tljh/hub/bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', '_WSREP_START_POSITION': '', 'HOME': '/home/jupyter-admin', 'LOGNAME': 'jupyter-admin', 'USER': 'jupyter-admin', 'SHELL': '/bin/bash', 'INVOCATION_ID': '0f6f2c87ddd84b6580a092738a5f2828', 'JOURNAL_STREAM': '8:13724686', 'RUNTIME_DIRECTORY': '/run/jupyter-admin', 'SYSTEMD_EXEC_PID': '89780', 'JPY_API_TOKEN': 'a0b308207fff4b1ba0d634c490dd3f33', 'JUPYTERHUB_ACTIVITY_URL': 'http://127.0.0.1:15001/hub/api/users/admin/activity', 'JUPYTERHUB_API_TOKEN': 'a0b308207fff4b1ba0d634c490dd3f33', 'JUPYTERHUB_API_URL': 'http://127.0.0.1:15001/hub/api', 'JUPYTERHUB_BASE_URL': '/', 'JUPYTERHUB_CLIENT_ID': 'jupyterhub-user-admin', 'JUPYTERHUB_DEFAULT_URL': '/lab', 'JUPYTERHUB_HOST': '', 'JUPYTERHUB_OAUTH_ACCESS_SCOPES': '["access:servers!server=admin/", "access:servers!user=admin"]', 'JUPYTERHUB_OAUTH_CALLBACK_URL': '/user/admin/oauth_callback', 'JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES': '[]', 'JUPYTERHUB_OAUTH_SCOPES': '["access:servers!server=admin/", "access:servers!user=admin"]', 'JUPYTERHUB_SERVER_NAME': '', 'JUPYTERHUB_SERVICE_PREFIX': '/user/admin/', 'JUPYTERHUB_SERVICE_URL': 'http://127.0.0.1:35051/user/admin/', 'JUPYTERHUB_USER': 'admin', 'PYDEVD_USE_FRAME_EVAL': 'NO', 'JPY_SESSION_NAME': '/home/jupyter-admin/graph-rag-sparql-mini.ipynb', 'JPY_PARENT_PID': '89780', 'TERM': 'xterm-color', 'CLICOLOR': '1', 'FORCE_COLOR': '1', 'CLICOLOR_FORCE': '1', 'PAGER': 'cat', 'GIT_PAGER': 'cat', 'MPLBACKEND': 'module://matplotlib_inline.backend_inline'})
VIRTUAL_ENV = None
sys.executable = /opt/tljh/user/bin/python
sys.path =
['/home/jupyter-admin', '/opt/tljh/user/lib/python310.zip', '/opt/tljh/user/lib/python3.10', '/opt/tljh/user/lib/python3.10/lib-dynload', '', '/opt/tljh/user/lib/python3.10/site-packages']
root@localhost:/opt/tljh# grep -r sys.path
...
/opt/tljh/hub/lib/python3.10/site-packages/tljh/jupyterhub_config.py contains
# Load arbitrary .py config files if they exist.
# This is our escape hatch
extra_configs = sorted(glob(os.path.join(CONFIG_DIR, "jupyterhub_config.d", "*.py")))
for ec in extra_configs:
load_subconfig(ec)
so...
nano /opt/tljh/config/jupyterhub_config.d/extra-path.py
# temp until I've figured out how JupyterHub does sys.path
import sys
sys.path.append("/home/hkms-apps/llama_index")
systemctl restart jupyterhub
No change.
Oh, rights. chmod on that ^, no change.
Runs at another point?
I don't need to check everything starts ok on reboot on this server, so reboot time.
14066 triple in Guradians graph
All started ok, no change to sys.pth.
Hmm. It doesn't appear to be running in a venv, so better retry
root@localhost:~# export PYTHONPATH=/home/hkms-apps/llama_index root@localhost:~# echo $PYTHONPATH /home/hkms-apps/llama_index
systemctl restart jupyterhub
WARNING: The script openai is installed in '/home/jupyter-admin/.local/bin' which is not on PATH.
Just remembered the actual code is under /home/hkms-apps/llama_index/llama_index
/opt/tljh/hub/lib/python3.10/site-packages/tljh/jupyterhub_config.py
is called from
jupyterhub.service
# Template file for JupyterHub systemd service
# Uses simple string.format() for 'templating'
[Unit]
# Traefik must have successfully started *before* we launch JupyterHub
Requires=traefik.service
After=traefik.service
[Service]
User=root
Restart=always
WorkingDirectory=/opt/tljh/state
# Protect bits that are normally shared across the system
PrivateTmp=yes
PrivateDevices=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
Environment=TLJH_INSTALL_PREFIX=/opt/tljh
Environment=PATH=/opt/tljh/hub/bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# Run upgrade-db before starting, in case Hub version has changed
# This is a no-op when no db exists or no upgrades are needed
ExecStart=/opt/tljh/hub/bin/python3 -m jupyterhub.app -f /opt/tljh/hub/lib/python3.10/site-packages/tljh/jupyterhub_config.py
[Install]
# Start service when system boots
WantedBy=multi-user.target
Add OpenAPI keys.
Endpoint : https://fuseki.hyperdata.it/#/dataset/llama_index_sparql-test/
DROP GRAPH <http://purl.org/stuff/guardians>
sudo /usr/local/nebula/scripts/nebula.service start all
cd ~/AI/nlp/GraphRAG/src
export PYTHONPATH=$PYTHONPATH:/home/danny/AI/LIBS-under-dev/llama_index
python graph-rag-sparql-minimal.py
On server for demo.
export PYTHONPATH=$PYTHONPATH:/home/hkms-apps/llama_index
/home/hkms-apps/llama_index
add to nano /etc/profile
Add OpenAPI keys.
Endpoint : https://fuseki.hyperdata.it/#/dataset/llama_index_sparql-test/
DROP GRAPH <http://purl.org/stuff/guardians>
sudo /usr/local/nebula/scripts/nebula.service start all
cd ~/AI/nlp/GraphRAG/src
export PYTHONPATH=$PYTHONPATH:/home/danny/AI/LIBS-under-dev/llama_index
python graph-rag-sparql-minimal.py
Fool Danny!
I'd made rels()
as a placeholder for get_rel_map()
while testing. Forget to wire it in when running end-to-end.
Connecting that up (and a few little tweaks) got rid of the recursion issue.
Yay!! I might be missing something obvious but it now appears to work!
Now to tidy up, then set up a demo Notebook.
Ew, I got the recursion/depth error again.
Ah, interesting!
I was giving it 2 questions :
The first the LLM should be able to answer but the RAG data would be no use, and vice versa.
When I removed the cat query, no recursion error. So I guess maybe the LLM was returning TMI.
So I'll leave cats out for now.
Back to setting up a demo.
Grrr. Accidentally stuck a big file in a repo, got in a tangle with github. Headache, time to stop.
So, eliminating loops...
For now at least I reckon it'd be best to prevent loops before triplets go to the store. The simplest thing that should work is when presented with a triplet (subj, rel, obj)
, check that obj
hasn't already been used as a subj
.
Immediate goal is to recreate Wey's Graph RAG demo but using a SPARQL store rather than NebulaGraph.
I'm running it as graph-rag-sparql-minimal.py. The in-progress sparql.py (rels()
will be called from get_rel_map()
).
I've got data going to the store something like it should using graph_store.upsert()
. Data coming back from calls to graph_store.get_rel_map()
that appears to be more or less the right shape. But an issue with each I would appreciate advice on.
Batching data
Right now I've set up INSERT queries to add triplets (as RDF triples) one at a time to the store. It's really inefficient because each has to be wrapped in a query skeleton and POSTed over HTTP. I imagine some kind of batch handling is already implemented somewhere, but I couldn't find anything I understood. It isn't essential right now, but if there is something, pointers?
Loops in graph data
This is a breaking issue for me right now. I'm sure I can sort it out, but a little more knowledge should make that quicker :)
So at some point the data is fed through llama_index/llama_index/response_synthesizers/tree_summarize.py
.
I get RecursionError: maximum recursion depth exceeded in comparison
, so seems likely there's a loop somewhere. Short term, some kind of check+delete should fix that (algorithm hunt time). But more generally, what is the situation with graphs & loops? Is this an underlying feature of the way Graph RAG operates, or is it just an implementation issue?
cd ~/AI/nlp/GraphRAG/src
export PYTHONPATH=$PYTHONPATH:/home/danny/AI/LIBS-under-dev/llama_index
python /home/danny/AI/LIBS-under-dev/llama_index/tests/storage/graph_stores/test_sparql.py
That took a while but I now have results like :
{Peter Quill: [
'Peter Quill, -[was abducted from Earth]->, as a child',
'Peter Quill, -[is leader of]->, Guardians of the Galaxy',<-[are attacked by]-, Adam',
Ah, but as a string. Need to add a loop on the subjs and tweak data accumulation to conform to:
def get_rel_map(
self, subjs: Optional[List[str]] = None, depth: int = 2
) -> Dict[str, List[List[str]]]:
Hmm, I don't really understand that return shape. And I've overwritten the log that had it...
sudo /usr/local/nebula/scripts/nebula.service start all
cd ~/AI/nlp/GraphRAG/src
export PYTHONPATH=$PYTHONPATH:/home/danny/AI/LIBS-under-dev/llama_index
python graph-rag-nebulagraph-minimal.py
if nebulagraph.py, get_rel_map() :
# We put rels in a long list for depth>= 1, this is different from
# SimpleGraphStore.get_rel_map() though.
# But this makes more sense for multi-hop relation path.
the results in the log look very like what's produced from SPARQL above. Ok, time to try it.
First, at https://fuseki.hyperdata.it/#/dataset/llama_index_sparql-test/query
DROP GRAPH <http://purl.org/stuff/guardians>
little char problem, ChatGPT time
Ah, SPARQL encoding different from Python on Spanish n with wibble
It might now be putting what it should in the store (took a few minutes), but it crashes with :
File "/home/danny/AI/LIBS-under-dev/llama_index/llama_index/response_synthesizers/tree_summarize.py", line 142, in get_response
return self.get_response(
^^^^^^^^^^^^^^^^^^
[Previous line repeated 980 more times]
File "/home/danny/AI/LIBS-under-dev/llama_index/llama_index/response_synthesizers/tree_summarize.py", line 96, in get_response
summary_template = self._summary_template.partial_format(query_str=query_str)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/danny/AI/LIBS-under-dev/llama_index/llama_index/prompts/base.py", line 206, in partial_format
return SelectorPromptTemplate(
^^^^^^^^^^^^^^^^^^^^^^^
...
RecursionError: maximum recursion depth exceeded in comparison
...
File "pydantic/class_validators.py", line 337, in pydantic.class_validators._generic_validator_basic.lambda13
...
Hmm. A loop or just to big?
PREFIX er: <http://purl.org/stuff/er#>
BASE <http://purl.org/stuff/data>
SELECT DISTINCT ?rel1 ?obj1 ?rel2 ?obj2 WHERE {
GRAPH <http://purl.org/stuff/guardians> {
?triplet a er:Triplet ;
er:subject ?subject ;
er:property ?property ;
er:object ?object .
?subject er:value "Peter Quill" .
?property er:value ?rel1 .
?object er:value ?obj1 .
OPTIONAL {
?triplet2 a er:Triplet ;
er:subject ?object ;
er:property ?property2 ;
er:object ?object2 .
?property2 er:value ?rel2 .
?object2 er:value ?obj2 .
}}}
LIMIT 10
WARNING:llama_index.graph_stores.nebulagraph:s =Peter Quill WARNING:llama_index.graph_stores.nebulagraph:rel_map =
Ok, I want rel_map
to take the subject, Peter Quill
, call the SPARQL store and return something in this format :
{'Peter Quill': [ 'Peter Quill, -[would return to the MCU]->, May 2021, <-[would return to the MCU]-, Peter Quill', 'Peter Quill, -[would return to the MCU]->, May 2021', 'Peter Quill, -[was raised by]->, a group of alien thieves and smugglers', 'Peter Quill, -[is leader of]->, Guardians of the Galaxy', 'Peter Quill, -[would return to the MCU]->, May 2021, <-[Gunn reaffirmed]-, Guardians of the Galaxy Vol. 3', ...
Hmm, it takes a list :
def get_rel_map(
self, subjs: Optional[List[str]] = None, depth: int = 2
) -> Dict[str, List[List[str]]]:
Looping through the list to build the query should work, but there might be a more elegant way. Whatever, start with a single subject.
If I build this up in :
llama_index/tests/storage/graph_stores/test_sparql.py
It make a good start to the test.
Probably unnecessary but I've added an `unescape_from_rdf` helper to `sparql.py` to revert the quote escaping that Turtle needed.
cd ~/AI/nlp/GraphRAG/src export PYTHONPATH=$PYTHONPATH:/home/danny/AI/LIBS-under-dev/llama_index python /home/danny/AI/LIBS-under-dev/llama_index/tests/storage/graph_stores/test_sparql.py
> urllib.error.HTTPError: HTTP Error 502: Proxy Error
Oops. Too many results? Check server...
That took me a long time, bit fiddly. But now :
results = graph_store.select_triplets('Peter Quill', 10)
is returning :
{'rel': {'type': 'literal', 'value': 'is leader of'}, 'obj': {'type': 'literal', 'value': 'Guardians of the Galaxy'}} {'rel': {'type': 'literal', 'value': 'is half-human'}, 'obj': {'type': 'literal', 'value': 'half-Celestial'}} {'rel': {'type': 'literal', 'value': 'was abducted from Earth'}, 'obj': {'type': 'literal', 'value': 'as a child'}} {'rel': {'type': 'literal', 'value': 'was raised by'}, 'obj': {'type': 'literal', 'value': 'a group of alien thieves and smugglers'}}
Ok, so now I reckon I need SPARQL UNION (and possibly BIND) to get some <-[backwards]- bits.
Break time.
Hmm, I was playing around with the SPARQL, looks like this dataset (populated from `sparql.py`) is missing a few triples.
For now go with https://fuseki.hyperdata.it/#/dataset/llama_index-test/query which came from NebulaGraph.
Ok, this returns some things of the right shape, will do for now :
PREFIX er: http://purl.org/stuff/er#
BASE http://purl.org/stuff/data
SELECT DISTINCT ?subj ?rel ?obj ?rel2 ?obj2 WHERE {
GRAPH <http://purl.org/stuff/guardians> {
?triplet a er:Triplet ;
er:subject ?subject ;
er:property ?property ;
er:object ?object .
?subject er:value "Peter Quill" .
?property er:value ?rel .
?object er:value ?obj .
OPTIONAL {
?triplet2 a er:Triplet ;
er:subject ?subject2 ;
er:property ?property2 ;
er:object ?object2 .
?subject2 er:value ?obj .
?property2 er:value ?rel2 .
?object2 er:value ?obj2 .
}
}
}
**Property paths!** D'oh! I'd forgotten about them. Probably useful here. https://www.w3.org/TR/sparql11-query/#propertypaths
But for now, get suitable output of `rel_map` from results of the above.
**ChatGPT**
Given the following example :
subj = 'Peter Quill'
rels = {'rel': {'type': 'literal', 'value': 'is leader of'}, 'obj': {'type': 'literal', 'value': 'Guardians of the Galaxy'}, 'rel2': {'type': 'literal', 'value': 'cannot heal'}, 'obj2': {'type': 'literal', 'value': 'Rocket'}}
arp = to_arrows(subj, rels)
write the function to_arrows so this will be the value of string arp :
'Peter Quill, -[would return to the MCU]->, May 2021, <-[Gunn reaffirmed]-, Guardians of the Galaxy Vol. 3'
**didnt really help**
Started doing it manually, now too tired. Night night.
---
I've used this (and almost identical in Java etc) _so often_, but have managed to forget :
> Logger.setLevel() specifies the lowest-severity log message a logger will handle, where debug is the lowest built-in severity level and critical is the highest built-in severity. For example, if the severity level is INFO, the logger will handle only INFO, WARNING, ERROR, and CRITICAL messages and will ignore DEBUG messages.
`:cat AI`
`:tag SPARQL`
`:tag LlamaIndex`
Got some funds so I paid some API credit. But have spent the day so far on admin and bits & pieces that were niggling me. And now, 16:47, it's dogwalk time. But I need a bit of fresh air, clear head for sparql.py
.
Back from dogwalk
Credit balance $29.87
sudo /usr/local/nebula/scripts/nebula.service start all
cd ~/AI/nlp/GraphRAG/src
export PYTHONPATH=$PYTHONPATH:/home/danny/AI/LIBS-under-dev/llama_index
python graph-rag-nebulagraph-minimal.py
5 or 6 runs with little tweaks in between. OpenAI credit still says $29.87.
No idea why, this returned a
It also ends on an exception from calling -
def __del__(self) -> None:
"""Close NebulaGraph session pool."""
self._session_pool.close()
But, before that it says :
Peter Quill is the half-human, half-Celestial leader of the Guardians of the Galaxy...
The log isn't any more useful -
INFO:llama_index.graph_stores.nebulagraph:nebulagraph HERE
INFO:llama_index.graph_stores.simple:simple HERE
INFO:llama_index.graph_stores.sparql:sparql HERE
INFO:__main__:graph-rag-nebulagraph-minimal HERE
INFO:__main__:#### 1.2
INFO:__main__:#### skip 2
INFO:__main__:#### 4
INFO:llama_index.indices.loading:Loading all indices.
INFO:__main__:#### 6.2
INFO:__main__:
Peter Quill is the half-human, half-Celestial leader of the Guardians of the Galaxy...
Hmm, log level..?
Tweaked a bit, noticed it's loading index from ./storage_graph/docstore.json
Oops, I pulled out the code for generating the kg when I was looking at the upload bits without OpenAI
TypeError: can only concatenate str (not "list") to str
ok,
for s in subjs:
logger.warning('s =' + str(s))
a couple more tweaks later it ran through. Much more interesting logs!
First thing of note :
DEBUG:openai:message='Request to OpenAI API' method=post path=https://api.openai.com/v1/completions
DEBUG:openai:api_version=None data='{"prompt": "Some text is provided below. Given the text, extract up to 10 knowledge triplets in the form of (subject, predicate, object). Avoid stopwords.\\n---------------------\\nExample:Text: Alice is Bob\'s mother.Triplets:\\n(Alice, is mother of, Bob)\\nText: Philz is a coffee shop founded in Berkeley in 1982.\\nTriplets:\\n(Philz, is, coffee shop)\\n(Philz, founded in, Berkeley)\\n(Philz, founded in, 1982)\\n---------------------\\nText: The Guardians travel to Orgocorp\'s headquarters to find the switch\'s override code and save Rocket\'s life.As Rocket lies unconscious, he recalls his past.As a baby raccoon, he was experimented on by the High Evolutionary, who sought to enhance and anthropomorphize animal lifeforms to create an ideal society called Counter-Earth.Rocket befriended his fellow Batch 89 test subjects: the otter Lylla, the walrus Teefs, and the rabbit Floor.The High Evolutionary was impressed by Rocket\'s rapidly growing intelligence but became furious once it exceeded his own.The High Evolutionary used Rocket to perfect his Humanimal creations, then planned to harvest Rocket\'s brain for further research and exterminate the obsolete Batch 89.Rocket freed his friends, but the High Evolutionary killed Lylla.Enraged, Rocket mauled the High Evolutionary, but his henchmen killed Teefs and Floor during a firefight with Rocket, before the latter fled Counter-Earth in a spaceship.In the present, the Ravagers, including an alternate version of Gamora, help the Guardians infiltrate Orgocorp.They retrieve Rocket\'s file but discover that the code was removed, with the likely culprit being Theel, one of the High Evolutionary\'s advisors.The Guardians, along with Gamora, depart for Counter-Earth to find him.They are followed by Ayesha and Adam after the High Evolutionary, their race\'s creator, threatened to wipe out the Sovereign if they fail to retrieve Rocket.The Guardians reach Counter-Earth and are guided to the Ar\\u00eate Laboratories complex.Drax and Mantis remain with Gamora and Rocket as Peter Quill, Groot, and Nebula travel to Ar\\u00eate.Nebula is forced to wait outside by guards as Quill and Groot enter Ar\\u00eate, while Drax tricks Mantis into pursuing Quill\'s group.Gamora saves Rocket from being captured by Adam and the High Evolutionary\'s guard War Pig.Questioned by Quill, the High Evolutionary admits disillusionment with the Animen\'s imperfect society.He destroys Counter-Earth, killing the Humanimals and Ayesha.Ar\\u00eate departs as a spaceship, with Nebula, Drax and Mantis boarding to rescue Quill and Groot, who instead escape Ar\\u00eate with Theel, retrieving the code from his corpse before being rescued by Gamora in their ship.\\nTriplets:\\n", "stream": false, "model": "text-davinci-002", "temperature": 0.0, "max_tokens": 3480}' message='Post details'
DEBUG:urllib3.connectionpool:https://api.openai.com:443 "POST /v1/completions HTTP/1.1" 200 None
Lots and lots of upserts, up to around line 1000 in nebby.log
line 1020 #### nebulagraph get_rel_map called
WARNING:llama_index.graph_stores.nebulagraph:
#### nebulagraph get_rel_map called
WARNING:llama_index.graph_stores.nebulagraph:s =Peter Quill
WARNING:llama_index.graph_stores.nebulagraph:#### nebulagraph get_flat_rel_map
WARNING:llama_index.graph_stores.nebulagraph:subs = ['Peter Quill']
WARNING:llama_index.graph_stores.nebulagraph:get_flat_rel_map() subjs_param: ['Peter Quill'], query: WITH map{`true`: '-[', `false`: '<-['} AS arrow_l, map{`true`: ']->', `false`: ']-'} AS arrow_r MATCH (s)-[e:`relationship`*..2]-() WHERE id(s) IN $subjs WITH id(s) AS subj,[rel IN e | [ arrow_l[tostring(typeid(rel) > 0)] + rel.`relationship`+ arrow_r[tostring(typeid(rel) > 0)], CASE typeid(rel) > 0 WHEN true THEN dst(rel) WHEN false THEN src(rel) END ]] AS rels WITH subj, REDUCE(acc = collect(NULL), l in rels | acc + l) AS flattened_rels RETURN subj, REDUCE(acc = subj, l in flattened_rels | acc + ', ' + l ) AS flattened_rels
WARNING:llama_index.graph_stores.nebulagraph:rel_map =
WARNING:llama_index.graph_stores.nebulagraph:s =Peter Quill
WARNING:llama_index.graph_stores.nebulagraph:rel_map =
right, this looks where the triplets for a subj are got & returned
raw snipped
I saved that chunks as rel_map.json and let VSCode try to format it - much clearer. Kinda...
PS. format a bit by hand
{'Peter Quill': [
'Peter Quill, -[would return to the MCU]->, May 2021, <-[would return to the MCU]-, Peter Quill',
'Peter Quill, -[would return to the MCU]->, May 2021',
'Peter Quill, -[was raised by]->, a group of alien thieves and smugglers',
'Peter Quill, -[is leader of]->, Guardians of the Galaxy',
'Peter Quill, -[would return to the MCU]->, May 2021, <-[Gunn reaffirmed]-, Guardians of the Galaxy Vol. 3',
...
Ok, a format in a format. I don't know why there are backwards arrows and apparently repetition, but the basic stuff will be straightforward from SPARQL results. I reckon I'll start with simple forward-arrow triples, see if that communicates enough.
oh wait, line 1124 :
'Request to OpenAI API' method=post path=https://api.openai.com/v1/completions
DEBUG:openai:api_version=None data='{"prompt": "Context information from multiple sources is below.\\n---------------------\\nThe following are knowledge sequence in max depth 2 in the form of directed graph like:\\n`subject -[predicate]->, object, <-[predicate_next_hop]-, object_next_hop ...
...
Break time. Food & flop.
:cat AI
:tag SPARQL
:tag LlamaIndex
reminders
Kingsley
thread -
https://twitter.com/kidehen/status/1677095027246243840
Woot!!!
#SPARQL plugin for #ChatGPT working, and it was so darn easy to make and deploy too, courtesy of our #VirtuosoRDBMS (designed to bring #Web magic to #DataConnectivity & #DBMS realms).
Session Transcript: https://chat.openai.com/share/ff7ec94e-0b0b-4505-b379-d6be7a91d7bf
#SemanticWeb #AI #LinkedData #AGI
https://twitter.com/mark_l_watson
Wowza! I asked ChatGPT with Diagrams Plugin to generate a UML sequence diagram for a SPARQL query to DBPedia: A+ grade
LLMs are far, far, far! from AGI but very useful. ChatGPT and CoPilot are fixtures of my Emacs setup, and web versions including Bard, Midjourney, and Bing+GPT are solidly in my workflow.
braindump for future me
I've had dataflows on my mind this week, tracing through some code and seeing some coincidentally related material, eg. Tweet from Mark Watson :
Wowza! I asked ChatGPT with Diagrams Plugin to generate a UML sequence diagram for a SPARQL query to DBPedia: A+ grade
The other day I had a skim through some AI books I got in the 1990s, mostly curious what ideas were around then that have been forgotten, could maybe be revitalized. One notable takeaway was how ReLU (rectified linear unit, a diode) has since unseated tanh/sigmoid as the activation function of choice.
Looking at a vanilla neural network, single numeric values flow through, getting modified along the way (and are backpropagated, changing weights). For tracking flows and highlighting things, medical diagnosis can use radioactive tracers.
Could something comparable be used with NNs?
At the input layer a value x is given to a node, subsequently each node receives a bunch of values from connected nodes in the previous layer. What if instead a pair of values were passed (x, C), where C is a marker constant, what should come out of a node, what about side effects?
First pass, how about this : the treatment of the x values stays exactly the same as in the vanilla case - but a C is hitching a ride. A selection function at the node picks the C from the input pairs with the highest value from its inputs. This is the value passed along from this node to nodes in the next layer.
The side effect I have in mind is similar to the way weights are adjusted in backprop, that the node takes on a value of C. This could also occur on the backprop phase, so each node holds (Cf, Cb).
Are there any implementation issues I haven't seen? Might this be any use for anything?
To investigate, I guess trying it in toy network code would be the next step.
There's a kind of bigger picture/generalization aspect to this. What if the values passed around, rather than usual single values or strange pairs as above, are arbitrary data structures? The transfer funtions are arbitrary? I'm pretty sure there'll be net designs which pass matrices of real numbers around, I've a feeling there might be calculation performance optimization potential somewhere around there. But I haven't a clue what activation functions would be appropriate...
On the function question, usually differentiability is canon. But in a view from 1km, this is a special case where optimization is done through convex optimization, hill-climbing over the number spaces. Other optimization techniques exist, eg. combinatorial, integer programming.
I've not read much about optimization techniques, apart from bits in papers along the lines of "We found 5 layers gave accuracy within 2%, 6 or more only gave fractional improvement.". The relative benefits of different activation functions was looked at a lot in the early days of Deep Learning. But nowadays experiments (I've seen) tend to be looking more at large-scale topologies, with the subunits chosen from known good black boxes (from a box of black boxes?).
I don't know, but perhaps the space of possible network functionality could be explored with a bit of meta-programming, trying different setups as above. It feels like it should be automatable.
To borrow from the Outer Limits :
There is nothing wrong with your neural network. We will control the datatypes. We will control the activation functions.
I lost internet connectivity last night, woke up this morning with no electricity. The electric came on pretty soon - I had been warned ENEL were messing with the village wiring. The net took a trip into town to sort out (billing error, and then they took ages to switch me back on). The money I was expecting hasn't yet arrived t the bank, so still no OpenAI credit. So I spent most of the day staring at the wall/Twitter.
Ok, it seems like I was looking at the wrong place to put the query. Not query()
but :
def get(self, subj: str) -> List[List[str]]:
"""Get triplets."""
...
The implementation in nebulagraph.py
is scary!
Fool Danny, I forgot to put a logger bit in to check what gets returned there.
In simple.py
:
def get(self, subj: str) -> List[List[str]]:
"""Get triplets."""
return self._data.graph_dict.get(subj, [])
Wey mentions 'flattened' around get_rel_map()
, where the subject is given with a list of associated rel, obj. But this looks less flat.
I'll assume for now, if data =
"one", "two", "three"
"one", "two", "four"
"two", "two", "four"
get('one')
will return :
[[`one', 'two', 'three'], ['one','two','four']]
Now I'm confused. I did put a logger into get()
, list of lists, lol. Nothing came out.
Ok, I'll make a helper method that acts as above, see where it go when I can see the behaviour again.
@prefix er: http://purl.org/stuff/er# . @base http://purl.org/stuff/data .
Query like :
PREFIX er: <http://purl.org/stuff/er#>
BASE <http://purl.org/stuff/data>
SELECT DISTINCT ?rel ?obj WHERE {
GRAPH <http://purl.org/stuff/guardians> {
?triplet a er:Triplet ;
er:subject ?subject ;
er:property ?property ;
er:object ?object .
?subject er:value ?subj_string .
?property er:value ?rel .
?object er:value ?obj .
}
}
Ok, that worked via Fuseki's UI. For code, replace ?subj_string
with {subj}
. Put it in def sparql_query(self, query_string):
sparqlwrapper will not doubt need help producing what I want, need to RTFM there, results.response
somewhere - if it's available as a list of lists that'd be nice.
I'm using GET method, I don't like POST unless really necessary (it's floppy in the spec).
I must make an endpoint-focused test for this
Wired in to get()
, that'll do for today.
Continuing from yesterday, log-probing modulagraph.py
While added logger probes, having another look at functions in modulagraph.py
, I should implement in sparql.py
. They don't look complicated in what they do, but the shape of the data structures returned is confusing.
def client(self) -> Any:
self._session_pool
Not really sure there, the sparqlwrapper clients aren't reusable, maybe return a generator functions? need to look at the other graph_store
implementations.
def get(self, subj: str) -> List[List[str]]:
I suspect that'll be like SELECT DISTINCT ?p ?o WHERE { <subj> ?p ?o }
https://adamj.eu/tech/2021/07/06/python-type-hints-how-to-use-typing-cast/
def get_rel_map(
self, subjs: Optional[List[str]] = None, depth: int = 2
) -> Dict[str, List[List[str]]]:
"""Get rel map."""
# We put rels in a long list for depth>= 1, this is different from
# SimpleGraphStore.get_rel_map() though.
# But this makes more sense for multi-hop relation path.
Confusing. But this calls get_flat_rel_map
which has a handy comment :
# The flat means for multi-hop relation path, we could get
# knowledge like: subj -rel-> obj -rel-> obj <-rel- obj.
# This type of knowledge is useful for some tasks.
# +-------------+------------------------------------+
# | subj | flattened_rels |
# +-------------+------------------------------------+
# | "player101" | [95, "player125", 2002, "team204"] |
# | "player100" | [1997, "team204"] |
# ...
# +-------------+------------------------------------+
def upsert_triplet(self, subj: str, rel: str, obj: str) -> None:
I think the implementation for NebulaGraph makes this look trickier than it is. A templated SPARQL INSERT
block is probably whaqt's needed.
def delete(self, subj: str, rel: str, obj: str) -> None:
Same approach as upsert_triplet
?
def query(self, query: str, param_map: Optional[Dict[str, Any]] = {}) -> Any:
Hmm, the param_map looks painful. This function calls :
def execute(self, query: str, param_map: Optional[Dict[str, Any]] = {}) -> Any:
No really sure how that operates...
I'm not sure how useful the log statements I've added will be with the data structures, I've only got str(structure)
, some may well need more unrolling.
Ok, try that (without any calls to OpenAI API) :
Runs ok, nice :
INFO:llama_index.graph_stores.nebulagraph:get_schema() schema:
Node properties: [{'tag': 'entity', 'properties': [('name', 'string')]}]
Edge properties: [{'edge': 'relationship', 'properties': [('relationship', 'string')]}]
Relationships: ['(:entity)-[:relationship]->(:entity)']
Hmm. How best to express the schema in RDF?
RDFS is the obvious choice. This current version absolutely won't be a general solution so the schema should be associated with the (guardians
) graph. OWL has ontology descriptions..? TODO re-read specs.
Uncommented some of the calls in graph-rag-nebulagraph-minimal.py
that depend on OpenAI API, added a few log statements. Added API key. I've probably not included all the blocks of code needed for this to work end-to-end, but give it a go.
OpenAI API credit balance $0.21
Running...
raise openai.error.AuthenticationError(
openai.error.AuthenticationError: No API key provided. You can set your API key in code using 'openai.api_key = <API-KEY>', or you can set the environment variable OPENAI_API_KEY=<API-KEY>). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = <PATH>'. You can generate API keys in the OpenAI web interface. See https://platform.openai.com/account/api-keys for details.
Strange, I had os.environ["OPENAI_API_KEY"] = "..."
. Added openai.api_key = "..."
Oops, NameError: name 'openai' is not defined. Did you mean: 'OpenAI'?
import openai
Also uncommented text-to-NebulaGraphCypher bits.
Ok, failed on :
File "/home/danny/AI/LIBS-under-dev/llama_index/llama_index/graph_stores/nebulagraph.py", line 632, in query
logger.info('param_map = '+param_map)
~~~~~~~~~~~~~~^~~~~~~~~~
TypeError: can only concatenate str (not "dict") to str
log got as far as :
#### nebulagraph query called
INFO:llama_index.graph_stores.nebulagraph:query =
MATCH (p:`entity`)-[:relationship]->(m:`entity`) WHERE p.`entity`.`name` == 'Peter Quill'
RETURN p.`entity`.`name`;
tweaked -
# logger.info('param_map = '+param_map)
logger.info('param_map = ')
for key, value in param_map.items():
logger.info(key + ' = '+value)
Good-good, at command line got :
python graph-rag-nebulagraph-minimal.py
Graph Store Query:
MATCH (p:`entity`)-[:relationship]->(m:`entity`) WHERE p.`entity`.`name` == 'Peter Quill'
RETURN p.`entity`.`name`;
Graph Store Response:
{'p.entity.name': ['Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill', 'Peter Quill']}
Final Response:
Peter Quill is a character in the Marvel Universe. He is the son of Meredith Quill and Ego, and the half-brother of Gamora. He was raised by his mother on Earth until he was abducted by aliens at the age of ten. He was raised by the Ravagers, a group of space pirates, and eventually became their leader. He is also known as Star-Lord.
Only a little more in log :
#### nebulagraph get_schema called
INFO:llama_index.graph_stores.nebulagraph:get_schema() schema:
Node properties: [{'tag': 'entity', 'properties': [('name', 'string')]}]
Edge properties: [{'edge': 'relationship', 'properties': [('relationship', 'string')]}]
Relationships: ['(:entity)-[:relationship]->(:entity)']
INFO:__main__:#### 5.2
INFO:__main__:#### 5.3
INFO:__main__:#### 6.1
INFO:llama_index.graph_stores.nebulagraph:
#### nebulagraph query called
INFO:llama_index.graph_stores.nebulagraph:query =
MATCH (p:`entity`)-[:relationship]->(m:`entity`) WHERE p.`entity`.`name` == 'Peter Quill'
RETURN p.`entity`.`name`;
INFO:llama_index.graph_stores.nebulagraph:param_map =
INFO:__main__:#### 6.2
Looks like it's first querying for all the subjects, then using those to get all property, objects.
Credit balance $0.20
Boo! VSCode hung when I was trying to rename a file. For a clean slate I rebooted. Browser opened with :
Credit balance $0.15
Commented out the text-to-NebulaGraphCypher bits again.
So next, attacking sparql.py
again (last time was with ChatGPT helping, only served to confuse me) :
The bits that use sparqlwrapper I'll put in helper functions so connection can be tested independently.
Had a long break, back to desk, noticed the schema in nebulagraphstore.py
appears to be NebulaGraph-specific. Not implemented in simple.py
, nothing to suggest it's essential elsewhere (as long as the DB don't care).
Should probably do anyhow, just for self-domumenting purposes. Couple of relevant terms :
http://www.w3.org/2002/07/owl#Ontology http://www.w3.org/2004/02/skos/core#inScheme
(playing a little fast & loose with domain & range, but I think is valid)
Started filling out sparql.py
. I thought I'd got to the point where it would INSERT DATA to store, instead:
python graph-rag-sparql-minimal.py
[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data] Unzipping tokenizers/punkt.zip.
<IPython.core.display.Markdown object>
Yay!!!
It appears to be passing triples to SPARQL store ok.
There was a char issue, but I've added escaping. But when I ran it :
openai.error.RateLimitError: You exceeded your current quota, please check your plan and billing details.
I was ready to call it a day anyway. Some money is due in my bank tomorrow, fingers crossed.
The next bit, the query part, should be straightforward, starting with :
SELECT ?s ?p ?o WHERE {
GRAPH <http://purl.org/stuff/guardians> {
?s ?p ?o
}
}
Ok, time to attack the bits listed at the end of yesterday's post.
Break time. Made some progress, but got a bit distracted starting a master plan. Pretty much than same as I'd already put in the GraphRAG doc, but expanded a bit, more background, hopefully a better explanation for folks that occupy other Venn diagrams.
export PYTHONPATH=$PYTHONPATH:/home/danny/AI/LIBS-under-dev/llama_index
Added extra logging to nebulagraph.py
, just markers on functions defined in types.py are being called -
logging.basicConfig(filename='loggy.log', filemode='w', level=logging.INFO)
logger = logging.getLogger(__name__)
logger.info('nebulagraph HERE')
...
logger.info('#### nebulagraph client(self) called')
After running python graph-rag-nebulagraph-minimal.py
, what was in loggy.log (in the dir I ran from) :
INFO:llama_index.graph_stores.nebulagraph:HERE
INFO:llama_index.indices.loading:Loading all indices.
INFO:llama_index.indices.loading:Loading all indices.
INFO:llama_index.graph_stores.nebulagraph:#### get_schema called
Ok, that's a start. For a bit more coverage, I'll do the same to simple.py
and the skeletal sparql.py
.
INFO:llama_index.graph_stores.nebulagraph:nebulagraph HERE
INFO:llama_index.graph_stores.simple:simple HERE
INFO:llama_index.indices.loading:Loading all indices.
INFO:llama_index.indices.loading:Loading all indices.
INFO:llama_index.graph_stores.nebulagraph:#### nebulagraph get_schema called
grr, forgot again,
sudo /usr/local/nebula/scripts/nebula.service start all
So far nebula-rdf-dump.py
is making quasi-Turtle triples from the NebulaGraph data. If I provide some namespace prefixes that'll make it proper.
er
for my minimal Entity-Relationship vocab, use the base
to complete the instance data URIs.
Easy one, prepend it with :
@prefix er: <http://purl.org/stuff/er#> .
@base <http://purl.org/stuff/data> .
Using f formatting in the code, run...
Check :
rapper: Parsing URI file:///home/danny/AI/nlp/GraphRAG/nebula-sparql-utils/guardians.ttl with parser turtle
rapper: Parsing returned 2380 triples
Good-o.
Now to figure out the SPARQL to pust the stuff to the store.
Different prefix syntax, https://www.w3.org/TR/sparql11-query/#syntaxTerms
BASE <http://purl.org/stuff/data>
PREFIX er: <http://purl.org/stuff/er#>
What queries are needed?
https://www.w3.org/TR/sparql11-update
CREATE ( SILENT )? GRAPH IRIref
DROP ( SILENT )? (GRAPH IRIref | DEFAULT | NAMED | ALL )
INSERT DATA QuadData
where QuadData are formed by TriplesTemplates, i.e., sets of triple patterns, optionally wrapped into a GRAPH block.
Ok, so here I guess an update call for :
CREATE GRAPH <http://purl.org/stuff/guardians>
{prefixes}
INSERT DATA {
GRAPH <http://purl.org/stuff/guardians>
{ {triples} } }
ew, need to escape {} ... nah, less thought needed to concatenate with +
Good-oh, sparqlwrapper has an update example https://sparqlwrapper.readthedocs.io/en/latest/main.html#sparql-update-example (with auth)
danny@danny-desktop:~/AI/nlp/GraphRAG/nebula-sparql-utils$ python nebula-rdf-dump.py
Update succeeded
Update succeeded
Looking good so far...check data at
https://fuseki.hyperdata.it/#/dataset/llama_index-test/query
SELECT ?s ?p ?o WHERE {
GRAPH <http://purl.org/stuff/guardians> {
?s ?p ?o
}
}
LIMIT 50
Nothing!!!
Tried the SPARQL as dumped to file, then SELECT as above - looks fine.
DROP GRAPH <http://purl.org/stuff/guardians>
check code around sparqlwrapper...
Hmm, can you reuse clients (create graph & insert)? Apparenty not. Added code to make a new one - it worked!
yardstone reached!
Tired, but looking at next steps -
from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore
...
%pip install nebula3-python ipython-ngql
os.environ['NEBULA_USER'] = "root"
os.environ['NEBULA_PASSWORD'] = "nebula" # default password
os.environ['NEBULA_ADDRESS'] = "127.0.0.1:9669" # assumed we have NebulaGraph installed locally
space_name = "guardians"
edge_types, rel_prop_names = ["relationship"], ["relationship"] # default, could be omit if create from an empty kg
tags = ["entity"] # default, could be omit if create from an empty kg
graph_store = NebulaGraphStore(
space_name=space_name,
edge_types=edge_types,
rel_prop_names=rel_prop_names,
tags=tags,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
Looks like most of what StorageContext does is done through the graph store implementations, eg. SimpleGraphStore
Ok, I reckon next, going back to putting tracer logging in NebulaGraphStore
nebulagraphstore.py
but not the OpenAI APInebulagraphstore.py
currently at :
~/.local/lib/python3.11/site-packages/llama_index
noo...that has a single subdir readers
Ok, this may or may not work - a symlink
Grr...next admin session I must set up systemd
sudo /usr/local/nebula/scripts/nebula.service start all
for ChatGPT :
I'd like a function to remove duplicates from a Python json structure. For example, give the following :
[
{'s': 'production on Vol.3', 'p': 'was put on hold until', 'o': 'February 2021'
},
{'s': 'production on Vol.3', 'p': 'put on hold until', 'o': 'February 2021'
},
{'s': 'production on Vol.3', 'p': 'was put on hold until', 'o': 'February 2021'
}
]
the function should return :
[
{'s': 'production on Vol.3', 'p': 'was put on hold until', 'o': 'February 2021'
},
{'s': 'production on Vol.3', 'p': 'put on hold until', 'o': 'February 2021'
}
]
It got it right first time!
So, next to flip this JSON into RDF/SPARQL.
Not 100% sure, but I don't think I actually need the extract_entities
function, anything useful will also appear in extract_entities
and the types/roles are implied by the triplets.
sanitize
def escape_for_rdf(input_str)
- thank you ChatGPT.
Given the following JSON, how would I retrieve the values of s, p, o? [ {'s': "Industrial Light & Magic's StageCraft", 'p': 'was developed for', 'o': 'Disney+ Star Wars series The Mandalorian' } ]
took a little backwards & forwards, but we got there.
I'm really not sure what level of granularity will eventually be needed, for big lumps of data something else from the SPARQL Protocol would probably be better (like a full-on HTTP PUT). Per-triple DELETE/INSERT might be desirable elsewhere. But here it shouldn't get too big, the INSERT approach should work, and would also work per-triple.
The NebulaGraph notion of a Space appears very similar to RDF/SPARQL Named Graphs, so I'll go with that (more information, and things can get messy doing everything in the default graph).
This is what I came up with the other day :
# Simple Entity-Relation
@base <http://purl.org/stuff/data> .
@prefix er: <http://purl.org/stuff/er> .
<#T123> a er:Triplet ;
er:id "#T123" ;
er:subject <#E123> ;
er:property <#R456> ;
er:object <#E567> .
<#E123> a er:Entity ;
er:value "one" .
<#R456> a er:Relationship ;
er:value "two" .
<#E567> a er:Entity ;
er:value "three" .
I opted for URL-named resources rather than blank nodes or other IRI because soon it may help with sanity-checking, further down the line the potential for HTTP retrieval is nice to have. There are a lot of annotation/meta triples that could be added, but I reckon this is about the minimum necessary to fulfil the Wey Gu's Notebook with SPARQL use case.
I guess I need a fragment ID generator. Suboptimal but so it's not too hard on the eyes for now, I'll got with alpha(capitals)-numeric. How many chars?
Please write a function to generate a random 4-character string using only numeric characters and capital letters.
Ok.
Need to keep identity of entities/relationships. I'm not sure what should happen if the same string is pulled out more than once as entity and/or relationship from different contexts. Should they have different IDs (URLs)? I don't fancy a deep dive into llama_index structures right now, I'll wait for someone to tell me. Whatever, giving identical strings the same URL shouldn't make a huge difference either way.
Strings appearing both in the role of Relationship & Entity might also be thinking about. But again, for now, no big deal.
Ran current version -
<#THKOE> a er:Triplet ;
er:subject <#EK8WH> ;
er:property <#RJSJV> ;
er:object <#ELD8T> .
<#EK8WH> a er:Entity ;
er:value "Vol.3" .
<#RJSJV> a er:Relationship ;
er:value "is the first MCU film to feature" .
<#ELD8T> a er:Entity ;
er:value "the word fuck uncensored" .
Progress!
Continuing from yesterday on getting data from a NebulaGraph store and putting it into a SPARQL store.
I was able to get a condensed JSON rendition of the Entity data from NebulaGraph, now to do the same with Relationships
USE guardians;
MATCH (src:entity)-[e:relationship]->(dst:entity)
RETURN src, e, dst
LIMIT 10;
ChatGPT time -
The task will be to create a function to extend an existing Python file. The purpose of this is to take data of a specific shape from a NebulaGraph DB via the nebula3 API and save it in a condensed form as JSON text. I will first give you the nGQL statements that were used to create the NebulaGraph space, then upload a the file, then describe the function I would like to add to it. Here are the nGQL statements :
CREATE SPACE guardians(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);
:sleep 10;
USE guardians;
CREATE TAG entity(name string);
CREATE EDGE relationship(relationship string);
:sleep 10;
CREATE TAG INDEX entity_index ON entity(name(256));
In the file there is the API call : resp = client.execute_json('MATCH (v:entity) RETURN v') I will add : resp_rel = client.execute_json('MATCH (src:entity)-[e:relationship]->(dst:entity) RETURN src, e, dst') json_rel_str = resp_rel.decode('utf-8') Please write a function extract_relationships(json_rel_str) following the pattern of extract_entities(json_data)
What it gave me looks close, but the output was :
{'src': 'N/A', 'relationship': 'N/A', 'dst': 'N/A'
},
{'src': 'N/A', 'relationship': 'N/A', 'dst': 'N/A'
},
...
I will upload a file containing a single result from the database. I would like the data extracted from this by the function extract_relationships to be: [{"src": "production on Vol.3", "rel": "relationship", "dst": "February 2021"}] please modify the function accordingly
ChatGPT Finished working Show work It appears that I forgot to request the file upload. Could you please upload the file containing a single result from the database? This will help me modify the extract_relationships function to meet your requirements.
We went around in circles a lot. ChatGPT's mapping was repeatedly out, I hadn't looked closely neough at the NebulaGraph results.
But finally I ran with :
resp_rel = client.execute_json(
'MATCH (src:entity)-[e:relationship]->(dst:entity) RETURN src, e, dst' LIMIT 1)
effectively make it like :
[{"s": "production on Vol.3", "p": "was put on hold until", "o": "February 2021"}]
Yes! There are a lot of duplicates in the output but I'll just hack that for now. But I think I've done the unfmiliar bits now, the RDF/SPARQL side should be straightforward from here.
Enough for today.
This would probably have been much easier with JSON-LD. But since I started down this path...
danbri mentioned XMPP (for ChatGPT) yesterday, it appears I have ejabberd installed. Is it running?
Browser-based client :
https://conversejs.org/docs/html/quickstart.html
Pro-tip, if you just want to quickly test things locally, you can run make serve inside a checkout of the Converse repo. Converse is then hosted at http://localhost:8000
} from '../../../node_modules/lit/html.js'; // } from './html.js'; danny
cp node_modules/lit/html.js src/headless/utils/
grr, hateful thing, given up.
Installed gajim (via synaptic) instead
no immediate joy (where to cofig servers?)
Installed pidgin
isn't connecting - the Enabled box for the account unchecks itself
Active: activating (start) since Wed 2023-08-30 11:13:37 UTC; 3min 22s ago
...
Aug 30 11:15:27 localhost sh[443214]: 2023-08-30 11:15:27.334892+00:00 [error] Cannot ping watchdog: {badmatch,{error,eacces}}
looks like the install is old, I think done from .deb So,
Unpacking ejabberd (23.04-1) over (21.12-1) ...
...
Could not execute systemctl: at /usr/bin/deb-systemd-invoke line 142.
Oops!
Grrr, Cannot ping watchdog
again.
Google...
https://askubuntu.com/questions/1411679/ubuntu-22-04-ejabberd-apparmour-profile-broken
Description=A distributed, fault-tolerant Jabber/XMPP server
Documentation=https://www.process-one.net/en/ejabberd/docs/
After=epmd.service network.target
Requires=epmd.service
[Service]
Type=forking
User=ejabberd
Group=ejabberd
LimitNOFILE=65536
Restart=on-failure
RestartSec=5
ExecStart=/bin/sh -c '/usr/sbin/ejabberdctl start && /usr/sbin/ejabberdctl started'
ExecStop=/bin/sh -c '/usr/sbin/ejabberdctl stop && /usr/sbin/ejabberdctl stopped'
ExecReload=/bin/sh -c '/usr/sbin/ejabberdctl reload_config'
PrivateTmp=true
ProtectHome=true
ProtectSystem=full
TimeoutSec=300
[Install]
WantedBy=multi-user.target
Active: active (running) since Wed 2023-08-30 11:47:14 UTC; 33s ago
Yay!
From pidgin :
danja@hyperdata.it/ disconnected
Host Unknown
Port problem? Whatever, enough of that for now.
Hah! The second I returned to Lemmy setup, I saw mynotes from when I installed ejabberd last time :
2022-10-27
installed ejabberd
Creating config file /etc/default/ejabberd with new version
Creating config file /etc/ejabberd/ejabberd.yml with new version
Created symlink /etc/systemd/system/multi-user.target.wants/ejabberd.service → /
lib/systemd/system/ejabberd.service.
admin@hyperdata.it
/etc/ejabberd/ejabberd.yml
ejabberdctl reload_config
ejabberdctl register admin hyperdata.it [password]
https://hyperdata.it:5280/admin/
ejabberdctl register danny hyperdata.it [password]
ejabberdctl register device hyperdata.it [password]
ejabberdctl register marinella hyperdata.it [password]
locally
mosquitto_sub -u danny@hyperdata.it -P [password] -t "test/1" -d -v
new terminal
mosquitto_pub -u device@hyperdata.it -P [password] -t "test/1" -d -m "ABC"
...Whatever, enough of that for now.
I'm down to $0.23 OpenAI API credit, so until I next have $ need to look at things that don't need it. sparql.py
doesn't in itself need the OpenAI API, but a SPARQLy version of Wey Gu's Notebook is my target functional E2E test.
I still have NebulaGraph data generated from Wey's Notebook. I can use nebula-python to pull out data from there, RDFLib to build RDF, sparqlwrapper to push to store. The SPARQL needed will be essentially the same as for
sparql.py
. Also NebulaGraph <=> RDF utils would be nice to have (may already exist, but I'll pretend I didn't consider that, need to inform myself).
Then -
MOVE PATH FROM llama_index INSTALLED TO DEV TREE
sudo /usr/local/nebula/scripts/nebula.service start all
Started with :
resp = client.execute_json('MATCH (v:entity) RETURN v')
After converting this to string there were character encoding errors. Should really be investigated properly, but for now I'll just go with some quick & dirty sanitization (Valid RDF literals will probably need something anyhow).
json_str = resp.decode('utf-8')
appears to have fixed the encoding issues.
https://github.com/LemmyNet
recommends Ansible
I opted for the full version -
ew, initial bits painful, found the docs confusing, maybe assume more prior knowledge than I have (only used Docker a bit, have never used Ansible). More worrying, I haven't a clue what it's stuck on my server.
2023-08-30
The Ansible thing did look like it'd mostly run ok, but things definitely fell apart around nginx
. Very likely cause being host:port clash, as I already have an Apache server in front on hyperdata.it/127.0.0.1 :80 and :443
https://www.geeksforgeeks.org/changing-nginx-port-in-linux/
I changed the default config yesterday, but just looked again, there's actually :
/etc/nginx/sites-available/default
/etc/nginx/sites-available/hyperdata.it.conf
/etc/nginx/sites-enabled/default
/etc/nginx/sites-enabled/hyperdata.it.conf
So I've deleted the /etc/nginx/sites-enabled/default
symlink and changed hyperdata.it.conf
to use port 8421
I'm pretty sure I don't need https/443 on nginx, I've got Apache doing that, is proxying to Fuseki etc on plain http on different ports. So I've deleted that section from hyperdata.it.conf
Active: active (running) since Wed 2023-08-30 09:27:02 UTC; 12s ago
Yay!
Hmm. http://hyperdata.it:8421 just redirects to https://hyperdata.it/
On server,
Currently reorganising the server. I'll put my FOAF/Solid here soon, for now my contact details should be at : https://hyperdata.it/ note to self : this is /opt/lampp/htdocs/profile/index.html
Aha! That's my placeholder for https://danny.ayers.name/
, configured in Apache.
Where are my notes on Apache setup..?
~/HKMS/server-setup/
apparently - TODO make that easier to find
distraction danbri mentioned XMPP (for ChatGPT) yesterday, it appears I have ejabberd installed. Is it running? see ejabberd Notes
https://lemmy.hyperdata.it
http://lemmy.hyperdata.it http://lemmy.hyperdata.it:8421 https://lemmy.hyperdata.it:8421 http://hyperdata.it:8421 https://hyperdata.it:8421
lynx http://127.0.0.1:8421
root@localhost:~/temp# wget http://127.0.0.1:8421
--2023-09-01 11:19:35-- http://127.0.0.1:8421/
Connecting to 127.0.0.1:8421... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://127.0.0.1/ [following]
--2023-09-01 11:19:35-- https://127.0.0.1/
Connecting to 127.0.0.1:443... connected.
ERROR: certificate common name ‘danny.ayers.name’ doesn't match requested host name ‘127.0.0.1’.
To connect to 127.0.0.1 insecurely, use `--no-check-certificate'.
The only 301 that looked relevant was in :
nano /etc/nginx/sites-available/hyperdata.it.conf
so I removed that block
systemctl restart nginx.service
https://lemmy.hyperdata.it/ seems to redirect to http://lemmy.hyperdata.it/, nginx starter page.
That redirect was needed!
put it back, random longshot, removed (shouldn't be needed after Apache..?):
location /.well-known/acme-challenge/ {
root /var/www/certbot;
}
in /var/log/nginx/access.log
127.0.0.1 - - [01/Sep/2023:11:31:41 +0000] "GET / HTTP/1.1" 301 162 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
It looks like it's getting that far, but the lemmy bit isn't doing anything
https://join-lemmy.org/docs/administration/troubleshooting.html
root@localhost:/home/hmks-apps/lemmy/hyperdata.it# docker-compose logs -f lemmy
...
lemmy_1 | Is the server running on host "postgres" (172.18.0.3) and accepting
lemmy_1 | TCP/IP connections on port 5432?
...
I was in the middle of typing a message to ask for help on r/lemmy, when something occurred to me. Maybe Ansible doesn't do the external services. It's plausible that I'd already installed nginx, Ansible just added a config. But postgres, pictrs (?), the smpt service..?
sudo apt install postgresql
The postgresql service is an "umbrella" service whose purpose is to start or stop the services named postgresql@version-instance which are the actual services that you are interested in. To get the statuses of these, run sudo systemctl status 'postgresql*'
a PostgreSQL Cluster is running...
https://www3.ntu.edu.sg/home/ehchua/programming/sql/PostgreSQL_GetStarted.html
initial client access :
there is a DB postgres, owner postgres
wait, https://github.com/LemmyNet/lemmy/issues/1847
postgres was already running under docker :
making sure it's using the same password as in lemmy.hjson
VkNyXOFMCgb0sGjraJUz
root@localhost:/home/hmks-apps/lemmy/hyperdata.it# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e519719e228e nginx:1-alpine "/docker-entrypoint.…" 2 days ago Up 2 days 80/tcp, 0.0.0.0:20140->8536/tcp, :::20140->8536/tcp hyperdatait_proxy_1
58f4af628105 dessalines/lemmy-ui:0.18.4 "docker-entrypoint.s…" 2 days ago Up 2 days 1234/tcp hyperdatait_lemmy-ui_1
9da7a413d90b dessalines/lemmy:0.18.4 "/app/lemmy" 2 days ago Up 2 days hyperdatait_lemmy_1
3394683cf319 postgres:15-alpine "docker-entrypoint.s…" 2 days ago Up 2 days 5432/tcp hyperdatait_postgres_1
3792edf55801 asonix/pictrs:0.4.0 "/sbin/tini -- /usr/…" 2 days ago Up 2 days 6669/tcp, 8080/tcp hyperdatait_pictrs_1
157491f6495f mwader/postfix-relay "/root/run" 2 days ago Up 2 days 25/tcp
ew, nginx & posgres
docker stop e519719e228e (nginx) docker restart 3394683cf319 docker restart 3792edf55801 docker restart 157491f6495f docker restart 9da7a413d90b docker restart 58f4af628105
no change
systemctl stop nginx docker start e519719e228e
slight change
https://lemmy.hyperdata.it/ -> http://lemmy.hyperdata.it/
503 Service Unavailable
that's a change. Next need to check the logs. But enough for today.
Aaargh! OpenAI API credit $0.21, and I won't have any money to top it up for another week or so
What doesn't need the API?
FOR TEMP CHANGES, UNMODIFIED FILES ARE IN ../original (I should probably make a branch in git, but then I'm likely to forget...)
My attention's been skipping around, spent a lot of time trying to engage with ChatGPT. There are bits of prompts I tried in llama_index SPARQL Notes 01.
Time to get back to the code proper (continuing from llama_index SPARQL Notes 00).
I've roughed out the shape of RDF I think I need :
# Simple Entity-Relation
@base <http://purl.org/stuff/data> .
@prefix er: <http://purl.org/stuff/er> .
<#T123> a er:Triplet ;
er:id "#T123" ;
er:subject <#E123> ;
er:property <#R456> ;
er:object <#E567> .
<#E123> a er:Entity ;
er:value "one" .
<#R456> a er:Relationship ;
er:value "two" .
<#E567> a er:Entity ;
er:value "three" .
But before writing the SPARQL I want to go back to nebulagraph.py
, add some logging calls to see what it's actually passing around. Then go back to the tests, then forward...
$0.21 mode
Is there still data in my local NebulaGraph? I could populate a SPARQL store with that.
sudo /usr/local/nebula/scripts/nebula.service restart all
./nebula-console -addr 127.0.0.1 -port 9669 -u root -p password
wait, there's the GUI, NebulaGraph Studio -
http://localhost:7001/login
USE guardians;
-- Fetch 10 vertices with the 'entity' tag
MATCH (v:entity)
RETURN v
LIMIT 10;
Yay!
One results column labelled v
("$118.4 million" :entity{name: "$118.4 million"})
...
-- Fetch 10 edges with the 'relationship' type
MATCH (src:entity)-[e:relationship]->(dst:entity)
RETURN src, e, dst
LIMIT 10;
Results table column labels are src, e, dst
("production on Vol.3" :entity{name: "production on Vol.3"}) [:relationship "production on Vol.3"->"February 2021" @-8998665471782897487 {relationship: "was put on hold until"}] ("February 2021" :entity{name: "February 2021"})
Ok, enough for today.
FOR TEMP CHANGES, UNMODIFIED FILES ARE IN ../original
Today's task : as yesterday.
It's occurred to me that it would be helpful to look at the data that nebulagraph.py actually looks at - pop a few logging calls in there.
(Maybe also probe - https://github.com/vesoft-inc/nebula-python )
Before that, need to have python looking at my llama_index tree. PYTHON_PATH is empty so -
>>>import sys
>>> print(sys.path)
>>> ['', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '/home/danny/.local/lib/python3.11/site-packages', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.11/dist-packages']
Ok, found it at :
~/.local/lib/python3.11/site-packages/llama_index
Hmm, which of Wey Gu's demos is the most minimal that uses NebulaGraph?
Custom instructions
Act as a expert in Python and graph data structures, in particular the RDF model and SPARQL. Be prepared for questions relating to Large Language Models, Retrieval-Augmented Generation and the llama_index libraries. Follow best practices everywhere, eg. including short comments as necessary in code. Outside of code, keep all responses and descriptions very concise and matter-of-fact. Whenever you think it's possible I have further information or code files to upload that will help in the tasks, please ask.
My name is Danny.
~~Act as a expert in Python and graph data structures as implemented in NebulaGraph and RDF and SPARQL. ~~
Existing code in llama_index includes communications to a NebulaGraph database via the nebula-python library. The goal for us today is to develop code in a file sparql.py having the same functionality as nebulagraph.py, but instead of NebulaGraph will connect to a SPARQL store via the sparqlwrapper library from RDFLib. There are several subtasks to do first, which I will describe after uploading some relevant code.
Here a zip of llama_index/llama_index/graph_stores
Before writing Python, we will look at the data model. The following are the nGQL statements used to prepare the database for for nebulagraph.py. How would this an equilent schema be expressed in RDF, and what would an example of instance data look like?
CREATE SPACE guardians(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);
:sleep 10;
USE guardians;
CREATE TAG entity(name string);
CREATE EDGE relationship(relationship string);
:sleep 10;
CREATE TAG INDEX entity_index ON entity(name(256));
ChatGPT returns :
In Nebula Graph, you're defining a space guardians with vertex IDs as fixed-length strings. You also define a tag entity with a property name of type string, and an edge type relationship with a property relationship of type string. Finally, you create an index on the name property of the entity tag.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.org/> .
# Define the Entity class
ex:Entity rdf:type rdfs:Class ;
rdfs:label "Entity" ;
rdfs:comment "An entity with a name" .
# Define the name property
ex:name rdf:type rdf:Property ;
rdfs:label "name" ;
rdfs:comment "The name of the entity" ;
rdfs:domain ex:Entity ;
rdfs:range xsd:string .
# Define the Relationship class
ex:Relationship rdf:type rdfs:Class ;
rdfs:label "Relationship" ;
rdfs:comment "A relationship between entities" .
# Define the relationship property
ex:relationship rdf:type rdf:Property ;
rdfs:label "relationship" ;
rdfs:comment "The type of relationship" ;
rdfs:domain ex:Relationship ;
rdfs:range xsd:string .
and
@prefix ex: <http://example.org/> .
# Entities
ex:Alice rdf:type ex:Entity ;
ex:name "Alice" .
ex:Bob rdf:type ex:Entity ;
ex:name "Bob" .
# Relationship
ex:Alice ex:relationship ex:Bob ;
rdf:type ex:Relationship ;
ex:relationship "friend" .
using existing files under llama_index/llama_index/graph_stores as a guide, in particular the interface defined
ex:Relationship rdf:type rdfs:Class ; rdfs:label "Relationship" ; rdfs:comment "A relationship between entities" .
Ok, do it by hand ...
But what/where are the IDs needed? ok, maybe better :
# Simple Entity-Relation
@base <http://purl.org/stuff/data> .
@prefix er: <http://purl.org/stuff/er> .
<#T123> a er:Triplet ;
er:id "#T123" ;
er:subject <#E123> ;
er:property <#R456> ;
er:object <#E567> .
<#E123> a er:Entity ;
er:value "one" .
<#R456> a er:Relationship ;
er:value "two" .
<#E567> a er:Entity ;
er:value "three" .
RDFS something like -
@prefix er: <http://purl.org/stuff/er> .
er:Entity a rdfs:Class ;
rdfs:label "Entity" ;
rdfs:comment "An entity..." .
er:Relationship a rdfs:Class ;
rdfs:label "Relationship" ;
rdfs:comment "A relationship between entities" .
er:Triplet a rdfs:Class ;
rdfs:label "Triplet" ;
rdfs:comment "A 3-tuple expressing a relationship between entities" .
er:subject a rdf:Property ;
rdfs:label "subject" .
er:subject rdfs:domain er:Entity .
er:subject rdfs:range er:Entity .
er:subject a rdf:Property ;
rdfs:label "subject" .
er:subject a rdf:Property ;
rdfs:label "subject" .
rdfs:comment "An entity..." .
Probably not needed.
Time to move onto another doc
Work in progress, scrappy notes & thoughts to pull together. What sprang to mind, not thought through, likely much rubbish.
So I finally got around to reading the recent Graph of Thoughts : Solving Elaborate Problems with Large Language Models paper. "Finally" in the sense of it only appeared at Arxiv just over a week or so ago, I printed a couple of days ago. Things are moving sooo fast...
I do need to re-read it a few more times, also check some of the refs, there are bound to be papers that address the bits mentioned below (several are graph-related).So, initial thoughts. First is meta, the most important, subsuming everything else: It's really well written!. Research only really becomes useful when it's communicated. The text is as plain English as you can get with such material. Loads of acronyms, but that's unavoidable. But where they are significant, they're expanded and explained. Only as much maths as necessary, lovely - so often in this field, complicated sums muddy the water. Lots of clear-enough diagrams. Content-wise : Strong Accept.
Regarding content, in no particular order :
There are really tasty chunks, I really like the general approach.
For starters, identification of concrete goals : How best to aggregate thoughts to maximise accuracy and minimise cost?
.
Some of the terms used (I've no idea which originated here or elsewhere, is irrelevant), like Graph of Operations, GoO
and Graph Reasoning State, GRS
are great markers for particular concepts.
The metric Volume the authors provide looks like it could be incredibly useful. It's stated in a couple of places, the plain English version is :
We define volume - for a given thought t - as the number of preceding LLM thoughts that could have impacted t
The evaluation seems broad enough to be reliable, applied in a way that makes sense. The use cases are seriously uninspiring, but I'd say that adds weight to this thing having potential - I think it would do better on harder problem.
Frankly I am very surprised they got results as good as they did. Ok, I personally think approaches along these lines can offer huge benefits over current Tree-of-Thought etc techniques. But I've had given it another year or so of architecture trial and error before seeing any noticeable gain. One of the use cases is keyword counting - from what I've seen, ChatGPT's ability with arithmetic is below that of a 5-year old, only marginally above that of a certain former US president.
Description Logics as a generalization
is all quasi-procedural logic I things like applying tabl
https://en.wikipedia.org/wiki/Method_of_analytic_tableaux
IBIS
to ping @DrIBIS @doriantaylor
https://web.archive.org/web/20030418021226/http://ideagraph.net/xmlns/ibis/
https://vocab.methodandstructure.com/ibis
I hadn't thought through how to do it (see this paper!), but the Graph of Thoughts notion was a very visible next step after Chain... and Tree.... I discussed it with ChatGPT a little while ago. This reinforced my feeling that it should be a productive avenue, so at the end of our conversation I asked ChatGPT for advertizing copy to promote it :
"Unleash the power of graph-of-thought in LLM dialogues! Represent ideas as nodes, relationships as edges, and explore a web of knowledge. Navigate non-linearly, uncover intricate connections, and foster context-aware discussions."
https://www.siwei.io/en/posts/
Conceptual Maps (Sowa) - his site? Concept Maps, Semantic Web/Linked Data
Layer Cake (bis)! LLM stack https://medium.com/@brian_90925/llms-and-the-emerging-ml-tech-stack-6fa66ee4561a https://outerbounds.com/blog/llm-infrastructure-stack/
WebLM? WebML? Intelligent Web
Agents
Let's see it it works
Note to self : check layoutof of original makesite
hmm...
This is a continuation of notes from GraphRAG. I'm currently writing a SPARQL connector for llama_index, and jotting notes as I go along. My Wordpress install recently broke, but because a SPARQL-backed blog engine was on my TODO list I let it ride. But it occurred to me that an easy stopgap would be to use a minimal static blog/site builder as an interim fix (makesite.py fit the bill), use it for the GraphRAG notes too. Nice layout with bells & whistles to follow!
Implementing a naive SPARQL connector, just enough to replicate the functional of Wey Gu's original demo with a SPARQL store rather than NebulaGraph.
I've read the books, but still my inclination on something like this would usually be to go straight to the implementation, only making tests when it (predictably) doesn't work. But my Python is weak and this codebase is new to me, so test-driven it is.
There don't appear to be any tests around the graph stores yet, so start against existing implementations, so, to build :
llama_index/tests/graph_stores/test_simple.py
llama_index/tests/graph_stores/test_nebulagraph.py
- see how far I get, I probably won't understand what the helpers do, Wey Gu should be happy to improvellama_index/tests/graph_stores/test_sparql.py
llama_index/graph_stores/sparql.py
for reference :
llama_index/graph_stores/types.py
- graph store interface vital ref
llama_index/graph_stores/simple.py
- minimal graph store (in-memory dictionary)
llama_index/graph_stores/nebulagraph.py
- NebularGraph connector
llama_index/tests/vector_stores/test_cassandra.py
- related test implementation
llama_index/tests/vector_stores/test_postgres.py
- related test implementation
/llama_index/docs/examples/index_structs/knowledge_graph/nebulagraph_draw.html
etc
Trying existing tests
pytest tests
missing dependency -
try again, 4 FAILED, all from what I roughed out yesterday :)
5 ERROR, eg.
ERROR tests/llm_predictor/vellum/test_predictor.py::test_predict__basic - ModuleNotFoundError: No module named 'vellum'
dependency?
9 failed, 247 passed, 79 skipped, 31 warnings in 2.75s
not-me failures around vellum
FAILED tests/llm_predictor/vellum/test_prompt_registry.py::test_from_prompt__new - ModuleNotFoundError: No module named 'vellum.core'
https://pypi.org/search/?q=vellum
try -
Lucky! 4 failed, 252 passed, 79 skipped, 31 warnings in 3.40s
Those 4 are my rubbish.
How to run individual test files? https://stackoverflow.com/questions/36456920/specify-which-pytest-tests-to-run-from-a-file
collected 0 items
test naming conventions https://stackoverflow.com/questions/37353960/why-pytest-is-not-collecting-tests-collected-0-items
Ooops - I used the wrong path.
5 failed, 1 passed in 0.25s
1 failed in 0.17s
I'd see the annotation @pytest.fixture
elsewhere, but for pytest the test_naming was what it needed :
def test_one_false():
assert 1 == 0
ok, time to actually write code...
I hadn't really explored ChatGPT's Code Interpreter. Had seen that you could upload source files, best as a zip.
So I thought, see how far it gets with llama_index/graph_stores/
Let's see it it works
Check layoutof of original makesite
far in the future
I feel very stupid.
I've written several blog engines in various languages. Never once started from scratch on design. What I mean is that I've always taken as my requirements the bits I liked about other blog engines, prioritised a bit, followed that to-do list. Never started with an absolutely minimal reverse-chrono
I rather like this mode of blogging, type markdown into VSCode, makesite.py
converts/formats it, use git (in a script) to transfer to server.
First, add a bit to p (publish) to ssh into server, then :
cd /home/hkms-apps/hyperdata-static
git pull
Soonish :
sparqlwrapper
and ChatGPT)I reckon I'll split makesite.py
up, modularise a bit. Even if I move to all SPARQL-backed for the blog, it's bound to be useful somewhere.
Things for later :
similar to https://github.com/ozekik/markdown-ld
https://raw.githubusercontent.com/ozekik/markdown-ld/master/examples/foaf.md
Requirements :
For most of 1 & 2, make it easy to put in pipeline, something like :
markdown, other = mx(markdown_with_extensions)
for 3, maybe :
mx:turtle '''
<this> a x:That.
For 4, for starters I'd probably be good to had some standard prefixes as default, added by the processor, which could be overriden :
mx:turtle mx:prefixes '''
prefix ...
<this> a x:That.
'''
>>> pair = (1, 2)
>>> pair
(1, 2)
>>> x, y = pair
>>> x
1
>>> y
2
>>> pair[0]
1
>>> pair[
Markdown Extension Vocab, mx:
An immediate irritation I have is that I'm coding in VSCode and also writing notes in it. Navigating between the two activities is a pain.
The problem is broader still because during the coding I have to do other bits of context-switch to look at/edit data, project management, bookmarks...
I've got most of these things at various stages of dev in HKMS, but accessing these via a browser bypasses my local FS, access there is desirable.
So I've been looking at Pulsar, a fork of the discontinued Atom editor. Plan is to tweak it to fit my needs.
I did like Atom as an editor, and hopefully as it's built on Electron it'll make a convenient local host for the HKMS apps.
Set up a repo : hkms-desktop
I'm having a few teething problems with Pulsar.
The ppm package manager it uses is in a different repo. The install instructions didn't work for me, so playing it by ear.
hmm, lost the paths
export ATOM_HOME=/home/danny/.pulsar
export APM_PATH=/home/danny/HKMS/hkms-desktop/ppm/bin/apm
export ATOM_ELECTRON_VERSION=12.2.3
ok, now packages seem to work.
Added https://github.com/denieler/save-workspace-atom-plugin not sure it's gonna be useful...
Added PlatformIO IDE Terminal
https://www.electronjs.org/docs/latest/tutorial/using-native-node-modules
I wanted an RDF Schema for the LlamaIndex sparql.py
. It isn't necessary at this point in time, but nice to have for documentation & completeness.
Just links for now.
http://purl.org/stuff/er
Oops! LODE docs are meant to be generated live, so that's missing bits. TODO fix
https://hyperdata.it/xmlns/er/er-schema.ttl
https://hyperdata.it/xmlns/
https://hyperdata.it/schema-editor/
https://github.com/danja/schema-editor
https://essepuntato.it/lode/
https://fuseki.hyperdata.it/#/dataset/schema-test/query
https://fuseki.hyperdata.it/#/dataset/vocabs/query
https://howarddierking.com/2016/12/02/generating-documentation-sites-for-linked-data-vocabularies/
https://github.com/howarddierking/metalsmith-ldschema
https://metalsmith.io/
https://answers.knowledgegraph.tech/t/generating-documentation-from-rdfs-and-or-owl-vocabularies/4829
https://www.reddit.com/r/semanticweb/comments/tr1v5o/tools_to_generate_documentation_site_for_classes/
https://www.w3.org/2011/prov/wiki/Generating_HTML_documentation_of_OWL
https://protege.stanford.edu/software.php
https://www.w3.org/DesignIssues/TopTen.html
A bit of forward-planning for blog engine stuff. This went on my todo list the other day, since then I've had a think, thought I'd better get it down before I forget.
The implementation for makesite.py
should be straightforward, but I've other things to do. But I can start using these extensions in posts today, no matter that they won't do anything yet. When I've got a demo implementation and tried it out a bit I'll give it a repo on GitHub.
How to express RDF statements in Markdown?
Note re. (2) : the markup syntax used will be interpreted as a processing instruction, so while Turtle creation/extraction is the immediate goal, it makes sense for extensibility to consider other possible uses.
` :term fur`
``` :term fur```
TODO express in BNF TODO provide regexes
` :tag fur`
:tag
will be interpreted as the term tag
from the namespace http://purl.org/stuff/mx/
and used as the property of the triplefur
will be used as the literal object of the tripleTODO result
In this example fur
is one word, a simple string delimited by spaces. Alternatives will include quoting of literals "as in Turtle"
for the object as well as the use of URIs using standard Turtle syntax.
TODO longer example
TODO fill out as needed, find standard vocab equivalents
Where a block of Turtle should be extracted, the term mx:x
should be used, e.g.
```:x @base http://example.org/ . @prefix foaf: http://xmlns.com/foaf/0.1/ . @prefix rel: http://www.perceive.net/schemas/relationship/ .
<#green-goblin> rel:enemyOf <#spiderman> ; a foaf:Person ; # in the context of the Marvel universe foaf:name "Green Goblin" . ```
TODO
for eg. mx:tags - provide a simple list syntax
Terms MAY be interpreted as those in the mx namespace and/or well-known equivalents
How to say what should be passed to standard markdown processor, what should be cut?
eg. :
contents = mx(markdown_with_extensions)
markdown = contents['markdown']
turtle = contents['turtle']
html = to_html(markdown)
store.add(turtle)
using standard linux tools, how do I modify a text file to remove the spaces from any strings in it of the following form : '<mbox:dylan_beadle -at- yahoo.com>'. Match the < > characters, I'd like the result 'mbox:dylan_beadle-at-yahoo.com'
(some fails with sed)
perl -pe 's/mbox:([^]+)>/my $x = $&; $x =~ s| -at- |-at-|g; $x =~ s| ||g; $x/ge' main_backup.n3 > main_backup-cleaner.n3
Could you now extend the previous Perl so that : <mbox:dylan_beadle -at- yahoo.com> becomes mbox:dylan_beadle@yahoo.com mbox:thaynes{at}openlinksw.com becomes mbox:thaynes@openlinksw.com
perl -pe 's|mbox:([^]+)>|my $x = $&; $x =~ s|\s*-at-\s*|@|g; $x =~ s|{at}|@|g; $x =~ s|\s||g; $x|ge' main_backup.n3 > main_backup-cleaner.n3
perl -pe 's|mbox:([^]+)>|my $x = $1; $x =~ s|\s*-at-\s*|@|g; $x =~ s|{at}|@|g; $x =~ s|\s||g; "mbox:$x"|ge' main_backup.n3 > main_backup-cleaner.n3