Just what I love as the first thing in my inbox on an early Friday morning: A nine-page, highly technical report that makes my caffeine-deprived head hurt, and ends with a declaration that the author has already made it to the weekend! ;-)
Jokes aside, thanks for this, Paul. Do you think it would be possible to extract from this:
1. Any Issues (bugs, feature requests) against D2RQ, stating what we should change/fix? These should go here: https://github.com/d2rq/d2rq/issues
2. Something tutorial-like that’s clear enough to share on the D2RQ wiki? https://github.com/d2rq/d2rq/wiki
I would in particular appreciate help with getting any issues into the tracker.
Post by Paul MurrayApologies for the long email, but I'm trying to describe the problem completely. I'll write this mail as I attempt to make what I want to happen, happen. Maybe in the process of writing this I will solve it for myself :)
(TL;DR: I got it going! Yay! But I'll send this saga out anyway because it took me most of the afternoon to write it.)
Post by Richard CyganiakHi Paul,
A quick initial response.
Even though D2RQ is made for directly querying a relational DB with SPARQL, if you need to integrate data from multiple sources, I recommend dumping them all to RDF and loading them into a single RDF store. This is the best way to get performance and reliability. Of course, it may not be possible for you due to database size or quickly changing data.
1 - A set of OWL ontologies in static RDF. I load these so that SPARQL can be written over them - perhaps I might even be able to drive the Pellet reasoner off them. There are 50 or so. These include our local ontologies as well as a couple of standard ones - SKOS, TDWG (Taxonomic Database Working Group), dublin core, darwin core.
2 - Three separate TDB data sets
** The Australian Faunal Directory (AFD) data. 18G. This gets built by a batch process that takes a day and a bit to complete. We would like to update it once a month, or even weekly.
** The old Australian Plant Names Index (APNI) 10G. This used to be extracted alongside AFD, but this database is not longer being updated. The legacy data needs to remain present on the semantic web.
** A load of the Catalogue of Life, 2011. 34G. Again, this will no longer get updated, I think (mainly because no-one is paying for it to be done). The key part of this data is that it contains mappings from the COL identifiers to AFD and APNI ones.
3 - The new APNI. This is the dataset that I would like to expose via d2r.
I keep this dataset organised by splitting it into named graphs. The named graph named 'meta'
<http://biodiversity.org.au/voc/graph/GRAPH#meta>
describes the named graphs on the service. This is not the default graph of the SPARQL endpoint … but perhaps it should be. At present the default graph on the server is an empty one because I was concerned that I might be exposing it to an update service.
So, you can see that there's reasons for building this hetrogenous mish-mash of multiple types of data sources. I have big data sets, updated on different schedules, using systems that have been in place and working for a couple of years now. Having the datasets stored as RDF and them loading them into memory with a parser is really not a great option. What I want is to append a D2R graph into this thing.
Post by Richard CyganiakThe D2RQ assembler works for me with the version of Joseki that ships with D2RQ, Joseki 3.4.4.
I’m not sure what you mean by “external assembler file.” Can you explain or give an example?
Ok, this is great news and what I was hoping to hear.
Steps are -
1 - get a standalone install of Joseki 3.4.4 running with an assembler with a couple of simple graphs.
2 - Get the joskei installation inside d2r doing this - acting as a sparql endpoint, serving up static graphs.
3 - Get the joseki installation inside d2r serving up the static graphs alongside a very simple d2r graph
4 - jamming the big thing into it.
Currently, the problem is step 2.
=== STEP 1 - A simple assembler that works ok with Joseki 3.4.4 ===
Let's start with the basics - I want to launch a sparql endpoint in the d2r distribution with just a couple of static, empty named graphs.
Once I have that going, jamming all of this other stuff in there is simply a mater of editing the assembler.
D2R is using Joseki 3.4.4 . Lets write a mapping file that I know works against 3.4.4 . To do this, I will grab a copy of Joseki 3.4.4.
. . .
OK. I save this assembler as simple-setup.ttl, and write a little sampledata.ttl with one row in it .
-----------------------------------
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix module: <http://joseki.org/2003/06/module#> .
@prefix joseki: <http://joseki.org/2005/06/configuration#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
<#graph1> a ja:MemoryModel
; ja:content
[ ja:externalContent <file:sampledata.ttl>
]
.
<#graph2> a ja:MemoryModel
; ja:content
[ ja:externalContent <file:sampledata.ttl>
]
.
<#graph3> a ja:UnionModel
; ja:subModel <#graph1>
; ja:subModel <#graph2>
.
<#server> a joseki:Server
.
<#service> a joseki:Service
; joseki:serviceRef "sparql"
; joseki:dataset
[ a ja:RDFDataset
; ja:namedGraph
[ ja:graphName <http://example.org/#graph1>
; ja:graph <#graph1>
]
; ja:namedGraph
[ ja:graphName <http://example.org/#graph2>
; ja:graph <#graph2>
]
; ja:namedGraph
[ ja:graphName <http://example.org/#uniongraph>
; ja:graph <#graph3>
]
]
; joseki:processor
[ a joseki:Processor
; module:implementation
[ a joseki:ServiceImpl
; module:className <java:org.joseki.processors.SPARQL>
]
]
.
-----------------------------------
When I save this into the Joseki directory and execute
$ export JOSEKIROOT=.
$ bin/rdfserver simpleconfig.ttl
And then browse to
http://localhost:2020/sparql?query=select+*+where+%7B+GRAPH+%3Fg+%7B+%3Fs+%3Fp+%3Fo+%7D+%7D&output=text
Then I get back some triples. Outstanding.
So! Let's try making this go in the d2r copy of joseki!
=== STEP 2 - Make the simple assembler run in joseki in the d2r2 installation ===
Ok. The d2r root directory does not have an rdf-server executable. It does have a d2r-server executable. This execuable has an option for a d2r mapping file, but not for a jena assembler.
$ ./d2r-server simpleconfig.ttl
15:44:52 WARN org.eclipse.jetty.webapp.WebAppContext :: Failed startup of context o.e.j.w.WebAppContext{,file:/Users/ibis/git/d2rq-0.8.1/webapp/},webapp
de.fuberlin.wiwiss.d2rq.D2RQException: No d2rq:Database defined in the mapping (E1)
Fine, that's what I expected. The d2r launcher launches an endpoint that has one graph, that graph being a d2r graph initialised by the parameter. It simply doesn't take a general JENA assembler as input.
So lets have a look at the joseki that's inside d2r. The various files in the joseki installation are not present. There is a joseki.jar in the lib directory and that's it.
JAVA_ARGS is just "-server -Xmx1G" , which we can ignore.
LOG is -Dlog4j.configuration=${LOGCONFIG}, which we can also ignore
And CP will have to be all the lib files in the joseki directory
$ CP=$(find lib -name *.jar | while read j ; do echo -n "$j:" ; done)
$ java -cp $CP joseki.rdfserver simpleconfig.ttl
Exception in thread "main" java.lang.NoClassDefFoundError: org/mortbay/jetty/Connector
at joseki.rdfserver.main(rdfserver.java:85)
Riiiiight. In any case, this at the very least isn't going to work because the WEB-INF won't be set up right. The joseki config alone isn't enough - the d2r installation isn't set up correctly to expose the joseki SPARQL endpoint as a service.
================= NEW PLAN =================
OK! New plan - we will include the d2r libraries in the joseki launch!
Well, there's a fair bit of duplication there, which is not a big problem. Except that there are some different versions of some of the libraries, which is bad.
Let's just make a humungous classpath and launch Joseki with the D2R libraries jammed in there. What could possibly go wrong? First, I'll try putting the joseki libraries first.
$ CP=$(find lib -name \*.jar | while read j ; do echo -n "$j:" ; done):$(find ~/git/d2rq-0.8.1/lib -name \*.jar | while read j ; do echo -n "$j:" ; done)
$ java -cp $CP joseki.rdfserver simpleconfig.ttl
SLF4J has a bit of a bitch about a class being in two of the jars, but apart from that it's all good.
So - let's try adding in a d2r assembler
Copy simpleconfig.ttl to withd2r.ttl, and add a mapping to the assembler.
<#graph4> a d2rq:D2RQModel;
d2rq:mappingFile <simplemapping.ttl>;
d2rq:resourceBaseURI <http://localhost:2020/test123>;
.
<#service> a joseki:Service
; joseki:serviceRef "sparql"
; joseki:dataset
[ a ja:RDFDataset
; ja:namedGraph
[ ja:graphName <http://example.org/#graph1>
; ja:graph <#graph1>
]
; ja:namedGraph
[ ja:graphName <http://example.org/#graph2>
; ja:graph <#graph2>
]
; ja:namedGraph
[ ja:graphName <http://example.org/#uniongraph>
; ja:graph <#graph3>
]
; ja:namedGraph
[ ja:graphName <http://example.org/#d2rqgraph>
; ja:graph <#graph4>
]
]
; joseki:processor
[ a joseki:Processor
; module:implementation
[ a joseki:ServiceImpl
; module:className <java:org.joseki.processors.SPARQL>
]
]
.
Ok, blows up with a
the root file:///Users/ibis/Software/Joseki/Joseki-3.4.4/withd2r.ttl#graph4 has no most specific type that is a subclass of ja:Object
This is ok - now I understand what that import is for :) . So lets include the import
<> ja:imports <http://d2rq.org/terms/d2rq> .
This blows up with a
Not found: file:///Users/ibis/Software/Joseki/Joseki-3.4.4/simplemapping.ttl
Which is awesome! The assembler loads and tries to do what I told it to do. Booyah! I'm excited, but I've been this excited before and been cruelly disappointed, all my hopes dashed.
So now I need to create a simple d2r mapping file and name it simplemapping.ttl
----------------------------------------------
@prefix map: <#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .
@prefix jdbc: <http://d2rq.org/terms/jdbc/> .
@prefix d2r: <http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/config.rdf#> .
@prefix nsl: <http://biodiversity.org.au/voc/nsl/NSL#> .
map:Configuration a d2rq:Configuration;
d2rq:serveVocabulary true
.
map:APNI_database a d2rq:Database;
d2rq:jdbcDriver "org.postgresql.Driver";
d2rq:jdbcDSN "jdbc:postgresql://localhost:5432/nsl";
d2rq:username "--DELETED--";
d2rq:password "--DELETED--";
.
map:APNI_Namespace a d2rq:ClassMap;
d2rq:dataStorage map:APNI_database;
d2rq:class nsl:Namespace;
d2rq:class nsl:IdentifiedEntity;
.
--------------------------------------------------
And launch - It launches! Run the query - and it complains that the database is down. Of course.
Start the database, run the query ... And I get an error
java.lang.IncompatibleClassChangeError: Implementing class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at de.fuberlin.wiwiss.d2rq.algebra.CompatibleRelationGroup.addNodeRelation(CompatibleRelationGroup.java:53)
at de.fuberlin.wiwiss.d2rq.algebra.CompatibleRelationGroup.groupNodeRelations(CompatibleRelationGroup.java:38)
Damn.
Ok - what if I load up the classpath with the d2r libraries first and the joseki libraries second?
$ CP=$(find ~/git/d2rq-0.8.1/lib -name \*.jar | while read j ; do echo -n "$j:" ; done):$(find lib -name \*.jar | while read j ; do echo -n "$j:" ; done)
$ java -cp $CP joseki.rdfserver withd2r.ttl
and browse to http://localhost:2020/sparql?query=select+*+where+%7B+GRAPH+%3Fg+%7B+%3Fs+%3Fp+%3Fo+%7D+%7D&output=text
and
--------------------------------------------------
OMG! OMG! It works! It works!
So: it seems I can make it go by launching joseki but putting all the d2r libraries in the classpath first. It's a little fragile, and what I would like is to not to have to do this - to have a prepackaged setup that works. Obviously the d2r installation has a bunch of stuff I don't need - the gear that supports the d2r web app pages. I still have not confirmed that it will co-operate with the multi-gigabyte TDB datasets I need to use. But ... it does seem to go. I can write a java app to scan the jar files and find which ones declare the same class names, to arrive at a set of jars that don't overlap.
Hopefully, sometime next week I'll be able to show you the new system serving the old and the new data together at biodiversity.org.au. But for now, it's 16:45 on Friday which makes it Beer O'Clock.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
d2rq-map-devel mailing list
https://lists.sourceforge.net/lists/listinfo/d2rq-map-devel