/ Dashboard / VirtuosoSesameProvider / WelcomeVisitors
 
  • Topic
  • Discussion
  • VirtuosoSesameProvider.WelcomeVisitors(Last) -- Sdmonroe? , 2008-07-08 12:02:22 Edit Sherman D Monroe 2008-07-08 11:02:22

    Virtuoso Sesame Provider

    Preliminary

    This tutorial assumes you have Virtuoso server installed and that the database is accessible at "localhost:1111". In addition, you will need the latest version of the Virtuoso Sesame Provider, and Sesame 2 or greater installed.

    What is Sesame

    Sesame is an open source Java framework for storing, querying and reasoning with RDF and RDF Schema. It can be used as a database for RDF and RDF Schema, or as a Java library for applications that need to work with RDF internally. For example, suppose you need to read a big RDF file, find the relevant information for your application, and use that information. Sesame provides you with the necessary tools to parse, interpret, query and store all this information, embedded in your own application if you want, or, if you prefer, in a separate database or even on a remote server. More generally: Sesame provides an application developer a toolbox that contains useful hammers screwdrivers etc. for doing 'Do-It-Yourself' with RDF.

    This tutorial covers the essentials for connecting to and manipulating data stored in a Virtuoso repository using the Sesame API. More information on the Sesame Framework, including extended examples on how to use the API, can be found in Chapter 8 of the Sesame User’s guideRDF, and the RepositoryConnection? API documentationRDF. What is the Virtuoso Sesame Provider

    The Virtuoso Sesame Provider allows users of Virtuoso to leverage the Sesame framework for modifying, querying, and reasoning with the Virtuoso quad store using the Java language. The Sesame Repository API offers a central access point for connecting to the Virtuoso quad store. Its purpose is to provides a java-friendly access point to Virtuoso. It offers various methods for querying and updating the data, while abstracting the details of the underlying machinery.

    Fig. 1 Sesame Component Stack

    In this tutorial, we explain the basics of how to program against the Sesame Repository API. The interfaces for the Repository API can be found in packages virtuoso.sesame2.driver and org.openrdf.repository. Several implementations for these interface exist in the Virtuoso Provider download package. The Javadoc reference for the API is available online and can also be found in the doc directory of the download.

    If you need more information about how to set up your environment for working with the Sesame APIs, take a look at Chapter 4 of the Sesame User Guide, Setting up to use the Sesame libraries:

    http://www.openrdf.org/doc/sesame2/users/RDF

    Creating a VirtuosoRepositoryRDF? object

    The first step to connecting to Virtuoso through the Sesame API is to create a Repository for it. The Repository object operates on (stacks of) Sail object(s) for storage and retrieval of RDF data.

    One of the simplest configurations is a repository that just stores RDF data in main memory without applying any inference or whatsoever. This is also by far the fastest type of repository that can be used. The following code creates and initialize a non-inferencing main-memory repository:

    import virtuoso.sesame2.driver.VirtuosoRepository;

    Repository myRepository = VirtuosoRepositoryRDF?("jdbc:virtuoso://localhost:1111”,”dba”,”dba”);

    myRepository.initialize();

    The constructor of the VirtuosoRepositoryRDF? class accepts the JDBC URL of the Virtuoso engine (the default port is 1111), the username and password of an authorized user. Following this example, the repository needs to be initialized to prepare the Sail(s) that it operates on, which includes operations such as restoring previously stored data, setting up connections to a relational database, etc.

    [The repository that is created by the above code is volatile: its contents are lost when the object is garbage collected or when the program is shut down. This is fine for cases where, for example, the repository is used as a means for manipulating an RDF model in memory. Using the Virtuoso repository with RepositoryConnecton?

    Now that we have created a VirtuosoRepositoryRDF?, we want to do something with it. This is achieved through the use of the VirtuosoRepositoryConnection?, which can be created by the VirtuosoRepositoryRDF?.

    A VirtuosoRepositoryConnection? represents - as the name suggests - a connection to the actual Virtuoso quad store. We can issue operations over this connection, and close it when we are done to make sure we are not keeping resources unnecessarily occupied.

    In the following sections, we will show some examples of basic operations using the Northwind dataset.

    Adding RDF to Virtuoso

    The Repository implements the Sesame Repository API offers various methods for adding data to a repository. Data can be added pro grammatically by specifying the location of a file that contains RDF data, and statements can be added individually or in collections.

    We perform operations on the repository by requesting a RepositoryConnection? from the repository, which returns a VirtuosoRepositoryConnection? object. On this VirtuosoRepositoryConnection? object we can perform the various operations, such as query evaluation, getting, adding, or removing statements, etc.

    The following example code adds two files, one local and one located on the WWW, to a repository:

    
    import org.openrdf.repository.RepositoryException;
    
    import org.openrdf.repository.Repository;
    
    import org.openrdf.repository.RepositoryConnection;
    
    import org.openrdf.rio.RDFFormat;
    
    import java.io.File;
    
    import java.net.URL;
    
    File file = new File("/path/to/example.rdf");
    
    String baseURI = "http://example.org/example/localRDF";
    
    …
    
    try {
    
       RepositoryConnection? con = myRepository.getConnection();
    
       try {
    
          con.add(file, baseURI, RDFFormat.RDFXML);
    
          URL url = new URL("http://example.org/example/remoteRDF");
    
          con.add(url, url.toString(), RDFFormat.RDFXML);
    
       }
    
       finally {
    
          con.close();
    
       }
    
    }
    
    catch (RepositoryException? rex) {
    
       // handle exception
    
    }
    
    catch (java.io.IOEXception e) {
    
       // handle io exception
    
    }
    
    

    More information on other available methods can be found in the javadoc reference of the RepositoryConnection? interface.

    Querying Virtuoso

    The Repository API has a number of methods for creating and evaluating queries. Three types of queries are distinguished: tuple queries, graph queries and boolean queries. The query types differ in the type of results that they produce.

    Select Query: The result of a select query is a set of tuples (or variable bindings), where each tuple represents a solution of a query. This type of query is commonly used to get specific values (URIs, blank nodes, literals) from the stored RDF data. The method QueryFactory.executeQuery() returns a Value[][] for sparql “SELECT” queries. The method QueryFactory.executeQuery() also calls the QueryFactory.setResult() which populates a set of tuples for SPARQL "SELECT" queries. The graph can be retrieved using QueryFactory.getBooleanResult().

    Graph Query: The result of graph queries is an RDF graph (or set of statements). This type of query is very useful for extracting sub-graphs from the stored RDF data, which can then be queried further, serialized to an RDF document, etc. The method QueryFactory.executeQuery() calls the QueryFactory.setGraphResult() which populates a graph for SPARQL “DESCRIBE” and “CONSTRUCT” queries. The graph can be retrieved using QueryFactory.getGraphResult().

    Boolean Query: The result of boolean queries is a simple boolean value, i.e. true of false. This type of query can be used to check if a repository contains specific information. The method QueryFactory.executeQuery() calls the QueryFactory.setBooleanResult() which sets a boolean value for sparql "ASK" queries. The value can be retrieved using QueryFactory.getBooleanResult().

    Note: Although Sesame 2 currently supports two query languages: SeRQL? and SPARQL, the Virtuoso provider only supports the W3C SPARQL specification.

    Evaluating a SELECT Query

    To evaluate a tuple query we simply do the following:

    
    import java.util.List;
    
    import org.openrdf.OpenRDFException;
    
    import org.openrdf.repository.RepositoryConnection;
    
    import org.openrdf.query.TupleQuery;
    
    import org.openrdf.query.TupleQueryResult;
    
    import org.openrdf.query.BindingSet;
    
    import org.openrdf.query.QueryLanguage;
    
    …
    
    try {
    
       RepositoryConnection? con = myRepository.getConnection();
    
       try {
    
          String queryString = "SELECT x, y FROM  WHERE {x} p {y}";
    
          TupleQuery? tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
    
          TupleQueryResult? result = tupleQuery.evaluate();
    
          try {
    
             … // do something with the result
    
          }
    
          finally {
    
             result.close();
    
          }
    
       }
    
       finally {
    
          con.close();
    
       }
    
    }
    
    catch (RepositoryException? e) {
    
       // handle exception
    
    }
    
    

    This evaluates a SPARQL query and returns a TupleQueryResult?, which consists of a sequence of BindingSet? objects. Each BindingSet? contains a set of pairs called Binding objects. A Binding object represents a name/value pair for each variable in the query’s projection.

    We can use the TupleQueryResult? to iterate over all results and get each individual result for x and y:

    
    while (result.hasNext()) {
    
       BindingSet? bindingSet = result.next();
    
       Value valueOfX = bindingSet.getValue("x");
    
       Value valueOfY = bindingSet.getValue("y");
    
       // do something interesting with the query variable values here…
    
    }
    
    

    As you can see, we retrieve values by name rather than by an index. The names used should be the names of variables as specified in your query. The TupleQueryResult.getBindingNames() method returns a list of binding names, in the order in which they were specified in the query. To process the bindings in each binding set in the order specified by the projection, you can do the following:

    
    List bindingNames = result.getBindingNames();
    
    while (result.hasNext()) {
    
       BindingSet? bindingSet = result.next();
    
       Value firstValue = bindingSet.getValue(bindingNames.get(0));
    
       Value secondValue = bindingSet.getValue(bindingNames.get(1));
    
       // do something interesting with the values here…
    
    }
    
    

    It is important to invoke the close() operation on the TupleQueryResult?, after we are done with it. A TupleQueryResult? evaluates lazily and keeps resources (such as connections to the underlying database) open. Closing the TupleQueryResult? frees up these resources. Do not forget that iterating over a result may cause exceptions! The best way to make sure no connections are kept open unnecessarily is to invoke close() in the finally clause.

    An alternative to producing a TupleQueryResult? is to supply an object that implements the TupleQueryResultHandler? interface to the query's evaluate() method. The main difference is that when using a return object, the caller has control over when the next answer is retrieved, whereas with the use of a handler, the connection simply pushes answers to the handler object as soon as it has them available.

    As an example we will use SPARQLResultsXMLWriter?, which is a TupleQueryResultHandler? implementation that writes SPARQL Results XML documents to an output stream or to a writer:

    
    import org.openrdf.query.resultio.sparqlxml.SPARQLResultsXMLWriter;
    
    …
    
    FileOutputStream? out = new FileOutputStream?("/path/to/result.srx");
    
    try {
    
       SPARQLResultsXMLWriter? sparqlWriter = new SPARQLResultsXMLWriter?(out);
    
       RepositoryConnection? con = myRepository.getConnection();
    
       try {
    
          String queryString = "SELECT * FROM  WHERE {x} p {y}";
    
          TupleQuery? tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
    
          tupleQuery.evaluate(sparqlWriter);
    
       }
    
       finally {
    
          con.close();
    
       }
    
    }
    
    finally {
    
       out.close();
    
    }
    
    

    You can just as easily supply your own application-specific implementation of TupleQueryResultHandler? though.

    Lastly, an important warning: as soon as you are done with the RepositoryConnection? object, you should close it. Notice that during processing of the TupleQueryResult? object (for example, when iterating over its contents), the RepositoryConnection? should still be open. We can invoke con.close() after we have finished with the result.

    Evaluating a CONSTRUCT query

    The following code evaluates a graph query on a repository:

    
    import org.openrdf.query.GraphQueryResult;
    
    GraphQueryResult? graphResult = con.prepareGraphQuery(
    
          QueryLanguage.SPARQL, "CONSTRUCT * FROM {x} p {y}").evaluate();
    
    A GraphQueryResult? is similar to TupleQueryResult? in that is an object that iterates over the query results. However, for graph queries the query results are RDF statements, so a GraphQueryResult? iterates over Statement objects:
    
    
    while (graphResult.hasNext()) {
    
       Statement st = graphResult.next();
    
       // … do something with the resulting statement here.
    
    

    }

    The TupleQueryResultHandler? equivalent for graph queries is org.openrdf.rio.RDFHandler. Again, this is a generic interface, each object implementing it can process the reported RDF statements in any way it wants.

    All writers from Rio (such as the RDFXMLWriter, TurtleWriter?, TriXWriter?, etc.) implement the RDFHandler interface. This allows them to be used in combination with querying quite easily. In the following example, we use a TurtleWriter? to write the result of a SPARQL graph query to standard output in Turtle format:

    
    import org.openrdf.rio.turtle.TurtleWriter;
    
    …
    
    RepositoryConnection? con = myRepository.getConnection();
    
    try {
    
       TurtleWriter? turtleWriter = new TurtleWriter?(System.out);
    
       con.prepareGraphQuery(QueryLanguage.SPARQL, "CONSTRUCT * FROM  WHERE {x} p {y}").evaluate(turtleWriter);
    
    }
    
    finally {
    
       con.close();
    
    }
    
    

    Again, note that as soon as we are done with the result of the query (either after iterating over the contents of the GraphQueryResult? or after invoking the RDFHandler), we invoke con.close() to close the connection and free resources.