Details

Kingsley Uyi Idehen
Lexington, United States

Subscribe

Post Categories

Recent Articles

Display Settings

articles per page.
order.

Translate

Showing posts in all categories RefreshRefresh
URIBurner: Painless Generation & Exploitation of Linked Data (Update 1 - Demo Links Added)

What is URIBurner?

A service from OpenLink Software, available at: http://uriburner.com, that enables anyone to generate structured descriptions -on the fly- for resources that are already published to HTTP based networks. These descriptions exist as hypermedia resource representations where links are used to identify:

  • the entity (data object or datum) being described,
  • each of its attributes, and
  • each of its attributes values (optionally).

The hypermedia resource representation outlined above is what is commonly known as an Entity-Attribute-Value (EAV) Graph. The use of generic HTTP scheme based Identifiers is what distinguishes this type of hypermedia resource from others.

Why is it Important?

The virtues (dual pronged serendipitous discovery) of publishing HTTP based Linked Data across public (World Wide Web) or private (Intranets and/or Extranets) is rapidly becoming clearer to everyone. That said, the nuance laced nature of Linked Data publishing presents significant challenges to most. Thus, for Linked Data to really blossom the process of publishing needs to be simplified i.e., "just click and go" (for human interaction) or REST-ful orchestration of HTTP CRUD (Create, Read, Update, Delete) operations between Client Applications and Linked Data Servers.

How Do I Use It?

In similar vane to the role played by FeedBurner with regards to Atom and RSS feed generation, during the early stages of the Blogosphere, it enables anyone to publish Linked Data bearing hypermedia resources on an HTTP network. Thus, its usage covers two profiles: Content Publisher and Content Consumer.

Content Publisher

The steps that follow cover all you need to do:

  • place a tag within your HTTP based hypermedia resource (e.g. within section for HTML )
  • use a URL via the @href attribute value to identify the location of the structured description of your resource, in this case it takes the form: http://linkeddata.uriburner.com/about/id/{scheme-or-protocol}/{your-hostname-or-authority}/{your-local-resource}
  • for human visibility you may consider adding associating a button (as you do with Atom and RSS) with the URL above.

That's it! The discoverability (SDQ) of your content has just multiplied significantly, its structured description is now part of the Linked Data Cloud with a reference back to your site (which is now a bona fide HTTP based Linked Data Space).

Examples

HTML+RDFa based representation of a structured resource description:

<link rel="describedby" title="Resource Description (HTML)"type="text/html" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>

JSON based representation of a structured resource description:

<link rel="describedby" title="Resource Description (JSON)" type="application/json" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>

N3 based representation of a structured resource description:

<link rel="describedby" title="Resource Description (N3)" type="text/n3" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>

RDF/XML based representations of a structured resource description:

<link rel="describedby" title="Resource Description (RDF/XML)" type="application/rdf+xml" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>

Content Consumer

As an end-user, obtaining a structured description of any resource published to an HTTP network boils down to the following steps:

  1. go to: http://uriburner.com
  2. drag the Page Metadata Bookmarklet link to your Browser's toolbar
  3. whenever you encounter a resource of interest (e.g. an HTML page) simply click on the Bookmarklet
  4. you will be presented with an HTML representation of a structured resource description (i.e., identifier of the entity being described, its attributes, and its attribute values will be clearly presented).

Examples

If you are a developer, you can simply perform an HTTP operation request (from your development environment of choice) using any of the URL patterns presented below:

HTML:
  • curl -I -H "Accept: text/html" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}

JSON:

  • curl -I -H "Accept: application/json" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
  • curl http://linkeddata.uriburner.com/about/data/json/{scheme}/{authority}/{local-path}

Notation 3 (N3):

  • curl -I -H "Accept: text/n3" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
  • curl http://linkeddata.uriburner.com/about/data/n3/{scheme}/{authority}/{local-path}
  • curl -I -H "Accept: text/turtle" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
  • curl http://linkeddata.uriburner.com/about/data/ttl/{scheme}/{authority}/{local-path}

RDF/XML:

  • curl -I -H "Accept: application/rdf+xml" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
  • curl http://linkeddata.uriburner.com/about/data/xml/{scheme}/{authority}/{local-path}

Conclusion

URIBurner is a "deceptively simple" solution for cost-effective exploitation of HTTP based Linked Data meshes. It doesn't require any programming or customization en route to immediately realizing its virtues.

If you like what URIBurner offers, but prefer to leverage its capabilities within your domain -- such that resource description URLs reside in your domain, all you have to do is perform the following steps:

  1. download a copy of Virtuoso (for local desktop, workgroup, or data center installation) or
  2. instantiate Virtuoso via the Amazon EC2 Cloud
  3. enable the Sponger Middleware component via the RDF Mapper VAD package (which includes cartridges for over 30 different resources types)

When you install your own URIBurner instances, you also have the ability to perform customizations that increase resource description fidelity in line with your specific needs. All you need to do is develop a custom extractor cartridge and/or meta cartridge.

Related:

# PermaLink Comments [0]
03/10/2010 12:52 GMT Modified: 03/11/2010 10:08 GMT
Meshups Demonstrating How SPARQL-GEO Enhances Linked Data Exploitation (Update 1)

Deceptively simple demonstrations of how Virtuoso's SPARQL-GEO extensions to SPARQL lay critical foundation for Geo Spatial solutions that seek to leverage the burgeoning Web of Linked Data.

Setup Information

SPARQL Endpoint: Linked Open Data Cache (8.5 Billion+ Quad Store which includes data from Geonames and the Linked GeoData Project Data Sets) .

Live Linked Data Meshup Links:

Related

# PermaLink Comments [0]
03/06/2010 17:43 GMT Modified: 03/08/2010 09:52 GMT
Revisiting HTTP based Linked Data (Update 1 - Demo Video Links Added)

Motivation for this post arose from a series of Twitter exchanges between Tony Hirst and I, in relation to his blog post titled: So What Is It About Linked Data that Makes it Linked Data™ ?

At the end of the marathon session, it was clear to me that a blog post was required for future reference, at the very least :-)

What is Linked Data?

"Data Access by Reference" mechanism for Data Objects (or Entities) on HTTP networks. It enables you to Identify a Data Object and Access its structured Data Representation via a single Generic HTTP scheme based Identifier (HTTP URI). Data Object representation formats may vary; but in all cases, they are hypermedia oriented, fully structured, and negotiable within the context of a client-server message exchange.

Why is it Important?

Information makes the world tick!

Information doesn't exist without data to contextualize.

Information is inaccessible without a projection (presentation) medium.

All information (without exception, when produced by humans) is subjective. Thus, to truly maximize the innate heterogeneity of collective human intelligence, loose coupling of our information and associated data sources is imperative.

How is Linked Data Delivered?

Linked Data is exposed to HTTP networks (e.g. World Wide Web) via hypermedia resources bearing structured representations of data object descriptions. Remember, you have a single Identifier abstraction (generic HTTP URI) that embodies: Data Object Name and Data Representation Location (aka URL).

How are Linked Data Object Representations Structured?

A structured representation of data exists when an Entity (Datum), its Attributes, and its Attribute Values are clearly discernible. In the case of a Linked Data Object, structured descriptions take the form of a hypermedia based Entity-Attribute-Value (EAV) graph pictorial -- where each Entity, its Attributes, and its Attribute Values (optionally) are identified using Generic HTTP URIs.

Examples of structured data representation formats (content types) associated with Linked Data Objects include:

  • text/html
  • text/turtle
  • text/n3
  • application/json
  • application/rdf+xml
  • Others

How Do I Create Linked Data oriented Hypermedia Resources?

You markup resources by expressing distinct entity-attribute-value statements (basically these a 3-tuple records) using a variety of notations:

  • (X)HTML+RDFa,
  • JSON,
  • Turtle,
  • N3,
  • TriX,
  • TriG,
  • RDF/XML, and
  • Others (for instance you can use Atom data format extensions to model EAV graph as per OData initiative from Microsoft).

You can achieve this task using any of the following approaches:

  • Notepad
  • WYSIWYG Editor
  • Transformation of Database Records via Middleware
  • Transformation of XML based Web Services output via Middleware
  • Transformation of other Hypermedia Resources via Middleware
  • Transformation of non Hypermedia Resources via Middleware
  • Use a platform that delivers all of the above.

Practical Examples of Linked Data Objects Enable

  • Describe Who You Are, What You Offer, and What You Need via your structured profile, then leave your HTTP network to perform the REST (serendipitous discovery of relevant things)
  • Identify (via map overlay) all items of interest based on a 2km+ radious of my current location (this could include vendor offerings or services sought by existing or future customers)
  • Share the latest and greatest family photos with family members *only* without forcing them to signup for Yet Another Web 2.0 service or Social Network
  • No repetitive signup and username and password based login sequences per Web 2.0 or Mobile Application combo
  • Going beyond imprecise Keyword Search to the new frontier of Precision Find - Example, Find Data Objects associated with the keywords: Tiger, while enabling the seeker disambiguate across the "Who", "What", "Where", "When" dimensions (with negation capability)
  • Determine how two Data Objects are Connected - person to person, person to subject matter etc. (LinkedIn outside the walled garden)
  • Use any resource address (e.g blog or bookmark URL) as the conduit into a Data Object mesh that exposes all associated Entities and their social network relationships
  • Apply patterns (social dimensions) above to traditional enterprise data sources in combination (optionally) with external data without compromising security etc.

How Do OpenLink Software Products Enable Linked Data Exploitation?

Our data access middleware heritage (which spans 16+ years) has enabled us to assemble a rich portfolio of coherently integrated products that enable cost-effective evaluation and utilization of Linked Data, without writing a single line of code, or exposing you to the hidden, but extensive admin and configuration costs. Post installation, the benefits of Linked Data simply materialize (along the lines described above).

Our main Linked Data oriented products include:

  • OpenLink Data Explorer -- visualizes Linked Data or Linked Data transformed "on the fly" from hypermedia and non hypermedia data sources
  • URIBurner -- a "deceptively simple" solution that enables the generation of Linked Data "on the fly" from a broad collection of data sources and resource types
  • OpenLink Data Spaces -- a platform for enterprises and individuals that enhances distributed collaboration via Linked Data driven virtualization of data across its native and/or 3rd party content manager for: Blogs, Wikis, Shared Bookmarks, Discussion Forums, Social Networks etc
  • OpenLink Virtuoso -- a secure and high-performance native hybrid data server (Relational, RDF-Graph, Document models) that includes in-built Linked Data transformation middleware (aka. Sponger).

Related

# PermaLink Comments [0]
03/04/2010 10:16 GMT Modified: 03/08/2010 09:51 GMT
Linked Data & Socially Enhanced Collaboration (Enterprise or Individual) -- Update 1

Socially enhanced enterprise and invididual collaboration is becoming a focal point for a variety of solutions that offer erswhile distinct content managment features across the realms of Blogging, Wikis, Shared Bookmarks, Discussion Forums etc.. as part of an integrated platform suite. Recently, Socialtext has caught my attention courtesy of its nice features and benefits page . In addition, I've also found the Mike 2.0 portal immensely interesting and valuable, for those with an enterprise collaboration bent.

Anyway, Socialtext and Mike 2.0 (they aren't identical and juxtaposition isn't seeking to imply this) provide nice demonstrations of socially enhanced collaboration for individuals and/or enterprises is all about:

  1. Identifying Yourself
  2. Identifying Others (key contributors, peers, collaborators)
  3. Serendipitous Discovery of key contributors, peers, and collaborators
  4. Serendipitous Discovery by key contributors, peers, and collaborators
  5. Develop and sustain relationships via socially enhanced professional network hybrid
  6. Utilize your new "trusted network" (which you've personally indexed) when seeking help or propagating a meme.

As is typically the case in this emerging realm, the critical issue of discrete "identifiers" (record keys in sense) for data items, data containers, and data creators (individuals and groups) is overlooked albeit unintentionally.

How HTTP based Linked Data Addresses the Identifier Issue

Rather than using platform constrained identifiers such as:

  • email address (a "mailto" scheme identifier),
  • a dbms user account,
  • application specific account, or
  • OpenID.

It enables you to leverage the platform independence of HTTP scheme Identifiers (Generic URIs) such that Identifiers for:

  1. You,
  2. Your Peers,
  3. Your Groups, and
  4. Your Activity Generated Data,

simply become conduits into a mesh of HTTP -- referencable and accessible -- Linked Data Objects endowed with High SDQ (Serendipitious Discovery Quotient). For example my Personal WebID is all anyone needs to know if they want to explore:

  1. My Profile (which includes references to data objects associated with my interests, social-network, calendar, bookmarks etc.)
  2. Data generated by my activities across various data spaces (via data objects associated with my online accounts e.g. Del.icio.us, Twitter, Last.FM)
  3. Linked Data Meshups via URIBurner (or any other Virtuoso instance) that provide an extend view of my profile

How FOAF+SSL adds Socially aware Security

Even when you reach a point of equilibrium where: your daily activities trigger orchestratestration of CRUD (Create, Read, Update, Delete) operations against Linked Data Objects within your socially enhanced collaboration network, you still have to deal with the thorny issues of security, that includes the following:

  1. Single Sign On,
  2. Authentication, and
  3. Data Access Policies.

FOAF+SSL, an application of HTTP based Linked Data, enables you to enhance your Personal HTTP scheme based Identifer (or WebID) via the following steps (peformed by a FOAF+SSL compliant platform):

  1. Imprint WebID within a self-signed x.509 based public key (certificate) associated with your private key (generated by FOAF+SSL platform or manually via OpenSSL)
  2. Store public key components (modulous and exponent) into your FOAF based profile document which references your Personal HTTP Identifier as its primary topic
  3. Leverage HTTP URL component of WebID for making public key components (modulous and exponent) available for x.509 certificate based authentication challenges posed by systems secured by FOAF+SSL (directly) or OpenID (indirectly via FOAF+SSL to OpenID proxy services).

Contrary to conventional experiences with all things PKI (Public Key Infrastructure) related, FOAF+SSL compliant platforms typically handle the PKI issues as part of the protocol implementation; thereby protecting you from any administrative tedium without compromising security.

Conclusions

Understanding how new technology innovations address long standing problems, or understanding how new solutions inadvertently fail to address old problems, provides time tested mechanisms for product selection and value proposition comprehension that ultimately save scarce resources such as time and money.

If you want to understand real world problem solution #1 with regards to HTTP based Linked Data look no further than the issues of secure, socially aware, and platform independent identifiers for data objects, that build bridges across erstwhile data silos.

If you want to cost-effectively experience what I've outlined in this post, take a look at OpenLink Data Spaces (ODS) which is a distributed collaboration engine (enterprise of individual) built around the Virtuoso database engines. It simply enhances existing collaboration tools via the following capabilities:

Addition of Social Dimensions via HTTP based Data Object Identifiers for all Data Items (if missing)

  1. Ability to integrate across a myriad of Data Source Types rather than a select few across RDBM Engines, LDAP, Web Services, and various HTTP accessible Resources (Hypermedia or Non Hypermedia content types)
  2. Addition of FOAF+SSL based authentication
  3. Addition of FOAF+SSL based Access Control Lists (ACLs) for policy based data access.

Related:

# PermaLink Comments [0]
03/02/2010 15:47 GMT Modified: 03/04/2010 10:19 GMT
OpenLink Virtuoso - Product Value Proposition Overiew

Situation Analysis

Since the beginning of the modern IT era, each period of innovation has inadvertently introduced its fair share of Data Silos. The driving force behind this anomaly remains an overemphasis on the role of applications when selecting problem solutions. Unfortunately, most solution selecting decision makers remain oblivious to the fact that most applications are architecturally monolithic; i.e., they fail to separate the following five layers that are critical to all solutions:

  1. Data Unit (Datum or Data Object) Identity,
  2. Data Storage/Persistence,
  3. Data Access,
  4. Data Representation, and
  5. Data Presentation/Visualization.

The rise of the Internet, and its exponentially-growing user-friendly enclave known as the World Wide Web, is bringing the intrinsic costs of the monolithic application architecture anomaly to bear -- in manners unanticipated by many. For example, the emergence of network-oriented solutions across the realms of Enterprise 2.0-based Collaboration and Web 2.0-based Software-as-a-Service (SaaS), combined with the overarching influence of Social Media, are producing more heterogeneously-structured and disparately-located data sources than people can effectively process.

As is often the case, a variety of problem and product monikers have emerged for the data access and integration challenges outlined above. Contemporary examples include Enterprise Information Integration, Master Data Management, and Data Virtualization. Labeling aside, the fundamental issues of the unresolved Data Integration challenge boil down to the following:

  • Data Model Heterogeneity
  • Data Quality (Cleanliness)
  • Semantic Variance across Contexts (e.g., weights and measures).

Effectively solving today's data integration challenges requires a move away from monolithic application architecture to loosely-coupled, network-centric application architectures. Basically, we need a ubiquitous network-centric application protocol that lends itself to loosely-coupled across-the-wire orchestration of data interactions. In short, this will be what revitalizes the art of application development and deployment.

The World Wide Web is built around a network application protocol called HTTP. This protocol intrinsically separates the five layers listed earlier, thereby enabling:

  • Use of Generic HTTP URIs as Data Object (Entity) Identifiers;
  • Identifier Co-reference, such that multiple Data Object Identifiers may reference the same Data Object;
  • Use of the Entity-Attribute-Value Model to describe Data Objects using real world modeling friendly conceptual graphs;
  • Use of HTTP URLs to Identify Locations of Resources that bear (host) Data Object Descriptions (Representations);
  • Data Access mechanism for retrieving Data Object Representations from persistent or transient storage locations.

What is Virtuoso?

A uniquely designed to address today's escalating Data Access and Integration challenges without compromising performance, security, or platform independence. At its core lies an unrivaled commitment to industry standards combined with unique technology innovation that transcends erstwhile distinct realms such as:

When Virtuoso is installed and running, HTTP-based Data Objects are automatically created as a by-product of its powerful data virtualization, transcending data sources and data representation formats. The benefits of such power extend across profiles such as:

Product Benefits Summary

  • Enterprise Agility — Virtuoso lets you mix-&-match best-of-class combinations of Operating Systems, Programming Environments, Database Engines and Data-Access Middleware when building or tweaking your IS infrastructure, without the typical impedance of vendor-lock-in.
  • Data Model Dexterity — By supporting multiple protocols and data models in a single product, Virtuoso protects you against costly vulnerabilities such as: perennial acquisition and accumulation of expensive data model specific DBMS products that still operate on the fundamental principle of: proprietary technology lock-in, at a time when heterogeneity continues to intrinsically define the information technology landscape.
  • Cost-effectiveness — By providing a single point of access (and single-sign-on, SSO) to a plethora of Web 2.0-style social networks, Web Services, and Content Management Systems, and by using Data Object Identifiers as units of Data Virtualization that become the focal points of all data access, Virtuoso lowers the cost to exploit emerging frontiers such as socially-enhanced enterprise collaboration.
  • Speed of Exploitation — Virtuoso provides the ability to rapidly assemble 360-degree conceptual views of data, across internal line-of-business application (CRM, ERP, ECM, HR, etc.) data and/or external data sources, whether these are unstructured, semi-structured, or fully structured.

Bottom line, Virtuoso delivers unrivaled flexibility and scalability, without compromising performance or security.

Related

 

# PermaLink Comments [0]
02/26/2010 14:12 GMT Modified: 02/27/2010 12:53 GMT
Re-introducing the Virtuoso Virtual Database Engine

In recent times a lot of the commentary and focus re. Virtuoso has centered on the RDF Quad Store and Linked Data. What sometimes gets overlooked is the sophisticated Virtual Database Engine that provides the foundation for all of Virtuoso's data integration capabilities.

In this post I provide a brief re-introduction to this essential aspect of Virtuoso.

What is it?

This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).

Why is it important?

In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools

In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.

How do I use it?

The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:

Relational Database Federation

You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!

Conceptual Level Data Access using the RDF Model

You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).

You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).

It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.

Conceptual Level Data Access using ADO.NET Entity Frameworks

As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.

Related

# PermaLink Comments [0]
02/17/2010 16:38 GMT Modified: 02/17/2010 16:54 GMT
Virtuoso Chronicles from the Field: Nepomuk, KDE, and the quest for a sophisticated RDF DBMS.

For this particular user experience chronicle, I've simply inserted the content of Sebastian Trueg's post titled: What We Did Last Summer (And the Rest of 2009) – A Look Back Onto the Nepomuk Development Year ..., directly into this post, without any additional commentary or modification.

2009 is over. Yeah, sure, trueg, we know that, it has been over for a while now! Ok, ok, I am a bit late, but still I would like to get this one out - if only for my archive. So here goes.

Virtuoso

Let’s start with the major topic of 2009 (and also the beginning of 2010): The new Nepomuk database backend: Virtuoso. Everybody who used Nepomuk had the same problems: you either used the sesame2 backend which depends on Java and steals all of your memory or you were stuck with Redland which had the worst performance and missed some SPARQL features making important parts of Nepomuk  like queries unusable. So more than a year ago I had the idea to use the one GPL’ed database server out there that supported RDF in a professional manner: OpenLink’s Virtuoso. It has all the features we need, has a very good performance, and scales up to dimensions we will probably never reach on the desktop (yeah, right, and 64k main memory will be enough forever!). So very early I started coding the necessary Soprano plugin which would talk to a locally running Virtuoso server through ODBC. But since I ran into tons of small problems (as always) and got sidetracked by other tasks I did not finish it right away. OpenLink, however, was very interested in the idea of their server being part of every KDE installation (why wouldn’t they ;)). So they not only introduced a lite-mode which makes Virtuoso suitable for the desktop but also helped in debugging all the problems that I had left. Many test runs, patches, and a Virtuoso 5.0.12 release later I could finally announce the Virtuoso integration as usable.

Then end of last year I dropped the support for sesame2 and redland. Virtuoso is now the only supported database backend. The reason is simple: Virtuoso is way more powerful than the rest - not only in terms of performance - and it is fully implemented in C(++) without any traces of Java. Maybe even more important is the integration of the full text index which makes the previously used CLucene index unnecessary. Thus, we can finally combine full text and graph queries in one SPARQL query. This results in a cleaner API and way faster return of  search results since there is no need to combine the results from several queries anymore. A direct result of that is the new Nepomuk Query API which I will discuss later.

So now the only thing I am waiting for is the first bugfix release of Virtuoso 6, i.e. 6.0.1 which will fix the bugs that make 6.0.0 fail with Nepomuk. Should be out any day now. :)

The Nepomuk Query API

Querying data in Nepomuk pre-KDE-4.4 could be done in one of two ways: 1. Use the very limited capabilities of the ResourceManager to list resources with certain properties or of a certain type; or 2. Write your own SPARQL query using ugly QString::arg replacements.

With the introduction of Virtuoso and its awesome power we can now do pretty much everything in one query. This allowed me to finally create a query API for KDE: Nepomuk::Query::Query and friends. I won’t go into much detail here since I did that before.

All in all you should remember one thing: whenever you think about writing your own SPARQL query in a KDE application - have a look at libnepomukquery. It is very likely that you can avoid the hassle of debugging a query by using the query API.

The first nice effect of the new API (apart from me using it all over the place obviously) is the new query interface in Dolphin. Internally it simply combines a bunch of Nepomuk::Query::Term objects into a Nepomuk::Query::AndTerm. All very readable and no ugly query strings.

Dolphin Search Panel in KDE SC 4.4

Shared Desktop Ontologies

An important part of the Nepomuk research project was the creation of a set of ontologies for describing desktop resources and their metadata. After the Xesam project under the umbrella of freedesktop.org had been convinced to use RDF for describing file metadata they developed their own ontology. Thanks to Evgeny (phreedom) Egorochkin and Antonie Mylka both the Xesam ontology and the Nepomuk Information Elements Ontology were already very close in design. Thus, it was relatively easy to merge the two and be left with only one ontology to support. Since then not only KDE but also Strigi and Tracker are using the Nepomuk ontologies.

At the Gran Canaria Desktop Summit I met some of the guys from Tracker and we tried to come up with a plan to create a joint project to maintain the ontologies. This got off to a rough start as nobody really felt responsible. So I simply took the initiative and released the shared-desktop-ontologies version 0.1 in November 2009. The result was a s***-load of hate-mails and bug reports due to me breaking KDE build. But in the end it was worth it. Now the package is established and other projects can start to pick it up to create data compatible to the Nepomuk system and Tracker.

Today the ontologies (and the shared-desktop-ontologies package) are maintained in the Oscaf project at Sourceforge. The situation is far from perfect but it is a good start. If you need specific properties in the ontologies or are thinking about creating one for your own application - come and join us in the bug tracker

Timeline KIO Slave

It was at the Akonadi meeting that Will Stephenson and myself got into talking about mimicking some Zeitgeist functionality through Nepomuk. Basically it meant gathering some data when opening and when saving files. We quickly came up with a hacky patch for KIO and KFileDialog which covered most cases and allowed us to track when a file was modified and by which application. This little experiment did not leave that state though (it will, however, this year) but another one did: Zeitgeist also provides a fuse filesystem which allows to browse the files by modification dates. Well, whatever fuse can do, KIO can do as well. Introducing the timeline:/ KIO slave which gives a calendar view onto your files.

Tips And Tricks

Well, I thought I would mention the Tips And Tricks section I wrote for the techbase. It might not be a big deal but I think it contains some valuable information in case you are using Nepomuk as a developer.

Google Summer Of Code 2009

This time around I had the privilege to mentor two students in the Google Summer of Code. Alessandro Sivieri and Adam Kidder did outstanding work on Improved Virtual Folders and the Smart File Dialog.

Adam’s work lead me to some heavy improvements in the Nepomuk KIO slaves myself which I only finished this week (more details on that coming up). Alessandro continued his work on faceted file browsing in KDE and created:

Sembrowser

Alessandro is following up on his work to make faceted file browsing a reality in 2010 (and KDE SC 4.5). Since it was too late to get faceted browsing into KDE SC 4.4 he is working on Sembrowser, a stand-alone faceted file browser which will be the grounds for experiments until the code is merged into Dolphin.

Faceted Browsing in KDE with Sembrowser

Nepomuk Workshops

In 2009 I organized the first Nepomuk workshop in Freiburg, Germany. And also the second one. While I reported properly on the first one I still owe a summary for the second one. I will get around to that - sooner or later. ;)

CMake Magic

Soprano gives us a nice command line tool to create a C++ namespace from an ontology file: onto2vocabularyclass. It produces nice convenience namespaces like Soprano::Vocabulary::NAO. Nepomuk adds another tool named nepomuk-rcgen. Both were a bit clumsy to use before. Now we have nice cmake macros which make it very simple to use both.

See the techbase article on how to use the new macros.

Bangarang

Without my knowledge (imagine that!) Andrew Lake created an amazing new media player named Bangarang - a Jamaican word for noise, chaos or disorder. This player is Nepomuk-enabled in the sense that it has a media library which lets you browse your media files based on the Nepomuk data. It remembers the number of times a song or a video has been played and when it was played last. It allows to add detail such as the TV series name, season, episode number, or actors that are in the video - all through Nepomuk (I hope we will soon get tvdb integration).

Edit metadata directly in Bangarang

Dolphin showing TV episode metadata created by Bangarang

And of course searching for it works, too...

And it is pretty, too...

I am especially excited about this since finally applications not written or mentored by me start contributing Nepomuk data.

Gran Canaria Desktop Summit

2009 was also the year of the first Gnome-KDE joint-conference. Let me make a bulletin for completeness and refer to my previous blog post reporting on my experiences on the island.

Well, that was by far not all I did in 2009 but I think I covered most of the important topics. And after all it is ‘just a blog entry’ - there is no need for completeness. Thanks for reading.

"
# PermaLink Comments [0]
01/28/2010 11:14 GMT Modified: 01/28/2010 21:58 GMT
One Technology That Will Rock 2010 (Update 1)

Thanks to the TechCrunch post titled: Ten Technologies That Will Rock 2010, I've been able to quickly construct a derivative post that condenses the ten item list down to a Single Technology That Will Rock 2010 :-)

Sticking with the TechCrunch layout, here is why all roads simply lead to Linked Data come 2010 and beyond:

  1. The Tablet: a new form factor addition re. Internet and Web application hosts which is just another way of saying: Linked Data will be accessible from Tablet applications.
  2. Geo: GPS chips are now standard features of mobile phones, so geolocation is increasingly becoming a necessary feature for any killer app. Thus, GeoSpatial Linked Data and GeopSpatial Queries are going to be a critical success factor for any endeavor that seeks to engage mobile applications developers and ultimately their end-users. Basiacally, you want to be able to perform Esoteric Search from these devices of the form: Find Vendors of a Camcorder (e.g., with a Zoom Factor: Weight Ratio of X) within a 2km Radius of my current location. Or how many items from my WishList are available from a Vendor within a 2km radius of my current location. Conversely, provide Vendors with the ability to spot potential Customers within a 2km of a given "clicks & mortar" location (e.g. BestBuy store).
  3. Realtime Search: Rich Structured Profiles that leverage standards such as FOAF and FOAF+SSL will enable Highly Personalized Realtime Search (HPRS) without compromisng privacy. Tecnically, this is about WebIDs securely bound to X.509 Certificates, providing access to verifiable and highly navigable Personal Profile Data Spaces that also double as personal search index entry points.
  4. Chrome OS: Just another operating system for exploiting the burgeoning Web of Linked Data
  5. HTML5: Courtesy of RDFa, just another mechanism for exposing Linked Data by making HTML+RDFa a bona fide markup for metadata (i.e., format for describing real world objects via their attribute-value graphs)
  6. Mobile Video: Simplifies the production and sharing of Video annotations (comments, reviews etc.) en route to creating rich Linked Discourse Data Spaces.
  7. Augmented Reality: Ditto
  8. Mobile Transactions: As per points 1&2 above, Vendor Discovery and Transaction Conusmation will increasingly be driven by high SDQ applications. The "Funnel Effect" (more choices based on individual preferences) will be a critical success factor for any one operating in the Mobile Transaction realm. Note, without Linked Data you cannot deliver scalable solutions that handle the combined requirements of: SDQ, "Funnel Effect", and Mobile Device form factor, will simply maginify the importance of Web accessible Linked Data.
  9. Android: An additional platform for items 1-8; basically, 2010 isn't going to be an iPhone only zone. Personally, this reminds me of a battle from the past i.e., Microsoft vs Apple, re. desktop computing dominance. Google has studied history very well :-)
  10. Social CRM: this is simply about applying points 1-9 alongide the construction of Linked Data from eCRM Data Spaces.

As I've stated in the past (across a variety of mediums), you cannot build applications that have long term value without addressing the following issues:

  1. Data Item or Object Identity
  2. Data Structure -- Data Models
  3. Data Representation -- Data Model Entity & Relationships Representation mechanism (as delivered by metadata oriented markup)
  4. Data Storage -- Database Management Systems
  5. Data Access -- Data Access Protocols
  6. Data Presentation -- How you present Views and Reports from Structured Data Sources
  7. Data Security -- Data Access Policies

The items above basically showcase the very essence of the HTTP URI abstraction that drives HTTP based Linked Data; which is also the basic payload unit that underlies REST.

Conclusion

I simply hope that the next decade marks a period of broad appreciation and comprehension of Data Access, Integration, and Management issues on the parts of: application developers, integrators, analysts, end-users, and decision makers. Remember, without structured Data we cannot produce or share Information, and without Information, we cannot produce of share Knowledge.

Related

# PermaLink Comments [0]
01/02/2010 12:30 GMT Modified: 01/02/2010 14:05 GMT
One Technology That Will Rock 2010 (Update 1)

Thanks to the TechCrunch post titled: Ten Technologies That Will Rock 2010, I've been able to quickly construct a derivative post that condenses the ten item list down to a Single Technology That Will Rock 2010 :-)

Sticking with the TechCrunch layout, here is why all roads simply lead to Linked Data come 2010 and beyond:

  1. The Tablet: a new form factor addition re. Internet and Web application hosts which is just another way of saying: Linked Data will be accessible from Tablet applications.
  2. Geo: GPS chips are now standard features of mobile phones, so geolocation is increasingly becoming a necessary feature for any killer app. Thus, GeoSpatial Linked Data and GeopSpatial Queries are going to be a critical success factor for any endeavor that seeks to engage mobile applications developers and ultimately their end-users. Basiacally, you want to be able to perform Esoteric Search from these devices of the form: Find Vendors of a Camcorder (e.g., with a Zoom Factor: Weight Ratio of X) within a 2km Radius of my current location. Or how many items from my WishList are available from a Vendor within a 2km radius of my current location. Conversely, provide Vendors with the ability to spot potential Customers within a 2km of a given "clicks & mortar" location (e.g. BestBuy store).
  3. Realtime Search: Rich Structured Profiles that leverage standards such as FOAF and FOAF+SSL will enable Highly Personalized Realtime Search (HPRS) without compromisng privacy. Tecnically, this is about WebIDs securely bound to X.509 Certificates, providing access to verifiable and highly navigable Personal Profile Data Spaces that also double as personal search index entry points.
  4. Chrome OS: Just another operating system for exploiting the burgeoning Web of Linked Data
  5. HTML5: Courtesy of RDFa, just another mechanism for exposing Linked Data by making HTML+RDFa a bona fide markup for metadata (i.e., format for describing real world objects via their attribute-value graphs)
  6. Mobile Video: Simplifies the production and sharing of Video annotations (comments, reviews etc.) en route to creating rich Linked Discourse Data Spaces.
  7. Augmented Reality: Ditto
  8. Mobile Transactions: As per points 1&2 above, Vendor Discovery and Transaction Conusmation will increasingly be driven by high SDQ applications. The "Funnel Effect" (more choices based on individual preferences) will be a critical success factor for any one operating in the Mobile Transaction realm. Note, without Linked Data you cannot deliver scalable solutions that handle the combined requirements of: SDQ, "Funnel Effect", and Mobile Device form factor, will simply maginify the importance of Web accessible Linked Data.
  9. Android: An additional platform for items 1-8; basically, 2010 isn't going to be an iPhone only zone. Personally, this reminds me of a battle from the past i.e., Microsoft vs Apple, re. desktop computing dominance. Google has studied history very well :-)
  10. Social CRM: this is simply about applying points 1-9 alongide the construction of Linked Data from eCRM Data Spaces.

As I've stated in the past (across a variety of mediums), you cannot build applications that have long term value without addressing the following issues:

  1. Data Item or Object Identity
  2. Data Structure -- Data Models
  3. Data Representation -- Data Model Entity & Relationships Representation mechanism (as delivered by metadata oriented markup)
  4. Data Storage -- Database Management Systems
  5. Data Access -- Data Access Protocols
  6. Data Presentation -- How you present Views and Reports from Structured Data Sources
  7. Data Security -- Data Access Policies

The items above basically showcase the very essence of the HTTP URI abstraction that drives HTTP based Linked Data; which is also the basic payload unit that underlies REST.

Conclusion

I simply hope that the next decade marks a period of broad appreciation and comprehension of Data Access, Integration, and Management issues on the parts of: application developers, integrators, analysts, end-users, and decision makers. Remember, without structured Data we cannot produce or share Information, and without Information, we cannot produce of share Knowledge.

Related

# PermaLink Comments [0]
01/02/2010 12:30 GMT Modified: 01/02/2010 16:07 GMT
Why Do I Need To Pay For ODBC , JDBC, ADO.NET, OLE-DB Drivers? (Update 3)

Payment is a function of pain alleviation (opportunity cost) monetization.

This post is about highlighting the real pains associated with the $0.00 misconception associated with Data Access Drivers: ODBC, JDBC, ADO.NET, OLE-DB etc.

In the most basic sense, there are some fundament aspects of data access that are complex to implement and rarely implemented (if at all) by free drivers, the list includes:

  1. Escape Syntaxes for Dates and Functions
  2. Metadata Calls which enable smarter ODBC compliant applications (this feature is typically missing on Driver Side and abused on the Client side i.e., making clients DBMS specific by testing for specific DBMS names)
  3. Scrollable Cursors, this is how you deal with change sensitivity, and most drivers actually fake support and get away with it due to shortage of applications to test proper cursor types (Static, Forward-Only, Key-Set, Dynamic, and Mixed models).

Okay, so we're done with actual driver sophistication re. implementation of critical features. Let's Up the ante by veering into the area of security. At the most basic level, It's extremely important to understand that all data access driver types provide read-write access to your databases; thus, it's imperative that data access drivers address the following:

  1. Read-Only or Read-Write Access scoped to specific Users
  2. Ditto applied to specific User Groups
  3. Ditto applied to Database Names
  4. Ditto applied to specific ODBC compliant applications
  5. Ditto applied to specific ODBC host operating systems
  6. Ditto applied to specific IP addresses or Ranges on your Network
  7. Any combination of items 1-6 as part of a configurable data access rules/policy system.

Once you're done with security, you then have the thorny issue of data access and data flow management. In a nutshell, your driver needs to be able to handle:

  1. Protection against cartesian product network flooding (e.g., user clicks on Customer Table via an ODBC compliant application without comprehension of back-end implications)
  2. Enabling or Disabling of key DBMS engine data access optimization features (e.g. DBMS specific extensions exposed via Environment Variables of SQL commands based settings)
  3. Conditional Connection Pooling across User, User Groups, Applications, Host Operating System, IP Address dimensions.

Once you've dealt with Security and Data Flow, you then have to address the enforcement of these settings across a myriad of ODBC compliant host, which is where Zeroconfig and centralized data access administration comes into play i.e., configure once (locally) and enforce globally.

When OpenLink Software entered the ODBC Driver Market segment in 1992, the issues above where the fundamental basis of our Multi-Tier Drivers. Thus, although we distinguished ourselves via performance, stability, and specification adherence, our fundamental engineering focus has always been skewed towards security and configurability, alongside high-performance and scalability.

As we close 2009, the security issues that pervade Native DBMS Drives, ODBC, JDBC, ADO.NET, OLE-DB etc. Drivers have only increased, courtesy of ubiquitous computing, sadly though, there remains a fundamental illusion that Data Access Drivers simply connect you to DBMS back-ends, and since you can get these drivers at $0.00 from most DBMS vendors they can't be that important.

I hope that this post brings some clarity to a very serious security and general configuration management issues associated with Data Access Drivers. Free ODBC Drivers offer nothing, when it comes to the real issues of Open Data Access. If they did, they wouldn't be worth $0.00!

Note: wondering if this has anything to do with Linked Data (my current data access focal point)? Well, remember, the Linked Data meme is fundamentally about REST based Open Data Access & Integration via HTTP; thus, what applies to Relational Model databases naturally applies to their more granular Graph Model relatives. Basically, data access security never goes away, it just gets more granular, complex, and ultimately, mercurial.

Related

# PermaLink Comments [0]
12/16/2009 17:53 GMT Modified: 12/31/2009 11:40 GMT
 <<     | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |     >>
Powered by OpenLink Virtuoso Universal Server
Running on Linux platform