Details

Kingsley Uyi Idehen
Lexington, United States

Subscribe

Post Categories

Recent Articles

Display Settings

articles per page.
order.

Translate

Showing posts in all categories RefreshRefresh
Re-introducing the Virtuoso Virtual Database Engine

In recent times a lot of the commentary and focus re. Virtuoso has centered on the RDF Quad Store and Linked Data. What sometimes gets overlooked is the sophisticated Virtual Database Engine that provides the foundation for all of Virtuoso's data integration capabilities.

In this post I provide a brief re-introduction to this essential aspect of Virtuoso.

What is it?

This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).

Why is it important?

In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools

In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.

How do I use it?

The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:

Relational Database Federation

You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!

Conceptual Level Data Access using the RDF Model

You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).

You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).

It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.

Conceptual Level Data Access using ADO.NET Entity Frameworks

As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.

Related

# PermaLink Comments [0]
02/17/2010 16:38 GMT Modified: 02/17/2010 16:54 GMT
Why Do I Need To Pay For ODBC , JDBC, ADO.NET, OLE-DB Drivers? (Update 3)

Payment is a function of pain alleviation (opportunity cost) monetization.

This post is about highlighting the real pains associated with the $0.00 misconception associated with Data Access Drivers: ODBC, JDBC, ADO.NET, OLE-DB etc.

In the most basic sense, there are some fundament aspects of data access that are complex to implement and rarely implemented (if at all) by free drivers, the list includes:

  1. Escape Syntaxes for Dates and Functions
  2. Metadata Calls which enable smarter ODBC compliant applications (this feature is typically missing on Driver Side and abused on the Client side i.e., making clients DBMS specific by testing for specific DBMS names)
  3. Scrollable Cursors, this is how you deal with change sensitivity, and most drivers actually fake support and get away with it due to shortage of applications to test proper cursor types (Static, Forward-Only, Key-Set, Dynamic, and Mixed models).

Okay, so we're done with actual driver sophistication re. implementation of critical features. Let's Up the ante by veering into the area of security. At the most basic level, It's extremely important to understand that all data access driver types provide read-write access to your databases; thus, it's imperative that data access drivers address the following:

  1. Read-Only or Read-Write Access scoped to specific Users
  2. Ditto applied to specific User Groups
  3. Ditto applied to Database Names
  4. Ditto applied to specific ODBC compliant applications
  5. Ditto applied to specific ODBC host operating systems
  6. Ditto applied to specific IP addresses or Ranges on your Network
  7. Any combination of items 1-6 as part of a configurable data access rules/policy system.

Once you're done with security, you then have the thorny issue of data access and data flow management. In a nutshell, your driver needs to be able to handle:

  1. Protection against cartesian product network flooding (e.g., user clicks on Customer Table via an ODBC compliant application without comprehension of back-end implications)
  2. Enabling or Disabling of key DBMS engine data access optimization features (e.g. DBMS specific extensions exposed via Environment Variables of SQL commands based settings)
  3. Conditional Connection Pooling across User, User Groups, Applications, Host Operating System, IP Address dimensions.

Once you've dealt with Security and Data Flow, you then have to address the enforcement of these settings across a myriad of ODBC compliant host, which is where Zeroconfig and centralized data access administration comes into play i.e., configure once (locally) and enforce globally.

When OpenLink Software entered the ODBC Driver Market segment in 1992, the issues above where the fundamental basis of our Multi-Tier Drivers. Thus, although we distinguished ourselves via performance, stability, and specification adherence, our fundamental engineering focus has always been skewed towards security and configurability, alongside high-performance and scalability.

As we close 2009, the security issues that pervade Native DBMS Drives, ODBC, JDBC, ADO.NET, OLE-DB etc. Drivers have only increased, courtesy of ubiquitous computing, sadly though, there remains a fundamental illusion that Data Access Drivers simply connect you to DBMS back-ends, and since you can get these drivers at $0.00 from most DBMS vendors they can't be that important.

I hope that this post brings some clarity to a very serious security and general configuration management issues associated with Data Access Drivers. Free ODBC Drivers offer nothing, when it comes to the real issues of Open Data Access. If they did, they wouldn't be worth $0.00!

Note: wondering if this has anything to do with Linked Data (my current data access focal point)? Well, remember, the Linked Data meme is fundamentally about REST based Open Data Access & Integration via HTTP; thus, what applies to Relational Model databases naturally applies to their more granular Graph Model relatives. Basically, data access security never goes away, it just gets more granular, complex, and ultimately, mercurial.

Related

# PermaLink Comments [0]
12/16/2009 17:53 GMT Modified: 12/31/2009 11:40 GMT
5 Very Important Things to Note about HTTP based Linked Data
  1. It isn't World Wide Web Specific (HTTP != World Wide Web)
  2. It isn't Open Data Specific
  3. It isn't about "Free" (Beer or Speech)
  4. It isn't about Markup (so don't expect to grok it via "markup first" approach)
  5. It's about Hyperdata - the use of HTTP and REST to deliver a powerful platform agnostic mechanism for Data Reference, Access, and Integration.

When trying to understand HTTP based Linked Data, especially if you're well versed in DBMS technology use (User, Power User, Architect, Analyst, DBA, or Programmer) think:

  • Open Database Connectivity (ODBC) without operating system, data model, or wire-protocol specificity or lock-in potential
  • Java Database Connectivity (JDBC) without programming language specificity
  • ADO.NET without .NET runtime specificity and .NET bound language specificity
  • OLE-DB without Windows operating system & programming language specificity
  • XMLA without XML format specificity - with Tabular and Multidimensional results formats expressible in a variety of data representation formats.
  • All of the above scoped to the Record rather than Container level, with Generic HTTP scheme URIs associated with each Record, Field, and Field value (optionally)

Remember the need for Data Access & Integration technology is the by product of the following realities:

  1. Human curated data is ultimately dirty, because:
    • our thick thumbs, inattention, distractions, and general discomfort with typing, make typos prevalent
    • database engines exist for a variety of data models - Graph, Relational, Hierarchical;
    • within databases you have different record container/partition names e.g. Table Names;
    • within a database record container you have records that are really aspects of the same thing (different keys exist in a plethora of operational / line of business systems that expose aspects of the same entity e.g., customer data that spans Accounts, CRM, ERP application databases);
    • different field names (one database has "EMP" while another has "Employee") for the same record
    • .
  2. Units of measurement is driven by locale, the UK office wants to see sales in Pounds Sterling while the French office prefers Euros etc.
  3. All of the above is subject to context halos which can be quite granular re. sensitivity e.g. staff travel between locations that alter locales and their roles; basically, profiles matters a lot.

Related

# PermaLink Comments [0]
11/19/2009 14:49 GMT Modified: 01/31/2010 17:24 GMT
Time for RDBMS Primacy Downgrade is Nigh! (No Embedded Images Edition)

As the world works it way through a "once in a generation" economic crisis, the long overdue downgrade of the RDBMS, from its pivotal position at the apex of the data access and data management pyramid is nigh.

What is the Data Access, and Data Management Value Pyramid?

As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.

See: AVF Pyramid Diagram.

The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.

In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.

Why has RDBMS Primacy has Endured?

Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.

See: RDBMS Primacy Diagram.

For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:

"Future of Database Research is excellent, but what is the future of data?"

"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."

-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.

"One size fits all: A concept whose time has come and gone

  1. They are direct descendants of System R and Ingres and were architected more than 25 years ago
  2. They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.

-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.

Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".

Circumstantial Pain

As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).

Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:

Government (Globally) -

Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.

Enterprises -

Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.

In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.

Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).

Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.

Technology

There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:

  1. Query language standardization - nothing close to SQL standardization
  2. Data Access API standardization - nothing close to ODBC, JDBC, OLE-DB, or ADO.NET
  3. Wire protocol standardization - nothing close to HTTP
  4. Distributed Identity infrastructure - nothing close to the non-repudiatable digital Identity that foaf+ssl accords
  5. Use of Identifiers as network based pointers to data sources - nothing close to RDF based Linked Data
  6. Negotiable data representation - nothing close to Mime and HTTP based Content Negotiation
  7. Scalability especially in the era of Internet & Web scale.

Entity-Attribute-Value with Classes & Relationships (EAV/CR) data models

A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.

What Comes Next?

The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:

  1. The Internet aided "Global Village" has brought "Open World" vs "Closed World" assumption issues to the fore e.g., the current global economic crisis remains centered on the inability to connect dots across "Open World" and "Closed World" data frontiers
  2. Entity-Attribute-Value with Classes & Relationships (EAV/CR) based DBMS models are more effective when dealing with disparate data associated with disparate schemas, across disparate DBMS engines, host operating systems, and networks.

Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:

  1. Every item of data (Datum/Entity/Object/Resource) has Identity
  2. Identity is achieved via Identifiers that aren't locked at the DBMS, OS, Network, or Application levels
  3. Object Identifiers and Object values are independent (extricably linked by association)
  4. Object values should be de-referencable via Object Identifier
  5. Representation of de-referenced value graph (entity, attributes, and values mesh) must be negotiable (i.e. content negotiation)
  6. Structured query language must provide mechanism for Creation, Deletion, Updates, and Querying of data objects
  7. Performance & Scalability across "Closed World" (enterprise) and "Open World" (Internet & Web) realms.

Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.

The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.

It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.

EAV/CR Oriented Data Access & Management Technology

Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:

The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.

See: New EAV/CR Primacy Diagram.

Related

# PermaLink Comments [0]
01/27/2009 19:19 GMT Modified: 05/29/2009 10:39 GMT
The Time for RDBMS Primacy Downgrade is Nigh!

As the world works it way through a "once in a generation" economic crisis, the long overdue downgrade of the RDBMS, from its pivotal position at the apex of the data access and data management pyramid is nigh.

What is the Data Access, and Data Management Value Pyramid?

As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.

Image

The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.

In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.

Why has RDBMS Primacy has Endured?

Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.

Image

For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:

"Future of Database Research is excellent, but what is the future of data?"

"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."

-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.

"One size fits all: A concept whose time has come and gone

  1. They are direct descendants of System R and Ingres and were architected more than 25 years ago
  2. They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.

-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.

Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".

Circumstantial Pain

As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).

Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:

Government (Globally) -

Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.

Enterprises -

Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.

In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.

Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).

Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.

Technology

There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:

  1. Query language standardization - nothing close to SQL standardization
  2. Data Access API standardization - nothing close to ODBC, JDBC, OLE-DB, or ADO.NET
  3. Wire protocol standardization - nothing close to HTTP
  4. Distributed Identity infrastructure - nothing close to the non-repudiatable digital Identity that foaf+ssl accords
  5. Use of Identifiers as network based pointers to data sources - nothing close to RDF based Linked Data
  6. Negotiable data representation - nothing close to Mime and HTTP based Content Negotiation
  7. Scalability especially in the era of Internet & Web scale.

Entity-Attribute-Value with Classes & Relationships (EAV/CR) data models

A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.

What Comes Next?

The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:

  1. The Internet aided "Global Village" has brought "Open World" vs "Closed World" assumption issues to the fore e.g., the current global economic crisis remains centered on the inability to connect dots across "Open World" and "Closed World" data frontiers
  2. Entity-Attribute-Value with Classes & Relationships (EAV/CR) based DBMS models are more effective when dealing with disparate data associated with disparate schemas, across disparate DBMS engines, host operating systems, and networks.

Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:

  1. Every item of data (Datum/Entity/Object/Resource) has Identity
  2. Identity is achieved via Identifiers that aren't locked at the DBMS, OS, Network, or Application levels
  3. Object Identifiers and Object values are independent (extricably linked by association)
  4. Object values should be de-referencable via Object Identifier
  5. Representation of de-referenced value graph (entity, attributes, and values mesh) must be negotiable (i.e. content negotiation)
  6. Structured query language must provide mechanism for Creation, Deletion, Updates, and Querying of data objects
  7. Performance & Scalability across "Closed World" (enterprise) and "Open World" (Internet & Web) realms.

Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.

The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.

It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.

EAV/CR Oriented Data Access & Management Technology

Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:

The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.

Image

Related

# PermaLink Comments [0]
01/24/2009 20:04 GMT Modified: 07/26/2009 10:20 GMT
New ADO.NET 3.x Provider for Virtuoso Released (Update 2)

I am pleased to announce the immediate availability of the Virtuoso ADO.NET 3.5 data provider for Microsoft's .NET platform.

What is it?

A data access driver/provider that provides conceptual entity oriented access to RDBMS data managed by Virtuoso. Naturally, it also uses Virtuoso's in-built virtual / federated database layer to provide access to ODBC and JDBC accessible RDBMS engines such as: Oracle (7.x to latest), SQL Server (4.2 to latest), Sybase, IBM Informix (5.x to latest), IBM DB2, Ingres (6.x to latest), Progress (7.x to OpenEdge), MySQL, PostgreSQL, Firebird, and others using our ODBC or JDBC bridge drivers.

Benefits?

Technical:

It delivers an Entity-Attribute-Value + Classes & Relationships model over disparate data sources that are materialized as .NET Entity Framework Objects, which are then consumable via ADO.NET Data Object Services, LINQ for Entities, and other ADO.NET data consumers.

The provider is fully integrated into Visual Studio 2008 and delivers the same "ease of use" offered by Microsoft's own SQL Server provider, but across Virtuoso, Oracle, Sybase, DB2, Informix, Ingres, Progress (OpenEdge), MySQL, PostgreSQL, Firebird, and others. The same benefits also apply uniformly to Entity Frameworks compatibility.

Bearing in mind that Virtuoso is a multi-model (hybrid) data manager, this also implies that you can use .NET Entity Frameworks against all data managed by Virtuoso. Remember, Virtuoso's SQL channel is a conduit to Virtuoso's core; thus, RDF (courtesy of SPASQL as already implemented re. Jena/Sesame/Redland providers), XML, and other data forms stored in Virtuoso also become accessible via .NET's Entity Frameworks.


Strategic:

You can choose which entity oriented data access model works best for you: RDF Linked Data & SPARQL or .NET Entity Frameworks & Entity SQL. Either way, Virtuoso delivers a commercial grade, high-performance, secure, and scalable solution.


How do I use it?

Simply follow one of guides below:

Note: When working with external or 3rd party databases, simply use the Virtuoso Conductor to link the external data source into Virtuoso. Once linked, the remote tables will simply be treated as though they are native Virtuoso tables leaving the virtual database engine to handle the rest. This is similar to the role the Microsoft JET engine played in the early days of ODBC, so if you've ever linked an ODBC data source into Microsoft Access, you are ready to do the same using Virtuoso.

Related

# PermaLink Comments [0]
01/08/2009 04:36 GMT Modified: 01/08/2009 09:05 GMT
Where Are All the RDF-based Semantic Web Applications?

In response to the "Semantic Web Technology" application classification scheme espoused by ReadWriteWeb (RWW), emphasized in the post titled: Where are all the RDF-based Semantic Web Apps?, here is my attempt to clarify and reintroduce what OpenLink Software offers (today) in relation to Semantic Web technology.

From the RWW Top-Down category, which I interpret as: technologies that produce RDF from non RDF data sources. Our product portfolio is comprised of the following; Virtuoso Universal Server, OpenLink Data Spaces, OpenLink Ajax Toolkit, and OpenLink Data Explorer (which includes ubiquity commands).

Virtuoso Universal Server functionality summary:

  1. Generation of RDF Linked Data Views of SQL, XML, and Web Services in general
  2. Deployment of RDF Linked Data
  3. "On the Fly" generation of RDF Linked Data from Document Web information resources (i.e. distillation of entities from their containers e.g. Web pages) via Cartridges / Drivers
  4. SPARQL query language support
  5. SPARQL extensions that bring SPARQL closer to SQL e.g Aggregates, Update, Insert, Delete Named Graph support (i.e. use of logical names to partition RDF data within Virtuoso's multi-model dbms engine)
  6. Inference Engine (currently in use re. DBpedia via Yago and UMBEL)
  7. Host and exposes data from Drupal, Wordpress, MediaWiki, phpBB3 as RDF Linked Data via in-built support for PHP runtime
  8. Available as an EC2 AMI
  9. etc..

OpenLink Data Spaces functionality summary:

  1. Simple mechanism for Linked Data Web enabling yourself by giving you an HTTP based User ID (a de-referencable URI) that is linked to a FOAF based Profile page and OpenID
  2. Binds all your data sources (blogs, wikis, bookmarks, photos, calendar items etc. ) to your URI so can "Find" things by only remembering your URI
  3. Makes your profile page and personal URI the focal point of Linked Data Web presence
  4. Delivers Data Portability (using data access by value or data access by reference) across data silos (e.g. Web 2.0 style social networks)
  5. Allows you make annotations about anything in your own Data Space(s) on the Web without exposure to RDF markup
  6. A Briefcase feature that provides a WebDAV driven RDF Linked Data variant of functionality seen in Mac OS X Spotlight and WinFS with the addition of SPARQL compliance
  7. Automatically generates RDFa in its (X)HTML pages
  8. Blog, Wiki, WebDAV File Server, Shared Bookmarks, Calendar, and other applications that look and feel like Web 2.0 counterparts but emitt RDF Linked Data amongst a plethora of data exchange formats
  9. Available as an EC2 AMI
  10. etc..

OpenLink Ajax Toolkit functionality summary:

  1. Provides binding to SQL, RDF, XML, and Web Services via Ajax Database Connectivity Layer (you only need an ODBC, JDBC, OLE-DB, ADO.NET, XMLA Driver, or Web Service on the backend for dynamic data access from Javascript)
  2. All controls are Ajax Database Connectivity bound (widgets get their data from Ajax Database Connectivity data sources)
  3. Bundled with Virtuoso and ODS installations.
  4. etc.

OpenLink Data Explorer functionality summary

  1. Distills entities associated with information resource style containers (e.g. Web Pages or files) as RDF Linked Data
  2. Exposes the RDF based Linked Data graph associated with information resources (see the Linked Data behind Web pages)
  3. Ubiquity commands for invoking the above
  4. Available as a Hosted Service or Firefox Extension
  5. Bundled with Virtuoso and ODS installations
  6. etc.

Note:

Of course you could have simply looked up OpenLink Software's FOAF based Profile page (*note the Linked Data Explorer tab*), or simply passed the FOAF profile page URL to a Linked Data aware client application such as: OpenLink Data Explorer, Zitgist Data Viewer, Marbles, and Tabulator, and obtained information. Remember, OpenLink Software is an Entity of Type: foaf:Organization, on the burgeoning Linked Data Web :-)

Related

# PermaLink Comments [0]
10/01/2008 19:09 GMT Modified: 10/02/2008 15:27 GMT
Crunchbase & Semantic Web Interview (Remix - Update 1)

After reading Bengee's interview with CrunchBase, I decided to knock up a quick interview remix as part of my usual attempt to add to the developing discourse.

CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released a CrunchBase Sponger Cartridge. Can you explain what a CrunchBase Sponger Cartridge is?
Me: A Sponger Cartridge is a data access driver for Web Resources that plugs into our Virtuoso Universal Server (DBMS and Linked Data Web Server combo amongst other things). It uses the internal structure of a resource and/or a web service associated with a resource, to materialize an RDF based Linked Data graph that essentially describes the resource via its properties (Attributes & Relationships).

Image


CrunchBase: And what inspired you to create it?
Me: Bengee built a new space with your data, and we've built a space on the fly from your data which still resides in your domain. Either solution extols the virtues of Linked Data i.e. the ability to explore relationships across data items with high degrees of serendipity (also colloquially known as: following-your-nose pattern in Semantic Web circles).
Bengee posted a notice to the Linking Open Data Community's public mailing list announcing his effort. Bearing in mind the fact that we've been using middleware to mesh the realms of Web 2.0 and the Linked Data Web for a while, it was a no-brainer to knock something up based on the conceptual similarities between Wikicompany and CrunchBase. In a sense, a quadrant of orthogonality is what immediately came to mind re. Wikicompany, CrunchBase, Bengee's RDFization efforts, and ours.
Bengee created an RDF based Linked Data warehouse based on the data exposed by your API, which is exposed via the Semantic CrunchBase data space. In our case we've taken the "RDFization on the fly" approach which produces a transient Linked Data View of the CrunchBase data exposed by your APIs. Our approach is in line with our world view: all resources on the Web are data sources, and the Linked Data Web is about incorporating HTTP into the naming scheme of these data sources so that the conventional URL based hyperlinking mechanism can be used to access a structured description of a resource, which is then transmitted using a range negotiable representation formats. In addition, based on the fact that we house and publish a lot of Linked Data on the Web (e.g. DBpedia, PingTheSemanticWeb, and others), we've also automatically meshed Crunchbase data with related data in DBpedia and Wikicompany data.

CrunchBase: Do you know of any apps that are using CrunchBase Cartridge to enhance their functionality?
Me: Yes, the OpenLink Data Explorer which provides CrunchBase site visitors with the option to explore the Linked Data in the CrunchBase data space. It also allows them to "Mesh" (rather than "Mash") CrunchBase data with other Linked Data sources on the Web without writing a single line of code.

CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?
Me: We saw the Semantic Web as a vehicle for standardizing conceptual views of heterogeneous data sources via context lenses (URIs). In 1998 as part of our strategy to expand our business beyond the development and deployment of ODBC, JDBC, and OLE-DB data providers, we decided to build a Virtual Database Engine (see: Virtuoso History), and in doing so we sought a standards based mechanism for the conceptual output of the data virtualization effort. As of the time of the seminal unveiling of the Semantic Web in 1998 we were clear about two things, in relation to the effects of the Web and Internet data management infrastructure inflections: 1) Existing DBMS technology had reached it limits 2) Web Servers would ultimately hit their functional limits. These fundamental realities compelled us to develop Virtuoso with an eye to leveraging the Semantic Web as a vehicle from completing its technical roadmap.

CrunchBase: Can you put into layman’s terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?
Me: RDF (Resource Description Framework) is a Graph based Data Model that facilitates resource description using the Subject, Predicate, and Object principle. Associated with the core data model, as part of the overall framework, are a number of markup languages for expressing your descriptions (just as you express presentation markup semantics in HTML or document structure semantics in XML) that include: RDFa (simple extension of HTML markup for embedding descriptions of things in a page), N3 (a human friendly markup for describing resources), RDF/XML (a machine friendly markup for describing resources).
SPARQL is the query language associated with the RDF Data Model, just as SQL is a query language associated with the Relational Database Model. Thus, when you have RDF based structured and linked data on the Web, you can query against Web using SPARQL just as you would against an Oracle/SQL Server/DB2/Informix/Ingres/MySQL/etc.. DBMS using SQL. That's it in a nutshell.

CrunchBase: On your website you wrote that “RDF and SPARQL as productivity boosters in everyday web development”. Can you elaborate on why you believe that to be true?
Me: I think the ability to discern a formal description of anything via its discrete properties is of immense value re. productivity, especially when the capability in question results in a graph of Linked Data that isn't confined to a specific host operating system, database engine, application or service, programming language, or development framework. RDF Linked Data is about infrastructure for the true materialization of the "Information at Your Fingertips" vision of yore. Even though it's taken the emergence of RDF Linked Data to make the aforementioned vision tractable, the comprehension of the vision's intrinsic value have been clear for a very long time. Most organizations and/or individuals are quite familiar with the adage: Knowledge is Power, well there isn't any knowledge without accessible Information, and there isn't any accessible Information without accessible Data. The Web has always be grounded in accessibility to data (albeit via compound container documents called Web Pages).
Bottom line, RDF based Linked Data is about Open Data access by reference using URIs (HTTP based Entity IDs / Data Object IDs / Data Source Names), and as I said earlier, the intrinsic value is pretty obvious bearing in mind the costs associated with integrating disparate and heterogeneous data sources -- across intranets, extranets, and the Internet.

CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?
Me: I agree with Nova. But I see Web 3.0 as a phase within the Semantic Web innovation continuum. Web 3.0 exists because Web 2.0 exists. Both of these Web versions express usage and technology focus patterns. Web 2.0 is about the use of Open Source technologies to fashion Web Services that are ultimately used to drive proprietary Software as Service (SaaS) style solutions. Web 3.0 is about the use of "Smart Data Access" to fashion a new generation of Linked Data aware Web Services and solutions that exploit the federated nature of the Web to maximum effect; proprietary branding will simply be conveyed via quality of data (cleanliness, context fidelity, and comprehension of privacy) exposed by URIs.

Here are some examples of the CrunchBase Linked Data Space, as projected via our CruncBase Sponger Cartridge:

  1. Amazon.com
  2. Microsoft
  3. Google
  4. Apple
# PermaLink Comments [0]
08/27/2008 18:16 GMT Modified: 08/27/2008 20:35 GMT
Response to: Whole Data Post (Update 3)

This post is in response to Glenn McDonald's post titled: Whole Data, where he highlights a number of issues relating to "Semantic Web" marketing communications and overall messaging, from his perspective.

By coincidence, Glenn and I presented at this month's Cambridge Semantic Web Gathering.

I've provided a dump of Glenn's issues and my responses below:

Issue - RDF

  • Ingenious data decomposition idea, but:
  • too low-level; the assembly language of data, where we need Java or Ruby
  • "resource" is not the issue; there's no such thing as "metadata", it's all data; "meta" is a perspective
  • lists need to be effortless, not painful and obscure
  • nodes need to be represented, not just implied; they need types and literals in a more pervasive, integrated way.

Response:

RDF is a Graph based Data Model it stands for Resource Description Framework. The Metadata data angle comes from it's Meta Content Framework (MCF) origins. You can express and serialize data based on the RDF Data Model using: Turtle, N3, TriX, N-Triples, and RDF/XML.

Issue - SPARQL (and Freebase's MQL)

These are just appeasement:
- old query paradigm: fishing in dark water with superstitiously tied lures; only works well in carefully stocked lakes
- we don't ask questions by defining answer shapes and then hoping they're dredged up whole.

Response:

SPARQL, MQL, and Entity-SQL are Graph Model oriented Query Languages. Query Languages always accompany Database Engines. SQL is the Relational Model equivalent.

Issue - Linked Data

Noble attempt to ground the abstract, but:
- URI dereferencing/namespace/open-world issues focus too much technical attention on cross-source cases where the human issues dwarf the technical ones anyway
- FOAF query over the people in this room? forget it.
- link asymmetry doesn't scale
- identity doesn't scale
- generating RDF from non-graph sources: more appeasement, right where the win from actually converting could be biggest!

Response:

Innovative use of HTTP to deliver "Data Access by Reference" to the Linked Data Web.

When you have a Data Model, Database Engine, and Query Language, the next thing you need is a Data Access mechanism that provides "Data Access by Reference". ODBC and JDBC (amongst others) provide "Data Access by Reference" via Data Source Names. Linked Data is about the same thing (URIs are Data Source Names) with the following differences:

  • Naming is scoped to the entity level rather than container level
  • HTTP's use within the data source naming scheme expands the referencability of the Named Entity Descriptions beyond traditional confines such as applications, operating systems, and database engines.

Issue - Giant Global Graph

Hugely motivating and powerful idea, worthy of a superhero (Graphius!), but:
- giant and global parts are too hard, and starting global makes every problem harder
- local projects become unmanageable in global context (Cyc, Freebase data-modeling lists...). And my thus my plea, again. Forget "semantic" and "web", let's fix the database tech first:
- node/arc data-model, path-based exploratory query-model
- data-graph applications built easily on top of this common model; building them has to be easy, because if it's hard, they'll be bad
- given good database tech, good web data-publishing tech will be trivial!
- given good tools for graphs, the problems of uniting them will be only as hard as they have to be.

Response:

Giant Global Graph is just another moniker for a "Web of Linked Data" or "Linked Data Web".

Multi-Model Database technology that meshes the best of the Graph & Relational Models exist. In a nutshell, this is what Virtuoso is all about and it's existed for a very long time :-)

Virtuoso Universal Sever (recap)

Virtuoso is also a Virtual DBMS engine (so you can see Heterogeneous Relational Data via Graph Model Context Lenses). Naturally, it is also a Linked Data Deployment platform (or Linked Data Sever).

The issue isn't the "Semantic Web" moniker per se., it's about how Linked Data (foundation layer of Semantic Web) gets introduced to users. As I said during the MIT Gathering: "The Web is experienced via Web Browsers primarily, so any enhancement to the Web must be exposed via traditional Web Browsers", which is why we've opted to simply add "View Linked Data Sources" to the existing set of common Browser options that includes:

  1. View page in rendered form (default)
  2. View page source (i.e., how you see the markup behind the page)

By exposing the Linked Data Web option as described above, you enable the Web user to knowingly transition from the traditional Rendered (X)HTML page view to the Linked Data View (i.e., structured data behind the page). This simple "User Interaction" tweak makes the notion of exploiting a Structured Web becomes somewhat clearer.

The Linked Data Web isn't a panacea. It's just an addition to the existing Web that enrichens the things you can do with the Web. It's predominance, like any application feature, will be subject to the degrees to which it delivers tangible value or matrializes internal and external opportunity costs.

Note: The Web isn't ubiquitous today becuase all it's users groked HTML Markup. It's ubquitity is a function of opportunity costs: there simply came a point in the Web boostrap when nobody could afford the opportunity costs associated with being off the Web. The same thing will play out with Linked Data and the broader Semantic Web vision.

Links:
  1. Linked Data Journey part of my Linked Data Planet Presentation Remix(from slides 15 to 22 - which include bits from TimBL's presentation)
  2. OpenLink Data Explorer
  3. OpenLink Data Explorer Screenshots and examples.
# PermaLink Comments [0]
08/15/2008 13:06 GMT Modified: 08/15/2008 19:11 GMT
CrunchBase gets hooked up with the Linked Data Web!

It's getting really hot in Linked Data land! Two days ago Benjamin Nowack pinged the LOD community about his RDFization of Crunchbase (sample (X)HTML view: http://cb.semsol.org/company/opera-software) courtesy of Crounchbase releasing an API. As you know, I've always equated Web Service API to Database CLIs (ODBC, JDBC, ADO.NET etc.) as both offer code level hooks into Data Spaces.

Naturally, we've decided to join the Crunchbase RDFization party, and have just completed a Virtuoso Sponger Cartridge (an RDFizer) for Crouncbase. What we add in our particular cartridge is additional meshing with DBpedia and Wikicompany Linked Data Spaces, plus RDFizaton of the Crunchbase (X)HTML pages :-)

As I've postulated for a while, Linked Data is about data "Meshing" and "Meshups". This isn't a buzzword play. I am pointing out an important distinction between "Mashups" and "Meshpus". Which goes as follows: "Mashups" are about code level joining devoid of structured modelling, hence the revelation of code as opposed to data when you look behind a "Mashup". "Meshups" on the other hand, are about joining disparate structured data sources across the Web. And when you look behind a "Meshup" you see structured data (preferably Linked Data) that enables further "Meshing".

I truly believe that we are now inches away from critical mass re. Linked Data, and because we are dealing with data, the network-effect will be sky-high! I shudder to think about the state of the Linked Data Web in 12 months time. Yes, I am giving the explosion 12 months (or less). These are very exciting times.

Demo Links:

For best experience I encourage you to look at the OpenLink Data Explorer extension for Firefox (2.x - 3.x). This enables you to go to Crunchbase (X)HTML pages (and other sites on the Web of course), and then simply use the "View | Linked Data Sources" main or context menu sequence to unveil the Linked Data Sources associated with any Web Page.

Of course there is much more to come!

Tags: | |
# PermaLink Comments [0]
07/25/2008 14:01 GMT Modified: 07/29/2008 21:43 GMT
 <<     | 1 | 2 | 3 | 4 | 5 | 6 |     >>
Powered by OpenLink Virtuoso Universal Server
Running on Linux platform