Details
Kingsley Uyi Idehen
Lexington, United States
Subscribe
Post Categories
Recent Articles
Display Settings
Translate
|
Showing posts in all categories Refresh
Re-introducing the Virtuoso Virtual Database Engine
In recent times a lot of the commentary and focus re. Virtuoso has centered on the RDF Quad Store and Linked Data. What sometimes gets overlooked is the sophisticated Virtual Database Engine that provides the foundation for all of Virtuoso's data integration capabilities.
In this post I provide a brief re-introduction to this essential aspect of Virtuoso.
What is it?
This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).
Why is it important?
In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools
In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.
How do I use it?
The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:
Relational Database Federation
You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!
Conceptual Level Data Access using the RDF Model
You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).
You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).
It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.
Conceptual Level Data Access using ADO.NET Entity Frameworks
As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.
Related
|
02/17/2010 16:38 GMT
|
Modified:
02/17/2010 16:54 GMT
|
Why Do I Need To Pay For ODBC , JDBC, ADO.NET, OLE-DB Drivers? (Update 3)
Payment is a function of pain alleviation (opportunity cost) monetization.
This post is about highlighting the real pains associated with the $0.00 misconception associated with Data Access Drivers: ODBC, JDBC, ADO.NET, OLE-DB etc.
In the most basic sense, there are some fundament aspects of data access that are complex to implement and rarely implemented (if at all) by free drivers, the list includes:
- Escape Syntaxes for Dates and Functions
- Metadata Calls which enable smarter ODBC compliant applications (this feature is typically missing on Driver Side and abused on the Client side i.e., making clients DBMS specific by testing for specific DBMS names)
- Scrollable Cursors, this is how you deal with change sensitivity, and most drivers actually fake support and get away with it due to shortage of applications to test proper cursor types (Static, Forward-Only, Key-Set, Dynamic, and Mixed models).
Okay, so we're done with actual driver sophistication re. implementation of critical features. Let's Up the ante by veering into the area of security. At the most basic level, It's extremely important to understand that all data access driver types provide read-write access to your databases; thus, it's imperative that data access drivers address the following:
-
Read-Only or Read-Write Access scoped to specific Users
-
Ditto applied to specific User Groups
-
Ditto applied to Database Names
-
Ditto applied to specific ODBC compliant applications
-
Ditto applied to specific ODBC host operating systems
-
Ditto applied to specific IP addresses or Ranges on your Network
-
Any combination of items 1-6 as part of a configurable data access rules/policy system.
Once you're done with security, you then have the thorny issue of data access and data flow management. In a nutshell, your driver needs to be able to handle:
-
Protection against cartesian product network flooding (e.g., user clicks on Customer Table via an ODBC compliant application without comprehension of back-end implications)
-
Enabling or Disabling of key DBMS engine data access optimization features (e.g. DBMS specific extensions exposed via Environment Variables of SQL commands based settings)
-
Conditional Connection Pooling across User, User Groups, Applications, Host Operating System, IP Address dimensions.
Once you've dealt with Security and Data Flow, you then have to address the enforcement of these settings across a myriad of ODBC compliant host, which is where Zeroconfig and centralized data access administration comes into play i.e., configure once (locally) and enforce globally.
When OpenLink Software entered the ODBC Driver Market segment in 1992, the issues above where the fundamental basis of our Multi-Tier Drivers. Thus, although we distinguished ourselves via performance, stability, and specification adherence, our fundamental engineering focus has always been skewed towards security and configurability, alongside high-performance and scalability.
As we close 2009, the security issues that pervade Native DBMS Drives, ODBC, JDBC, ADO.NET, OLE-DB etc. Drivers have only increased, courtesy of ubiquitous computing, sadly though, there remains a fundamental illusion that Data Access Drivers simply connect you to DBMS back-ends, and since you can get these drivers at $0.00 from most DBMS vendors they can't be that important.
I hope that this post brings some clarity to a very serious security and general configuration management issues associated with Data Access Drivers. Free ODBC Drivers offer nothing, when it comes to the real issues of Open Data Access. If they did, they wouldn't be worth $0.00!
Note: wondering if this has anything to do with Linked Data (my current data access focal point)? Well, remember, the Linked Data meme is fundamentally about REST based Open Data Access & Integration via HTTP; thus, what applies to Relational Model databases naturally applies to their more granular Graph Model relatives. Basically, data access security never goes away, it just gets more granular, complex, and ultimately, mercurial.
Related
|
12/16/2009 17:53 GMT
|
Modified:
12/31/2009 11:40 GMT
|
5 Game Changing Things about the OpenLink Virtuoso + AWS Cloud Combo
Here are 5 powerful benefits you can immediately derive from the combination of Virtuoso and Amazon's AWS services (specifically the EC2 and EBS components):
- Acquire your own personal or service specific data space in the Cloud. Think DBase, Paradox, FoxPRO, Access of yore, but with the power of Oracle, Informix, Microsoft SQL Server etc.. using a Conceptual, as opposed to solely Logical, model based DBMS (i.e., a Hybrid DBMS Engine for: SQL, RDF, XML, and Full Text)
- Ability to share and control access to your resources using innovations like FOAF+SSL, OpenID, and OAuth, all from one place
- Construction of personal or organization based FOAF profiles in a matter of minutes; by simply creating a basic DBMS (or ODS application layer) account; and then using this profile to create strong links (references) to all your Data silos (esp. those from the Web 2.0 realm)
- Load data sets from the LOD cloud or Sponge existing Web resources (i.e., on the fly data transformation to RDF model based Linked Data) and then use the combination to build powerful lookup services that enrich the value of URLs (think: Web addressable reports holding query results) that you publish
- Bind all of the above to a domain that you own (e.g. a .Name domain) so that you have an attribution-friendly "authority" component for resource URLs and Entity URIs published from your Personal Linked Data Space on the Web (or private HTTP network).
In a nutshell, the AWS Cloud infrastructure simplifies the process of generating Federated presence on the Internet and/or World Wide Web. Remember, centralized networking models always end up creating data silos, in some context, ultimately! :-)
|
11/18/2009 14:12 GMT
|
Modified:
01/31/2010 17:24 GMT
|
New ADO.NET 3.x Provider for Virtuoso Released (Update 2)
I am pleased to announce the immediate availability of the Virtuoso ADO.NET 3.5 data provider for Microsoft's .NET platform.
What is it?
A data access driver/provider that provides conceptual entity oriented access to RDBMS data managed by Virtuoso. Naturally, it also uses Virtuoso's in-built virtual / federated database layer to provide access to ODBC and JDBC accessible RDBMS engines such as: Oracle (7.x to latest), SQL Server (4.2 to latest), Sybase, IBM Informix (5.x to latest), IBM DB2, Ingres (6.x to latest), Progress (7.x to OpenEdge), MySQL, PostgreSQL, Firebird, and others using our ODBC or JDBC bridge drivers.
Benefits?
Technical:
It delivers an Entity-Attribute-Value + Classes & Relationships model over disparate data sources that are materialized as .NET Entity Framework Objects, which are then consumable via ADO.NET Data Object Services, LINQ for Entities, and other ADO.NET data consumers.
The provider is fully integrated into Visual Studio 2008 and delivers the same "ease of use" offered by Microsoft's own SQL Server provider, but across Virtuoso, Oracle, Sybase, DB2, Informix, Ingres, Progress (OpenEdge), MySQL, PostgreSQL, Firebird, and others. The same benefits also apply uniformly to Entity Frameworks compatibility.
Bearing in mind that Virtuoso is a multi-model (hybrid) data manager, this also implies that you can use .NET Entity Frameworks against all data managed by Virtuoso. Remember, Virtuoso's SQL channel is a conduit to Virtuoso's core; thus, RDF (courtesy of SPASQL as already implemented re. Jena/Sesame/Redland providers), XML, and other data forms stored in Virtuoso also become accessible via .NET's Entity Frameworks.
Strategic:
You can choose which entity oriented data access model works best for you: RDF Linked Data & SPARQL or .NET Entity Frameworks & Entity SQL. Either way, Virtuoso delivers a commercial grade, high-performance, secure, and scalable solution.
How do I use it?
Simply follow one of guides below:
Note: When working with external or 3rd party databases, simply use the Virtuoso Conductor to link the external data source into Virtuoso. Once linked, the remote tables will simply be treated as though they are native Virtuoso tables leaving the virtual database engine to handle the rest. This is similar to the role the Microsoft JET engine played in the early days of ODBC, so if you've ever linked an ODBC data source into Microsoft Access, you are ready to do the same using Virtuoso.
Related
|
01/08/2009 04:36 GMT
|
Modified:
01/08/2009 09:05 GMT
|
Dog-fooding: Linked Data and OpenLink Product Portfolio
Thanks to RDF and Linked Data, it's becoming a lot easier for us to explain and reveal the depth of the OpenLink technology portfolio.
Here is a look at our offerings by product family:
As you explore the Linked Data graph exposed via our product portfolio, I expect you to experience, or at least spot, the virtuous potential of high SDQ (Serendipitous Discovery Quotient) courtesy of Linked Data, which is Web 3.0's answer to SEO. For instance, how Database, Operating System, and Processor family paths in the product portfolio graph (data network) unveil a lot more about OpenLink Software than meets the proverbial "eye" :-)
|
10/24/2008 22:05 GMT
|
Modified:
10/24/2008 18:33 GMT
|
Crunchbase & Semantic Web Interview (Remix - Update 1)
After reading Bengee's interview with CrunchBase, I decided to knock up a quick interview remix as part of my usual attempt to add to the developing discourse.
CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released a CrunchBase Sponger Cartridge. Can you explain what a CrunchBase Sponger Cartridge is?
Me: A Sponger Cartridge is a data access driver for Web Resources that plugs into our Virtuoso Universal Server (DBMS and Linked Data Web Server combo amongst other things). It uses the internal structure of a resource and/or a web service associated with a resource, to materialize an RDF based Linked Data graph that essentially describes the resource via its properties (Attributes & Relationships).
CrunchBase: And what inspired you to create it?
Me: Bengee built a new space with your data, and we've built a space on the fly from your data which still resides in your domain. Either solution extols the virtues of Linked Data i.e. the ability to explore relationships across data items with high degrees of serendipity (also colloquially known as: following-your-nose pattern in Semantic Web circles).
Bengee posted a notice to the Linking Open Data Community's public mailing list announcing his effort. Bearing in mind the fact that we've been using middleware to mesh the realms of Web 2.0 and the Linked Data Web for a while, it was a no-brainer to knock something up based on the conceptual similarities between Wikicompany and CrunchBase. In a sense, a quadrant of orthogonality is what immediately came to mind re. Wikicompany, CrunchBase, Bengee's RDFization efforts, and ours.
Bengee created an RDF based Linked Data warehouse based on the data exposed by your API, which is exposed via the Semantic CrunchBase data space. In our case we've taken the "RDFization on the fly" approach which produces a transient Linked Data View of the CrunchBase data exposed by your APIs. Our approach is in line with our world view: all resources on the Web are data sources, and the Linked Data Web is about incorporating HTTP into the naming scheme of these data sources so that the conventional URL based hyperlinking mechanism can be used to access a structured description of a resource, which is then transmitted using a range negotiable representation formats. In addition, based on the fact that we house and publish a lot of Linked Data on the Web (e.g. DBpedia, PingTheSemanticWeb, and others), we've also automatically meshed Crunchbase data with related data in DBpedia and Wikicompany data.
CrunchBase: Do you know of any apps that are using CrunchBase Cartridge to enhance their functionality?
Me: Yes, the OpenLink Data Explorer which provides CrunchBase site visitors with the option to explore the Linked Data in the CrunchBase data space. It also allows them to "Mesh" (rather than "Mash") CrunchBase data with other Linked Data sources on the Web without writing a single line of code.
CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?
Me: We saw the Semantic Web as a vehicle for standardizing conceptual views of heterogeneous data sources via context lenses (URIs). In 1998 as part of our strategy to expand our business beyond the development and deployment of ODBC, JDBC, and OLE-DB data providers, we decided to build a Virtual Database Engine (see: Virtuoso History), and in doing so we sought a standards based mechanism for the conceptual output of the data virtualization effort. As of the time of the seminal unveiling of the Semantic Web in 1998 we were clear about two things, in relation to the effects of the Web and Internet data management infrastructure inflections: 1) Existing DBMS technology had reached it limits 2) Web Servers would ultimately hit their functional limits. These fundamental realities compelled us to develop Virtuoso with an eye to leveraging the Semantic Web as a vehicle from completing its technical roadmap.
CrunchBase: Can you put into layman’s terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?
Me: RDF (Resource Description Framework) is a Graph based Data Model that facilitates resource description using the Subject, Predicate, and Object principle. Associated with the core data model, as part of the overall framework, are a number of markup languages for expressing your descriptions (just as you express presentation markup semantics in HTML or document structure semantics in XML) that include: RDFa (simple extension of HTML markup for embedding descriptions of things in a page), N3 (a human friendly markup for describing resources), RDF/XML (a machine friendly markup for describing resources).
SPARQL is the query language associated with the RDF Data Model, just as SQL is a query language associated with the Relational Database Model. Thus, when you have RDF based structured and linked data on the Web, you can query against Web using SPARQL just as you would against an Oracle/SQL Server/DB2/Informix/Ingres/MySQL/etc.. DBMS using SQL. That's it in a nutshell.
CrunchBase: On your website you wrote that “RDF and SPARQL as productivity boosters in everyday web development”. Can you elaborate on why you believe that to be true?
Me: I think the ability to discern a formal description of anything via its discrete properties is of immense value re. productivity, especially when the capability in question results in a graph of Linked Data that isn't confined to a specific host operating system, database engine, application or service, programming language, or development framework. RDF Linked Data is about infrastructure for the true materialization of the "Information at Your Fingertips" vision of yore. Even though it's taken the emergence of RDF Linked Data to make the aforementioned vision tractable, the comprehension of the vision's intrinsic value have been clear for a very long time. Most organizations and/or individuals are quite familiar with the adage: Knowledge is Power, well there isn't any knowledge without accessible Information, and there isn't any accessible Information without accessible Data. The Web has always be grounded in accessibility to data (albeit via compound container documents called Web Pages). Bottom line, RDF based Linked Data is about Open Data access by reference using URIs (HTTP based Entity IDs / Data Object IDs / Data Source Names), and as I said earlier, the intrinsic value is pretty obvious bearing in mind the costs associated with integrating disparate and heterogeneous data sources -- across intranets, extranets, and the Internet.
CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?
Me: I agree with Nova. But I see Web 3.0 as a phase within the Semantic Web innovation continuum. Web 3.0 exists because Web 2.0 exists. Both of these Web versions express usage and technology focus patterns. Web 2.0 is about the use of Open Source technologies to fashion Web Services that are ultimately used to drive proprietary Software as Service (SaaS) style solutions. Web 3.0 is about the use of "Smart Data Access" to fashion a new generation of Linked Data aware Web Services and solutions that exploit the federated nature of the Web to maximum effect; proprietary branding will simply be conveyed via quality of data (cleanliness, context fidelity, and comprehension of privacy) exposed by URIs.
Here are some examples of the CrunchBase Linked Data Space, as projected via our CruncBase Sponger Cartridge:
-
Amazon.com
-
Microsoft
-
Google
-
Apple
|
08/27/2008 18:16 GMT
|
Modified:
08/27/2008 20:35 GMT
|
Hello Data Web (Take 3 - Feel The "RDF" Force)
As I have stated, and implied, in various posts about the Data Web and burgeoning Semantic Web in general; the value of RDF is felt rather than seen (driven by presence as opposed to web sites). That said, it is always possible to use the visual Interactive-Web dimension (Web 1.0) as a conduit to the Data-Web dimension.
In this third take on my introduction to the Data Web I would like to share a link with you (a Dynamic Start Page in Web 2.0 parlance) with a Data Web twist: You do not have to preset the Start Page Data Sources (this is a small-big thing, if you get my drift, hopefully!).
Here are some Data Web based Dynamic Start Pages that I have built for some key play ers from the Semantic Web realm (in random order):
-
Dan Brickley
-
Tim Berners-Lee
-
Dan Connolly
-
Danny Ayers
-
Planet RDF
"These are RDF prepped Data Sources....", you might be thinking, right? Well here is the reminder: The Data Web is a Global Data Generation and Integration Effort. Participation may be active (Semantic Web & Microformats Community), or passive (web sites, weblogs, wikis, shared bookmarks, feed subscription, discussion forums, mailing lists etc..). Irrespective of participation mode, RDF instance can be generated from close to anything (I say this because I plan to add binary files holding metadata to this mix shortly). Here are examples of Dynamic Start Pages for non RDF Data Sources:
-
del.icio.us Web 2.0 Events Bookmarks
-
Vecosys
-
Techcrunch
-
Jon Udell's Blog
-
Dave Winer's Scripting News
-
Robert Scoble's Blog
what about Microformats you may be wondering? Here goes:
-
Microformats Wiki (click on the Brian Suda link for instance)
-
Microformats Planet
-
Del.icio.us Microformats Bookmarks
-
Ben Adida's home page (RDFa)
Let's carry on.
How about some traditional Web Sites? Here goes:
-
OpenLink Software's Home Page
-
Oracle's Home Page
-
Apple's Home Page
-
Microsoft's Home Page
-
IBM's Home Page
And before I forget, here is My Data Web Start Page .
Due to the use of Ajax in the Data Web Start Pages, IE6 and Safari will not work. For Mac OS X users, Webkit works fine. Ditto re. IE7 on Windows.
|
02/24/2007 21:43 GMT
|
Modified:
03/31/2007 21:51 GMT
|
Birds of a Feather Flock Together - Mac OS X & Rails
A very cool video promo for Ruby on Rails and Mac OS X, or should I say: 37 Signals & Apple :-) Either way, very cool!
BTW - We have just released a collection of High-Performance Data Providers for ActiveRecord. Our providers deliver
Consistent Functionality to RoR developers across Virtuoso, Oracle, SQL Server, Sybase, DB2, Ingres, Informix, and others without compromising performance or cross platform portability.
|
10/21/2006 00:55 GMT
|
Modified:
05/28/2007 16:19 GMT
|
Birds of a Feather Flock Together - Mac OS X & Rails
A very cool video promo for Ruby on Rails and Mac OS X, or should I say: 37 Signals & Apple :-) Either way, very cool!
BTW - We have just released a collection of High-Performance Data Providers for ActiveRecord. Our providers deliver
Consistent Functionality to RoR developers across Virtuoso, Oracle, SQL Server, Sybase, DB2, Ingres, Informix, and others without compromising performance or cross platform portability.
|
10/21/2006 00:55 GMT
|
Modified:
05/28/2007 16:19 GMT
|
Recent Virtuoso Developments
(Cut & Pasted near-verbatim from Orri Erling's Weblog).
Recent Virtuoso Developments:
We have been extensively working on virtual database refinements. There are many SQL cost model adjustments to better model distributed queries and we now support direct access to Oracle and Informix statistics system tables. Thus, when you attach a table from one or the other, you automatically getup to date statistics. This helps Virtuoso optimize distributed queries. Also the documentation is updated as concerns these, with a new section on distributed query optimization.
On the applications side, we have been keeping up with the SIOC RDF ontology developments. All ODS applications now make their data available as SIOC graphs for download and SPARQL query access.
What is most exciting however is our advance in mapping relational data into RDF. We now have a mapping language that makes arbitrary legacy data in Virtuoso or elsewhere in the relational world RDF queriable. We will put out a white paper on this in a few days.
Also we have some innovations in mind for optimizing the physical storage of RDF triples. We keep experimenting, now with our sights set to the highend of triple storage, towards billion triple data sets. We are experimenting with a new more space efficient index structure for better working set behavior. Next week will yield the first results.
|
09/26/2006 20:20 GMT
|
Modified:
05/28/2007 16:20 GMT
|
|
|