Details
Kingsley Uyi Idehen
Lexington, United States
Subscribe
Post Categories
Recent Articles
Display Settings
Translate
|
Showing posts in all categories Refresh
Meshups Demonstrating How SPARQL-GEO Enhances Linked Data Exploitation (Update 1)
Deceptively simple demonstrations of how Virtuoso's SPARQL-GEO extensions to SPARQL lay critical foundation for Geo Spatial solutions that seek to leverage the burgeoning Web of Linked Data.
SPARQL Endpoint: Linked Open Data Cache (8.5 Billion+ Quad Store which includes data from Geonames and the Linked GeoData Project Data Sets) .
Live Linked Data Meshup Links:
Related
|
03/06/2010 17:43 GMT
|
Modified:
03/08/2010 09:52 GMT
|
Linked Data & Socially Enhanced Collaboration (Enterprise or Individual) -- Update 1
Socially enhanced enterprise and invididual collaboration is becoming a focal point for a variety of solutions that offer erswhile distinct content managment features across the realms of Blogging, Wikis, Shared Bookmarks, Discussion Forums etc.. as part of an integrated platform suite. Recently, Socialtext has caught my attention courtesy of its nice features and benefits page . In addition, I've also found the Mike 2.0 portal immensely interesting and valuable, for those with an enterprise collaboration bent.
Anyway, Socialtext and Mike 2.0 (they aren't identical and juxtaposition isn't seeking to imply this) provide nice demonstrations of socially enhanced collaboration for individuals and/or enterprises is all about:
- Identifying Yourself
- Identifying Others (key contributors, peers, collaborators)
- Serendipitous Discovery of key contributors, peers, and collaborators
- Serendipitous Discovery by key contributors, peers, and collaborators
- Develop and sustain relationships via socially enhanced professional network hybrid
- Utilize your new "trusted network" (which you've personally indexed) when seeking help or propagating a meme.
As is typically the case in this emerging realm, the critical issue of discrete "identifiers" (record keys in sense) for data items, data containers, and data creators (individuals and groups) is overlooked albeit unintentionally.
How HTTP based Linked Data Addresses the Identifier Issue
Rather than using platform constrained identifiers such as:
- email address (a "mailto" scheme identifier),
- a dbms user account,
- application specific account, or
- OpenID.
It enables you to leverage the platform independence of HTTP scheme Identifiers (Generic URIs) such that Identifiers for:
- You,
- Your Peers,
- Your Groups, and
- Your Activity Generated Data,
simply become conduits into a mesh of HTTP -- referencable and accessible -- Linked Data Objects endowed with High SDQ (Serendipitious Discovery Quotient). For example my Personal WebID is all anyone needs to know if they want to explore:
- My Profile (which includes references to data objects associated with my interests, social-network, calendar, bookmarks etc.)
- Data generated by my activities across various data spaces (via data objects associated with my online accounts e.g. Del.icio.us, Twitter, Last.FM)
-
Linked Data Meshups via URIBurner (or any other Virtuoso instance) that provide an extend view of my profile
How FOAF+SSL adds Socially aware Security
Even when you reach a point of equilibrium where: your daily activities trigger orchestratestration of CRUD (Create, Read, Update, Delete) operations against Linked Data Objects within your socially enhanced collaboration network, you still have to deal with the thorny issues of security, that includes the following:
- Single Sign On,
- Authentication, and
- Data Access Policies.
FOAF+SSL, an application of HTTP based Linked Data, enables you to enhance your Personal HTTP scheme based Identifer (or WebID) via the following steps (peformed by a FOAF+SSL compliant platform):
- Imprint WebID within a self-signed x.509 based public key (certificate) associated with your private key (generated by FOAF+SSL platform or manually via OpenSSL)
- Store public key components (modulous and exponent) into your FOAF based profile document which references your Personal HTTP Identifier as its primary topic
- Leverage HTTP URL component of WebID for making public key components (modulous and exponent) available for x.509 certificate based authentication challenges posed by systems secured by FOAF+SSL (directly) or OpenID (indirectly via FOAF+SSL to OpenID proxy services).
Contrary to conventional experiences with all things PKI (Public Key Infrastructure) related, FOAF+SSL compliant platforms typically handle the PKI issues as part of the protocol implementation; thereby protecting you from any administrative tedium without compromising security.
Conclusions
Understanding how new technology innovations address long standing problems, or understanding how new solutions inadvertently fail to address old problems, provides time tested mechanisms for product selection and value proposition comprehension that ultimately save scarce resources such as time and money.
If you want to understand real world problem solution #1 with regards to HTTP based Linked Data look no further than the issues of secure, socially aware, and platform independent identifiers for data objects, that build bridges across erstwhile data silos.
If you want to cost-effectively experience what I've outlined in this post, take a look at OpenLink Data Spaces (ODS) which is a distributed collaboration engine (enterprise of individual) built around the Virtuoso database engines. It simply enhances existing collaboration tools via the following capabilities:
Addition of Social Dimensions via HTTP based Data Object Identifiers for all Data Items (if missing)
- Ability to integrate across a myriad of Data Source Types rather than a select few across RDBM Engines, LDAP, Web Services, and various HTTP accessible Resources (Hypermedia or Non Hypermedia content types)
- Addition of FOAF+SSL based authentication
- Addition of FOAF+SSL based Access Control Lists (ACLs) for policy based data access.
Related:
|
03/02/2010 15:47 GMT
|
Modified:
03/04/2010 10:19 GMT
|
One Technology That Will Rock 2010 (Update 1)
Thanks to the TechCrunch post titled: Ten Technologies That Will Rock 2010, I've been able to quickly construct a derivative post that condenses the ten item list down to a Single Technology That Will Rock 2010 :-)
Sticking with the TechCrunch layout, here is why all roads simply lead to Linked Data come 2010 and beyond:
-
The Tablet: a new form factor addition re. Internet and Web application hosts which is just another way of saying: Linked Data will be accessible from Tablet applications.
-
Geo: GPS chips are now standard features of mobile phones, so geolocation is increasingly becoming a necessary feature for any killer app. Thus, GeoSpatial Linked Data and GeopSpatial Queries are going to be a critical success factor for any endeavor that seeks to engage mobile applications developers and ultimately their end-users. Basiacally, you want to be able to perform Esoteric Search from these devices of the form: Find Vendors of a Camcorder (e.g., with a Zoom Factor: Weight Ratio of X) within a 2km Radius of my current location. Or how many items from my WishList are available from a Vendor within a 2km radius of my current location. Conversely, provide Vendors with the ability to spot potential Customers within a 2km of a given "clicks & mortar" location (e.g. BestBuy store).
-
Realtime Search: Rich Structured Profiles that leverage standards such as FOAF and FOAF+SSL will enable Highly Personalized Realtime Search (HPRS) without compromisng privacy. Tecnically, this is about WebIDs securely bound to X.509 Certificates, providing access to verifiable and highly navigable Personal Profile Data Spaces that also double as personal search index entry points.
-
Chrome OS: Just another operating system for exploiting the burgeoning Web of Linked Data
-
HTML5: Courtesy of RDFa, just another mechanism for exposing Linked Data by making HTML+RDFa a bona fide markup for metadata (i.e., format for describing real world objects via their attribute-value graphs)
-
Mobile Video: Simplifies the production and sharing of Video annotations (comments, reviews etc.) en route to creating rich Linked Discourse Data Spaces.
-
Augmented Reality: Ditto
-
Mobile Transactions: As per points 1&2 above, Vendor Discovery and Transaction Conusmation will increasingly be driven by high SDQ applications. The "Funnel Effect" (more choices based on individual preferences) will be a critical success factor for any one operating in the Mobile Transaction realm. Note, without Linked Data you cannot deliver scalable solutions that handle the combined requirements of: SDQ, "Funnel Effect", and Mobile Device form factor, will simply maginify the importance of Web accessible Linked Data.
-
Android: An additional platform for items 1-8; basically, 2010 isn't going to be an iPhone only zone. Personally, this reminds me of a battle from the past i.e., Microsoft vs Apple, re. desktop computing dominance. Google has studied history very well :-)
-
Social CRM: this is simply about applying points 1-9 alongide the construction of Linked Data from eCRM Data Spaces.
As I've stated in the past (across a variety of mediums), you cannot build applications that have long term value without addressing the following issues:
- Data Item or Object Identity
- Data Structure -- Data Models
- Data Representation -- Data Model Entity & Relationships Representation mechanism (as delivered by metadata oriented markup)
- Data Storage -- Database Management Systems
- Data Access -- Data Access Protocols
- Data Presentation -- How you present Views and Reports from Structured Data Sources
- Data Security -- Data Access Policies
The items above basically showcase the very essence of the HTTP URI abstraction that drives HTTP based Linked Data; which is also the basic payload unit that underlies REST.
Conclusion
I simply hope that the next decade marks a period of broad appreciation and comprehension of Data Access, Integration, and Management issues on the parts of: application developers, integrators, analysts, end-users, and decision makers. Remember, without structured Data we cannot produce or share Information, and without Information, we cannot produce of share Knowledge.
Related
|
01/02/2010 12:30 GMT
|
Modified:
01/02/2010 14:05 GMT
|
One Technology That Will Rock 2010 (Update 1)
Thanks to the TechCrunch post titled: Ten Technologies That Will Rock 2010, I've been able to quickly construct a derivative post that condenses the ten item list down to a Single Technology That Will Rock 2010 :-)
Sticking with the TechCrunch layout, here is why all roads simply lead to Linked Data come 2010 and beyond:
-
The Tablet: a new form factor addition re. Internet and Web application hosts which is just another way of saying: Linked Data will be accessible from Tablet applications.
-
Geo: GPS chips are now standard features of mobile phones, so geolocation is increasingly becoming a necessary feature for any killer app. Thus, GeoSpatial Linked Data and GeopSpatial Queries are going to be a critical success factor for any endeavor that seeks to engage mobile applications developers and ultimately their end-users. Basiacally, you want to be able to perform Esoteric Search from these devices of the form: Find Vendors of a Camcorder (e.g., with a Zoom Factor: Weight Ratio of X) within a 2km Radius of my current location. Or how many items from my WishList are available from a Vendor within a 2km radius of my current location. Conversely, provide Vendors with the ability to spot potential Customers within a 2km of a given "clicks & mortar" location (e.g. BestBuy store).
-
Realtime Search: Rich Structured Profiles that leverage standards such as FOAF and FOAF+SSL will enable Highly Personalized Realtime Search (HPRS) without compromisng privacy. Tecnically, this is about WebIDs securely bound to X.509 Certificates, providing access to verifiable and highly navigable Personal Profile Data Spaces that also double as personal search index entry points.
-
Chrome OS: Just another operating system for exploiting the burgeoning Web of Linked Data
-
HTML5: Courtesy of RDFa, just another mechanism for exposing Linked Data by making HTML+RDFa a bona fide markup for metadata (i.e., format for describing real world objects via their attribute-value graphs)
-
Mobile Video: Simplifies the production and sharing of Video annotations (comments, reviews etc.) en route to creating rich Linked Discourse Data Spaces.
-
Augmented Reality: Ditto
-
Mobile Transactions: As per points 1&2 above, Vendor Discovery and Transaction Conusmation will increasingly be driven by high SDQ applications. The "Funnel Effect" (more choices based on individual preferences) will be a critical success factor for any one operating in the Mobile Transaction realm. Note, without Linked Data you cannot deliver scalable solutions that handle the combined requirements of: SDQ, "Funnel Effect", and Mobile Device form factor, will simply maginify the importance of Web accessible Linked Data.
-
Android: An additional platform for items 1-8; basically, 2010 isn't going to be an iPhone only zone. Personally, this reminds me of a battle from the past i.e., Microsoft vs Apple, re. desktop computing dominance. Google has studied history very well :-)
-
Social CRM: this is simply about applying points 1-9 alongide the construction of Linked Data from eCRM Data Spaces.
As I've stated in the past (across a variety of mediums), you cannot build applications that have long term value without addressing the following issues:
- Data Item or Object Identity
- Data Structure -- Data Models
- Data Representation -- Data Model Entity & Relationships Representation mechanism (as delivered by metadata oriented markup)
- Data Storage -- Database Management Systems
- Data Access -- Data Access Protocols
- Data Presentation -- How you present Views and Reports from Structured Data Sources
- Data Security -- Data Access Policies
The items above basically showcase the very essence of the HTTP URI abstraction that drives HTTP based Linked Data; which is also the basic payload unit that underlies REST.
Conclusion
I simply hope that the next decade marks a period of broad appreciation and comprehension of Data Access, Integration, and Management issues on the parts of: application developers, integrators, analysts, end-users, and decision makers. Remember, without structured Data we cannot produce or share Information, and without Information, we cannot produce of share Knowledge.
Related
|
01/02/2010 12:30 GMT
|
Modified:
01/02/2010 16:07 GMT
|
5 Game Changing Things about the OpenLink Virtuoso + AWS Cloud Combo
Here are 5 powerful benefits you can immediately derive from the combination of Virtuoso and Amazon's AWS services (specifically the EC2 and EBS components):
- Acquire your own personal or service specific data space in the Cloud. Think DBase, Paradox, FoxPRO, Access of yore, but with the power of Oracle, Informix, Microsoft SQL Server etc.. using a Conceptual, as opposed to solely Logical, model based DBMS (i.e., a Hybrid DBMS Engine for: SQL, RDF, XML, and Full Text)
- Ability to share and control access to your resources using innovations like FOAF+SSL, OpenID, and OAuth, all from one place
- Construction of personal or organization based FOAF profiles in a matter of minutes; by simply creating a basic DBMS (or ODS application layer) account; and then using this profile to create strong links (references) to all your Data silos (esp. those from the Web 2.0 realm)
- Load data sets from the LOD cloud or Sponge existing Web resources (i.e., on the fly data transformation to RDF model based Linked Data) and then use the combination to build powerful lookup services that enrich the value of URLs (think: Web addressable reports holding query results) that you publish
- Bind all of the above to a domain that you own (e.g. a .Name domain) so that you have an attribution-friendly "authority" component for resource URLs and Entity URIs published from your Personal Linked Data Space on the Web (or private HTTP network).
In a nutshell, the AWS Cloud infrastructure simplifies the process of generating Federated presence on the Internet and/or World Wide Web. Remember, centralized networking models always end up creating data silos, in some context, ultimately! :-)
|
11/18/2009 14:12 GMT
|
Modified:
01/31/2010 17:24 GMT
|
Exploring the Value Proposition of Linked Data
The primary topic of a meme penned by TimBL in the form of a Design Issues Doc (note: this is how TimBL has shared his thoughts since the Beginning of the Web).
There are a number of dimensions to the meme, but its primary purpose is the reintroduction of the HTTP URI -- a vital component of the Web's core architecture.
What's Special about HTTP URIs?
They possess an intrinsic duality that combines persistent and unambiguous Data Identity with platform & representation format independent Data Access. Thus, you can use a string of characters that look like a contemporary Web URL to unambiguously achieve the following:
- Identity or Name Anything of Interest
- Describe Anything of Interest by associating the Description Subject's Identity with a constellation of Attribute and Value pairs (technically: an Entity-Attribute-Value or Subject-Predicate-Object graph)
- Make the Description of Named Things of Interest discoverable on the Web by implicitly binding the aforementioned to Documents that hold their descriptions (technically: metadata documents or information resources)
What's the basic value proposition of the Linked Data meme?
Enabling more productive use of the Web by users and developers alike. All of which is achieved by tweaking the Web's Hyperlinking feature such that it now includes Hypertext and Hyperdata as link types.
Note: Hyperdata Linking is simply what an HTTP URI facilitates.
Examples problems solved by injecting Linked Data into the Web:
- Federated Identity by enabling Individuals to unambiguously Identify themselves (Profiles++) courtesy of existing Internet and Web protocols (e.g., FOAF+SSL's WebIDs which combine Personal Identity with X.509 certificates and HTTPs based client side certification)
- Security and Privacy challenge alleviation by delivering a mechanism for policy based data access that feeds off federated individual identity and social network (graph) traversal
- Spam Busting via the above
.
-
Increasing the Serendipitous Discovery Quotient (SDQ) of Web accessible resources by embedding Rich Metadata into (X)HTML Documents e.g., structured descriptions of your "WishLists" and "OfferLists" via a common set of terms offered by vocabularies such as GoodRelations and SIOC
- Coherent integration of disparate data across the Web and/or within the Enterprise via "Data Meshing" rather than "Data Mashing"
- Moving beyond imprecise statistically driven "Keyword Search" (e.g. Page Rank) to "Precision Find" driven by typed link based Entity Rank plus Entity Type and Entity Property filters.
Conclusion
If all of the above still falls into the technical mumbo-jumbo realm, then simply consider Linked Data as delivering Open Data Access in granular form to Web accessible data -- that goes beyond data containers (documents or files).
The value proposition of Linked Data is inextricably linked to the value proposition of the World Wide Web. This is true, because the Linked Data meme is ultimately about an enhancement of the current Web; achieved by reintroducing its architectural essence -- in new context -- via a new level of link abstraction, courtesy of the Identity and Access duality of HTTP URIs.
As a result of Linked Data, you can now have Links on the Web for a Person, Document, Music, Consumer Electronics, Products & Services, Business Opening & Closing Hours, Personal "WishLists" and "OfferList", an Idea, etc.. in addition to links for Properties (Attributes & Values) of the aforementioned. Ultimately, all of these links will be indexed in a myriad of ways providing the substrate for the next major period of Internet & Web driven innovation, within our larger human-ingenuity driven innovation continuum.
Related
|
07/23/2009 20:17 GMT
|
Modified:
07/24/2009 08:33 GMT
|
Take N: Yet Another OpenLink Data Spaces Introduction
Problem:
Your Life, Profession, Web, and Internet do not need to become mutually exclusive due to "information overload".
Solution:
A platform or service that delivers a point of online presence that embodies the fundamental separation of: Identity, Data Access, Data Representation, Data Presentation, by adhering to Web and Internet protocols.
How:
Typical post installation (Local or Cloud) task sequence:
-
Identify myself (happens automatically by way of registration)
- If in an LDAP environment, import accounts or associate system with LDAP for account lookup and authentication
-
Identify Online Accounts (by fleshing out profile) which also connects system to online accounts and their data
- Use Profile for granular description (Biography, Interests, WishList, OfferList, etc.)
- Optionally upstream or downstream data to and from my online accounts
- Create content Tagging Rules
- Create rules for associating Tags with formal URIs
- Create automatic Hyperlinking Rules for reuse when new content is created (e.g. Blog posts)
- Exploit Data Portability virtues of RSS, Atom, OPML, RDFa, RDF/XML, and other formats for imports and exports
- Automatically tag imported content
- Use function-specific helper application UIs for domain specific data generation e.g. AddressBook (optionally use vCard import), Calendar (optionally use iCalendar import), Email, File Storage (use WebDAV mount with copy and paste or HTTP GET), Feed Subscriptions (optionally import RSS/Atom/OPML feeds), Bookmarking (optionally import bookmark.html or XBEL) etc..
- Optionally enable "Conversation" feature (today: Social Media feature) across the relevant application domains (manage conversations under covers using NNTP, the standard for this functionality realm)
- Generate HTTP based Entity IDs (URIs) for every piece of data in this burgeoning data space
- Use REST based APIs to perform CRUD tasks against my data (local and remote) (SPARQL, GData, Ubiquity Commands, Atom Publishing)
- Use OpenID, OAuth, FOAF+SSL, FOAF+SSL+OpenID for accessing data elsewhere
- Use OpenID, OAuth, FOAF+SSL, FOAF+SSL+OpenID for Controlling access to my data (Self Signed Certificate Generation, Browser Import of said Certificate & associated Private Key, plus persistence of Certificate to FOAF based profile data space in "one click")
- Have a simple UI for Entity-Attribute-Value or Subject-Predicate-Object arbitrary data annotations and creation since you can't pre model an "Open World" where the only constant is data flow
- Have my Personal URI (Web ID) as the single entry point for controlled access to my HTTP accessible data space
I've just outlined a snippet of the capabilities of the OpenLink Data Spaces platform. A platform built using OpenLink Virtuoso, architected to deliver: open, platform independent, multi-model, data access and data management across heterogeneous data sources.
All you need to remember is your URI when seeking to interact with your data space.
Related
-
Get Yourself a URI (Web ID) in 5 Minutes or Less!
-
Various posts over the years about Data Spaces
-
Future of Desktop Post
-
Simplify My Life Post by Bengee Nowack
|
04/22/2009 14:46 GMT
|
Modified:
04/22/2009 15:46 GMT
|
Simple Compare & Contrast of Web 1.0, 2.0, and 3.0 (Update 3)
Here is a tabulated "compare and contrast" of Web usage patterns 1.0, 2.0, and 3.0.
| |
Web 1.0 |
Web 2.0 |
Web 3.0 |
| Simple Definition |
Interactive / Visual Web |
Programmable Web |
Linked Data Web |
| Unit of Presence |
Web Page |
Web Service Endpoint |
Data Space (named structured data enclave) |
| Unit of Value Exchange |
Page URL |
Endpoint URL for API |
Resource / Entity / Object URI |
| Data Granularity |
Low (HTML) |
Medium (XML) |
High (RDF) |
| Defining Services |
Search |
Community (Blogs to Social Networks) |
Find |
| Participation Quotient |
Low |
Medium |
High |
| Serendipitous Discovery Quotient |
Low |
Medium |
High |
| Data Referencability Quotient |
Low (Documents) |
Medium (Documents) |
High (Documents and their constituent Data) |
| Subjectivity Quotient |
High |
Medium (from A-list bloggers to select source and partner lists) |
Low (everything is discovered via URIs) |
|
Transclusence
|
Low |
Medium (Code driven Mashups) |
HIgh (Data driven Meshups) |
| What You See Is What You Prefer (WYSIWYP) |
Low |
Medium |
High (negotiated representation of resource descriptions) |
| Open Data Access (Data Accessibility) |
Low |
Medium (Silos) |
High (no Silos) |
| Identity Issues Handling |
Low |
Medium (OpenID) |
High (FOAF+SSL) |
| Solution Deployment Model |
Centralized |
Centralized with sprinklings of Federation |
Federated with function specific Centralization (e.g. Lookup hubs like LOD Cloud or DBpedia) |
| Data Model Orientation |
Logical (Tree based DOM) |
Logical (Tree based XML) |
Conceptual (Graph based RDF) |
| User Interface Issues |
Dynamically generated static interfaces |
Dyanically generated interafaces with semi-dynamic interfaces (courtesy of XSLT or XQuery/XPath) |
Dynamic Interfaces (pre- and post-generation) courtesy of self-describing nature of RDF |
| Data Querying |
Full Text Search |
Full Text Search |
Full Text Search + Structured Graph Pattern Query Language (SPARQL) |
| What Each Delivers |
Democratized Publishing |
Democratized Journalism & Commentary (Citizen Journalists & Commentators) |
Democratized Analysis (Citizen Data Analysts) |
|
Star Wars Edition Analogy
|
Star Wars (original fight for decentralization via rebellion) |
Empire Strikes Back (centralization and data silos make comeback) |
Return of the JEDI (FORCE emerges and facilitates decentralization from "Identity" all the way to "Open Data Access" and "Negotiable Descriptive Data Representation") |
Naturally, I am not expecting everyone to agree with me. I am simply making my contribution to what will remain facinating discourse for a long time to come :-)
Related
|
03/14/2009 14:20 GMT
|
Modified:
03/16/2009 12:04 GMT
|
Time for RDBMS Primacy Downgrade is Nigh! (No Embedded Images Edition)
As the world works it way through a "once in a generation" economic crisis, the long overdue downgrade of the RDBMS, from its pivotal position at the apex of the data access and data management pyramid is nigh. What is the Data Access, and Data Management Value Pyramid? As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services. See: AVF Pyramid Diagram. The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy. In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy. Why has RDBMS Primacy has Endured? Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte. See: RDBMS Primacy Diagram. For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:
"Future of Database Research is excellent, but what is the future of data?" "..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular." -- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come and gone - They are direct descendants of System R and Ingres and were architected more than 25 years ago
- They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry. Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis". Circumstantial Pain As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management). Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually: Government (Globally) - Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging. Enterprises - Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings. In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision. Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine). Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid. Technology There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative: - Query language standardization - nothing close to SQL standardization
- Data Access API standardization - nothing close to ODBC, JDBC, OLE-DB, or ADO.NET
- Wire protocol standardization - nothing close to HTTP
- Distributed Identity infrastructure - nothing close to the non-repudiatable digital Identity that foaf+ssl accords
- Use of Identifiers as network based pointers to data sources - nothing close to RDF based Linked Data
- Negotiable data representation - nothing close to Mime and HTTP based Content Negotiation
- Scalability especially in the era of Internet & Web scale.
Entity-Attribute-Value with Classes & Relationships (EAV/CR) data models A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm. What Comes Next? The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons: - The Internet aided "Global Village" has brought "Open World" vs "Closed World" assumption issues to the fore e.g., the current global economic crisis remains centered on the inability to connect dots across "Open World" and "Closed World" data frontiers
- Entity-Attribute-Value with Classes & Relationships (EAV/CR) based DBMS models are more effective when dealing with disparate data associated with disparate schemas, across disparate DBMS engines, host operating systems, and networks.
Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following: - Every item of data (Datum/Entity/Object/Resource) has Identity
- Identity is achieved via Identifiers that aren't locked at the DBMS, OS, Network, or Application levels
- Object Identifiers and Object values are independent (extricably linked by association)
- Object values should be de-referencable via Object Identifier
- Representation of de-referenced value graph (entity, attributes, and values mesh) must be negotiable (i.e. content negotiation)
- Structured query language must provide mechanism for Creation, Deletion, Updates, and Querying of data objects
- Performance & Scalability across "Closed World" (enterprise) and "Open World" (Internet & Web) realms.
Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over. The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc. It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues. EAV/CR Oriented Data Access & Management Technology Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include: The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions. See: New EAV/CR Primacy Diagram. Related
|
01/27/2009 19:19 GMT
|
Modified:
05/29/2009 10:39 GMT
|
The Time for RDBMS Primacy Downgrade is Nigh!
As the world works it way through a "once in a generation" economic crisis, the long overdue downgrade of the RDBMS, from its pivotal position at the apex of the data access and data management pyramid is nigh. What is the Data Access, and Data Management Value Pyramid? As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services. The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy. In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy. Why has RDBMS Primacy has Endured? Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte. For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below: "Future of Database Research is excellent, but what is the future of data?" "..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular." -- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers. "One size fits all: A concept whose time has come and gone - They are direct descendants of System R and Ingres and were architected more than 25 years ago
- They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry. Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis". Circumstantial Pain As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management). Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually: Government (Globally) - Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging. Enterprises - Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings. In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision. Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine). Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid. Technology There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative: - Query language standardization - nothing close to SQL standardization
- Data Access API standardization - nothing close to ODBC, JDBC, OLE-DB, or ADO.NET
- Wire protocol standardization - nothing close to HTTP
- Distributed Identity infrastructure - nothing close to the non-repudiatable digital Identity that foaf+ssl accords
- Use of Identifiers as network based pointers to data sources - nothing close to RDF based Linked Data
- Negotiable data representation - nothing close to Mime and HTTP based Content Negotiation
- Scalability especially in the era of Internet & Web scale.
Entity-Attribute-Value with Classes & Relationships (EAV/CR) data models A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm. What Comes Next? The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons: - The Internet aided "Global Village" has brought "Open World" vs "Closed World" assumption issues to the fore e.g., the current global economic crisis remains centered on the inability to connect dots across "Open World" and "Closed World" data frontiers
- Entity-Attribute-Value with Classes & Relationships (EAV/CR) based DBMS models are more effective when dealing with disparate data associated with disparate schemas, across disparate DBMS engines, host operating systems, and networks.
Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following: - Every item of data (Datum/Entity/Object/Resource) has Identity
- Identity is achieved via Identifiers that aren't locked at the DBMS, OS, Network, or Application levels
- Object Identifiers and Object values are independent (extricably linked by association)
- Object values should be de-referencable via Object Identifier
- Representation of de-referenced value graph (entity, attributes, and values mesh) must be negotiable (i.e. content negotiation)
- Structured query language must provide mechanism for Creation, Deletion, Updates, and Querying of data objects
- Performance & Scalability across "Closed World" (enterprise) and "Open World" (Internet & Web) realms.
Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over. The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc. It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues. EAV/CR Oriented Data Access & Management Technology Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include: The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions. Related
|
01/24/2009 20:04 GMT
|
Modified:
07/26/2009 10:20 GMT
|
|
|