Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Berners-Lee Keeps WWW2004 Focused on Semantic Web

by Paul Ford
May 20, 2004

New York City, May 19 -- A peculiar buzz is back in the halls of WWW2004 -- the mix of hubris and geek name dropping, cheap suits and over-eager handshakes that last prevailed in 2000. "I nearly invented the web," says a fellow with a large stack of promotional postcards advertising new social networking software. "People are downloading our new XML API almost before we upload it," said another, making introductions to anyone who wanders within distance.

Whether the days of the dot-com boom are back, or people simply wish they were, there is an optimistic tone to the WWW2004 conference, the 13th International World Wide Web Conference, held at the Sheraton New York. The core technologies of the Web, like XML, XHTML, CSS, and Web Services, have all been accepted by the IT world in general, and issues like accessibility, far from being fringe concerns, are now understood by all. "We've come from a long day from the days of telegraphs and typewriters", said Gino P. Menchini, commissioner of the Department of Information Technology (the Mayor himself was at the 9/11 hearings, and couldn't make it). Menchini went on to name May 19 "World Wide Web day in New York City", to much applause from the several hundred assembled around tables in the Sheraton's Imperial Ballroom.

But now that the Web is unquestioned as a basic medium, part of a parcel with television, publishing, and radio, there is risk of stagnation. To that end, Tim Berners-Lee, creator of the first Web browser and server, and inventor of HTML, gave an open-ended plenary talk focused on two open questions: What should we do with top level domain names (TLDs)? And what should we do with the Semantic Web? It was the latter question that is clearly most important to Berners-Lee, and over the course of the speech, as he encouraged developers to begin using the Semantic Web, it became clear that Berners-Lee is less than satisfied with the current state of the web -- and not entirely clear as to the best way to proceed.

Dealing with Domains

Berners-Lee pointed out that the new TLDs (like .biz, .info, and .xxx) attempt to sort domain names into a semantic tree, hopefully making it simple to identify the essence of a site (is it business? is it information? is it porn?) by its name. But he questioned the concept at it root. "When people are looking for global brands, it's a flat space. You're much more likely to look for johnstravel.com than johns.travel," he said.

According to Berners-Lee, the entire concept of TLDs as a means to expand the domain space is suspect. What, he asked, does the .xxx TLD mean? For Americans, this brings back the debates over pornography in the 80s, when judges were trying to find a balance between smut and free speech, and found it exceptionally difficult to find where art ended and porn began. "I have a high tolerance for people with no clothes on, and a low tolerance for violence," said Berners-Lee. But that level of tolerance "might be different for someone from the Christian right."

So, if this semantic ambiguity is an unavoidable part of the TLD system, and thus makes the entire TLD enterprise somewhat suspect, what are TLDs good for? Berners-Lee suggested that we use TLDs to indentify content type -- we could start, for instance, by using .mobi for mobile applications, and enforce content type standards within the mobi domain. As a result, a web of reliably mobile device-accessible content would emerge. By promoting a "device-centric" use for TLDs, he implied, we make TLDs less useful and fragment the web into multiple parts.

Challenges to the Semantic Web Community

Berners-Lee quickly moved from the discussion of TLDs to the Semantic Web. With OWL and RDF as official W3C recommendations, he told the crowd, the foundation of the Semantic Web is in place, and it is time to move on to Phase II: "a time of less constraint." He acknowledged that given multiple ontologies and different kinds of data, "[the Semantic Web is] bound to be inconsistent. Well, so is the web."

Related Reading

Practical RDF
By Shelley Powers

"People ask," he said, "so what's the Semantic Web killer app going to be? That's not the right question." The real proof of the Semantic Web, he said, is when new connections are made, and new links between information emerge.

Rather than concerning themselves unduly with hewing to existing ontologies, Berners-Lee pushed developers to start using RDF and triples more aggressively. In particular, he wants to see existing databases exported as RDF, with ontologies created ad-hoc to match the structure of that data. Rather than using PHP scripts only to produce HTML, he suggested, create RDF as well. Then, when all of the RDF is aggregated, apply rules and see what happens. "Let's not fall back on handmade markup." Later in the talk, he described a cascade of Semantic Web connections, postulating that one day, individuals may be able to follow links from a parts catalog to order status, from location to weather to taxes.

Berners-Lee acknowledged that the Semantic Web framework is in opposition to the conventional wisdom regarding who controls the display of information. "The person publishing the data will feel that they have the right to tell it how to look," he said, and content producers who fund their work through advertising will be resistant to hand over their content in a neutral RDF form that can be displayed and linked in unpredictable ways. "But that's just one side of it. It's the [user] who's really in control." He challenged the audience to create a Semantic Web browser that would address these issues, an "open application, pulling in style information from lots of different places."

He then moved away from the idea of the Semantic Web as connecting individuals to information and promoted it as an automated disambiguation layer within an operating system. In particular, he described a potential "RDF clipboard" that would automatically translate content types between applications. For instance, it would "copy a piece of SVG, which is vector graphics, and paste it into something that can only handle a bitmap graphic." The RDF clipboard, as a set of rules, would automatically know how to translate between data types, converting on the fly using rules.

Close to the end of the talk, Berners-Lee discussed the Semantic Web bus, the model whereby data, ontologies, rules, and logic interoperate. He then showed the audience a real Semantic Web bus (exterior, interior), created by the W3C's Spain office. The bus will promote the W3C's agenda throughout Spain by operating as a classroom on wheels.

The Semantic Web's Uncertain Destiny

At the conclusion of Berners-Lee's speech, and by reading through the papers in the conference proceedings, it is clear that the Semantic Web has not yet entered adulthood; it is rather in a somewhat uncomfortable adolescence. While there is no shortage of suggested commercial applications, and more prototype frameworks than one can count (from HP Labs' Jena to the Haystack framework) very few have made their way to end users. The users of the Semantic Web are currently those who deeply care about the Semantic Web as a concept; the concept of total connectivity of data has yet to catch on.

Berners-Lee acknowledged this by issuing challenges to the WWW community, seeking to seed the Semantic Web in the minds of developers. But at the same time, his challenges themselves show the ambiguity inherent in the Semantic Web project. Is the SemWeb a layer above (or below) the Web, as in Berners-Lee's proposed SemWeb browser, linking people, ideas, and resources together? Or is it something fundamental to the computing experience, like the RDF clipboard?

Obviously it can be both -- there are no hard-set technological limits on the proper domain for using triples and logical rules. But in the short term, while no one can fully agree on what the Semantic Web is, the need for a clearly articulated vision is of essence in order to move the Semantic Web further. It is surprising to see that its leading evangelist is also uncertain as to whether Semantic Web's immediate destiny is on the server or the desktop.


Comment on this articleShare your comments on this article in our forum.
(* You must be a member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • RDF/XML
    2008-01-02 09:33:21 alphasun [Reply]

    IN my opinion, rather than RDF or XML leading to the sort of convenient result described in the mother's medical arrangements example, I believe that an AI application will be needed to heuristically extract associations between content and add them to databases, in other words an extension of what search engines already do for e-commerce. Human labour will not be an effective or economical means of achieving the goals outlined. The application will learn about the content independently of the user and generate the kinds of relevant associations between things that we do ourselves. However, unlike our brains the system will not be affected by the boring and voluminous nature of some associations and details.
    This implies the tracking of just about every action done on the user's IT system, plus as much other data (GPS, textbooks etc.) as can be accommodated. The AI core will also need to have quasi-human linguistic ability. I suspect that this aspect of AI is the key to the whole prcess, as effectively whatis needed is an intellgence capable of indefatigable research and administration based on it.
    Since the forties, if not earlier, we have had several predictions of the cybernetically controlled house -- the fridge that replenishes itself by ordering fresh food supplies etc. That such things have not come to pass to a significant extent demonstrates the importance of the economic factor and the need for an ingenious programme that will be able to complete the vast data accumulation and other processing tasks required.
    I believe also that imminent improvements in memory will favour this development.

  • Semantics and who controls "look" of content?
    2004-05-24 08:15:55 rgbiggs [Reply]

    The conclusion posited (paragraph quoted below) by the interviewee states: "It's the [user] who's really in control." It takes two to communicate, and the one encoding the message for delivery structures and styles it to include more "semantics," if you will, where communication also resides in the structure and style of any communication. Of course, you have to style and structure your communication in a way appropriate to the recipient, but that does not mean abdicating control! I find it ironic that a piece on "semantic" anything would miss that point.



    ================================
    Berners-Lee acknowledged that the Semantic Web framework is in opposition to the conventional wisdom regarding who controls the display of information. "The person publishing the data will feel that they have the right to tell it how to look," he said, and content producers who fund their work through advertising will be resistant to hand over their content in a neutral RDF form that can be displayed and linked in unpredictable ways. "But that's just one side of it. It's the [user] who's really in control."

  • RDF - ready but not willing?
    2004-05-21 06:04:44 Daniel Zambonini [Reply]

    Excellent reporting; thanks for keeping us well informed.


    I had written a few thoughts about why RDF hasn't (yet) achieved its potential - I was going to put these in my blog, but I think this may have a better audience here:



    * Commercial Incentives


    Organisations adopting early HTML had obvious commercial incentives – an extremely low cost, low risk route to increased marketing, customer interaction, and additional sales channels.


    XML presented more cost-saving opportunities – share data more easily with other organisations, migrate data across applications and platforms, and create content once for multiple delivery channels – "future-proof your data!"


    RDF though, is harder to sell: "So – let me get this straight... If we invest thirty man days into creating and publishing RDF data, our customers may be able to find us on a search engine that may exist in the future which may be better than Google? Sign me up!"


    Of course, RDF is much more than search-engine metadata – it's the data equivalent of Kazaa, BitTorrent, or any other de-centralised, peer-to-peer architecture which brings power to the people.



    * Killer Application


    Which brings us to the Killer Application, or lack there-of. For HTML, there was e-commerce. For XML, there was XSLT. RDF needs an equivalent – something that uses RDF in such a way that you just have to use it – without it, you're losing out to your competitors.


    Perhaps it will be the RDF enabled search engine. Perhaps a popular, RDF-aware P2P application. Or perhaps Microsoft will integrate RDF-aware tools into Internet Explorer or Office applications.


    Whatever it is – it doesn't exist - yet. That isn't to say that RDF has no current uses – FOAF, RSS, etc. – it just doesn’t possess a must have application.


    * Understanding


    If there's one thing worse than a Reese Witherspoon movie, it's hearing a respected technical team member explain to their peers that "choosing between RDF and XML" is arbitrary, as if the two are interchangeable syntaxes.


    I've witnessed the "We’ll use XML now and convert to RDF later if necessary" conversation a few times – the model and implications of the two either side-stepped or, more usually, not understood.


    It's easy to appreciate why this happens – metadata models often have separate 'XML' and 'RDF' bindings, reinforcing the stereotype of RDF as an XML alternative. Similarly, most developers will be aware of XML Schema and RDF Schema, the names of the two specifications sounding so similar that surely they are alternatives for the same domain?


    If it's difficult for technical users to grasp the RDF model, then highlighting the benefits of RDF to managers and senior decision makers is verging on impossible.



    * Software for Imbeciles


    Let's be honest about RDF – the model is confusing, useful data is difficult to create, and the learning curve is steep. In some ways, it's similar to assembly language – a simple model and syntax that becomes more complex as you build useful, larger applications with it.


    High-level, user-friendly editors are therefore essential for the widespread adoption, and ultimate success of RDF. However, the majority of RDF editors available today continue to confuse the user with predicates, reification, ontology editors, 3-dimensional webs of orbiting triplets, and other low-level data and terms.


    Most users would just like to record some basic information about their web page, without any mention of statements or unique resource identifiers that may be created or implied from their original input.



    * Press Coverage and The Buzz


    XML was - and continues to be - warmly received by the press. Before XML was widely understood, magazines and newsletters would wax lyrical on the new XML technology, which would solve every I.T. problem from the last 20 years. Even now, additional minor XML features from major vendors (e.g. Microsoft, Macromedia, etc.) are widely publicised. Conversely, RDF functionality (e.g. Adobe Acrobat's support for Dublin Core based RDF) rarely excites journalists.


    Without a buzz of excitement, the press will ignore the technology. Without the press, commercial decision makers will remain unaware of the technology, and therefore uninterested in its implementation – locking RDF into a minority 'geeks and nerds only' club.



    * Common Classification System


    Due to a lack of a central, widely-adopted common taxonomy (classification system, restricted vocabulary, whatever you want to call it), even current RDF data isn't living up to it's abilities. We need some common unique identifiers for common terms (words, topics, people, places), so that the real 'web' of information can begin to interlock together.


    Creating and maintaing these identifiers, and keeping them non-proprietary, is an essential yet mammoth task.

  • Challenges to the Semantic Web Community
    2004-05-21 01:05:57 Philip Fennell [Reply]

    Having had my own personal epiphany regarding RDF late last year I now see no reason not to use RDF as and where appropriate. The key to that statement is 'as and where appropriate'. While working for large corporate customers, any data model that contains information that could/should be shared within the business or sold-on to other companies is an ideal candidate for RDF.


    The extensive activity in areas such as Dubline Core, FOAF and the like are just what I need because they provide me with useful, off-the-peg, vocabularies that allow me to get on with my job and add value to the work I do for my customers.


    However, the bain of my life - how I am actually allowed to do my job would benifit greatly from the ability to capture data in a sematically useful way. All those lost notes, annotations, links and info about how, why and where things went wrong and how we fixed them. Being able to query that kind of knowledge repository would really make a difference to me and is a pet project of mine that is yet to take flight.


    If you want to get into the Semantic Web and feel the benifit, deploy it internally within your business, use the tools that are avaiable now to improve your processes and pass the knowledge and experience on to your customers by building them semantically aware solutions.

    • Challenges to the Semantic Web Community
      2004-11-09 05:37:37 davidbrucehughes [Reply]

      I think you have the right idea, Phillip. My particular area of expertise is badly in need of a clear ontology. We will certainly pass the benefits of clarifying out thinking and working on to our clients. But first we have to educate users at least to the point where the word 'ontology' doesn't set their heads spinning.


      David