Where the Web Leads Us

October 6, 1999

Editor's Note: This article is based on Tim O'Reilly's Keynote address for Linux World in Tokyo on September 29, 1999. This may not be the usual fare you find on XML.com. Nonetheless, I think it's worthwhile to offer this rather long, somewhat rambling run through a lot of the issues that impact us today: Linux, Open Source, XML and ultimately, the future of the Web. I'd argue that XML itself would not make sense to anyone without this greater context in which open standards and open source have emerged. -- DD.

Almost everyone who talks about Open Source software wants to know whether or not Linux stands a chance of dethroning Windows. I'm here to talk about something completely different -- the role of open source software and Linux in building the future of the Internet, and more specifically, the future of the World Wide Web.

The Web is changing the way we use computers and the way we're delivering new applications. Understanding the role of open source on the Web as opposed to its role as a competitor to Windows gives us a perspective that is extremely important for the open source community.

In order to clarify this perspective, I'm going to review the early history of the personal computer industry, and talk about the parallels with the Internet. After that I'm going to talk about how people make money with open source businesses. And finally I'm going to come back to how the open source community needs to think about the impact of the Web on open source in the future.

" Traditional software embeds small amounts of information in a lot of software; infoware embeds small amounts of software in a lot of information."

Opening Up the World of Hardware

The computer industry, in the late 1970s, was a hardware-dominated industry. The companies that I dealt with were"computer companies." For example, in the early days of my business, I did a lot of work for Digital, and Hewlett Packard, and Data General. Any company that wanted to make an impact on the computer industry had to do two things: they had to come up with a new hardware architecture and a new operating system to go with it. You may have read The Soul of a New Machine, which was about the design of a new computer at Data General and one of the frontiers of the computer industry. If you were Data General, the way you got ahead was to design a new piece of hardware. There were some companies devoted only to software, but mostly, they were satellites of hardware companies in one way or another.

Something happened in the early 80s that was fairly revolutionary, something that had enormous impact for many years to come. That was a decision by IBM to release the specification for the IBM PC as an open hardware specification. IBM said, "Anybody can make these computers."

Once IBM released the specifications for the PC, everything changed. The market broke wide open in a variety of ways. First of all, companies no longer had to invent new hardware in quite the same way. There were (and still are) thousands of garage-shop entrepreneurs creating their own PCs out of industry- standard parts. Out of that competition, some huge new players eventually emerged: companies like Compaq, Dell, and Gateway 2000. Dell is particularly interesting: Michael Dell started the company while he was still a college student, working out of his dorm room. That illustrates just how low the barriers to entry into the market had become.

But that wasn't all that changed. All of a sudden, you no longer had to be a hardware company, or a satellite of a hardware company, to be a player in the computer industry.

For example, before Mitch Kapor started Lotus, he was trying to decide whether to become a family therapist, or to try to turn his interest in programming into a business. We all know what he chose. He wrote Lotus 1-2-3, and launched phase two of the PC revolution with its first "killer application." Here was just this guy off the street who built a company that had an enormous impact. Even Bill Gates, whose company has gone on to dominate the software industry, was an outsider and an upstart at the beginning.

We are at a similar transition point today. The open hardware standard of the IBM PC didn't just change the computer hardware market; it created innovative new markets in software. Now, the open software standards of the Internet have created a new class of applications. And I don't mean web browsers and email clients.

Infoware: The Age of Information Applications

Some friends of mine were thinking of buying their first computer so they could use Amazon.com to buy books and CDs. Not to use "the Internet," not to use "the Web," but to use Amazon.com.

Now, that's the classic definition of a "killer application": one that makes someone go out to buy a computer. What's interesting is that the killer application is no longer a desktop productivity application or even the web as a whole, but an individual web site. And once you start thinking of web sites as applications, you soon come to realize that they represent an entirely new breed, something you might call an "information application," or perhaps even "infoware."

Information applications are used to computerize tasks that just couldn't be handled in the old computing model. A few years ago, if you wanted to search a database of a million books, you talked to a librarian who knew the arcane search syntax. If you wanted to buy a book, you went to a bookstore and looked through its relatively small selection. Now, tens of thousands of people with no specialized training find and buy books online from that million-record database every day. As a result of information applications, computers have come one step closer to the way that people communicate with each other. Web-based applications use plain English to build their interface--words and pictures, not specialized little controls that acquire meaning only as you learn the software.

Traditional software embeds small amounts of information in a lot of software; infoware embeds small amounts of software in a lot of information. The "actions" in an infoware product are generally fairly simple: make a choice, buy or sell, enter a small amount of data, and get back a customized result.

" Programmers don't think of HTML as source code. But people copied HTML pages wholesale; they built on other people's web pages in a frenzy of networked sharing. "

These actions are often accomplished by scripts attached to a hypertext link using an interface such as CGI (the Common Gateway Interface), although lately we have many new and improved ways to do the same thing. CGI defines a way for a web server to call any external program and return the output of that program as a web page. CGI programs may simply be small scripts that perform a simple calculation, or they may connect to a full-fledged back-end database server. But even when there's a heavy-duty software engine behind a site, the user interface itself is not composed of traditional software. The interface consists of web pages (which may well have been created by a writer, editor, or designer rather than by a programmer).

Information interfaces are typically dynamic. For example, Amazon.com's presentation of books is driven by sales rankings that are updated every hour. Customers can add comments and ratings on the fly, which then become a key part of the information-rich decision-support interface for purchasers. A site designed to help someone buy or sell stocks online needs to not only present updated share prices, but also the latest relevant news stories, insider trading information, analyst recommendations, and perhaps user- discussion groups. The information interface thus typically consists of a rich mix of hand-crafted documents, program-generated data, and links to specialized application servers (such as ecommerce back ends, email, chat, or conferencing).

Information interfaces are not as efficient for tasks that you do over and over as pure software interfaces, but they are far better for tasks you do only rarely, or differently each time. In particular, they are good for interfaces in which you make choices based on information presented to you. Whether you're buying a book or CD at Amazon.com, or a stock at E*Trade, the actual purchase is a fairly trivial part of the interaction. It's the quality of the information provided to help you make a decision that forms the heart of the application you interact with.

Open Source Software Drives the Web

So what does all this have to do with Open Source software? There's one obvious answer: most of the technologies that make the Web possible are Open Source. The Internet itself—features like the TCP/IP network protocol and key infrastructure elements such as the Domain Name System (DNS) were developed through the open-source process. It's easy to argue that the open-source BIND (Berkeley Internet Name Daemon) program that runs the DNS is the single most mission-critical Internet application. Even though most web browsing is done with proprietary products (Netscape's Navigator and Microsoft's Internet Explorer), both are outgrowths of Tim Berners-Lee's original open-source web implementation and open protocol specification. According to the automated Netcraft web server survey ( http://www.netcraft.co.uk/survey), more than 60% of all visible web sites are served by the open-source Apache web server. The majority of web-based dynamic content is generated by open-source scripting languages such as Perl, Python, and Tcl.

But this obvious answer is only part of the story. After all, why is it the Web and not some proprietary technology that is the basis for the networked information applications of the future? Microsoft actually was ahead of the curve in realizing the power of online multimedia. In 1994, when the Web started to take off, Microsoft's CD-ROM products like Encarta, their online encyclopedia, and Cinemania, their online movie reference, were ahead of the Web in providing online hyperlinked documents with rich multimedia capabilities. Microsoft even realized that it was important to provide information resources via online networks.

There was only one problem with Microsoft's vision of the Microsoft Network: barriers to entry were high. Publishers were expected to use proprietary Microsoft tools, to apply and be approved by Microsoft, and to pay to play. By contrast, anyone could start a web site. The software you needed was free. The specifications for creating documents and dynamic content were simple, open, and clearly documented.

Perhaps even more important, both the technology and the Internet ethic made it legitimate to copy features from other people's web sites. HTML (HyperText Markup Language) pages that were used to implement various features on a web site could be easily saved and imitated. Even the CGI scripts used to create dynamic content were available for copying. Although traditional computer languages like C run faster, Perl became the dominant language for CGI because it was more accessible. While Perl is powerful enough to write major applications, it is possible for amateurs to write small scripts to accomplish specialized tasks. Even more important, because Perl is not a compiled language, the scripts that are used on web pages can be viewed, copied, and modified by users. In addition, archives of useful Perl scripts were set up and freely shared among web developers. The easy cloning of web sites built with the combination of HTML+CGI+Perl meant that, for the first time, powerful applications could be created by non-programmers.

People in the open source community don't really think about HTML as an open source technology. They're focused on software and programming, and more particularly, on software that's distributed under a specific open source license, such as the GNU General Public License (GPL). They don't think of HTML as source code. But just imagine, for example, if the Mosaic and the Netscape browsers had not included a "View Source" menu item. Would the Web have taken off the way that it did? People copied HTML pages wholesale; they built on other people's web pages in a frenzy of networked sharing. HTML is one of the great open revolutions. Anybody can build an HTML interface.

You notice I'm skirting the strict definition of "open source," because while licenses are very important, what's more important is what people do in practice. Do they share? Do they copy? Is it easy to build on what other people do? The success of the Web was driven by the low barriers to entry that were implicit in having that HTML source available all the time.

But even apart from HTML and Perl, this is perhaps the most important point to make about open-source software: it lowers the barriers to entry into the software market. You can try a new product for free--and even more than that, you can build your own custom version of it, also for free. Source code is available for massive independent peer review. If someone doesn't like a feature, they can add to it, subtract from it, or re-implement it. If they give their fix back to the community, it can be adopted widely very quickly.

" Red Hat understands that the rules of the software business are changing in much the same way that Michael Dell grasped the change in the PC hardware business. "

What's more, because developers (at least initially) aren't trying to compete on the business end, but instead focus simply on solving real problems, there is room for experimentation in a less punishing environment. As has often been said, open-source software "lets you scratch your own itch." Because of the distributed development paradigm, with new features being added by users, open-source programs "evolve" as much as they are designed. Indeed, the evolutionary forces of the market are freer to operate as nature "intended" when unencumbered by marketing barriers or bundling deals, the equivalent of prosthetic devices that help the less-than-fit survive.

Now I realize that that's a long historical preamble. What I want you to take from it is that many of the most interesting new applications we're facing today are actually web sites, not desktop-style applications. It's the Amazons, Yahoos, EBays, ETrades of the world that are delivering new computer functionality. And that functionality doesn't look a whole lot like the applications we saw during the desktop generation. We're facing a power shift in the computer industry from software to web-based infoware that is as significant as the shift from hardware to software back in the early 80's. But I also want to suggest that if that historical pattern continues, we're going to see the open standards of the web becoming the basis for new proprietary empires.

Open Source Strategies

So let me move to my second topic, which is what the web teaches us about the strategic priorities of the open source community. One way to do that is to look at who made money and a secure industry position during the early days of the personal computer industry.

Obviously, there were many winners on the hardware front. Not only new companies but also existing companies benefited from the enormous expansion of the market that followed. But one of the things that we're all very aware of is that there was an enormous shift of power from IBM to Microsoft as the most feared company in the computer industry.

A second lesson is that there were some unique niches exploited by companies who realized there are different business models when you have commodity products instead of proprietary products. In his recent book, Michael Dell talks about an interesting turning point in the company's history. Like everyone else, Dell believed that their competitive edge lay in creating some new proprietary hardware on top of the open PC architecture. They had a big project that was getting more and more behind, and they decided to cancel the project. They realized that in the end their advantage was not going to be some new hardware edge. It was in getting better at sales, marketing, and distribution of what was essentially a commodity product.

This is a point hammered home again and again by Bob Young of Red Hat, who insists that he's branding and distributing a commodity software product. He's someone who is trying to play by the new rules rather than the old ones, which would suggest he should try to add in some kind of proprietary features. Now whether or not Red Hat in fact is the winner in what's shaping up to be the Linux distribution wars and probably the next big Wall Street feeding frenzy, is beside the point. Red Hat understands that the rules of the software business are changing in much the same way that Michael Dell grasped the change in the PC hardware business.

But clearly, there are even more people who made a proprietary business out of an open platform. Who else besides Microsoft was a huge winner in that hardware-to-software transition? Well of course, Intel. They made a heck of a lot of money because most of those open architecture PCs ended up with an Intel processor chip.If you think that the lessons of Microsoft and Intel's success in the PC revolution have been lost in the age of the Internet, think again.

" The Linux community is far too focused on the battle with Microsoft's current operating system. "

The very first time I gave a version of this talk was back in Wurzburg, Germany, at the very same Linux Kongress where Eric Raymond first gave his seminal paper on open source software, The Cathedral and the Bazaar. I made some of these observations about hardware, software, and infoware, followed by the boastful claim that "Open source is the Intel inside of the next generation of computer applications."

At the time, I was talking about the role of FreeBSD and Linux, Apache and Perl in building the web. This role has expanded since then, with many new back end services built with open source software. For example, an increasing number of web sites are using third- party search engines. The downloadable Excite search engine is built with Perl; the AtomZ search service and many others like it are hosted services built using FreeBSD, Apache, and Perl. The list is far longer than you might suspect. One of the hottest areas in the computer industry over the next couple of years is going to be in web sites that provide services to other web sites over the web, using the same model as the private label search engines are doing now. This is part of a larger trend, in which web sites themselves are starting to be used as software components.

But despite the growing role of open source in building new web-based services, I spoke too soon because while open source is right now the "Intel inside" of the next generation, there's no guarantee that it will stay that way.

The Linux community is far too focused on the battle with Microsoft's current operating system. Some see that the big goal is to develop a competing desktop and compatible desktop applications. And while I think that's a worthy goal and Linux is doing pretty well at it, I see Microsoft much more clearly and strategically focused on what kind of software will be needed to support that next generation of computer applications, and that worries me. Bill Gates made a speech back in 1997 in which he said, "Netscape isn't our biggest competitor in the web server space. Apache is." That was great for those of us in the open source community because it put us on the map, but when you think about it, that's not entirely a good thing. Microsoft rarely gets it right the first time, but they go back and try again.

Gates saw that Apache was a pretty clear target. Similarly, when technologies like Microsoft's ActiveX had been rejected by the web developers who liked HTML and Perl/CGI, what did Microsoft do? They went back to the drawing board, and they produced ASP and development environments that were much more acceptable to the scripting crowd. If you go out there and you do an AltaVista search for files ending with ".asp" versus ".pl," you see they are making pretty good progress. They are attacking the open source community where it really counts.

For that matter, they are attacking other parts of the Internet as well. Exchange Server is targeted pretty clearly at the market that Sendmail now dominates. In fact, that's one of the reasons why Eric Allman felt that he had to go commercial and obtain additional resources to compete. He asks a key question: "Do you really believe that SMTP will stay an open standard if Microsoft Exchange becomes the dominant mail server on the Internet?"

Open source plays a largely unheralded role in enforcing Internet standards. Similarly, would the HTTP standard still exist in its current form if Apache didn't hold true to it? Wouldn't it have dissolved into partisan one-upmanship between Microsoft and Netscape?

Nor is Microsoft the only company targetting functionality originally provided by the Open Source community. Companies from Allaire to Vignette are building software that nibbles away at Perl's dominance in glueing infoware front ends to software back ends. Services like RealNames and browser features from Microsoft and Netscape chip away at the function of the DNS. (Fortunately, there's also some interesting action in the open source arena as well, with the Java Servlet code being contributed to the Apache Project, and with Python-based web development environments like Zope.)

So there are some very important battle grounds in the Internet standards area that far overshadow the battle for the desktop. And we can't just look at current standards. Nor are software vendors trying to replace open Internet standards with proprietary software the only danger.

Can Infoware Become Proprietary?

The biggest danger, to my mind, is that even if we win the battle to keep Internet software open and non-proprietary, the new applications built on top of the web, the infoware applications I've been talking about, will form a new proprietary layer.

If you think about Amazon and other web sites as applications, and infoware as the next frontier, there are several implications. The first is a positive one: if someone like Amazon or Yahoo uses free software, they are a possible ally. But in practice, these companies don't actually think of themselves as part of the Linux or open source community, and I think that that will have a long term negative effect.

There's also one important point that many of the most serious open source partisans haven't thought enough about. If it is the future of computer applications to be hosted on a web site, many of the ways the open source community has thought about software licensing are going to fall by the wayside. When somebody uses any web-based application, they use it in a time-sharing way; the developer doesn't actually have to distribute any software. A company could build all kinds of cool, interesting, perhaps even dominant proprietary applications on top of the open software architecture of the Internet, using open source software as a platform, and all of our vaunted open source licenses wouldn't even touch them.

Now I'm not saying that web companies are intent on subverting the open source movement, but I do think that it's important to get people thinking about these issues before there's too much money at stake. If you don't form good habits early, it's a lot harder to form them later.

Going back to my earlier story about the PC, on top of that open PC hardware standard we ended up with some really closed software standards. So "openness" doesn't mean much when you're making one of these transitions to a new layer. And I think we are at that transition point. And so, for example, one area that may be very important in this new transition layer will be what you might think of as a high-level protocol for communication between cooperating computers on the Internet. Right now we have low-level protocols like TCP/IP, we have medium-level protocols like HTTP. On top of that, we're going to have various kinds of XML-based data-exchange protocols. Dave Winer's XML-RPC is a sign of things to come. It's a bad sign that Microsoft knows more about this than the leaders of the Linux community. They've already incorporated it into a new protocol that they are calling SOAP.

The question is who's going to control those protocols? And when are the companies who have dominant market share around one of those website-to-website or website-to-consumer protocols going to realize that they may have the world by the cohones, just like Microsoft did in the PC space?

Let me give you one small example. In my publishing business we have to provide electronic catalog information to all the online booksellers. Our webmaster is having conversations with barnes&noble.com, amazon.com, borders.com, and all the independent bookstores who want this same kind of information, and they all have slightly different formats. Now, Allen Noren, the webmaster, is trying to get all the vendors to agree on a standard. But if they don't agree, whose format do you think he's going to implement first?

Amazon, of course. Now I don't want to say anything about Amazon's intentions. I believe they mean well. But the fact is, once a company gets enormous market power, it's very easy to slip into habits of abuse. I don't know that even Microsoft started out from the beginning wanting to get everybody by the short hairs. Once you get in that position it takes a lot of strength of will and strength of purpose and high moral fiber not to take advantage of it.

So, part of what I'm trying to encourage companies like Amazon and Yahoo to do is to think further ahead, to think about their responsibilities to the Internet and to the future. All of these companies have benefited enormously from the openness that gave rise to the Internet.

(Note that when I say that the open source community was in some way responsible for the Internet, I'm not talking about the role of specific open source licenses or software projects but about the kind of collaborative, wide area computing that grew out of the university community. In some ways I think of software-sharing over the Usenet as the real beginning of the open source movement. At the end of the day what's really important is sharing, that people find ways to work together collaboratively. That's what really brought us to where we are: While the commercial software industry was bloating up the original innovative PC applications with unneeded features, and Microsoft was working on such splendid innovations as Microsoft BOB (that's the failed attempt to make Windows even more friendly, whose sole survivor is that hideous talking paper clip in the Microsoft Office applications), this worldwide collaborative community came up with the Web, they came up with email, and all the things that make the Internet so interesting.)

It's very important for companies who are involved in building and using some of these new kinds of applications to create good habits, habits of openness, habits of sharing. But I'm not necessarily talking to web developers now, I'm talking to Linux developers. And for you, the message is simple: pay attention to the web, and the way it's changing what users need, and the whole shape of the competitive landscape. If you just focus on Windows, you're missing a lot of the most exciting action. I'd like to see Linux not just as a desktop alternative, but as the best web development platform.

But Is There a Business Model?

I said I'd talk about how open source businesses make money. When talk turns to who's made money from open source, people used to point to me and say, "You've made more money from open source than anyone. They give away the software, you sell the manual." And even though I actually have made quite a lot of money because of open source (I think I've probably had more open source - related revenue this past year than Linux distributors like SuSE or Red Hat), (and even before Red Hat's stunning stock market performance) I'm not the person who's made the most money from open source.

The person who's made the most money from open source so far is Bill Gates. Did people buy upgrades to Office 97 and Windows 98, all those billions of dollars of upgrade fees, did they shell out that money for Microsoft BOB? No, they shelled it out for Internet functionality. They shelled it out for the functionality that had been developed over a period of 15 or 20 years by a community of collaborating researchers, hackers, people who built tools for their own use. So, the people who profit aren't necessarily the people who build the software. And we have to think about that. We have to think about how we make sure to create business opportunities that encourage the further development of free software, of open source software, of software that really matters. And companies like Microsoft that incorporate that functionality into their products ought to be figuring out how they can give something back to this community because they have profited immensely, and will continue to profit immensely, because an open community is always better at innovation than a closed one. So my message to the commercial software world is to support open source because it's good for the computer industry, and good for you. Without it, you'll run out of ideas.

I also have a message to entrepreneurs who are trying to come up with new open source businesses. With commodity software, the rules are different. We need new business models. And those models are not always what you might expect. Let me illustrate once again with a story.

One person who's made buckets of money from open source software is Rick Adams, the founder of UUNet. How many of you remember when Rick was the host master at a site called seismo, the world's largest Usenet hub and the author of B News, which was the most widely used Usenet news software. Rick didn't say, "Oh, I'm going to put this software in a box and sell it." What Rick did when he saw that his bosses at the U.S. Geological Survey, or whereever it was that seismo was housed, were starting to ask, "Why are our phone bills several hundred thousand dollars a month for passing Usenet feeds to anyone who asks?" Rick realized that we needed to have some way to have Usenet pay for itself. And he really invented what we now take for granted, the commercial Internet service provider business. But when people think about free software and money, they very often play right into Bill Gates's hands because they think about the paradigm that he has perfected so well, which is "put software in a box, ship it, get a locked-in customer base, then upgrade 'em." Rick went sideways from that. He was the first person to say, "I'm going to build a serious business that is based on free software," and the business was in providing a service that was needed by the people who use that software, who talk to each other, who distributed it, who worked with each other online.

Now I've said that the Web companies like Yahoo are the next generation of application based on open source software. That's a variation of the same message. The way that you often will make money with free software is to deliver a service based on it; either a service that underlies it like the ISP service or a service that's built on top of it like Amazon or Yahoo, or business to business services like the web search engines.

Now if this is the case, we have a good answer for all those people saying, "This free software thing must be a bubble because we can't figure out how anybody's going to make money at it." My argument is that people are already making more money at it than we can count. So that's a very important paradigm shift and it brings me back again to this idea about where the open source community should be focusing its energy. If the frontier is in developing applications to deliver online services to people, and those applications are not distributed as software, they're just software in use, then, first of all, we have to make sure that the people who are using that software realize it, and that they contribute to the community, that they keep that engine of innovation going.

And secondly, in our development work, we need to focus on the technologies that are needed for this next generation of applications. Now I'm not saying that more traditional software is unimportant. Far from it. Advances in operating systems are incredibly important, just like advances in hardware. In fact, in that previous transition, the hardware companies continued to do very well. There's been an enormous growth of functionality in hardware and I think there will be enormous growth of functionality in the software layer. But I can't emphasize strongly enough how important it is for us not to let the engines of the Web go proprietary. It's essential for us to engage the companies that use Web software in the open source community, and in open standards, because if they go proprietary it won't really matter if Linux wins the server wars or the operating system wars, because the battle ground is moving on.

I want you to think in a bigger way about both the impact of open source, the enormous positive impact, and the dangers if the web becomes proprietary. Open source has been one of the engines of enormous change in the computer industry, and it will be the engine of even greater change as we go forward. We're really heading into a very interesting time, and open source can be one of the key engines of that revolution if we focus our energy in the right place.