XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

From P2P to Web Services: Addressing and Coordination

April 07, 2004

Organizations are facing new technological challenges, often finding them perplexing or even insolvable, as they modernize their use of the Internet and intranets. But the common element which these problems share is that their solutions go beyond technology. These problems require a social infrastructure, a framework that determines whether or not technological change is successful. This article summarizes what researchers and standards committees are doing in tentative attempts to create that infrastructure.

After making great strides in the 1990s to install LANs, web servers, virtual private networks, and other facets of the TCP/IP revolution, system administrators notice that:

  • mobile users are logging in from all over the continent;
  • employees are attaching wireless devices to your networks at heaven's knows what access point from minute to minute;
  • people are exchanging sensitive data over instant messaging, an outrageously insecure protocol by default, and one that additionally replicates many of the problems of email such as viruses;
  • employees are collaborating with people outside the company, using email or more sophisticated collaboration tools, freely sharing company data in ways you can't control;
  • people are running servers on their PCs for the first time and, thus, exposing services to the network--theoretically opening their systems to compromise--through technologies such as Rendezvous;

These problems have to be solved at a fundamental level, involving changes in business requirements, training, and organizational communication patterns. In short, they require a social infrastructure, which makes them harder to solve. Computer and software vendors sell technical solutions for these problems; the products are probably good ones. But they are not enough.

Perhaps we can gain some insight by looking at the ways in which peer-to-peer technology was received and accomodated a few years ago? No one ever offered a great definition of "peer-to-peer". Sometimes the term is used to cover only file-sharing systems (Napster, Gnutella, and Kazaa) , despite the fact that peer-to-peer researchers and implementers were looking beyond file-sharing. Other people defined peer-to-peer so broadly that it included email. When used appropriately, "peer-to-peer" covers grid computing, the new generation of collaborative tools (for example, Groove), and new types of distributed databases and distributed filesystems (for example, OceanStore).

For our purposes here, it's fine to think of peer-to-peer as any networking technology where crucial responsibility lies at the end-points. This definition includes all the issues I mentioned earlier. It may also characterize some aspects of Microsoft's Office 2003 suite.

In fact, definitional inadequacies aside, peer-to-peer isn't really a set of technologies as much as it is a set of problems. And now the problems of peer-to-peer are the problems we all face. Peer-to-peer exposed the weaknesses that exist in the current implementation of the Internet; it was an avant-garde. And while few peer-to-peer technologies have been adopted thus far, I expect that in a decade or so they will be adopted because the problems in social infrastructure now must be solved.

The challenges of and lessons from peer-to-peer fall under one of three categories: addressing, coordination, and trust. I discuss the first two of these in the present article, taking up the third in a future article.

Addressing

Most of us don't run applications that require personal, persistent addresses. Suppose you have a great sale to offer customers and want to promote it through a web service. SOAP offers a way to expose the information to your customers, who can query you for promotions through a SOAP call:

<s:Envelope xmlns:s="http://www.w3.org/2001/06/soap-envelope">
    <s:Body>
       p:QueryPromotions xmlns:p="urn:Promotions">
          <category>travel/category>
          <expiration>2003-10-31/expiration>
       </p:QueryPromotions>
    </s:Body>
</s:Envelope>

But why wait for users to think about querying you? Perhaps this promotion lasts only one week and you want to reach out to loyal customers in time. You want push technology. Companies do this now through email. And you're stuck with email; you can't do push through web services. The problem is that there is no persistent address where a user can be reached by way of a web service. Web services are asymmetric: users can query a server, but the server can't query users. It would also be good if a user could make a web service request and then disconnect, letting you send the results to the user at a later time. You're going to have to use email for that too. Web services are synchronous: the sender has to wait for the reply.

The Real Problem

The two situations I've just described are related. I could restate these points by saying that our current social infrastructure provides only one persistent address for a user, an email address, and that it cannot currently be used easily for other protocols. And even email lacks adequate persistence; just look at the thousands of MediaOne subscribers who had to change to AT&T accounts and then change again to Comcast.

But the venerable email address, for lack of anything better, is being used as a unique identifier. In other words, web clients lack a robust return address. In theory, I could send you an IP address, properly encrypted and signed to prevent spoofing. And a protocol could be developed for you to send the results of your long-running operation to the IP address when you're done. But there is no guarantee I'll be at that IP address; I may have logged off long ago and my neighbor, who may be my sworn enemy and work for a competing company, may have logged in and received my old address.

When the addressing problem, which is related to resource discovery, is raised, some people say, "Implement IPv6, thus providing enough addresses for every device to be manufactured over the next several hundred years. Give every device an address; and, while you're at it, eliminate Network Address Translation and DHCP, and the problem is solved." No, it's not. People are not tied to individual devices. We go to work, we go home, we log in through PDAs and telephones. I am not my computer device.

Furthermore, I need to change addresses as I move. If all of us could use the Internet using the same IP address whether we were in Boston, Montreal, or Helsinki, Internet routing would bog down and become unmanageable. IPv6 does not provide for the use of addresses in different geographic locations; there is only an extension called Mobile IP [http://www.ietf.org/html.charters/mobileip-charter.html], an extra layer designed for cellular phone networks. Implementing IPv6 and eliminating NAT have benefits, but they don't remove the addressing problem.

P2P Faces the Problem

The peer-to-peer movement had to face the problem of addressing head on because people at individual PCs had to be reached in a wide variety of environments. One of cleverest ways to solve the addressing problem is to design applications so that user addresses don't matter. This is the solution chosen by Gnutella and many related file-sharing systems: you just broadcast that you want a certain file. The request passes from your system to a few systems you know and then to a few systems each of them knows; eventually some system comes back saying, "Here is the file." Another way of saying this is that the addressing problem is moved from the user to the desired resource. Individual users are free from the addressing problem.

It's interesting how many applications can function with anonymity. As we have seen, the Web requires the client to identify the server, but the server does not have to identify the client (except to obtain a temporary IP address); the server is happy to display its home page to anyone. Once someone wants to view sensitive data or buy something, the server will put up a password dialog box or require a credit card; that's a more advanced situation.

On the other hand, anonymity is currently being allowed in many places where it's creating trouble, largely because of the rise of wireless networks and the risk of drive-by intruders. For instance, corporate file servers routinely put up public sites that anyone on the network can read; it's assumed that everybody behind the firewall can be trusted. This was always a bad assumption, but if the network adds a wireless hub, the administrator has to worry constantly about who's sneaking up to it and snooping around. In a similar fashion, many corporate mail servers accept mail from anyone on the LAN. That problem has been highly publicized because intruders have been using such hubs to send unsolicited bulk email. So more and more, we are discovering the need to assign persistent identities to users. In the case of wireless, organizations are doing so by making them log in before using the network.

Sender Policy Framework, which has been in the news a lot as email software designers and ISPs call for its adoption, works on a slightly different level. It doesn't identify end users. Instead, it provides checks to ensure that mail messages correctly identify the hubs and relays through which they pass. This is more of a routing issue than an addressing issue; the basic form of addressing (DNS) is no different when SPF is used.

Solutions to the addressing problem fall into a few categories. In the case of a wireless LAN, the first solution is simply to make users authenticate themselves with a central repository that contains their identities and to record their addresses for a single session. This is what most instant messaging systems do. It was also the quick-and-dirty solution chosen by Napster, which is why it could easily be dismantled for vicarious and contributory copyright infringement while modern Gnutella-based file-sharing systems cannot.

This dependence on central servers scales well. AOL Instant Messenger shows that such a system can serve millions of users. Still the system suffers from a flat namespace (once someone chooses the name John, no one else can use it), and it puts control in the hands of the people who run the servers.

The second solution is used in Apple's Rendezvous (an implementation of the Zero Configuration Networking or Zeroconf standard) and in many other network systems meant for LANs, including some Microsoft domains. Each would-be peer announces the address or name it wants, and if it hasn't been claimed already by another peer, it is assigned to the newcomer and recorded by each of the other peers. This solution requires all participants to be on the same LAN, for several reasons: it depends on broadcasts, it doesn't scale up to huge numbers, and it's open to many attacks if a peer isn't well-behaved.

The most robust and scalable solution in current use, the Domain Name System, was created twenty years ago. DNS was extended long ago with special records (MX records) to support email, which I mentioned as the one form of persistent address in our social infrastructure. DNS makes it easier to maintain a network of mail servers. It would be interesting to see whether support in DNS for a more generalized addressing solution could allow other services to support persistent addresses or lead to a more general form of addressing that could be used by many applications and protocols.

Such support was actually added about five years ago, in the form of SRV records as specified in RFC 2782. These records can specify any well-known server and provide the information needed to reach it in a flexible manner. SRV records have not been generally adopted and are not being pushed by the IETF; but they are in widespread use: by Apple's Rendezvous, by Kerberos, and by Microsoft's Active Directory.

The Jabber instant messaging service, an XML message passing system that is not highly popular yet but whose protocol was officially standardized by the IETF as the Extensible Messaging and Presence Protocol (XMPP), partly solves the addressing problem by depending on DNS, and suggesting that each user run his or her own server. Doing so is not required, but if practiced, automatically gives each user an address.

Domain names are perhaps the best solution in the ideal sense but the worst in their practical implementation. They remain relatively heavy-weight and present many barriers to the average user, partly thanks to the original implementation of the system and partly thanks to persistent intervention by large corporations and their legal representatives. Particularly in the global top-level domains like COM and ORG, this intervention has been effective in keeping most individuals from taking advantage of the persistent names offered by DNS.

The supply of domain names is artificially limited, so much so that a whole business has grown up around notifying someone when his desired name becomes available. VeriSign fought with registrars over who gets to dominate this activity, which adds no value to society. Compared to the cost of actually administering DNS servers, prices of domain names in the popular top-level domains amount to information highway robbery. Even if you get past these barriers and obtain your own domain name, you cannot consider it safe unless you also invest thousands of dollars to obtain a national trademark. Furthermore, registration requires you to make your contact information public, an anti-privacy measure that renders the system inappropriate for individuals.

When you turn to country domains, the situation is much more user-friendly. But registering is still too much trouble and expensive for most people; compare its difficulty to the convenience of getting a login account on one of the major instant messaging services.

Researchers have been searching for years for a distributed system of addressing and resource discovery. The more heavy-weight peer-to-peer systems such as Chord and Tapestry, both in the experimental stage, design addressing and routing systems.

Each node that joins one of these systems is assigned a unique, random identifier. Certain nodes know how to reach others with similar numbers. When trying to reach another node, you start by choosing one you know whose first few bits match the identifier of the node you want. The system is a lot like standard Internet routing at the IP layer.

Thus, if you want to reach 12345 and you have two choices, 12862 and 12347, you choose 12347 because more of the initial bits match. 12347 requires fewer hops to get to 12345. This kind of system is intriguing, but we don't know yet how practical it is.

Much of the p2p network research was subsidized by the music industry, for which we should offer our sincere thanks. Without that subsidy, how could researchers collect statistics over months and years based on participation by literally tens of millions of nodes? It was sheer genius to offer popular recordings to sign up users, and the world will benefit from the testbeds that these systems provide. The music industry botched the PR, of course. So did universities, who also subsidized the research by providing large amounts of bandwidth, but tried to strangle the traffic once they noticed it.

Pages: 1, 2

Next Pagearrow