Content Delivery Networks (CDNs): How They Work and How They're Used?
![]() Judith Boettcher [JB] |
![]() Howard Strauss [HS] |
![]() Micah Beck [MB] |
September 20, 2001
Audio
• Streaming
MP3
• Download
MP3 (Download
Tips)
JB: Welcome to the CREN Tech Talk series for Fall of 2001 and to this launching session on Content Delivery Networks and How They Fit Into the Networking Landscape. You are here because it�s time to discuss the core technologies for your future campus. This is Judith Boettcher, your CREN host, and our session today is coming to you with the support of the CREN member institutions and Lokomo Systems. Lokomo is your source for a software suite that goes beyond simple web acceleration, helping you to manage active edge servers anywhere on the Internet. And many of you may not know what edge servers are, but we�ll be talking about those as part of our content delivery session today. Let me welcome back Howard Strauss of Princeton as the technology anchor for Tech Talk as we launch this fifth season of Tech Talks. Howard is a well-known web technology expert and portal expert. Welcome back, Howard!
HS: Thank you, Judith, and it really is so good to be back for yet another season of Tech Talk! In this webcast, I invite you to join Judith and me in a lively technical dialogue with our guest expert, Micah Beck, that will answer the questions you�d like answered and will also ask those very important follow-up questions. You can join in this dialogue by sending your own questions via e-mail to expert@cren.net anytime during this webcast. If we don�t get to your questions during the webcast, we�ll provide an answer in the webcast archives. It has been said that you can never be too rich. Most of us would quickly agree with that! Less obvious, perhaps, is that you can never have a computer that is too fast, can never have too much computer storage and can never have too much bandwidth. Well, it would take most of us a bit of time to exceed the capacity of the one petaflop Blue Gene computer being built by IBM. But I�m certain before it�s built we�ll find some application that will require an exaflop computer, a thousand times faster. The Internet will always be too slow, no matter how fast we make it. While packets can travel at the speed of light�about seven and a half times around the world in a second�the modems, routers, web servers and sheer amount of traffic creates a nearly permanent traffic jam that can bring the web to a crawl. We all know some tricks to speed things up. Get faster modems! Get DSL! Build multiple web servers! Use local caching! Caching trades the disk space of the cache for what appears to be additional bandwidth. The first time we use a frequently-accessed website, we load all the high bandwidth static information such as images and logos into the cache as well as displaying it on our screens. When we access the site again, the information in our cache will come from our disk at very high speed, totally independent of the state of the network. The information that changes will, of course, have to make its way across the Internet. The local cache helps the user that has it, but it is no help to the rest of the university. What we need is a university-wide cache. If we had a cache that could be shared by the entire university, it would of course have to be quite bit and very fast. But what would we put in it and who would pay for it? As we�ll soon see, we�d have the start of a Content Delivery Network and it will often be free. But speeding up the network for just static content is not enough. Today much of the dynamic content is streaming media, live video and other forms of information that have a voracious appetite for bandwidth. We�ll need a solution that can handle this as well. A cache is a hiding place for treasure. The treasure you�ll find in caching, Content Delivery Networks and distributed storage infrastructures is an Internet that appears faster than it could be�although I�m sure it�ll never seem quite fast enough. All of this will soon be perfectly clear to you as we explore some secrets of speeding information to you on today�s webcast of Tech Talk. Judith?
JB: Thank you, Howard, and I think our dream is always finding those computers and networks that actually keep up with how fast our brains generally work, so we�ll hear more about that today. With that, let me welcome our expert for today, Micah Beck, who is the Associate Research Professor of Computer Science at the University of Tennessee-Knoxville and Lead Investigator in the Internet 2 Distributed Infrastructure Project. Micah is very active in many areas in both storage and distributed storage arenas, serving as the chair of the Internet 2 Network Storage working group and also as leader of the Internet 2 Distributed Storage Infrastructure Project, of which I think we�ll hear more about today. Micah�s Ph.D. is from Cornell and he has also worked in industry in Bell Labs. And we found out that he and Howard share that particular experience. At any rate, we�re very pleased to have you, Micah. Welcome to Tech Talks.
MB: Thank you, I�m very pleased to be here.
JB: Great!
HS: Micah, perhaps the first thing you could tell us is what is a Content Delivery Network anyway? I think it�s a term that most people haven�t heard, and I never even heard it before this Tech Talk came up.
MB: Well, a Content Delivery Network is a variant on a network of caches. You mentioned that a cache for a community like a university campus can act to cut down the bandwidth requirements of that campus when serving content that�s popular on the campus by storing it locally. If you sort of turn that around and say if someone were to place a cache on the campus that held content that they want to be able to distribute to the university�say, for example, if CNN were to place a cache on the campus and have it only hold content for CNN so that people in our dorms and in offices that were accessing CNN could get content out of that cache, that would be a very simple Content Delivery Network serving just one content provider.
HS: So CNN would wake up in the morning and they would put all their static stuff out there? Is that
MB: Well, so, in fact it�s done on demand, so it�s not that they push it out there, but in fact, it�s a cache which is a special cache, say, for CNN content which other kinds of websites couldn�t use. Access to other websites wouldn�t go through that cache. So it would still be on demand, like an ordinary cache, but it would be serving only a particular site.
HS: And there would be a physical disk that would live on your own campus, so there�d be one for every campus?
MB: Right. And of course, to do that for just a single content provider like CNN would be rather extravagant so instead, a Content Delivery Network provider places, for example, Akamai or Digital Island, will place a box on the campus and then rent out space in that box�that is, a portion of that cache�to different content providers, to CNN, Yahoo, a number of providers that all want to use that cache to accelerate delivery of that content to the campus.
HS: And the people who pay for this are the content providers themselves?
MB: Exactly! The content providers pay for it and the result for the campus is presumably reduced external bandwidth on their link to the Internet because a lot of popular content is served out of this dedicated cache. And there can even be benefits to the campus if they choose to serve content from that cache to other users and to garner income from it.
HS: And the advantage to the content provider is?
MB: Well, so there are a couple of advantages. The most direct advantage is it offloads the content provider�s central web server from having to serve out large rich media objects�big pictures, video clips. If those can be served from an edge server, which is what this cache, this content delivery cache is, then it doesn�t have the work of serving. It doesn�t have to be done at the central server. I mean, one thing to note is that the job of serving highly popular commercial content is something that is highly computationally intensive and it�s difficult to build centralized servers that can do it all from one location. So their benefit is that it offloads their server and the secondary benefit is that it hopefully provides better access for their users. If you�re accessing a local cache, you�ll get a snappier response. You won�t be contending as much across the backbone so you should get better service at the end user.
HS: Both Judith and you now have said something about edge servers or the edge of the network. When I think of a network, I don�t think of it having any edges. What�s this concept of servers on the edge of the network all about?
MB: Well, if you look at the architecture of the network, it is very hierarchical. There are backbone networks which are essentially�you can think of them in telephony terms as trunk lines that carry large amounts of information over long distances often, or at least on very popular routes. And then there are networks that feed off of that. So for instance, AT&T or Sprint or Quest are providers of backbone networking, but they can connect to, say, an Internet service provider that might then connect out to a lot of individual end users. So we consider the Internet service provider to be closer to the edge of the network, and actually the end user�s PC is the very edge of the network.
JB: So in terms of where on a campus generally, where within their campus network might the edge server be located?
MB: From the point of view of a commercial provider, locating one of these servers in the campus network close to, again, the center of the campus network, close to a point that�s well connected to the entire campus, that�s where you want to put this server because it can serve the entire community. So you will find it often at the edge of the campus network, in the machine room where the campus network connects to the Internet service provider that the campus uses, you�ll put the box there.
JB: So it might, in fact, be right in the campus machine room at the primary external connection, then?
MB: That�s where you�re likely to find it, yes.
HS: How is this different than mirroring? It sounds a little bit like mirroring, isn�t it?
MB: So it is, and in fact, we wrote a paper in which we did a direct comparison of caching and mirroring as ways to distribute the work of content delivery around the network. Mirroring is�well, let me say, caching is the process of capturing the output of a web server and then later, if you see the same response again, you�ve replayed that output, okay? And so you�re really focusing on what the server produces. What we call mirroring is when you actually copy the source files that the server uses and actually put a working server at each location where you�re making a copy. So it�s a matter of capturing and replaying the output of the server vs. replicating the input to the server.
HS: And how do you decide which one you want to do?
MB: Well, so that�s an interesting question. The reason that caching is such an appealing technology is because the output of the server is standardized. HTTP is a standard and it has a standard definition, and so it�s possible to capture the output of a web server and then replay it and it doesn�t matter what the architecture or the operating system of that web server was. You have a standard definition of what an HTTP response looks like. So that means that it�s rather easy to, for instance, insinuate a cache between a server and a client and perhaps not even tell either of them that the cache is there and still have it work. So that makes it appealing from the point of view of compatibility.
HS: How would a Content Delivery Network, then, handling streaming media or live video or something like that?
MB: Right! So that becomes a harder problem because those kinds of output from servers are harder to capture and replay. For a long time it was considered that, for instance, streaming media was not able to be cached because you couldn�t really capture the output of a server, store it and later replay it. So it was considered that you really had to use mirroring for those kinds of services, that they weren�t amenable to caching. Someone actually has done work and has managed for certain kinds of streaming protocols to capture them, store them to disk and be able to later replay them. And so it turns out if you do some work, you can get streaming media cached, but that�s not for all protocols, and in fact you don�t get all the features of the streaming media. You only get the basic video or audio service. Any kinds of fancier editions that really have to be understood by the server and implemented by the server are lost when you capture the output of a server.
HS: Are there lots of people using these CDNs? How commonly are these things being used on campus?
MB: So that�s a very good question about campuses. They are commonly used in commercial ISPs. The business model of the content distribution networks, the folks who are building these, were to give away the hardware�or really not to give it away, but to own the hardware and place it for free in the ISP or campus machine room. And so it was really a very good deal for an ISP or a campus, so this has been very widely distributed in the commercial Internet. And I would say I really couldn�t quantify that in the university Internet, but pretty much, it�s a savings of bandwidth. Pretty much everyone could use a savings of bandwidth on their commercial Internet connection. And so I would certainly say that there are hundreds of institutions that are using it at this point.
HS: To a user of this thing, with a Content Delivery Network, it sounds like the users think they�re going out to CNN. But they�re not, right? They�re really going out to this little local disk.
MB: I mean, so it�s a very interesting question. It depends on what you mean by �going out to CNN.� I mean, yes, they think that they
HS: I type in the CNN URL and I think that�s where it�s going to go.
MB: Right, well, so if you interpret that URL as identifying a server and what you think is happening is that you are making a connection to that server and talking to that one server to get your content, then yes. It�s true that if that�s what�s in your mind, that�s not what�s happening. Instead, depending on the configuration, it may be that some of your requests�since a web page is made up of a lot of different pieces, you know, text and pictures and so on��
HS: Sure.
MB: Some of those pieces may be served from caches that are closer to you in the network and hopefully, that process is well-controlled, well-managed and is transparent to the end user.
HS: That means, when you say �transparent to the user,� you mean I don�t have to change my browser, I don�t have to change anything I�m doing on my local computer?
MB: That�s right. And in fact, unless you pay very close attention to what URLs are displayed as the page is loaded, you may never know that you aren�t getting all your content directly from a central web server operated by the content provider.
HS: So the client doesn�t have to do anything different. What about the people on the server side, the CNN folks?
MB: Well, so that depends a lot on the��
HS: Because it seems like somebody has to do something different.
MB: Well, how much different you have to do depends a lot on the content delivery software that you�re using to implement the content delivery system. There certainly are things that have to be done. Some of those things can be done automatically. In the best world, we would take a website that is written to be displayed on a single web server and any changes that would have to be made to the HTML source and other parts of it in order to make it run in a Content Delivery Network would be done automatically. That�s the best world. Certainly, in the early Content Delivery Networks, that was not true. There were changes, particularly to embedded URL�s, that had to be made to allow a level of indirection to be implemented so that it could be directed to one or another of the caching services around the network. But I think that there have been actually advances made and today the burden on the content provider is very low.
HS: Now, I understand that you can do all this stuff on the regular network. We don�t need Internet 2 or anything like that, special modems or anything. This just runs on the plain commodity Internet.
MB: This runs on the commodity Internet. I mean, this is a technology
[Automatic Message: Your conference call will end in ten minutes.]�
HS: We hope not!
JB: I hope not!
MB: This is really about integrating networking, storage and processing resources. And yes, the networking resources could be really any networking resources and certainly in situations where there are problematic connectivity is some of the places where Content Delivery Networks can do the most good by decreasing the reliance of every request on the connection to the Wide Area Network.
HS: But I understand that there is an Internet 2 project that seems related to these Content Delivery Networks, the Distributed Storage Infrastructure thing. Maybe you could tell us a little bit about that and what it is and why it requires Internet 2? Or does it?
MB: Well, so, the Internet 2 Distributed Storage Infrastructure Project, the idea behind it is really that if Internet 2 is about advanced networking, that that need not simply mean bigger pipes. That that really means making the network work better and work at much higher capacity for the academic community and if using storage resources along with networking resources can give a synergistic effect which is greater than any one on their own, well, then that should be also part of the Internet 2 mission. So from that point of view, as I say, it�s a combining of storage and networking and processing and so the portion that relies on Internet 2, this is a project which does content delivery in the Internet 2 community. The sites, or as we call them, channels that are being delivered, some of them are very, very large and you really would not hope to be able to do content delivery on that scale in an ordinary network.
HS: But is it really a Content Delivery Network? I mean, do we really just have a cache here and there or do we have entire websites? How�s this different from what you described?
MB: Okay, that�s another difference. So as I said, one thing is the scale of objects that are being moved around and of collections that are being moved around. Let me say, as leader of that project and just to clarify, the Internet 2 Distributed Storage Infrastructure Project actually predated the sort of commercial debut of Content Delivery Networks. So in some ways, it does what those networks do and more, really with an eye towards what the requirements are of the academic networking community instead of the commercial. One of the things that we decided, we looked at in the design of that project was that we were not happy with the idea that you could only capture and replay content that the cache understood. What that meant was there was a limited number of data types and they tend to be very simple data types that could be distributed through a Content Delivery Network. So we decided instead of going with a cache-based infrastructure, to go with a mirroring-based infrastructure. And what we do is we do not distribute, we do not capture and replay the output of servers. Instead, we distribute the input to the servers across multiple sites, actually implement the server software on each site and provide the service on demand in a distributed way. So that means if we have a service that requires us to run a program to produce the output, we�re not going to be able necessarily to cache that because every time the program runs, it may give a different response. But if we can make a copy of the program and copy it onto a bunch of servers and run it on the server whenever the website is hit, well, we can replicate the service that that website is providing.
HS: These Content Delivery Network caches actually live in the server or machine rooms of universities, right? Do I have that right?
MB: That�s right.
HS: And what about these things for this Distributed Storage Initiative, these DSI mirror servers? Where do they live?
MB: They live right next to the commercial boxes, generally, in those same machine rooms.
HS: So there�s really one of these in every campus?
MB: Well, now, the DSI�the Distributed Storage Infrastructure Project is an experimental project and it only has a relatively small number of sites, about six active sites at the moment.
HS: They�re scattered across the country or��
MB: Scattered across the country and we have had international sites from time to time.
JB: Who�s involved with that project, Micah?
MB: So the lead institutions are the University of Tennessee�my group here�and the University of North Carolina-Chapel Hill. And we have a collaborator at the School of Information Science there, Bert Dempsey, and other collaborators there. Also more recently, we�ve started working with Ed Fox at Virginia Tech, working on replicating open archives in the Digital Library world. That�s from an academic point of view, who�s involved. The institutions include University Texas�I�m sorry, Texas A&M. I�ll be in trouble with someone over that!
JB: Right!
MB: University of Hawaii has been involved, the Euros Data Center has been hosting one of our machines. We�ve had a machine at the University of Indiana. I�m sure I�ll miss someone, but we�ve had these machines in various places around the country and we�ve had collaborators�not all of them necessarily still with the project�in Singapore and in other foreign locations also.
JB: Okay, well, let me just jump in here right now for a moment and remind our listeners that they can send questions in to you, Micah, our expert at expert@cren.net and hopefully we have solved the extension problem, by the way, too, so we should be all right.
HS: Okay, Micah, on this SDI thing, I mean, with the Content Delivery Networks you said we had things out there like CNN and that kind of stuff. With DSI, what kind of things do we have? What kind of things are we mirroring that have huge amounts of data?
MB: We have a number of channels that are currently active and we�re always in the process of looking for more. We have, for instance, the Open Video channel which holds a collection of video which is available to the information science community for public domain use, high quality video. Turns out that it�s difficult to get high quality video for experimentation, for various kinds of work that they do with it, and so Gary Marchinini at UNC Chapel Hill has this project and he�s using a channel within the Internet 2 Distributed Storage Infrastructure to store and distribute that video. That�s one example. We are helping to distribute Linux, which is highly popular�the Linux Source and the Linux binaries puts a huge strain on, actually, again the UNC campus network to distribute that content to the world and if it can be distributed around the network and people can get copies from servers that are close to them in terms of network topology, that will greatly offload it. So we have a Linux channel also. These are a couple of examples of the channels we have. We�re very interested in branching out, finding collections of content for which hosting them at a single university is not adequate.
JB: So maybe when we take a look at something like film studies and film productions at campuses, that something like that would be good to have on this type of channel. Would that be correct?
MB: That�s certainly true. And the other thing is that any channel which has an active content where you have scripts that have to be run as part of the channel, that often is a difficult thing to have a single copy of to serve the whole world, particularly if you have international users or users in different networks. And mirroring has been used to mirror static content. FTTP mirroring has been around a long time. We are very interested in actually mirroring the applications that have to run in a website to make it an active website, and so the simplest example of that that�s on very many websites are search engines. Those are things that often are not mirrored, even if they content that they are indexing is mirrored. And so again, that�s a simple example, but we�re looking for cases in which people have scripts or programs that are part of their academic application but hosting one copy of it on their campus is not adequate in terms of the connectivity and the latency that the end users are seeing.
HS: If there�s someone out there who is interested in joining in this effort, how do they do that? If they have something that they think would be a good application for this.
MB: At the moment, what they do is they contact me and we talk about what the nature of their application is, what the resource requirements are and what the community is that they are trying to reach. Because we�re not currently in a business mode, we generate the technology. We have a certain number of servers that we can give people the use of, if they�re in good places for them. But really, what we�re looking probably for is people who would be putting their own resources into mirroring but would use this kind of a system to automate that and to make it more scalable, to make it work much better for them. And that may mean asking them or their community to provision the servers themselves, and that�s one very big difference between sort of the academic picture and sort of the industry picture which is that we are not in a position to buy machines and place them in other people�s networks because we have no model for recovering income.
JB: So if campuses are looking forward to planning in the future in this area, what should they do?
MB: If they�re looking forward to planning in this area, I guess the important thing here is their communities, their research communities. Are there people in their communities who are hosting content that is very popular or that is used very broadly, or do they know of campus researchers who have difficulty in pushing their content out to the communities that they want to have access to? If they do, then they have to look at the question of how can they work with the other network providers in that community to provision it, to provision that community with servers and then use�hopefully�our tools, but perhaps commercial tools to distribute content to servers throughout the community.
HS: Do you imagine at any point that you�d have a bunch of these servers�like you do right now. You say you have six scattered across the world here, or the United States. But do you imagine that you would have a time when these things are not inside people�s campuses but sitting out in the world so that people can get them?
MB: So I�m not sure by what you mean by people getting them.
HS: Well, what I�m thinking about is if a university has something and they say, �Gee, this would be really useful in this DSI model, but we can�t be putting it in 200 campuses or 3,000 campuses or whatever. But maybe we�d consider just having our one copy of it, we could have six copies of it, so you�d get some of the benefits of this because you wouldn�t have to go to our one copy. You�d go to one of the other six servers.��
MB: Right. So that would be a possibility. I mean, this gets down to business models, really. And it would be certainly possible for one of the commercial providers, the commercial Content Delivery providers to work in that mode. In fact, I�ve heard of folks working with Internet 2 who are offering some access to the Content Delivery Network servers that are already out there for academic content. That�s sort of a special case, to give them either cut rates or maybe even free access to them. So that would be possible to have that kind of an infrastructure that people could perhaps rent some use of the same way that commercial sites do. But the other possibility would be to do it cooperatively, namely as we once did with building the Internet itself. Namely, we put the servers on our own networks and then we reach cooperative arrangements with our colleagues on other networks about pooling the resources and spreading the content amongst them.
HS: One campus says, �I�ll do this type of content and these 12 universities can access it,� and the next university will take some other content and a bunch of universities will be able to access it?
MB: And that�s already done informally by people who do mirroring, okay? It�s done generally not by the networking groups but by people who run their own servers on campus and what do they do? They create a mirror site and advertise that to the world and that acts something like a manually operated Content Delivery node, except that it doesn�t use any of this technology and is not at all under the control��
[Automatic Message: Your conference call will end in ten minutes.]�
HS: Interesting thing here! Micah, is your goal here just to save bandwidth or do you have quality of service goals? How are these things balanced?
MB: So that, I guess, depends on who you�re talking about. In the case of the Internet 2 Distributed Storage Infrastructure Project, the main goal really is to improve the quality of service and to broaden�and to do so for a wide range of applications, okay? So the focus there, again, because we�re talking about working in the Internet 2 network where bandwidth is not really a constraint, still latency and location can be. And so while there can be bandwidth savings, we are really focused to a great extent on improving the quality of service, of delivery through proximity between the client and the server. And there is also then the aspect of wanting to be able to do that particularly with highly interactive applications where latency and quality of service are particularly important. And that�s why it�s important to us to be able to not only make copies of static content, but also to make copies of applications. And that desire to make copy of applications actually led us fairly quickly to realize that as with any time that you mirror applications, there are immediately going to be problems with portability. And we have that problem with mirroring that you don�t have with caching because we�re not working on the basis of standards.
HS: Okay, we have a question in from our neighbors up north in Canada, at Laurentian University. We have a question from Professor Richard Danielson and he says, �At what point does it make sense for a small university�� I assume he�s talking about Laurentian. ��small university to abandon the quest for more bandwidth and move toward the CDN? For example, what is the base amount of bandwidth that a small university, about 5,000 students�� Which is not much smaller than Princeton, as it turns out. ��should have before needing to move toward a CDN?��
MB: It�s an interesting question. The real�I would say it doesn�t depend so much on what you bandwidth, your total bandwidth to the Internet is, but what proportion of that bandwidth is being taken up with commodity traffic. If we�re talking about commercial CDN now, rather than something like DSI.
HS: Yeah, that sounds like what Professor Danielson�s talking about.
MB: Right. So the issue is going to be�it could be, even on a very slow link, that if a high proportion�actually, in some sense, the slower the link, the more important the bandwidth savings can be. If a high proportion of your content is popular content that is going to be served content by that CDN, then it�s going to make sense for you. You�re going to save bandwidth. If, however, it just so happens that the sites that are accessed by students in your dorms aren�t clients, aren�t clients of the CDN that you would do business with or maybe they aren�t clients of any CDN, then putting that CDN box in won�t do you any good. Those accesses will still have to go out over the backbone. So it really takes something of an analysis of your traffic and they way it�s usually done is just by accepting the box for a certain period of time and then analyzing what that does to your traffic.
HS: So that implies that you don�t really, as a campus, have control over what content is on a box.
MB: None at all.
HS: The CDN provider does.
MB: Right, and in fact, the usual types of contract that are offered not only say that that box, don�t you have any control over what�s on the box, generally there are real limits on even snooping the traffic going into the box and out of it beyond the statistics that they provide because there is commercially valuable information in that traffic that they don�t want the campuses getting a hold of. So it�s really in a sense a kind of foreign territory placed within your own machine room.
HS: Okay, Professor Danielson has another little question at the end of his first question here and he says, �How much does CDN software cost?��
MB: How much does CDN software cost?
HS: Yeah, I think what he�s saying is what does it cost to have this CDN type thing in your university? You say, �Great, I�ll put this box in my machine room.��
MB: If you are accepting a commercial CDN and you�re accepting their clients in your network, generally the cost is zero. And as I said, you may even be able to arrange to be paid to host their box if you�re willing to serve from your campus using outgoing bandwidth which may be extra, to serve other people from your campus server.
HS: So your only cost is a few square feet of floor space?
MB: Exactly. And ceding a bit of the sovereignty of your machine room to the CDN, that�s the only problem.
HS: But the CDN people don�t come tramping into your machine room or anything and saying, �It�s my box, move away!��
MB: Generally not, but I would have to read the fine print on the machine to see. There are really limits, for instance, on what you are allowed to do with the box. Generally, you have to set it up yourself and they don�t come into your machine room. And I know, I have no��
HS: Take your coffee cup off of there!
MB: I have no idea what the actual�whether physical access has to be guaranteed to them. But even something as simple as deciding to turn the box off can be ruled out by the contract.
HS: We have another question. We have a question from Tom Weber and he�s from Information and Communication Technologies at Penn State and he has two questions. His first question is, �When you referred to �popular sites� or information sources, do you have harder number?� He says, �Like one million hits a month, ten million hits a month? What do you mean by popular sites?��
MB: So most of the sites that Commercial Content Distribution or Delivery networks are serving are sites that are getting millions of hits a day.
JB: Millions of hits a day?
MB: Yeah.
JB: Okay, so CNN might get, what?
MB: I actually don�t know what the number is on CNN but I would imagine that they are at the higher end of what they serve. Yahoo��
HS: Also, you can�we do have a link to a list on Akamai of the sites that they provide off the CREN web page for this talk, so you really can go out and look at the sites that they provide. I think that gives you some insight into at least what they think are the popular sites.
MB: But it has to be, in order to make sense commercially, it has to be some site that is popular enough that it is placing a strain on the people building the server to provision enough resources. Otherwise it doesn�t really make financial sense to go to a CDN. It�s not a cheap service.
HS: Okay, and Tom has another question which might be more challenging to you here. He says, �Also, how do you determine or evaluate the user perspective and �prove��� He has �prove� in quotes. ��that I2 DSI is valuable?� What�s the case for DSI?
MB: So I would not say at this point that we have proven it, but I can tell you how we are evaluating it and how we plan to evaluate it. And that is by the experience of the end user, so there is subjective experience. There is also measurable experience in the duration of downloads of various kinds or latencies measured at the end user site. So in order to�I�ll be honest, the I2 DSI project has been in the tool building phase and does not have a lot of channels that are very widely distributed. We have our six servers right now. Our plan is that when we have a greater distribution and a greater variety of channels that reach a community that we can get our hands on, to do some measurements and also questionnaire type subjective evaluation of the difference in the service provided by the site. I think ultimately it does come down to what the end user sees.
HS: And the people who are using this, do you have a bunch of people trying this right now?
MB: As I say, we have the channels up and in fact, we have a fair amount. We have four channels up and we have a fair amount of traffic to some of them, for instance, the Linux channel is our highest volume one. People who use Linux have a lot of issues with getting hold of new releases and congestion of bandwidth in doing so. We also have, for instance, Open Video does not have nearly the high traffic but each of the downloads is large and so has, again, an impact on the researchers who are using it. So we have a number of them up and like I say, we�re in the process of trying to work with communities. The Digital Library community is one that we�re very eager to partner with and to work more with some electronic publishing outlets to work with their content provider communities. One of the challenges for us, I�m in computer science, I�m not in information science or publishing. And so having developed these mechanisms, now bringing them to the content provider community and getting them used and evaluating their use out in the end user community is something that really requires partnering with people in other disciplines.
JB: We have another question that I think probably is good at this time, Micah, and it�s from Mark Talmadge from Nortel Networks, actually. He�s saying, �Does the campus really need at CDN? And couldn�t the campus just set up a cache or server plus cache software?� I think the question could maybe use some clarification, too.
MB: So I mean, they serve�a general cache serves a different function from a CDN. The CDN is going to be very focused in the service it provides. It only provides services to the content providers that are buying space or using space on that CDN and so you�re likely to get much better speed up of those specific sites from a CDN than from a general cache. In the general cache, you have a much broader mix of use and some of that is not going to use the cache as effectively because it�s not as high demand sites. On the other hand, you know, if you�re not�again, this does not rule out the need for a cache, particularly for people who are going to sites other than those that are paying to be on the CDN.
JB: It sounds as if it might be that campuses would have both a cache and a CDN.
MB: It certainly is possible. The other thing to say is that caches are not all that popular on American campuses because it�s possible to work without them. In a lot of other parts of the world where bandwidth is more scarce or has been traditionally, caches were really a necessary part of the infrastructure and in fact are enforced sometimes by the university or government infrastructure providers that provide the universities. But there are issues of correctness. If you place a cache between a client and the server, sometimes you may get stale content or the cache may not handle that content correctly. People frankly tend not to like them and tend to avoid using them if they can. The Content Delivery Network, because it is part of the server infrastructure and because using it is enforced in various ways by the content provider and the Content Delivery Network, it is something which even in a campus where you can�t get people to use a cache, the Content Delivery Network is not an optional thing if you install it. So in that sense, it�s a different business model and it is�because someone is getting a financial benefit from going through the CDN directly, someone off-campus, they put in the time and effort and in some ways bend the rules sometime of networking to make sure that your traffic has to go through that CDN.
HS: Another thing that we hear a lot about that seems to fit in with all these caches and CDNs and DSIs and all the rest of this stuff are proxy servers. How are proxy servers different than all the things you�ve talked about?
MB: Well, these terms are not necessarily well defined and are not always used in the same ways, but if we take a proxy server to be a cache that, instead of being in the client network, is somewhere else, either in the same network as the server or somewhere in between and is in some sense helping out the server in the process of delivering content, then there�s sort of a continuum between proxy servers and the CDN. But essentially proxy servers that sit in the same machine room as the server and just accelerate the delivery of content are sort of like a mini CDN, all in one machine room. And you could say a CDN is a way of taking that architecture which is all driven by the server�s economic needs and distributing it so that part of it lives in the client network and can act more effectively.
HS: Can people or do people use all these things in combination? We�ve talked about caches, mirrors, proxy servers, CDN��
[Automatic Message: Your conference call will end in ten minutes.]�
HS: Okay, we�ll ignore that again.
JB: We could do this indefinitely!
HS: Right.
JB: I�m sorry. You were in the middle.
HS: Yeah. Someday we�ll have to do a talk on how to get this telephone technology to work better also. I think we were told what we�re doing now is an improvement!
JB: Well, who knows? Anyway, Micah, I think you were in the middle of a very important question here.
HS: Yeah. Do people use all these things in combination, or do you choose one of them? What do we do here?
MB: They are all��
HS: There�s a whole sea of these things.
MB: They are all used in combination and often, if they are conforming to Net protocol standards, it may not even be known, it may not be visible that they are being used. So for instance, if I have a cache on my campus, it can cache the output of a server but what we think of as the output of a server may actually be coming from a proxy server, not from the server machine itself. But that may not be visible outside of the machine room that the proxy server lives in.
HS: On the campus now�and I�ve just heard this�I was just saying, �Gee! Should I do this CDN thing, should I do this DSI thing, should I have proxy servers, caches?� How does one begin to decide what to do?
MB: Well, you have to ask yourself what your problems are, what your issues are. If your problem is that your overall bandwidth requirements are too high and it�s not necessarily bandwidth going to sites that are CDN clients, then you may need to use a client side cache and get your campus to direct its traffic through that cache. If you can�t do that, or maybe even in addition to doing that, you may want to put in a CDN which will particularly accelerate those very high demand sites and further cut down your bandwidth. That can work together with your cache. Those are not�if you, for instance, have a project that uses content that is in a channel that the I2 DSI infrastructure distributes and it isn�t getting good quality of service, good access to that channel because the closest server is too far away, well, you might want to get an I2 DSI node to put on your campus to hold a mirror of that channel. So these are all different issues that are addressed with slightly different technologies or different ways of configuring particularly caching technologies.
JB: Okay, I�d like to just remind everyone that we probably have time for one or two more questions if you get them in very quickly. With that, Micah, did we talk and answer the question on the quantity of bandwidth, you know, to a campus that is sufficient for most needs right now before they move into Content Delivery Networks?
MB: What I was saying is that there is no magic number about how much bandwidth you�re going to have to have. Actually, the thing that�s going to be more likely is if you don�t have enough eyeballs�let�s put it that way, in marketing-speak�if you don�t have enough users that are using a CDN node, then they won�t give it to you for free. That is, if you have few enough users, it�s not going to be worth their while provisioning your network with a free server. And so that is probably�again, it�s their business model and they may have changed with recent economic downturns. The number may be higher than it used to be.
HS: Okay, we�ll let you go.
MB: Exactly! They�re not going to give you a box to put at home so that you can yourself get your sports results faster. But if you have certainly thousands of students in your dorms, it�s the usual thing because that�s where we get the hits to the high demand sites, then you�re likely to be able to use a CDN. If you don�t have students on your campus or on your network, then it�s very unlikely that a CDN, a commercial CDN is going to be any value to you at all unless your faculty are spending all their time surfing commercial sites.
HS: Well, even if they are, this speeds things up, right? Takes the load off.
JB: Takes the load of the rest of the network there.
MB: Right. I�m saying if that�s the case, but faculty members tend to spend a lot of time looking at sites that are not going to be the clients of a CDN. So it�s not going to help for them.
HS: We actually did get a question in here and it�s somebody from a place called Net Edge Technology, which is kind of nice since we�re talking about edges here, and I only see�oh, I see a full name here. Rahma Mehnin says, �I don�t really see the need for a CDN in the campus. Typically CDNs are needed when one needs to distribute content over a wide network like the big Internet and the content experiences heavy usage volume. What do you say to Rahma?
MB: Generally what we find on campus networks is that the link between the campus and what we call the commodity Internet, the commercial Internet, is saturated. And it�s saturated with hits to very popular sites, many of which are clients of the CDN. So let�s just assume that as a given. If that�s not true, then the CDN won�t do any good. But if that is true, then the use of the CDN will insure that those popular hits are cached and that means that the traffic will be localized to the university campus and those hits will not require wide area bandwidth. So that�s a win for the campus networking provider.
HS: Okay.
JB: I think it might be time to see if there�s one question or one message that you�d like to leave with our audience today about Content Delivery Networks. Do you have a final comment?
MB: I guess my final comment would be just that the model that we�ve got for the web of client and server is a very simple one and one that it�s very easy to add more servers and more clients to, but in a lot of ways does not generalize that well. When we scale it to applications that have a lot of either high bandwidth requirements or latency requirements that we want to deliver across a global network, we often get poor performance and we�ve just learned to live with it. And the solutions are to use distributed systems kinds of technology that have been known for decades within the distributed systems world. They complicate the infrastructure and make it�one has to know much more about the details of what the traffic looks like in order to optimize it, but it has the potential of providing much better service to a much broader audience.
JB: Okay, well, and it certainly sounds like the need for Content Delivery Networks is going to grow as sites with much richer and more complicated content is going to grow in the future as well.
MB: That�s a race. There�s always this kind of a race between how complicated is our use of technology. There�s Moore�s law that says that there�s much more bandwidth and processing available, and then there�s just the question of scalability. How practical is it to grow the network in these ways and to predict the outcome is difficult. However, to bet always on the fact that we will have more resources than we can possibly use is sometimes perhaps being more optimistic than we should.
JB: Okay, Howard, how have we done on our questions and did we cover everything today?
HS: Oh, we never cover everything! In fact, it�s rare that we even cover 20% of what we have here. But I think that we�ve taken a very difficult and new topic for lots of people�including myself�and have covered quite a bit of territory here. It�s been very, very interesting for me.
JB: Well, great. It has for me as well, and I really encourage people to go in and take a look at the website and the rich resources that are linked off that site. I know it was quite fun exploring all those resources and finding out about these new technologies. Micah, thank you so much for today, and with that, let me have my closing comments here. Thanks to everyone for being with us here today and be sure to block off your Thursdays this fall, and particularly two weeks from today when our topic will be �Emerging Technologies in Securing the Network.� You will want to download and print, by the way, we�ve just made available a DO NOT DISTURB sign to enable you to focus on participating in the networks. It�s all ready for downloading.
HS: I�m doing it right now.
JB: What is it, Howard?
HS: There�s one outside my door right now.
JB: All right! And tell your friends who missed the session that Tech Talks will be available as well in the new MP3 format for downloading for when jogging and driving. Many thanks to the CREN member institutions and Lokomo Systems for their sponsorship of today�s sessions. Remember, Lokomo for software to support your content distribution. A special thanks to our Tech Talk expert today, Micah Beck, and to technology anchor, Howard Strauss; to Terry Calhoun, Tech Talk web producer; to Jason Russell, Gayle Terkeurst and the support team at Merit; and to Susie Berneis, our audio file transcriber. And finally, a thanks to all of you for being here. You were here because it�s time. Bye, Micah. Bye, Howard.
HS: Bye, Judith. Bye, Micah.
JB: Take care. We�ll see you all.
HS: Bye-bye.
MB: Bye-bye.
END OF WEBCAST