Power Browsing on the Web
![]() Judith Boettcher [JB] |
![]() Greg Marks [GM] |
![]() Howard Strauss [HS] |
January 27, 1998
Audio
• Streaming
MP3
• Download
MP3 (Download
Tips)
JB: Welcome to the CREN Virtual Seminar Expert Series, Untangling the Web, for Spring of 1998. Whether you are joining us by phone or by Internet audio, you are here because it is time to discuss one of the leading core technologies in your future, the World Wide Web.
This is Judith Boettcher of CREN, one of your hosts for today's session. Today's co-host is Greg Marks from MERIT. Good afternoon, Greg. I hope the real audio software is working well.
GM: I think it is. Thank you, good to be here again, Judith.
JB: All right, thank you. Our guest expert today is World Wide Web expert Howard Strauss from Princeton University. Howard is the manager of advanced applications at Princeton, and a well-known presenter on Web issues and futures. Howard is also the seminar leader for the CREN Virtual Seminar on untangling the Web. Welcome, Howard. Thanks for being here today.
HS: Thank you, Judith. I'm glad to be here again today. And this may be one of the few chances people out there are going to get to hear something that doesn't have to do with the scandal in Washington today. Although, if you were following the story--which I guess everybody was--you might have heard that the story actually broke earlier on the Web than probably anywhere else.
JB: I hadn't heard that, Howard, and you're breaking a promise about not having this be talking about The Event in Washington.
GM: Whatever it is. Let me also break in and just remind anybody who is listening in that you have an option as to which way you're joining us. You can either do so via the phone conference which is 734-647-2802, and you will be immediately joining us. There won't be someone sort of cueing you to join. You'll be right in the conversation with us. Or you can join us on the Internet -- and an easy way to do that is from the CREN homepage, www.cren.net.
JB: Okay. Greg, do you want to go ahead and tell them also about the e-mail?
GM: Sure. You can send us e-mail at utw@cren.net with, if possible, your name and institution and title, or you can skip that if you like. And that e-mail will come directly to us while we're on the air, and we can respond to your questions or comments. All three of us will be getting it at the same time, so we can just pick up and do that immediately. I should also say that our process during these 45 minutes is going to be question and answer between Howard and Judith and myself, and discussion of some of the hot topics here. But we really encourage your calling in or sending us e-mail so that we can interact with you, too.
JB: Okay, very good, thanks, Greg. Also, just a reminder that this is our second series of these expert events, and we really enjoyed the feedback from the first series. We did hear from folks asking us to definitely keep these expert events going, and also asking about whether or not we might be thinking about offering audio recordings of the events, so you can tell that we've heard from a few people about this, so do keep your notes coming back in and let us know.
At any rate, we're very glad to be back and hope that you will enjoy the session today with Howard. He's got some exciting new things to talk with us about. Before we do launch into our questions, Howard, did you get any feedback from your friends about the series?
HS: Yeah, actually I perhaps got strange feedback (because we have strange people out here), but a lot of folks whom I talked to had missed the first series and in fact were kind of surprised to even hear that it was going on. They were very glad that we're doing this again. So this is almost like me saying, "Hi, mom!" But to the folks who missed the first one and tune into this, I'm glad to have you here and I hope that during the session you feel free to ask questions or send e-mail or participate in the thing.
JB:. Okay, very good. Well, Howard, we're all gathered here today and our focus is getting to be one of our favorite topics, and that is power browsing. Even though it's been only four months since our last event on this topic, it can be a long time in what we now are calling Net Time. What do you think is the most significant Web happening of the last 120 days or so?
HS: Well, I hate to point to the legal side of the issue, but there's been some interesting stuff on the legal side and that is with respect to Microsoft and IE 4.0, where Microsoft was asked to make it possible for vendors of microcomputers to unbundle IE 4.0. At first, they did some cute things -- they played some games with the government, made the people in the courts a little bit unhappy, but ultimately they agreed that they would take IE 4.0 off the screen for vendors who wanted it off. Actually, I think that most people who sell microcomputers are going to keep it on anyway. I mean, they're going to be allowed to take the thing off, but why would anybody do that?
JB:. Okay.
GM: Is it the case that you think -- from what you've seen -- that most vendors are indeed going to leave it there and not touch it, just because they don't want to offend Microsoft anyway?
HS: No, I think they're going to leave it there because -- that's an interesting point. You're right, they probably will leave it there because they don't want to offend Microsoft. But whether they wanted to offend Microsoft or not, I think they'd leave it there because it's a good browser. And there's nothing that prevents them from putting it on there and putting the Netscape browser on there. You could certainly put both on the machine and then let people who are buying the machine decide which one they want to get rid of.
An interesting point about the getting rid of the thing, which was what really tied the courts up a bit, was that Microsoft claimed that getting rid of the thing would make the operating system inoperable, whereas there was some little demonstration in front of the court where somebody just went in and did the Add/Remove process on Windows 95 and the thing seemed to disappear. Microsoft claimed it hadn't disappeared at all, just the icon and a few files had disappeared. And actually Microsoft was accurate in that case.
JB: Well, Howard, actually when the court did ask them to remove the browser, was it in fact indeed possible to do that? Can you say a little bit more about how it's "integrated" into the system?
HS: Yeah, Microsoft was actually telling the truth, because it's a difficult thing. Just like I don't understand too much about what lawyers do, it became, I think, a difficult thing for the courts and the lawyers to understand what was happening inside here.
But in a nutshell, what's happened is that Microsoft has taken the browser, and just like they've taken many of their other products and they've cut the products into little modules so that the modules could be shared. An example of that is that you can put a URL right in the middle of a Word document or in the middle of an Excel document or a PowerPoint presentation or just about in any Microsoft product, you can put a URL in the thing. And you need some bit of code, some routine, some module that can take that URL and interpret it. And in the operating system in Windows 95, there's just one such module and everybody uses it. And of course, IE 4.0 uses it, too. The browser uses it. So if you were to take that module out, not only would the browser not run, but none of the other facilities in Windows 95 or Windows NT would be able to interpret URL's. And there's a whole bunch of modules like that. There's a dozen or so modules like that. The part of IE 4.0 that goes beyond the modules that are used by the rest of the operating system is relatively small -- maybe 10 or 12% of IE 4.0. So, Microsoft is actually correct: if they took all those things out, the operating system would be unable to do a whole bunch of things.
One of the things I think that fools people is that you can buy IE 4.0 -- it's in a little box, shrink-wrapped, with little disks in it and things like that. You put the disks in and it's complete, because it actually includes a copy of all the modules it needs, including the modules that are already in the operating system. If you have them installed, it either just doesn't install them or it installs a later version of them.
JB: So Howard, what is your best guess at this point as to what's going to be the next step in this whole -- dare I call it a game?
HS: Well, I think it's called a game. I mean, Microsoft has now agreed because they didn't like the idea of paying a million dollars a day in fines. They've agreed to allow vendors to take the icon off the screen and remove the few files that part of IE 4.0 that are not integral to the operating system. But again, I would be really surprised. If you could buy a computer and everything else was identical, but one of them had IE 4.0 already installed on it, I think that's a nice thing. You'd probably want the thing there.
GM: Isn't it the case, though, that in addition to this question of whether Microsoft has unfairly integrated the browser, that in fact the work that they've done on Internet Explorer 4.0 does raise it to being a tool that really is of serious interest to people who want to browse the Internet? It's not just a second-rate browser that they're forcing on you.
HS: It's not only to browse the Internet. It's a fine product, but Netscape's browser is also a fine product. But if you install IE 4.0 on Windows 95, what you'll discover is that the whole file system behaves differently. That is, when you look up files on your local disks, all of a sudden features inside the browser are changing the way you look at your files. All of a sudden, the file display actually is created using HTML. When you look at things on your C drive, the display is done with HTML, and the HTML can be modified.
So it's just woven, the whole thing is woven all through the operating system, and it has many, many features beyond browsing the Web that I think people would really like. In Windows 98, they're not going to have a choice. They're going to see all these features. You'll still be able to remove the icon off the screen, if you don't like the way the icon looks or something. If you absolutely do not want to use IE 4.0 for some religious conviction or other, you'll be able to get rid of it. But a lot of the features and facilities of the thing you will use every time you look at a file on your C drive -- or your D drive or wherever -- or around in your network.
JB: Well, Howard, do you think, then, we're going to be getting to the point where I'm going to be using Internet Explorer for a lot of things, then I'll also be keeping Netscape on my system and using that for other things as well?
HS: I guess I don't think so. I think what's going to happen is you're not going to think about using a browser very much. The browser's going to become a little bit like TCP/IP. In the past (and not the too far distant past), TCP/IP stacks were something that you bought extra -- you bought from companies and you added it to your operating system. And now you don't even think about it. You just say it's part of the operating system.
So I think you're going to say, "Gee, when I go into Excel and in one of the cells there, there's an underlined thing there, well, of course I expect that I can click on it and that cell will be replaced by something that is out in Berkeley," or something like that. You're not going to think about using a browser explicitly all the time. You're going to think about the fact that everything has hyperlinks, and some of the hyperlinks happen to be to local files, but some of the hyperlinks happen to be out all over the world. So I think this whole browser thing is going to kind of fade and people won't be talking too much about browsers. GM: Let's talk a little bit about searching and what you can do with the search engines and so on. Can we turn to that topic?
HS: Sure. You want to start in any particular place?
GM: Let's take the questions off the -- maybe even a quick overview of the quality of the search engines that we've got today. And how much stuff is out there that they've got to be searching for. I mean, it's a terrible problem just to keep up with the growth of the Internet.
HS: Yeah, it certainly is, and in fact, I'm almost in awe of the fact that anybody can go out and index the entire Internet, which when I last saw data on this (which was just a few days ago), there was 110,000,000 URL's that the good search engines were indexing. So things like Alta Vista and that bunch are going out and indexing 110,000,000 of these things.
And what they do, the way the process works, is there are things called robots or crawlers or spiders -- whatever you want to call these things (they've got lots of names). And they wander around at night, building a huge index of everything they can get their hands on -- which right now is 110,000,000 sites.
By the way, there's about 120 sites an hour being added to the World Wide Web, so during this chat we're having over here, another 100 sites are out there. So I know, Greg, you probably looked at all the sites -- but when we're done here, you're going to have 100 more!
JB: Between 110,000,000 and 120, right? Sites out there?
HS: There's going to be another 120 sites out there. But when you're searching, what you're doing is you're searching a big database.
So when you're choosing a search engine, you're really choosing two things: One is you're trying to decide what's a good search engine in terms of its ability to search, but you'd also like a search engine that has a huge database.
And there's actually more search engines than there are databases, because some of the folks who build these big index databases actually sell them -- rent them. They let other people use them.
GM: That makes a lot of sense. I've sometimes wondered to myself, don't we end up with an index that's bigger than the Web itself?
HS: Well, I think we don't because English is fortunately very redundant.
By the way, these things do index every word. I mean, they literally take every document and every word in every document and they index them. So somewhere in this index is the word "green" and next to the word "green" is every site, every Web page, every URL that has the word "green" in it. And that's done for every word. And not only that, they have all kinds of other information about, say, the word "green." Like what words it's near and how far away from other words it is and what words are immediately adjacent to it and on and on. It's really an incredible thing. It's terabytes of data. GM: What about searching for things other than text? Images and so forth. What's happening there?
HS: Well, nobody today is searching for images in a sense that we can sit there and look at the bit patterns that make up an image, but you raise an interesting question anyway.
And that is, if I did want to look for an image, the way images are usually defined on the Web is with an image tag. In fact, the way a lot of things are defined on the Web is with a tag. And it is possible to search for some subset of the tags. The image tag is an example of a tag that's searchable.
But we've sort of gotten off on a little corner here. If you want to search on one of these things, it really depends on what search engine you're using. Different search engines have different syntax for searching. So we can't say, "Here's how you search for all the pictures of ostriches." I can only say, "Given this particular search engine, here's how you would do that."
JB: Is there a search engine, Howard, that actually is better at finding images over the letters?
HS: That's almost an impossible question to answer without looking at the search engines every day, and they keep changing. In fact, what I would suggest to listeners is to find a search engine that they particularly like and learn everything about it. And until somebody convinces them to use another one for some wonderful reason, to keep doing that. I think that if you really learned how to use any one of the search engines, you'll discover you can find pretty much anything you want to find.
I use AltaVista because early on, it had the biggest database, and I thought that was a real positive thing. And so when I talk about how to do things, I will talk about how to do them with AltaVista, but there are many other search engines that are fine search engines that can probably do the same things. JB: Howard, let's perhaps look at searching from different perspectives. One perspective is when I, as a user, am out there trying to find something. Another perspective is if I have a website where I have some content of value, and I would like people to find that site: what can I do to make it easier for people to find that site and perhaps to get good information about what's on that site?
HS: There's a couple things that we can mention about that. First is that if there's a link from a site that's already indexed to your site, then you don't have to do anything special to at least get it out there. All of the spiders, crawlers, robots will find it if there's any link that points to it.
But it might be that nothing points to it, and in that case, nothing will ever find it -- none of these things will ever index it unless you specifically go out to one of the search engines and tell it that you're there.
And that's very easy to do. If you just go to one of the search engines, you'll find some item, some hypertext link somewhere on the thing. In the case of AltaVista, for example, right under the place where you would type in the search, you'll see there's an Add/Remove URL. And if you go off to that and just tell it your URL, it'll go off and it will add your URL to the list of things it's going to index. And in fact, if you want to get one of these things to index it right away rather than waiting the almost two weeks it sometimes takes for the spider to discover it, you can just go out and say, "Add this right away," and it will add it that night.
But let's assume that you have a thing [website or page] out there. If you have a thing out there, you've probably noticed that when you get a hit on the search that displays the first two or three lines of whatever the text is in your document, that sometimes that's not the two or three lines you'd like displayed.
You've probably noticed other things where there's a nice description of the thing. Those descriptions are actually written when you build the website. And so using a metatag, a description metatag, you can write a description. Actually, you can write a description up to one K, so you can have up to 1,024 characters describing what this document is. And then if that document is ever discovered by searching, then the description that you wrote will appear, which is kind of nice.
JB: The full description will in fact be displayed?
HS: Everything up to 1,024 characters that you put after the word "content" in the description metatag will appear -- upper and lower case letters, punctuation, whatever. Whatever text you put out there.
In addition, if you're going to do that, another thing you ought to do is you ought to put a keyword metatag in your document. (And by the way, both these things are described both in the virtual seminar and at the websites for almost any of the searchers.) But if you put a keyword metatag in there, then you can put keywords that might not appear in your document but that nonetheless you would like indexed out on the big Web index.
So for example, if you were talking about the Vietnamese war and you were talking about POW's, but you never said the words "prisoner of war," you could go into the keywords and say "prisoner of war," or "POW" or "MIA" or whatever and put these other terms there. And then if somebody searched for one of those terms, even if that word did not appear in your document, then your document will be on the list of hits. GM: Now, you described the spider or crawler or robots as taking up to perhaps two weeks to find things. Do you have an understanding as to whether major news and information sites really in effect push information towards the search engines daily, so that when you do a search, you really get pretty up to date information -- or is that very chancy at this point?
HS: Well, it's chancy. I know that a lot of them actually try to do that, and if you come out and you say, "I have a new URL, make sure it's indexed well," then the spider will try to do it that night. So I think if you are looking for some news or something like that and you don't find it by searching, it might be that it just didn't get out there. And so you really have to take some other path. JB: Howard, right now for our listeners, I think we ought to move back to the area of what if I'm doing my searching and how I can perhaps speed up my uses of the Web. What about -- for example, someone has told me that there are things like short URLs or smart URLs, or whatever we want to call them. It's just that I don't have to -- can I get by and not have to type in http?
HS: Yeah, well, actually that's usually the first step toward searching more efficiently or finding things on the Web more efficiently is to realize that you don't have to type in the http://. It's assumed if you leave it off, so that will save you typing it.
By the way, you might say, if I never have to type it in, why is it there at all? It's there because it could be other things, other than http (which is "hypertext transport protocol"), which is normally what you get on the Web. But it could be something like ftp or it could be Gopher, if anybody remembers Gopher. Or it could be files. There's a whole bunch of other things that could be there. So if you want one of those other things, other than a regular webpage, you actually have to type it. But if you leave it out, it's assumed that it's a regular Web document or http://.
Another thing you can leave out is that if you want a company, most companies have websites that are www.the-company-name.com, and if you just type in a word like ABC, then what will happen. Actually, it's a two-step process. First it will try to see if there is a machine called ABC in your domain. So here at Princeton, for example, our domain is Princeton.edu. If some student had a machine called abc.princeton.edu and I typed in abc, it would find that machine. And that's kind of nice. It means that in your own domain, all you have to do is type in the name of a machine. You don't have to type in your university name.edu. Kind of handy.
If there is no such machine, then what it does is it prefixes "www" in front of the thing and suffixes "com," so it would turn "abc" into "www.abc.com" and get you the webpage of the ABC network. What that means, for virtually any company that you can imagine -- for Home Depot or Disney or Sony or whatever -- just type in Sony or type in IBM and you'll get the website, assuming that you don't have one of those things right in your own domain.
We have some students who have actually -- I'm not going to give out any names because the sites will be flooded by people -- but we have some students who have named their machines the names of famous companies so that if you just type in the name of a company, every now and again I get some student's webpage. JB: Is there anything else -- what if I'm going to be looking for some German sites or something like that, and I just want to find something in that area. Is there an easy way of doing that?
HS: Yeah, there's a couple approaches to that. You're asking one of two questions, so I'll answer both of them. One of the questions you might be asking is, "I want sites that are in the German language," and in that case, at least in AltaVista, there's a little pull-down menu that says you want documents in some language. You could restrict the language, for example, to English. Let's say you're hopeless at understanding any other language except English, which would include people like me. You might say, "Just give me documents in English." But you also could just say, "Just get me documents that are in German or in French or Spanish."
Another thing you could do is, you might say, "Gee, I just want to get webpages from Germany." Now they might be in English, they might be in German, they might be in whatever language people in Germany are writing documents in. And in that case, you need to know the domain name of the country, which actually earlier today we discovered was DE, I believe, for Germany. So if you searched on domain: and some country code -- like de or uk, for the United Kingdom, or ca for Canada. That would restrict your search just to documents from that country.
So you could do both of those things -- either pick a particular language, or pick a particular country. And in fact, you don't have to pick a particular country with domain:. You could pick something like edu, that is just educational places, or just commercial places or just government places. And there are dozens of ways to restrict searches like that, which means you're going to get fewer hits, but the hits are going to be better.
GM: And all of this reinforces what you said a few minutes ago -- that a really effective thing to do to become a power browser is to really get to know what your specific search engine will do. Because there's really a lot of capabilities buried away in the instructions that, if you understand them, give you a tremendous amount of ability.
HS: Yeah, that is certainly true. The more you understand the search engine you're using, yeah. I think what you're hoping to do is you're hoping to -- I'd like to do a search and get four hits, but I'd like them to be just the four that I want. When I do a search and get -- what is more typical -- 34,000 hits, I don't know what to do with them. There's too many of them. What can you do with those?
JB: I have recently discovered, and I forget which search engine I was using, Howard, that when you get 34,000 or 50,000 hits, some of the search engines do a very nice job of then searching within those 50,000. So you can gradually narrow your search a little bit. And certainly for novice users, that's one approach to perhaps getting more familiar with a browser and familiar with a search engine and finding what you want.
HS: Yeah. I think, though, unfortunately sometimes. I think you're correct, all of the search engines allow you to do that -- once you get a subset of these things, they let you search through them.
But it's certainly more effective to get a small subset right up front by learning some tricks and techniques. And again, let's say you got a huge number and you wanted to subset the thing, if you didn't know how to restrict the search, say, to just sites in Spain, then just about anything you did would not give you that effect, whereas if you knew how to do that, you'd start out by just looking at sites in Spain or wherever. JB: Howard, what about the fact that often when I'm confronted with that nice blank space to start my search, I'm struggling with which keywords I should put in, and all the rest of that. Is there anything happening to make it even easier, like even using what they call "natural language" searches?
HS: Yeah, I think if you're not sure exactly of what phrase or few things to type, natural language searches work okay. At least they're a good first cut at the thing. So you might say, "Tell me about bears in national parks." or something like that -- just a regular phrase with a period at the end, a regular sentence like that.
What will happen, or what you want to do if you do a natural language search and you don't find what you want in the first few (the first three or four), is to think about your natural language search. What you want to do is make your natural language search more specific, and to do this you ought to use words that are rarer.
One of the interesting things about when they decide what's going to float to the top in the list of hits, one of the things that they actually consider is how rare the words are. So they simply index every word in all 110,000,000 documents on the Web; they know, for example, that the word "the" occurs pretty commonly. So if the word "the" is in one of your search requests, you've been getting hits on the thing, they decide that that's not very important and that will not move something up toward the top. But a word like "Alaska," well, that's rarer than the word "the." So if you had the words "the" and "Alaska" in your search, Alaska would be considered more important.
So when you're doing a natural language search, if we use the example I just gave that said something like "tell me about bears in national parks," if we said something like "tell me about grizzly bears in Glacier National Park," the word "grizzly" is relatively rare. "Glacier," especially with a capital G, is relatively rare. So for one, we're going to get fewer hits, and for another, the hits that we get with the word Glacier and with the word grizzly in it are going to float up toward the top. So natural language searches are a good first cut, but you ought to try to make them as specific as you can, and you will include what you know to be the rare words that occur naturally in your search. Just make sure you get them in there.
JB: Well, I didn't know that we could actually use these kinds of natural language searches. I'm excited to actually go back and try one of those search engines and see how it works, Howard.
HS: Well, I think they're a good first approximation. And again, if you get a thousand hits, don't go looking down past the first half dozen or so, because the ones at the top -- since they are ordered by what the search engine believes are the best hits -- the ones at the top really are going to be the best. So if the first half a dozen aren't good, boy, those bottom thousand are not going to be good either. They're going to be worse. GM: I want to go back to the foreign language search example. I know, because we were trying it a little bit earlier, that AltaVista has some translation capabilities built into it, and I'd appreciate your commenting on that. Is that the only one that's well-known right now that has that?
HS: I don't know if that's the only one that can do that. One of the interesting things is that four months ago, when we did this, none of them could do this. And now I know that AltaVista can, and I really don't know if any of the others can do this. I know that AltaVista did not write this. This is some group of routines to do language translation that they got from another company, whose name escapes me right now. But it's kind of an interesting thing.
One of the things that used to happen to me, but doesn't happen as much anymore, is that I would stumble across a website that happened to be all in Spanish or German or French or something like that. Of course, I couldn't read it, so I'd scrounge around the university to find somebody who could understand Spanish. But I kind of needed them sitting behind my machine all day as I went and searched from site to site.
But what you can do now is if you find a webpage, AltaVista -- for many, many webpages -- offers translation. Just find the page and there will be a little underlined tag that says "Translate," and it offers you many choices. It lets you translate -- I think today it's Spanish, French, Portuguese, German. Did I miss something? Oh, and Italian. So there's translations from a number of languages into English. GM: We have a message coming in from Rob Von Baren as to setting up search engines to index your own website. Is that something that's practical and makes sense for someone as sort of a next step from their browsing activities?
HS: Yeah, actually, there's really two levels that you could do that on, and they're both kind of interesting functions.
One is if you were a university -- like here at Princeton, for example. We have a search engine that indexes all -- I think we have about 100,000 pages out here -- that indexes all 100,000 pages that are related to Princeton, so that when folks want to search locally here, we don't have to search 110,000,000 Web pages. Instead we just search 100,000. The index is smaller, and since we put the thing up, we have a lot more control over it. And it never takes us two weeks to index the thing. We re-index every night because it's controllable. So if you're a university or a company or something like that, and you want people to get much better access to your site, that's a great thing to do.
But there's also the possibility of having a search engine that just indexes your own machine -- that you use for your own personal use. And you might say, "I have six webpages. Why would I ever need one of these personal things?" Well, I have a few hundred webpages, but even there, I sort of know what they are. But these things will also let you index your file system, so they will let you index your C disk and your D disk, and if you ever had to look for a document where you said, "Gee, in the middle of the document, I was talking about that trip to Walden Pond, and the document was named nothing like that," how would you ever find that? With one of these little search engines, all of a sudden all your local files look just like webpages -- are indexed the same way, with the same rules. You can do all the same kind of things. So that's kind of a neat thing.
And since we've been talking about AltaVista, AltaVista makes both a little personal searcher like that, and they also make a searcher for a university-wide kind of thing. And the little personal ones, I don't know the price exactly, but they're relatively inexpensive. GM: Is a variation on this theme likely to be of some value for people who are concerned about sort of the reviewing of information that's on the Web -- particularly in, for example, K-12, where there's really an issue of kids getting at information, and do they know if it's good information or bad information? Well, it's a set of librarians, for example, as the American Library Association actually has done, points to a large number of sites. Could they go that one step farther and set up a search engine that only searches those sites -- so you use a search engine to find the stuff you want and all of it is from reputable sites?
HS: That's an interesting idea, and I think that's something that folks have talked about a great deal -- to have some kind of authority list or something like that that would say which sites. I mean, if we're going out and looking at brain surgery, which ones were done by doctors and which ones were done by 12-year-olds?
GM: And which ones would you trust?
HS: Right. Unfortunately, there's a lot of things on the Web, and you cannot tell which ones on the Web are done by 12-year-olds and which ones are done by doctors.
GM: Right.
JB: And who knows? Some of the ones done by 12-year-olds are really pretty good, right?
HS: I don't know.
JB: Perhaps not on brain surgery. Let me go back and correct that.
HS: Right. Brain surgery may be -- even very precocious 12-year-olds, you want to give them a little more time to mature.
JB: That's right.
HS: There's another thing out there, sort of like the rating system that's on movies and on TV now, I guess. Now and again, I see one of these things on TV. I'm actually surprised to see it. But there's something called PICS, which is the "Platform for Internet Content Selection." This is done by a group called the Recreational Software Advisory Council. I don't know, there are two different groups. And what they're trying to do -- in fact, what they've done, is they've gone off and they've come up with a rating system for websites and are trying to get folks on websites to actually rate their sites as to violence, language, nudity, that kind of thing, out there. And then you could build Web browsers where you could set the thing to say, "If this thing has a rating above a certain level or below a certain level or whatever, restrict things that have a lot of violence." Then so you could put those things up for home. The standards have been set, and IE 4.0 actually looks for those PICS ratings, but less than 1% of the sites are rated. Nobody seems to have any real interest in rating their sites.
JB: In terms of how those ratings would work, if everything searches by virtue of the indexes, Howard, is the rating linked to the indexed words, then?
HS: Yeah, I would assume you'd have to do that. Well, no, you'd probably discover them. I mean, if the search engines gave you back the sites, even if it was a site you weren't supposed to see, that as soon as you clicked on it, you wouldn't be able to see it. You'd know the name, so you'd get back from your search engine something that said, "Bad stuff! You shouldn't see!" or something. You're getting a site back that you aren't supposed to see, but when you try to go to it, you wouldn't be able to get to it. GM: Can we switch to a different topic here for a few minutes -- we're getting close to the end of our time -- and talk a little bit about cookies as ways that basically sites out there store information about you, and some of the ways that cookies get used that are to your benefit?
HS: Yeah, what a cookie is -- I mean, we all know what cookies are.
JB: I really like molasses, (inaudible.)
HS: I know. I have my favorites too! It's perhaps an unfortunate term. But what a cookie is in respect to the Web, it's a little bit of information that is stored on your machine, and it is stored on your machine related to a URL or to a group of URLs. It might be a group of URLs instead of just one URL. And what happens is once that little bit of information is stored, and it's stored there as a result of you looking at some website, then whenever that URL or that group of URLs is sent, the cookie gets sent along with it.
Now, to take a very, very simple example of what a cookie might do: if you were a university and you really wanted a different website for alums and for grad students and for undergrads and faculty and staff and so forth, you might go off and say, "Well, I'll build six different websites, and then I'll have six different Web addresses. And every time I want to go out and advertise where my website is, I'll give out all six addresses and say, 'If you're an alum, use www.whatever,' and if you're an undergrad, do this, etc. and so forth." That'd be a nuisance.
With cookies, what you can do is you can send everybody the same address the first time, and the first time they went to that Web address, there would be a little box that they would check that would say, Are you an alum, undergrad, or whatever? And when they checked that box, what you would do is you would store a cookie saying what kind of person they were, alum, undergrad, grad student. You'd store that on that machine. The next time they went to that URL, in addition to the URL being sent to the server, the cookie would be sent along with it saying, "Gee! I'm an alum!" And then you could deliver to that alum the alumni page. Or if they said they were an undergrad, you could deliver the undergrad homepage. That kind of thing. So that's kind of a nice thing to be able to do.
GM: Seems like there's a lot of options here that, within a university environment, for example, you can do to customize the environment to serve your campus. We were talking a few minutes ago about how you might customize the search environment. Now we're talking about things that can be used to customize information and what's delivered in the browsing process.
HS: Yeah. If folks want to see a very nice example of this, without sounding too much like a commercial for Microsoft (who needs no commercials, really), if you go to IE 4.0 and you go to a place called home.microsoft.com, the first time you go there, it's kind of a strange looking homepage with lots of buttons and weird things on it. What it lets you do, it lets you customize the homepage, and it doesn't store one cookie, it stores a dozen cookies or more. It stores lots and lots of cookies so that in the future, when you go back to the same URL, for example, you'll have little pictures of clouds and sun and whatever, telling you what the next five day forecast for your area, not for where I live, is going to be. And if you have favorite stocks, it will give you the current price of those things, and if you have favorite sports sites, it will give you headlines from those favorite sports sites, which will change every time you look at this thing. And if you don't like sports, well, then it won't give you those. It will give you business sites if you prefer those. You can actually go out and see what it might look like -- in fact, what it does look like if you pick up a bunch of cookies and customize a page that belongs to you.
Just one word of caution about this whole thing, and that is that the cookies live on a [specific] machine. They don't follow you around. So when you go home (inaudible)
GM: Cookies don't follow you?
HS: No, they don't follow you around. So if you have this wonderful homepage that you've built at home.microsoft.com that has all the wonderful things you want on it, if you then take your laptop machine, go home, and you dial into the same URL, you're going to discover that it looks entirely different unless you set it up exactly the same way. It also means that if you have a machine in a laboratory or a shared machine of any kind, the cookies are not going to do the kind of things that you want them to do because they're specific to the machine, not to a person.
GM: Sounds like for the university environment something that a browser manufacturer might well think about as a feature to add so that in some way those indeed can follow students.
HS: Well, I just assume that these shared machines are going to disappear, that everyone's going to have their own laptop and they're going to carry it around. GM: We have time, I hope, for just one question which has come in via e-mail from Sherry Castro, asking about basically the search engines that look for people. And she's saying that she does searches such as on Yahoo's people search and finds a number of people, but there are others that she knows that have got e-mail addresses and they don't turn up. Are there other search engines she ought to try, or other ways she can approach finding e-mail addresses for these missing ones?
HS: I wish I could give her a solution to this problem. It's a problem that exists everywhere. Most of the search engines do a people search. AltaVista does a people search, too. In fact, to me, one of the most disappointing things is the first time I go to one of these people search things, I search for myself. And I'm missing in most of them. It's something she might try, too. She'll discover she doesn't exist. Right now, there doesn't seem to be a really good way of these things finding out the e-mail addresses of everybody -- not to mention the fact that I have four e-mail addresses, for example, which creates a problem.
JB: But none of them are findable?
HS: When I go off to most of these things, I type in my full first name, my full last name, the city, all that kind of stuff, and most of these say I'm not there. So my assumption is they can't find me. I'm not going to trust it to find people. Sometimes it does find people, but whereas with locating the websites, they take a bunch of websites and they follow all the links, E-mail addresses don't link to each other. So they've got to use other tools to try to discover these things, and it's a more difficult task.
JB: Well, it sounds like that's something that we can look forward to for development in the future, Howard.
HS: I hope so. That would be nice to have that done really effectively.
JB: Okay. Good.
HS: That will give us another chance to do more futures.
JB: That's right. That has brought us, I think, to the end of our session for today, so I would like to thank everyone for being here today. And if you do have any follow-up questions, the e-mail of utw@cren.net is a good one to use, and we'll be using that same e-mail address for the entire series this spring. I'd also like to remind you all that the next session with Howard will be February 10, and the topic at that time will be the future of HTML. I hope that HTML will be around for awhile since, I just purchased another book on this!
HS: Right, it'll be good for three months.
JB: For three months, right. All right. I'd also like to alert everyone to watch for a similar message about access instructions with the new phone number and URL just prior to the next Expert Event, but you can also get this information by going to the CREN Website and looking for the information and the schedule there.
Howard and all of the Event participants, thanks for being here, and I'd also like to thank everyone who made this possible today. That includes the board of CREN, Corporation for Research in Educational Networking; our guest expert, Howard Strauss from Princeton; Greg Marks from MERIT in Michigan; Brian Vaughn for his work at UM Online for the audio services; and all of you with us on the phone and on the Web. You were here because it's time. 'Bye, Howard.
HS: 'Bye, Judith.
JB: And 'bye, Greg.
GM: Good afternoon.
JB: See you next time.
GM: Yes, indeed.