Class WebSpider

java.lang.Object
  |
  +--WebSpider

public class WebSpider
extends java.lang.Object

Web-crawling objects. Instances of this class will crawl a given web site in breadth-first order.


Field Summary
 int crawlLimitDefault
          The maximum number of pages to crawl.
private  WebIndex i
           
private  java.net.URL u
           
 
Constructor Summary
WebSpider(java.net.URL u, WebIndex i)
          Create a new web spider.
 
Method Summary
 WebIndex crawl()
          Crawl the web, up to the default number of web pages.
 WebIndex crawl(int limit)
          Crawl the web, up to a certain number of web pages.
private  java.lang.String StripPalm(java.lang.String s)
          strip out all the '#' and "/" in the url in order to avoid intrapage link
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

crawlLimitDefault

public int crawlLimitDefault
The maximum number of pages to crawl.


u

private java.net.URL u

i

private WebIndex i
Constructor Detail

WebSpider

public WebSpider(java.net.URL u,
                 WebIndex i)
Create a new web spider.

Parameters:
u - The URL of the web site to crawl.
i - The initial web index object to extend.
Method Detail

crawl

public WebIndex crawl(int limit)
Crawl the web, up to a certain number of web pages.

Parameters:
limit - The maximum number of pages to crawl.
Returns:
The web index resulting from this crawl (and any previous ones).

crawl

public WebIndex crawl()
Crawl the web, up to the default number of web pages.

Returns:
The web index resulting from this crawl (and any previous ones).

StripPalm

private java.lang.String StripPalm(java.lang.String s)
strip out all the '#' and "/" in the url in order to avoid intrapage link