Class HttpTokenizer

java.lang.Object
  |
  +--HttpTokenizer

public class HttpTokenizer
extends java.lang.Object

A simple tokenizer for web pages. Object of this class provide a low-level, stream-based parsing of web pages. Given a Reader object that reads a web page, an HttpTokenizer will provide a stream of tokens, in exactly the same style as the StreamTokenizer class in java.io. The possible tokens are as follows: HT_EOF: The end of file HT_NUMBER: A number, converted to a double HT_WORD: A word, converted to all lowercase HT_STRING: A quoted string HT_TAGOPEN: A "<" character HT_TAGCLOSE: A ">" character HT_EQUALS: A "=" character HT_SLASH: A "/" character HT_DASH: A "-" character HT_BANG: A "!" character HT_A: The keyword "a" HT_HREF: The keyword "href" HT_IMG: The keyword "img" When an HT_NUMBER is returned by the next() method, the instance variable nval contains the double representation of the number. When an HT_WORD or STRING is returned by the next() method, the instance variable sval contains its string representation.


Field Summary
static int HT_A
          A constant indicating an "a" has been read.
static int HT_BANG
          A constant indicating a '!'
static int HT_DASH
          A constant indicating a '-' has been read.
static int HT_EOF
          A constant indicating the end of the web document has been reached.
static int HT_EQUALS
          A constant indicating a '=' has been read.
static int HT_HREF
          A constant indicating an "href" has been read.
static int HT_IMG
          A constant indicating an "img" has been read.
static int HT_NUMBER
          A constant indicating a number token has been read.
static int HT_SLASH
          A constant indicating a '/' has been read.
static int HT_STRING
          A constant indicating a string token has been read.
static int HT_TAGCLOSE
          A constant indicating a '>' has been read.
static int HT_TAGOPEN
          A constant indicating a '<' has been read.
static int HT_WORD
          A constant indicating a word token has been read.
 double nval
          If the current token is a number, this field contains the value of that number.
 java.lang.String sval
          If the current token is a word or string, this field gives the string.
private  java.io.StreamTokenizer tokens
           
private  java.util.StringTokenizer word
           
 
Constructor Summary
HttpTokenizer(java.io.Reader page)
          Create an HTTP tokenizer, given a Reader for the web page.
 
Method Summary
 int nextToken()
          Parses the next token from the web page.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

HT_EOF

public static final int HT_EOF
A constant indicating the end of the web document has been reached.

See Also:
Constant Field Values

HT_NUMBER

public static final int HT_NUMBER
A constant indicating a number token has been read.

See Also:
Constant Field Values

HT_WORD

public static final int HT_WORD
A constant indicating a word token has been read.

See Also:
Constant Field Values

HT_STRING

public static final int HT_STRING
A constant indicating a string token has been read.

See Also:
Constant Field Values

HT_TAGOPEN

public static final int HT_TAGOPEN
A constant indicating a '<' has been read.

See Also:
Constant Field Values

HT_TAGCLOSE

public static final int HT_TAGCLOSE
A constant indicating a '>' has been read.

See Also:
Constant Field Values

HT_EQUALS

public static final int HT_EQUALS
A constant indicating a '=' has been read.

See Also:
Constant Field Values

HT_SLASH

public static final int HT_SLASH
A constant indicating a '/' has been read.

See Also:
Constant Field Values

HT_DASH

public static final int HT_DASH
A constant indicating a '-' has been read.

See Also:
Constant Field Values

HT_BANG

public static final int HT_BANG
A constant indicating a '!' has been read.

See Also:
Constant Field Values

HT_A

public static final int HT_A
A constant indicating an "a" has been read.

See Also:
Constant Field Values

HT_HREF

public static final int HT_HREF
A constant indicating an "href" has been read.

See Also:
Constant Field Values

HT_IMG

public static final int HT_IMG
A constant indicating an "img" has been read.

See Also:
Constant Field Values

sval

public java.lang.String sval
If the current token is a word or string, this field gives the string.


nval

public double nval
If the current token is a number, this field contains the value of that number.


tokens

private java.io.StreamTokenizer tokens

word

private java.util.StringTokenizer word
Constructor Detail

HttpTokenizer

public HttpTokenizer(java.io.Reader page)
              throws java.io.IOException
Create an HTTP tokenizer, given a Reader for the web page.

Method Detail

nextToken

public int nextToken()
              throws java.io.IOException
Parses the next token from the web page.

Returns:
The code of the next token.
java.io.IOException