Class PageLexer

java.lang.Object
  |
  +--PageLexer
All Implemented Interfaces:
java.util.Iterator

public class PageLexer
extends java.lang.Object
implements java.util.Iterator

A lexical analyzer for web documents, based on a finite-state machine. This code is incomplete. For Part 1 of HW5, you are to finish writing the code for this class. This class implements a lexical analyzer for web documents. Instances of this class are Iterators that produce PageElement objects (each of which is a keyword, number, or hyperlink). Bad hyperlinks are discarded.


Nested Class Summary
private  class PageLexer.Action
          a private class that do the action
 
Field Summary
private  PageLexer.Action action
          The action table. action.doit(state) performs the action for the given state.
private  int[][] delta
          The state-transition table.
private  java.util.Vector elts
           
private  HttpTokenizer tokenStream
           
private  java.net.URL url
           
 
Constructor Summary
PageLexer(java.io.Reader page, java.net.URL u)
          Creates a new web page lexer.
 
Method Summary
 boolean hasNext()
          Determine whether there are more PageElements in the page.
 java.lang.Object next()
          Return the next PageElement in the page.
 void remove()
          Unimplemented
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

url

private java.net.URL url

elts

private java.util.Vector elts

tokenStream

private HttpTokenizer tokenStream

delta

private int[][] delta
The state-transition table. A transition to -1 means halt. delta[state][token] yields the next state of the finite-state machine. Note that this is a very simple FSM. It is possible that better web indexing could be done by modifying this FSM. However, making such improvements is optional extra credit for this assignment.


action

private PageLexer.Action action
The action table. action.doit(state) performs the action for the given state.

Constructor Detail

PageLexer

public PageLexer(java.io.Reader page,
                 java.net.URL u)
          throws java.io.IOException
Creates a new web page lexer. Note that this constructor method contains missing code. For Part 1 of HW5, please fill in the missing code and then test by using the WebReader class. (See the writeup for more details.)

Parameters:
page - A reader for the web page
u - The URL of this page
Method Detail

hasNext

public boolean hasNext()
Determine whether there are more PageElements in the page.

Specified by:
hasNext in interface java.util.Iterator
Returns:
true of there are more PageElements, else false

next

public java.lang.Object next()
Return the next PageElement in the page.

Specified by:
next in interface java.util.Iterator
Returns:
the next PageElement, or null if there are none.

remove

public void remove()
Unimplemented

Specified by:
remove in interface java.util.Iterator