0. Abstract
      As the first step of most 
      XML processing algorithms, one usually extracts token content out of the 
      source document into many discrete string objects. We propose a 
      "non-extractive" tokenization approach that maintains the source document 
      intact in memory. Using a binary encoding specification called Virtual 
      Token Descriptor (VTD), the processing model represents tokens exclusively 
      using starting offset and length. To create a hierarchical view of the 
      data encapsulated in XML, the parser further indexes elements of same 
      depths using directory-like structures we call location cache. Through a 
      demonstration of navigating the document hierarchy using VTD and location 
      caches, we show that it is indeed possible to create a cursor-based API 
      that retains most of DOM's random-access capabilities at a fraction 
      of its memory usage. Furthermore, by analyzing key design constraints of 
      custom hardware, we reason that the memory conserving characteristics of 
      the processing model simultaneously make possible "XML on a chip" and 
      "binary-enhanced XML." The benchmark results show that the reference 
      implementation of our processing model significantly outperforms Xerces 
      DOM in terms of both memory and processing performance.