If you want to do that you'd probably want to use a fast SAX parser, not something that naively looks at one byte at a time.