Following are the places where Libhubbub still lacks, although it is now reliable at most places.
1) Element stack size inflates on repeated pushing, without reducing its size during pop. The proposed solution is to reduce size by 1/2 once the used proportion falls to below 1/3. A simillar increase to twice it's size once it's full. An approval from core developers is neccessary before trying to implement this.
2) The tokeniser has become very messy & unreadable because of introduction of script related states. The proposed solution is to have standalone handlers for each state, But this may mean a significant increase in code size and redundant code, giving a blow to code reusability.
3) The library has been significantly slowed down because it is now required to store tag Attributes. I have currently stored it on the context details. But repeated use of strndup to copy attribute strings during stack push as well as during formatting list push, has severely slowed down things. However, it works reliably.
4) Assumption: The client currently doesn't support creation of template elements. Libhubbub can now properly handle template tags, assuming template to be equivalent to any other tag. When template creation support is provided, the only thing to be done would be to incorporate it into the insert_element method of the treebuilder.
5) Handling script tags in SVG mode requires the client to support it too. The specs are a bit hazy and any input on it would be appreciated: www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inforeign \<-look under the script end tag .
6) I couldn't guess how to find out whether the document is an iframe-source document. If any inputs can be provided, it would be helpful enough.
7) The charset detection mechanism previously prescaned the doc upto 512 bytes to find the meta tag. This has been increased it to 1024 bytes, and this requires approval of the core developers. Also, currently, no algorithms have been implemented to auto-detect document encoding. If appropriate sources are provided, I will try implementing those in Hubbub.
8) XML violations are a special set of rules to make make the API safe for the xml pipeline. And Hubbub currently doesn't support it. If the core developers see this to be neccessary at all, I will try implementing it. Ref: http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#coercing-an-html-dom-into-an-infoset
9) Some errors out of my knowledge may have crept into the library. After all human is all but err
Rupinder Singh Khokhar