added support for UTF-8 documents

Benjamin Mako Hill || Want to submit a patch?