TextDecoder decodes non-UTF-8 byte sequences to valid UTF-8, in accordance with WHATWG's encoding standard.
This is a low-level interface; you may also be interested in decoder, which provides equally efficient high-level wrapper procedures.
The implementation consists of a single procedure: decode, which dispatches on the charset field to pick the desired decoder. To decode an input stream, call decode on any number of chunks with finish = false, then with finish = true on the last chunk. (If you don't know which is the last chunk, just use an empty chunk at the end.)
Each decode call may return a tdrDone, tdrReqOutput, or tdrError result. It takes input from iq (input queue), and places it in oq (output queue). The parameter n is always set to the last byte written to in the output queue.
tdrReqOutput signals that the output queue was too small to fit output of the decoder. The consumer should provide more space, e.g. by copying contents of the output queue elsewhere and resetting n, or by growing the output queue in size.
At this point, the internal variable i points to the last input byte consumed; bytes before that may be safely discarded, provided you adjust i accordingly (subtracting the removed input bytes).
tdrReadInput instructs the consumer to read the input queue between the bytes pi..<ri (exclusive) as decoded output. WARNING: this does not mean that oq is left unmodified.
In particular, in the UTF-8 decoder, if the previous iq ended with a split up UTF-8 character, then the next pass fills oq with its remains before it would return tdrReadInput. Make sure to process oq to n before you process iq.
tdrError is returned for all decoding errors encountered. For compliance with the encoding standard, callers must either abort decoding the input stream (error mode "fatal"), or manually append a U+FFFD replacement character (error mode "replacement").
Note that even if finish is true, decoding of the chunk is not complete after receiving tdrError if you're using error mode "replacement".
tdrDone is returned once decoding of iq has finished. If finish was set to true, it can be assumed that decoding is complete; otherwise, you should call decode again on the next buffer. (i is reset to 0 automatically, so there's no need to do anything before the next call.)
Using TextDecoder objects after setting finish = true is valid, but not well tested, so it is recommended that you reset your decoder after the last chunk.
Types
TextDecoder = object i*: int ri*: int pi*: int charset*: Charset
TextDecoderFinishResult = enum tdfrDone, tdfrError
TextDecoderResult = enum tdrDone, tdrReadInput, tdrReqOutput, tdrError
Procs
proc decode(td: var TextDecoder; iq: openArray[uint8]; oq: var openArray[uint8]; n: var int; finish = false): TextDecoderResult {....raises: [], tags: [], forbids: [].}