README
@agen/encoding
This package contains methods to encode/decode strings using async generators.
- High-level methods:
- Main methods for encoding / decoding messages. These methods provide
detailed information about exact positions of returned blocks in the stream:
- encodeBlocks encodes strings to UTF-8 binary blocks and returns byte position and length of the encoded chunks
- decodeBlocks decodes binary blocks and returns decoded strings with their byte-level positions in the original stream
- decodeLines decodes individual lines and returns
their position in the stream; returned lines have the trailing
"\n"
symbol
- Low-level functions accepting one character / one byte and performing encoding / decoding
at the byte level:
- newEncoder encodes individual characters
- newDecoder decodes individual bytes
method encode
Encodes strings to UTF-8 binary blocks. Paramters:
blockSize
- maximal length of encoded binary chunksonBlock
- an optional method recieves the full position information of returned blocks; this method is called just before the block is returned by the iterator. Structure of the information object:data
- a byte sequence encoding a string chunkcharIdx
- position of the first encoded character in the initial streamcharLen
- length of the encoded block of charactersbyteIdx
- byte position of the encoded block in the resulting binary streambyteLen
- length of the encoded data block (equals to the data.length)
Example:
import * as agen from '@agen/encoding';
const blockSize = 64;
const f = agen.encode(blockSize);
const textChunks = [
`съешь же\n ещё`,
` этих\n мягких французских \nбулок, `,
`да выпей\nчаю`
]
for await (let chunk of f(textChunks)) {
console.log('*', chunk);
}
// Output:
// * Uint8Array(63)[
// 209, 129, 209, 138, 208, 181, 209, 136, 209, 140,
// 32, 208, 182, 208, 181, 10, 32, 208, 181, 209,
// 137, 209, 145, 32, 209, 141, 209, 130, 208, 184,
// 209, 133, 10, 32, 208, 188, 209, 143, 208, 179,
// 208, 186, 208, 184, 209, 133, 32, 209, 132, 209,
// 128, 208, 176, 208, 189, 209, 134, 209, 131, 208,
// 183, 209, 129
// ]
// * Uint8Array(42)[
// 208, 186, 208, 184, 209, 133, 32, 10,
// 208, 177, 209, 131, 208, 187, 208, 190,
// 208, 186, 44, 32, 208, 180, 208, 176,
// 32, 208, 178, 209, 139, 208, 191, 208,
// 181, 208, 185, 10, 209, 135, 208, 176,
// 209, 142
// ]
method decode
Decodes binary blocks and returns decoded strings.
Paramters:
onBlock
- an optional method recieving full information about the decoded string chunk; this method is called just before the decoded string is returned by the iterator. Structure of the information object:data
- the string chunkbyteIdx
- position of the returned string in the original byte streambyteLen
- byte lengths of the stringcharIdx
- position of the first character of the returned stringcharLen
- length of the string chunk (the same as str.length)firstLine
- line of the first character of the returned stringfirstPos
- position on the line for the first characterlastLine
- the line of the last character of the returned stringlastPos
- line position of the last returned character
Example:
import * as agen from '@agen/encoding';
const f = agen.decode();
const blocks = [
Uint8Array.from([
209, 129, 209, 138, 208, 181, 209, 136, 209, 140,
32, 208, 182, 208, 181, 10, 32, 208, 181, 209,
137, 209, 145, 32, 209, 141, 209, 130, 208, 184,
209, 133, 10, 32, 208, 188, 209, 143, 208, 179,
208, 186, 208, 184, 209, 133, 32, 209, 132, 209,
128, 208, 176, 208, 189, 209, 134, 209, 131, 208,
183, 209, 129
]),
Uint8Array.from([
208, 186, 208, 184, 209, 133, 32, 10,
208, 177, 209, 131, 208, 187, 208, 190,
208, 186, 44, 32, 208, 180, 208, 176,
32, 208, 178, 209, 139, 208, 191, 208,
181, 208, 185, 10, 209, 135, 208, 176,
209, 142
])
];
for await (let str of f(blocks)) {
console.log('*', str);
}
// Output:
// * съешь же
// ещё этих
// мягких французс
// * ких
// булок, да выпей
// чаю
method lines
Decodes binary stream and returns individual lines.
The returned lines contain the trailing "\n"
symbol (except, eventually, the last line).
Paramters:
onBlock
- an optional method recieving full information about the decoded string chunk; this method is called just before the decoded string is returned by the iterator. Structure of the information object:data
- the string chunkbyteIdx
- position of the returned string in the original byte streambyteLen
- byte lengths of the stringcharIdx
- position of the first character of the returned stringcharLen
- length of the string chunk (the same as str.length)firstLine
- line of the first character of the returned stringfirstPos
- position on the line for the first characterlastLine
- the line of the last character of the returned stringlastPos
- line position of the last returned character
Example:
import * as agen from '@agen/encoding';
const f = agen.lines();
const blocks = [
Uint8Array.from([
209, 129, 209, 138, 208, 181, 209, 136, 209, 140,
32, 208, 182, 208, 181, 10, 32, 208, 181, 209,
137, 209, 145, 32, 209, 141, 209, 130, 208, 184,
209, 133, 10, 32, 208, 188, 209, 143, 208, 179,
208, 186, 208, 184, 209, 133, 32, 209, 132, 209,
128, 208, 176, 208, 189, 209, 134, 209, 131, 208,
183, 209, 129
]),
Uint8Array.from([
208, 186, 208, 184, 209, 133, 32, 10,
208, 177, 209, 131, 208, 187, 208, 190,
208, 186, 44, 32, 208, 180, 208, 176,
32, 208, 178, 209, 139, 208, 191, 208,
181, 208, 185, 10, 209, 135, 208, 176,
209, 142
])
];
for await (let line of f(blocks)) {
console.log('*', line);
}
// Output:
// * съешь же
//
// * ещё этих
//
// * мягких французских
//
// * булок, да выпей
//
// * чаю
method encodeBlocks
This method recieves strings and transforms them to byte chunks with associated position information.
Parameters:
blockSize
(default:1024
) - size of generated chunksArrayType
(default:Uint8Array
) - type of the generated byte arrays
Returned values have the following structure:
data
- a byte sequence encoding a string chunkcharIdx
- position of the first encoded character in the initial streamcharLen
- length of the encoded block of charactersbyteIdx
- byte position of the encoded block in the resulting binary streambyteLen
- length of the encoded data block (equals to the data.length)
Example:
import * as agen from '@agen/encoding';
const blockSize = 64;
const f = agen.encodeBlocks(blockSize);
const textChunks = [
`съешь же\n ещё`,
` этих\n мягких французских \nбулок, `,
`да выпей\nчаю`
]
for await (let info of f(textChunks)) {
console.log(`-----------`)
console.log('*', info);
}
// Output:
// -----------
// * {
// byteIdx: 0,
// charIdx: 0,
// byteLen: 63,
// charLen: 35,
// data: Uint8Array(63)[
// 209, 129, 209, 138, 208, 181, 209, 136, 209, 140,
// 32, 208, 182, 208, 181, 10, 32, 208, 181, 209,
// 137, 209, 145, 32, 209, 141, 209, 130, 208, 184,
// 209, 133, 10, 32, 208, 188, 209, 143, 208, 179,
// 208, 186, 208, 184, 209, 133, 32, 209, 132, 209,
// 128, 208, 176, 208, 189, 209, 134, 209, 131, 208,
// 183, 209, 129
// ]
// }
// -----------
// * {
// byteIdx: 63,
// charIdx: 35,
// byteLen: 42,
// charLen: 24,
// data: Uint8Array(42)[
// 208, 186, 208, 184, 209, 133, 32, 10,
// 208, 177, 209, 131, 208, 187, 208, 190,
// 208, 186, 44, 32, 208, 180, 208, 176,
// 32, 208, 178, 209, 139, 208, 191, 208,
// 181, 208, 185, 10, 209, 135, 208, 176,
// 209, 142
// ]
// }
method decodeBlocks
Decodes provided stream of UTF-8 texts. Returns information about indididual blocks in the stream - their byte position, character position, byte and char lengths, also the line numbers and line positions of the beginning and block ends.
Parameters:
splitter
- optional splitter; with this parameter this method additionally splits blocks; it can be useful to split the strim to new lines (by the '\n' symbol). Note that to re-create full lines the returned blocks should be analized and concatenated together if they are started on the same line.
The returned messages have the following structure:
data
- the string chunkbyteIdx
- position of the returned string in the original byte streambyteLen
- byte lengths of the stringcharIdx
- position of the first character of the returned stringcharLen
- length of the string chunk (the same as str.length)firstLine
- line of the first character of the returned stringfirstPos
- position on the line for the first characterlastLine
- the line of the last character of the returned stringlastPos
- line position of the last returned character
Example:
import * as agen from '@agen/encoding';
const f = agen.decodeBlocks();
const blocks = [
Uint8Array.from([
209, 129, 209, 138, 208, 181, 209, 136, 209, 140,
32, 208, 182, 208, 181, 10, 32, 208, 181, 209,
137, 209, 145, 32, 209, 141, 209, 130, 208, 184,
209, 133, 10, 32, 208, 188, 209, 143, 208, 179,
208, 186, 208, 184, 209, 133, 32, 209, 132, 209,
128, 208, 176, 208, 189, 209, 134, 209, 131, 208,
183, 209, 129
]),
Uint8Array.from([
208, 186, 208, 184, 209, 133, 32, 10,
208, 177, 209, 131, 208, 187, 208, 190,
208, 186, 44, 32, 208, 180, 208, 176,
32, 208, 178, 209, 139, 208, 191, 208,
181, 208, 185, 10, 209, 135, 208, 176,
209, 142
])
];
for await (let info of f(blocks)) {
console.log(`-----------`)
console.log('*', info);
}
// Output:
// -----------
// * {
// charIdx: 0,
// charLen: 35,
// byteIdx: 0,
// byteLen: 63,
// firstLine: 0,
// firstPos: 0,
// lastLine: 2,
// lastPos: 16,
// data: 'съешь же\n ещё этих\n мягких французс'
// }
// -----------
// * {
// charIdx: 35,
// charLen: 24,
// byteIdx: 63,
// byteLen: 42,
// firstLine: 2,
// firstPos: 16,
// lastLine: 4,
// lastPos: 3,
// data: 'ких \nбулок, да выпей\nчаю'
// }
method decodeLines
Decodes and returns individual lines from the byte stream. The returned lines
are associated with their position information.
All lines contain the trailing "\n"
symbol except, eventually, the last one.
The returned messages have the following structure:
data
- the line string; it contains the\\n
symbol at the end.byteIdx
- byte position of the returned line in the original byte streambyteLen
- byte lengths of the linecharIdx
- position of the first character of the linecharLen
- length of the line (the same as data.length)firstLine
- line of the first character of the returned stringfirstPos
- position on the line for the first character (it is always 0)lastLine
- the line of the last character of the returned stringlastPos
- line position of the last returned character; it is 0 most of the time and can be different for the last line Example:
import * as agen from '@agen/encoding';
const f = agen.decodeLines();
const blocks = [
Uint8Array.from([
209, 129, 209, 138, 208, 181, 209, 136, 209, 140,
32, 208, 182, 208, 181, 10, 32, 208, 181, 209,
137, 209, 145, 32, 209, 141, 209, 130, 208, 184,
209, 133, 10, 32, 208, 188, 209, 143, 208, 179,
208, 186, 208, 184, 209, 133, 32, 209, 132, 209,
128, 208, 176, 208, 189, 209, 134, 209, 131, 208,
183, 209, 129
]),
Uint8Array.from([
208, 186, 208, 184, 209, 133, 32, 10,
208, 177, 209, 131, 208, 187, 208, 190,
208, 186, 44, 32, 208, 180, 208, 176,
32, 208, 178, 209, 139, 208, 191, 208,
181, 208, 185, 10, 209, 135, 208, 176,
209, 142
])
];
for await (let info of f(blocks)) {
console.log(`-----------`)
console.log('*', info);
}
// Output:
// -----------
// * {
// charIdx: 0,
// charLen: 9,
// byteIdx: 0,
// byteLen: 16,
// firstLine: 0,
// firstPos: 0,
// lastLine: 1,
// lastPos: 0,
// data: "съешь же\n"
// }
// -----------
// * {
// charIdx: 9,
// charLen: 10,
// byteIdx: 16,
// byteLen: 17,
// firstLine: 1,
// firstPos: 0,
// lastLine: 2,
// lastPos: 0,
// data: " ещё этих\n"
// }
// -----------
// * {
// charIdx: 19,
// charLen: 21,
// byteIdx: 33,
// byteLen: 38,
// firstLine: 2,
// firstPos: 0,
// lastLine: 3,
// lastPos: 0,
// data: " мягких французских \n"
// }
// -----------
// * {
// charIdx: 40,
// charLen: 16,
// byteIdx: 71,
// byteLen: 28,
// firstLine: 3,
// firstPos: 0,
// lastLine: 4,
// lastPos: 0,
// data: "булок, да выпей\n"
// }
// -----------
// * {
// charIdx: 56,
// charLen: 3,
// byteIdx: 99,
// byteLen: 6,
// firstLine: 4,
// firstPos: 0,
// lastLine: 4,
// lastPos: 3,
// data: "чаю"
// }
method newEncoder
This method accepts one character code, updates the given buffer and returns the number of written bytes.
Returned method accepts the following parameters:
code
- code of the character to encodebuf
- buffer where character bytes should be writtenidx
- start position in the buffer where encoded bytes should be written
This method returns the number of encoded bytes.
Example:
import * as agen from '@agen/encoding';
const enc = agen.newEncoder();
const charCode = 'ё'.charCodeAt(0);
const buf = [0, 0, 0, 0];
const len = enc(charCode, buf, 0);
console.log('Number of written bytes:', len);
console.log('Buffer content:', buf);
// Output:
// Number of written bytes: 2
// Buffer content: [209, 145, 0, 0]
method newDecoder
This method accepts one byte and returns a string with the decoded characters.
Returned method accepts the following parameters:
byte
- one byte from the stream
This method returns a string with the decoded characters. If there is no characters ready to output then it returns an empty string.
Example:
import * as agen from '@agen/encoding';
const dec = agen.newDecoder();
const buf = [ 209, 145 ];
for (let i = 0; i < buf.length; i++) {
const byte = buf[i];
let s = dec(byte);
console.log(`${i}) ${byte} - "${s}"`);
}
// Output:
// 0) 209 - ""
// 1) 145 - "ё"