ch any erroneous encoding of UTF-8, including encodings of surrogate codepoints. And, of course, for all of Unicode (`[000000-10FFFF]`): ```text [0-7F] [C2-DF][80-BF] [E0][A0-BF][80-BF] [E1-EC][80-BF][80-BF] [ED][80-9F][80-BF] [EE-EF][80-BF][80-BF] [F0][90-BF][80-BF][80-BF] [F1-F3][80-BF][80-BF][80-BF] [F4][80-8F][80-BF][80-BF] ``` This module automates the process of creating these byte ranges from ranges of Unicode scalar values. # Lineage I got the idea and general implementation strategy from Russ Cox in his [article on regexps](https://web.archive.org/web/20160404141123/https://swtch.com/~rsc/regexp/regexp3.html) and RE2. Russ Cox got it from Ken Thompson's `grep` (no source, folk lore?). I also got the idea from [Lucene](https://github.com/apache/lucene-solr/blob/ae93f4e7ac6a3908046391de35d4f50a0d3c59ca/lucene/core/src/java/org/apache/lucene/util/automaton/UTF32ToUTF8.java), which uses it for executing automata on their term index.