Posted by matthewolfe 1 day ago
I’m teaching myself LLM internals by re-implementing the stack from first principles. Profiling TikToken’s Python/Rust implementation showed a lot of time was spent doing regex matching. Most of my perf gains come from a) using a faster jit-compiled regex engine; and b) simplifying the algorithm to forego regex matching special tokens at all.
Benchmarking code is included. Notable results show: - 4x faster code sample tokenization on a single thread. - 2-3x higher throughput when tested on a 1GB natural language text file.
During this process I also asked ChatGPT a lot of questions.
I'm definitely open to suggestions about "how to learn" with all the new tools we have. I felt this has not been straightforward to figure out.
[0] https://modal.com/gpu-glossary
[1] https://www.youtube.com/watch?v=7xTGNNLPyMI