How fast can you parse a CSV file in C#?

Posted by greghn 10/27/2024

How fast can you parse a CSV file in C#?(lemire.me)

25 points | 11 comments

unsnap_biceps 10/27/2024|

> I am loading the file from disk. However, my disk has bandwidth higher than 1.2 GB/s, and the file is small enough to end up in cache. Thus we are already limited by the processor. And we are not yet doing any parsing!

I really wish the author would show some proof for claims like this. io is a complicated beast and it's not certain that the subsystem could go faster with the patterns the code is doing or not. Plus there's no indication of what else is going on simultaneously on the system, so perhaps a background job spin up and the disk is io bound. We don't know and it appears that neither does the author.

kevinskii 10/27/2024||

Not every blog post written in the author’s free time needs to be manuscript-worthy. Informal benchmarks can be often really helpful in solving real world problems where it’s not worthwhile to try to squeeze out every last nanosecond of performance.

unsnap_biceps 10/27/2024||

As I started my comment out with, I wish, not demand, not expect, just wish there was a little more technical meat on an article written by an absolute expert in the performance computing field. This is the author's day job. They've written a number of books on high performance and is currently an professor teaching high performance.

That said, I apologize if I came off more critical than I intended.

spease 10/27/2024|||

Worked on a relevant project for Meta.

There’s a lot of overhead as soon as you involve a filesystem rather than a block device, even on a dedicated disk, particularly with btrfs. I don’t know if the same is true with MacOS and APFS; this isn’t the area I usually work in. However copy-on-write file systems (which I believe apfs is) are somewhat predisposed to fragment files as part of the dedup process; I don’t know if apfs runs it online in some way so it could have affected the article’s author’s results.

The standard library implementation details can also have a huge impact, eg I observed with Rust for a prior project when I started fiddling with the read buffer size:

https://github.com/rust-lang/rust/issues/49921

The other issue that I see is that their I/O is implicitly synchronous and requires a memory copy. They might see better performance if they can memmap the file, which can probably solve both issues. Then if C# allows it, they can just parse the CSV in-place; with a language like Rust, you can even trivially do this in a zero-copy manner, though I suspect it’s more involved with C# since this requires setting up strings / parsing that point at the memmaped file.

At that point, the OS should be theoretically able to serve up the cached file for the application to do some logic with, without ever needing to copy the full contents again into separate strings.

neonsunset 10/27/2024||

C# has an abstraction for memory-mapped files. You can always use raw pointers and directly call the corresponding OS APIs with interop too.

However, the fastest-performing implementations in 1BRC challenge that were written in C# ended up with inconclusive results whether using memory-mapping over RandomAccess.Read API (which is basically a thin wrapper over read/pread calls) is faster or not: https://github.com/noahfalk/1brc/?tab=readme-ov-file#file-re...

You can relatively easily do 2 GiB/s reads with RandomAccess/FileStream as long as sufficiently large buffer size is used. FileStream default settings already provide a quite good performance, and make it use adaptive buffer size under the hood. Memory-mapping is convenient but it's not a silver bullet (in this context) and page-faulting then mapping the page and filling it with data by performing the read within kernel space is not necessarily cheaper than passing a pointer to a buffer to read into.

The challenges in Rust and C# are going to be very similar in this type of task since C# can just pin the GC-allocated arrays to read into, call into malloc or 'stackalloc' the temporary buffer inline, and the rest of implementation will be subject to more or less identical constraints. C# is probably the closest* "high-level" language in feature set to Rust, even if this sounds strange. There's a sibling submission that covers an another angle to this: https://news.ycombinator.com/item?id=41963259

* have not looked through Swift 6 changes in detail yet

theamk 10/27/2024||

It's really not that hard, I think you are making this trivial benchmarking task sound complicated.

The link to source code is there. It uses BenchmarkRunner class, which handles warmup and multiple runs. I am assuming author ensured that stddev was small enough that the raw numbers is valid. And with 11MB file size, it will certainly be cached between runs - even if something else evicts the cache, it will show up as high stddev and then presumably author would rr-run it again on quiter systems.

mattewong 10/28/2024||

Haven't yet seen any of these beat https://github.com/liquidaty/zsv (of which I'm an author) when real-world constraints are applied (e.g. we no longer assume that line ends are always \n, or that there are no dbl-quote chars, embedded commas/newlines/dbl-quotes). And maybe under the artificial conditions as well.

kristianp 10/28/2024||

Lemire himself also worked on a C++ CSV parser with Geoff Langdale that uses SIMD to accelerate the parsing. [1].

Also the fastest library in TFAs benchmark is Sep [2], which makes use of SIMD.

[1] https://github.com/geofflangdale/simdcsv

[2] https://github.com/nietras/Sep

MeteorMarc 10/27/2024|

The author does not specify whether the tested libraries contain native code, cq. c. This obscures how much improvement is possible.

pjmlp 10/27/2024||

Contrary to Python, .NET can do plenty on its JIT and AOT compilers, and low level features used to support C++/CLI on the runtime.

neonsunset 10/27/2024||

All tested libraries here are pure C#. You simply do not need "native" components* for tasks like these in C#, since JIT/ILC is more than capable at producing performant "native" code.

* for all intents and purposes that's what C# compiles to under NAOT. Even with JIT there is no interpreter stage, CIL is always compiled to native code.