Previous Efforts
A couple weeks ago I wrote a post that examined performance of the C# and Go programming languages when parsing large XML documents. Why compare C# and Go? Because the goals of the languages are similar: to provide programmers with a statically typed, compiled syntax paired with a runtime that enforces memory safety and automatic garbage collection (reclaiming memory from unreferenced objects).
I compared the performance of my C# code with Go code written by Eli Bendersky in his Faster XML Stream Processing in Go blog post. I concluded my C# code ran slightly faster. Then I made a series of performance optimizations to my C# code that demonstrated fully managed C# code (with no P/Invoke calls into native C or C++ code) can be made to run really fast if the programmer employs appropriate data structures and algorithms.
My testing technique, however, was unfair. I did not use the same XML file as Eli (he did not provide it). And I compared the performance of Go code running on Eli’s PC with C# code running on my PC. My comparison measured software differences but did not account for hardware differences, as Eli noted in an email message to me and I acknowledged in a footnote in my blog post.
An Apples to Apples Comparison
So, let me correct the record. To create an apples to apples comparison, I’ll run a Go executable and a C# executable on my PC; I’ll ensure these two programs use similar XML parsing algorithms; and I’ll configure them to parse the same XML file. First, I’ll install the Go distribution for an AMD CPU on my Windows OS.
Next, I’ll modify Eli’s program to use the same XML parsing algorithm as my C# program. The example code Eli provided in his blog post counts occurrences of the value “Africa” in any XML element named <location>
regardless of its depth in the XML document tree. The <location>
element could be located at the root, one level deep, two levels deep, etc… following any sequence of preceding XML elements. My C# program doesn’t look for a specific XML element value, it simply counts XML elements that occur at a specified path in the XML document tree. So I made the following changes to Eli’s code:
- Read XPath expression from command line.
- Use an int variable and string array to track current depth and previous XML element names as the document is parsed.
- Count a match only if the current XML element and its predecessors match the XPath expression provided on the command line.
- Calculate duration from the line of code immediately preceding creation of an XML decoder to the line of code immediately following end of file.
Here’s the revised Go code:
On Your Mark, Get Set, Go!
Let’s build and run the code.
PS C:\Users\Erik\Go\src\parseXml> go build PS C:\Users\Erik\Go\src\parseXml> .\parseXml.exe "C:\Users\Erik\...\LargeDocument.xml" "icjzlzuydtq/foo/bar/baz" XML parsing took 10.2763089s count = 8 PS C:\Users\Erik\Go\src\parseXml> .\parseXml.exe "C:\Users\Erik\...\LargeDocument.xml" "icjzlzuydtq/foo/bar/baz" XML parsing took 10.2284479s count = 8 PS C:\Users\Erik\Go\src\parseXml> .\parseXml.exe "C:\Users\Erik\...\LargeDocument.xml" "icjzlzuydtq/foo/bar/baz" XML parsing took 10.2084803s count = 8
Unless I’ve made a serious mistake in my Go code (a language with which I’m not familiar), this means my C# code is even faster than equivalent Go code than I had originally thought. The streaming technique of using a forward-only cursor to parse XML tokens is almost twice as fast in C# as in Go. Here’s the updated performance comparison chart:
Updated Performance Summary
Technique | Duration (sec) | Speedup versus C# Baseline | Speedup versus Go Baseline |
---|---|---|---|
Go Encoding / XML Streaming | 10.24 | 1.34x | +1.00x |
.NET XmlDocument | 13.72 | 1.00x | -1.34x |
.NET XPathDocument | 13.67 | 1.00x | -1.33x |
.NET XmlReader | 05.58 | 2.46x | +1.84x |
.NET Char Parser | 01.79 | 7.66x | +5.72x |
.NET Char Parser (CoreRT RyuJIT) | 01.50 | +9.13x | +6.83x |
Positive (+) speedups indicate faster code. Negative (-) speedups indicate slower code.
So yeah, C# is a fast programming language / managed runtime. Eli pointed out in his email message that C# and .NET have been around much longer than Go and have been optimized more for performance than Go. My testing confirms it.