Previous Efforts

A couple weeks ago I wrote a post that examined performance of the C# and Go programming languages when parsing large XML documents. Why compare C# and Go? Because the goals of the languages are similar: to provide programmers with a statically typed, compiled syntax paired with a runtime that enforces memory safety and automatic garbage collection (reclaiming memory from unreferenced objects).

I compared the performance of my C# code with Go code written by Eli Bendersky in his Faster XML Stream Processing in Go blog post. I concluded my C# code ran slightly faster. Then I made a series of performance optimizations to my C# code that demonstrated fully managed C# code (with no P/Invoke calls into native C or C++ code) can be made to run really fast if the programmer employs appropriate data structures and algorithms.

My testing technique, however, was unfair. I did not use the same XML file as Eli (he did not provide it). And I compared the performance of Go code running on Eli’s PC with C# code running on my PC. My comparison measured software differences but did not account for hardware differences, as Eli noted in an email message to me and I acknowledged in a footnote in my blog post.

An Apples to Apples Comparison

So, let me correct the record. To create an apples to apples comparison, I’ll run a Go executable and a C# executable on my PC; I’ll ensure these two programs use similar XML parsing algorithms; and I’ll configure them to parse the same XML file. First, I’ll install the Go distribution for an AMD CPU on my Windows OS.

Next, I’ll modify Eli’s program to use the same XML parsing algorithm as my C# program. The example code Eli provided in his blog post counts occurrences of the value “Africa” in any XML element named <location> regardless of its depth in the XML document tree. The <location> element could be located at the root, one level deep, two levels deep, etc… following any sequence of preceding XML elements. My C# program doesn’t look for a specific XML element value, it simply counts XML elements that occur at a specified path in the XML document tree. So I made the following changes to Eli’s code:

  1. Read XPath expression from command line.
  2. Use an int variable and string array to track current depth and previous XML element names as the document is parsed.
  3. Count a match only if the current XML element and its predecessors match the XPath expression provided on the command line.
  4. Calculate duration from the line of code immediately preceding creation of an XML decoder to the line of code immediately following end of file.

Here’s the revised Go code:

On Your Mark, Get Set, Go!

Let’s build and run the code.

PS C:\Users\Erik\Go\src\parseXml> go build
PS C:\Users\Erik\Go\src\parseXml> .\parseXml.exe "C:\Users\Erik\...\LargeDocument.xml" "icjzlzuydtq/foo/bar/baz"
XML parsing took  10.2763089s
count = 8

PS C:\Users\Erik\Go\src\parseXml> .\parseXml.exe "C:\Users\Erik\...\LargeDocument.xml" "icjzlzuydtq/foo/bar/baz"
XML parsing took  10.2284479s
count = 8

PS C:\Users\Erik\Go\src\parseXml> .\parseXml.exe "C:\Users\Erik\...\LargeDocument.xml" "icjzlzuydtq/foo/bar/baz"
XML parsing took  10.2084803s
count = 8

Unless I’ve made a serious mistake in my Go code (a language with which I’m not familiar), this means my C# code is even faster than equivalent Go code than I had originally thought. The streaming technique of using a forward-only cursor to parse XML tokens is almost twice as fast in C# as in Go. Here’s the updated performance comparison chart:

Updated Performance Summary

Technique Duration (sec) Speedup versus C# Baseline Speedup versus Go Baseline
Go Encoding / XML Streaming 10.24 1.34x +1.00x
.NET XmlDocument 13.72 1.00x -1.34x
.NET XPathDocument 13.67 1.00x -1.33x
.NET XmlReader 05.58 2.46x +1.84x
.NET Char Parser 01.79 7.66x +5.72x
.NET Char Parser (CoreRT RyuJIT) 01.50 +9.13x +6.83x

Positive (+) speedups indicate faster code. Negative (-) speedups indicate slower code.

So yeah, C# is a fast programming language / managed runtime. Eli pointed out in his email message that C# and .NET have been around much longer than Go and have been optimized more for performance than Go. My testing confirms it.

Leave a Reply

Your email address will not be published. Required fields are marked *