Interest Piqued

Recently I read Eli Bendersky’s Faster XML Stream Processing in Go blog post. While the point of his post was to explain the difference between in-memory and stream parsing, then examine various stream parsing techniques in the Go programming language… I got hung up on his choice of language.

Update 2019 Aug 17: I’ve run an apples to apples comparison test. See Updated Comparison of C# and Go Performance

Go is a new programming language developed at Google. It’s statically typed, compiled, with a runtime that enforces memory safety and automatic garbage collection (reclaiming memory from unreferenced objects). Don’t we already have two mature language / runtimes with these characteristics? Java and C#. Why use Go? Is there some fundamental deficiency in Java or C#? Eli’s blog post focuses on performance, so let’s examine the question from that perspective. I can’t speak for Java (not my forte), so I’ll ask the following question of C#: Is there a serious performance deficiency in C#, not present in Go, that makes Go a superior choice for projects whose teams prefer to work with a managed runtime?

My intuition says no, C# is not outperformed by Go. C# is very fast for a managed language. But, let’s not rely on anecdotal evidence, let’s collect empirical data.

<tl;dr>Jump to the Summary section.</tl;dr>

Generating Test Data

Eli gives performance metrics for parsing a large (230 MB) XML file using various techniques. So our first assignment is to write code that generates large XML documents.

PS C:\Users\Erik\...\Sandbox\Xml Parser> dotnet run -c release -- -o "C:\Users\Erik\Documents\Temp\LargeDocument.xml" -s 230
Creating file... done.
Creation of 230 MB file took 5.405 seconds.

After generating the file, I manually inserted a few occurrences of the following sequence of XML elements under the document’s <icjzlzuydtq> root element:

<foo><bar><baz>Testing</baz></bar></foo>

Next, let’s write boilerplate code that reads command-line arguments, generates a file, and parses the file using various techniques.

Go and C# Baseline Performance

Eli gives a baseline of 6.24 seconds to parse a 230 MB XML file using streaming APIs in the Go standard library’s encoding/xml package. Let’s see if we can do better. We’ll start with what I call a naive solution, XmlDocument, a class provided by Microsoft in the .NET Base Class Library. We’ll load the entire file into memory, then search for the above sequence of XML elements. We know this won’t perform as well as streaming techniques, but the code is simple to write, so let’s write it anyhow. We’ll use this algorithm to establish a baseline .NET performance metric.

PS> dotnet run -c release -- -i "C:\...\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p xmldocument
Found 8 nodes.
Parsing of 230 MB file took 14.766 seconds.

PS> dotnet run -c release -- -i "C:\...\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p xmldocument
Found 8 nodes.
Parsing of 230 MB file took 13.156 seconds.

PS> dotnet run -c release -- -i "C:\...\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p xmldocument
Found 8 nodes.
Parsing of 230 MB file took 13.231 seconds.

Eh, that’s not great. Let’s see if XPathDocument, also found in the .NET Base Class Library, performs better.

PS> dotnet run -c release -- -i "C:\Users\Erik\Documents\Temp\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p xpathdocument
Found 8 nodes.
Parsing of 230 MB file took 13.167 seconds.

PS> dotnet run -c release -- -i "C:\Users\Erik\Documents\Temp\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p xpathdocument
Found 8 nodes.
Parsing of 230 MB file took 13.177 seconds.

PS> dotnet run -c release -- -i "C:\Users\Erik\Documents\Temp\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p xpathdocument
Found 8 nodes.
Parsing of 230 MB file took 14.671 seconds.

No better.

.NET Streaming API

Let’s try a streaming API from the .NET Base Class Library, XmlReader.

PS> dotnet run -c release -- -i "C:\Users\Erik\Documents\Temp\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p xmlreader
Found 8 nodes.
Parsing of 230 MB file took 5.626 seconds.

PS> dotnet run -c release -- -i "C:\Users\Erik\Documents\Temp\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p xmlreader
Found 8 nodes.
Parsing of 230 MB file took 5.512 seconds.

PS> dotnet run -c release -- -i "C:\Users\Erik\Documents\Temp\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p xmlreader
Found 8 nodes.
Parsing of 230 MB file took 5.605 seconds.

Much better! Our C# XmlReader code is slightly faster than Eli’s Go code (1). This is quite impressive considering Go compiles to machine code but C# compiles to Intermediate Language (IL) code. But I believe we can do even better if we write our own code.

Invoking Unmanaged Code

Eli also wrote his own code. He wrote Go code that calls libxml, a library written in C. This program ran in 4.03 seconds. He wrote Python code that calls libxml, which ran in 3.7 seconds. And he wrote C code that calls libxml (no transition from managed to native runtime), which ran in 0.56 seconds. Wow! However, none of these techniques measure the performance of Go. Nor do they address the premise of my question, which is whether Go is a better choice (in terms of performance) for teams who prefer to work with a managed runtime. Presumably, these teams prefer a managed runtime because they don’t want to accept the burden of manual memory management (malloc and free, RAII, marshalling, etc) imposed by low-level languages like C and C++. I can accomplish the same in C# using its P/Invoke (Platform Invoke) feature to call into C or C++ libraries, but that’s not what I intend to investigate in this blog post.

Eli wrote a Go program that runs in 4.03 seconds; a Python program that runs in 3.7 seconds; and a C program that runs in 0.56 seconds. Wow! However, they’re fast because of C, not Go.

.NET Zero-Allocation Custom Code

OK, back to our investigation of the performance of C# code: I think we can do even better if we write our own code as opposed to using classes from Microsoft’s Base Class Library. Let’s read a character at a time from the file stream into a buffer, careful to keep memory allocations to a minimum. We’ll allocate data structures at program startup, then allocate nothing as we read through the file stream. (My experience with chess programming has taught me to avoid allocating memory inside tight loops, such as a chess engine’s search function.) To be more precise, while reading characters from the file stream, we’ll allocate primitive value-types on the stack but allocate no reference-types on the heap. (See the discussion of value types versus reference types in Types : C# Programming Guide.) This should improve performance.

PS> dotnet run -c release -- -i "C:\Users\Erik\Documents\Temp\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p char
Found 8 nodes.
Parsing of 230 MB file took 1.805 seconds.

PS> dotnet run -c release -- -i "C:\Users\Erik\Documents\Temp\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p char
Found 8 nodes.
Parsing of 230 MB file took 1.791 seconds.

PS> dotnet run -c release -- -i "C:\Users\Erik\Documents\Temp\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p char
Found 8 nodes.
Parsing of 230 MB file took 1.761 seconds.

The zero-allocation char-buffered C# code is fast!

CoreRT

Now let’s put the pedal to the metal. Let’s use Microsoft’s CoreRT runtime. CoreRT still is in alpha (early) development. To quote the project description from its GitHub repository, CoreRT is “a .NET Core runtime optimized for AOT (Ahead of Time compilation) scenarios, with the accompanying compiler toolchain.” What does that mean? It means now we can compile C# directly to machine code instead of Intermediate Language (IL). This means the resulting .exe file directly executes CPU instructions instead of relying on a Just In Time (JIT) compiler to transform IL, as it’s encountered, into machine code.

Let’s compile our code using CoreRT and RyuJIT. RyuJIT is the JIT compiler used by the .NET Framework and .NET Core. Except that here we’re using it to compile our code AOT.

Install CoreRT (instructions) and add the following import statements and properties to the .csproj project file.

<Import Project="$(MSBuildSDKsPath)\Microsoft.NET.Sdk\Sdk\Sdk.props" />
<Import Project="$(MSBuildSDKsPath)\Microsoft.NET.Sdk\Sdk\Sdk.targets" />
<Import Project="$(IlcPath)\build\Microsoft.NETCore.Native.targets" />
<PropertyGroup>
    <IlcInvariantGlobalization>true</IlcInvariantGlobalization>
    <RootAllApplicationAssemblies>false</RootAllApplicationAssemblies>
    <IlcOptimizationPreference>Speed</IlcOptimizationPreference>
    <IlcGenerateCompleteTypeMetadata>false</IlcGenerateCompleteTypeMetadata>
    <IlcGenerateStackTraceData>false</IlcGenerateStackTraceData>
    <IlcFoldIdenticalMethodBodies>true</IlcFoldIdenticalMethodBodies>
</PropertyGroup>

Compile the code.

C:\Users\Erik\Documents\Visual Studio 2019\Projects\Sandbox\Xml Parser>dotnet publish -c release -r win10-x64
Microsoft (R) Build Engine version 15.9.20+g88f5fadfbe for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.

  Xml Parser -> C:\Users\Erik\...\win10-x64\ErikTheCoder.Sandbox.XmlParser.dll
  Generating compatible native code. To optimize for size or speed, visit https://aka.ms/OptimizeCoreRT
  Xml Parser -> C:\Users\Erik\...\win10-x64\publish\

The above command produces a native .exe for a Windows 10 64-bit PC. Run the program.

PS> .\...XmlParser.exe -i "C:\Users\Erik\Documents\Temp\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p char
Found 8 nodes.
Parsing of 230 MB file took 1.503 seconds.

PS> .\...XmlParser.exe -i "C:\Users\Erik\Documents\Temp\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p char
Found 8 nodes.
Parsing of 230 MB file took 1.502 seconds.

PS> .\...XmlParser.exe -i "C:\Users\Erik\Documents\Temp\LargeDocument.xml" -x "/icjzlzuydtq/foo/bar/baz" -p char
Found 8 nodes.
Parsing of 230 MB file took 1.497 seconds.

The AOT-compiled program is faster than our JIT-compiled program.

I tried compiling the code with CoreRT and C++ optimizations. To quote the CoreRT documentation, “This approach uses a transpiler to convert IL to C++, and then uses a platform-specific C++ compiler and linker for compiling / linking the application.” However, I ran into numerous errors and even when I’d resolved the errors the resulting .exe was 200 milliseconds slower. So clearly, the CoreRT / C++ transpiler is not ready for primetime.

Performance Summary

Update 2019 Aug 17: I’ve run an apples to apples comparison test. See Updated Comparison of C# and Go Performance

Technique Duration (sec) Speedup versus C# Baseline Speedup versus Go Baseline
Go Encoding / XML Streaming 06.24 2.20x +1.00x
.NET XmlDocument 13.72 1.00x -2.20x
.NET XPathDocument 13.67 1.00x -2.20x
.NET XmlReader 05.58 2.46x +1.12x
.NET Char Parser 01.79 7.66x +3.49x
.NET Char Parser (CoreRT RyuJIT) 01.50 +9.13x +4.16x

Positive (+) speedups indicate faster code. Negative (-) speedups indicate slower code.

C# Advocacy

So yeah, C# is a fast programming language / managed runtime. It’s also mature, has many advanced features for code-conciseness and performance (generics, tuples, LINQ, expression bodies, pattern matching, async / await, ref locals, Span<T>, unsafe pointers, P/Invoke), provides sophisticated frameworks for web and desktop application development (web via ASP.NET Core MVC and WebAPI; desktop via Windows Forms, WPF, and UWP), is open-source, cross-platform, and is supported by a large community of developers.

Conclusion

Of course all of us in the software development profession should support the research and development of new programming languages, runtimes, frameworks, etc. We all need to push the state-of-the-art forward. However, let’s not lose track of what we already have in the world of managed code. The existing programming languages, runtimes, and frameworks set an extraordinarily high bar.

You may review the full source code in the XML Parser folder of my Sandbox project in GitHub.


(1) Though of course I’m not accounting for unmeasured hardware differences between our PCs. See My Chess PC on my chess programming blog for detailed specs.

Leave a Reply

Your email address will not be published. Required fields are marked *