I hope you and your loved ones are doing as well as can be expected in these difficult times. I feel very fortunate to be gainfully employed during the Coronavirus pandemic.
Introduction
I ran into an interesting coding issue at work I’d like to discuss here. It involves a subtle object reference inequality bug in my implementation of the Record Pattern. At least that’s what we call the pattern at work. To discuss the issue here I’ll change the names of classes to avoid revealing any intellectual property and simplify the code to the minimum necessary to demonstrate the issue and the solution.
Before I discuss the issue I should discuss the Record Pattern. The Record Pattern is an OO (Object Oriented) technique for saving and loading data inside domain classes to / from long-term storage such as a file system or database. We use it primarily to serialize and de-serialize an object graph (objects that refer to other objects that in turn refer to other objects, etc) to / from JSON for storage in a database nvarchar(max) column. Usually we persist domain classes to normalized database tables, rows, and columns. This of course allows fast lookup of data via SQL queries against indexed tables. However, sometimes we persist domain classes to JSON in a nvarchar(max)
column when we’re certain we won’t need to query the domain classes by their properties. Meaning, we know we’ll only load the root domain class given its ID.
Why use the Record Pattern? Why not simply persist and restore a domain class directly via the following code?
string json = JsonConvert.SerializeObject(domainClassInstance);
Because of property getter and setter side effects. If the domain class is serialized instead of its record, the act of writing a property (calling setter) or reading a property (calling getter) may trigger code in the property body that mutates (alters) data elsewhere in the object graph. Consequently, the data persisted to JSON may not reflect the corresponding object graph in memory. Conversely, after de-serialization, the object graph in memory may not reflect the corresponding data read from JSON.
The Record Pattern avoids potential data corruption that may occur when a serializer invokes domain class properties.
Side Effects
OK, that’s a bit of hand-waving on my part. I’m attempting to establish our motivation to use the Record Pattern- it avoids data corruption issues caused by property getter and setter side effects. The engineering tradeoff here is accepting increased complexity in OO class design (writing a record class for each domain class) in exchange for a corruption-free, full-fidelity data-persistence cycle of serialization and de-serialization. In order to proceed I’m sure you need more than hand-waving. Fair enough. I’ll write code that demonstrates domain class side-effects due to logic embedded in a property setter. Then I’ll demonstrate bugs caused by the side effect.
Let’s model a baseball team.
I’m not going to duplicate my work code here. I need a domain for example code. We’ll, it’s spring and due to the Coronavirus pandemic I’m missing baseball. Let’s model a baseball team. First, let’s write some interfaces.
In the above code I define interfaces for a team, team member (player or coach) and a repository (another OO design pattern). My employer uses a variation of the Repo Pattern where the Repo implements Create and Read methods and domain classes implement Update and Delete methods. Once a class instance is “handed out” by the repo, so to speak, the instance is responsible for its own data persistence. Other companies may choose to implement all CRUD operations in the Repo on the principal that domain classes should remain ignorant of how they’re persisted. I’ll write my example code using the Repo Pattern we use at work.
Let’s write code that implements these interfaces.
Did you spot the side effect?
In the TeamMemberBase
class, increasing a player’s salary triggers code that determines if the salary increase puts the team over the league-mandated team salary cap. If so, it then reduces salaries of the lowest paid members of the team until the team’s total salary is at or under the salary cap.
A contrived example? Sure. Please suspend your disbelief for a moment and imagine each team member hired an incompetent agent that overlooked fine print stating “team management reserves the right to decrease pay at any moment, including removing all pay ($0 salary).”
Now let’s instantiate class instances to represent the 1984 Chicago Cubs and print their (totally fictional) salaries.
PS C:\Users\Erik\...\Baseball App> dotnet run -c release -- testsalaries Team Salary Cap = $50,000,000. Team Member Salary ======================================== Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Jim Frey $4,000,000 Ryne Sandberg $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Don Zimmer $2,000,000 Billy Connors $2,000,000 Jody Davis $1,000,000 ======================================== Total $44,000,000
The team’s total salary is $44 million, which is under the $50 million cap. Let’s give Ryne Sandberg a raise for his MVP performance. Let’s also raise Jody Davis’ salary for being King of Wrigley Field. Refer to the TestSalaryUpdate_23_7()
method in the Program.cs
source code above. These raises will bump the team salary over the cap.
PS C:\Users\Erik\...\Sandbox\Baseball App> dotnet run -c release -- testsalaryupdate_23_7 Team Salary Cap = $50,000,000. Team Member Salary ======================================== Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Jim Frey $4,000,000 Ryne Sandberg $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Don Zimmer $2,000,000 Billy Connors $2,000,000 Jody Davis $1,000,000 ======================================== Total $44,000,000 Increase Sandberg's salary by $7,000,000 Increase Davis' salary by $2,000,000 Team Salary Cap = $50,000,000. Team Member Salary ======================================== Ryne Sandberg $11,000,000 Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Jim Frey $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Billy Connors $2,000,000 Jody Davis $2,000,000 Don Zimmer $0 ======================================== Total $50,000,000
What happened? Well, the team now is at the league-mandated $50 million salary cap. Unfortunately, coach Don Zimmer had to sacrifice his salary to pay Sandberg and Davis. And Jody Davis didn’t receive his full raise. He was cheated out of $1 million. What happens if we change the order in which we increase salaries? Let’s increase Jody Davis’ salary first, then increase Ryne Sandberg’s salary second. Refer to the TestSalaryUpdate_7_23()
method in the Program.cs
source code above.
PS C:\Users\Erik\Documents\Visual Studio 2019\Projects\Sandbox\Baseball App> dotnet run -c release -- testsalaryupdate_7_23 Team Salary Cap = $50,000,000. Team Member Salary ======================================== Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Jim Frey $4,000,000 Ryne Sandberg $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Don Zimmer $2,000,000 Billy Connors $2,000,000 Jody Davis $1,000,000 ======================================== Total $44,000,000 Increase Davis' salary by $2,000,000 Increase Sandberg's salary by $7,000,000 Team Salary Cap = $50,000,000. Team Member Salary ======================================== Ryne Sandberg $11,000,000 Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Jim Frey $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Jody Davis $3,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Billy Connors $1,000,000 Don Zimmer $0 ======================================== Total $50,000,000
The team still is at the league-mandated $50 million salary cap. Sandberg still receives an $11 million salary. However, Jody Davis now receives his expected $3 million salary. And coaches Billy Connors and Don Zimmer had to sacrifice their salary to pay Sandberg and Davis. So the order in which we adjust player salaries affects the final result. Yikes. A different order of operations produces a different result. This illustrates a nasty bug caused by placing team salary cap logic in the team member salary property. That design choice breaks the commutative nature of setting properties. Suddenly the order in which properties are set matters. The bug is quite insidious. You may not have discovered it unless specifically testing the condition where a salary increase causes the lowest paid team member to “leapfrog” over other poorly paid team members. Also note it’s quite arbitrary that Don Zimmer lost all of his salary in the first case, and Billy Connors lost only $1 million, considering both started with the same $2 million salary. Coach Zimmer was the victim of Microsoft’s implementation of a sorting algorithm.
Suddenly the order in which properties are set matters… In what order does the serializer set those properties? “Oh crap, I don’t know,” you may say. “I didn’t write that code.”
Do you see where I’m going with this? What is de-serialization? It’s reading a sequence of bytes from storage (or a network stream) and recreating an object graph from those bytes by setting class instance properties. In what order does the serializer set those properties? “Oh crap, I don’t know,” you may say. “I didn’t write that code. We use Newtonsoft’s Json.NET package.” Because Json.NET is open source, we can read its source code and determine the order in which properties are set. Yeah, we could but… Do you see how a design choice has increased the mental burden on you and your humble blogger when reasoning about the code?
Of course the simplest way to eliminate side effects is not to write them (in property getters and setters) in the first place. However, that may not be feasible. You may have inherited an older codebase, you may need to implement complex validation logic, you may need to implement lazy loading, etc. Before I show you how the Record Pattern avoids unwanted side effects during serialization, let me first demonstrate how direct serialization of domain classes does trigger side effects.
Serialization Triggering a Side Effect
Let’s examine the property setter’s impact on the data-persistence cycle of serialization and de-serialization. We’ll instantiate class instances to represent the 1984 Chicago Cubs. Then serialize that object graph to JSON, save it to a text file, then read back from the text file and de-serialize it to an object graph. Will the data be restored correctly? Next, we’ll lower the league’s team salary cap from $50 million to $40 million. This simulates a common condition in data-intensive applications. The “state of the world” changes while data sits idle in long-term storage. What will happen when we de-serialize the JSON text file back to an object graph? The team will be in violation of the league’s team salary cap. Refer to the TestFileSerialization()
method in the Program.cs source code above.
PS C:\Users\Erik\Documents\Visual Studio 2019\Projects\Sandbox\Baseball App> dotnet run -c release -- testfileserialization Team Salary Cap = $50,000,000. Team Member Salary ======================================== Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Jim Frey $4,000,000 Ryne Sandberg $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Don Zimmer $2,000,000 Billy Connors $2,000,000 Jody Davis $1,000,000 ======================================== Total $44,000,000 Saving team to JSON text file. Loading team from JSON text file. Team Salary Cap = $50,000,000. Team Member Salary ======================================== Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Jim Frey $4,000,000 Ryne Sandberg $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Don Zimmer $2,000,000 Billy Connors $2,000,000 Jody Davis $1,000,000 ======================================== Total $44,000,000 Saving team to JSON text file. Changing team salary cap to $40,000,000. Loading team from JSON text file. Team Salary Cap = $40,000,000. Team Member Salary ======================================== Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Jim Frey $4,000,000 Ryne Sandberg $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Don Zimmer $2,000,000 Billy Connors $2,000,000 Jody Davis $1,000,000 ======================================== Total $44,000,000
What happened? We got lucky. It worked by accident. The data “survived” the data persistence cycle. We saved a team with a $44 million payroll and we restored that team with the same $44 million payroll, both in the case of leaving the league’s team salary cap in place and the case where the salary cap was lowered below the Cubs’ payroll. If I had to venture a guess (instead of read the Json.NET source code) I’d say this is due to the sequence in which Json.NET instantiates Coach
or Player
classes and adds them to collections. Note we do not enforce the salary cap when adding team members to a collection, only when setting their salary.
It worked by accident. The data “survived” the data persistence cycle.
Let’s perform the same test except this time let’s simulate data persistence to a database. Refer to the TestSqlSerialization()
method in the Program.cs
source code above.
PS C:\Users\Erik\...\Baseball App> dotnet run -c release -- testsqlserialization Changing team salary cap to $40,000,000. Loading team from database. Team Salary Cap = $40,000,000. Team Member Salary ======================================== Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Ryne Sandberg $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Jim Frey $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Billy Connors $1,000,000 Jody Davis $0 Don Zimmer $0 ======================================== Total $40,000,000 PS C:\Users\Erik\Documents\Visual Studio 2019\Projects\Sandbox\Baseball App>
Now we’re unlucky.
We saved a team with a $44 million payroll to a database. When we restored that team the Salary
property setter noticed the team was above the league’s $40 million salary cap (because the salary cap had been lowered from $50 million to $40 million between the time-of-write and time-of-read) and consequently adjusted team member salaries. Why? Because a Player
or Coach
is instantiated, associated with a team, and added to the Players
or Coaches
collection before their Salary
is set. When the Salary
property is set, it finds an associated team and, when one of the last team members is added, notices the team is over the salary cap. This triggers the side effect that mutates data.
To make matters even more confusing, if I move line 7 after line 10 (so the team member is added after their salary is set), the salary cap logic is triggered and it adjusts team member salaries to align with an incorrect salary cap of $42 million. I’ll leave it to you to work out why.
PS C:\Users\Erik\...\Baseball App> dotnet run -c release -- testsqlserialization Changing team salary cap to $40,000,000. Loading team from database. Team Salary Cap = $40,000,000. Team Member Salary ======================================== Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Ryne Sandberg $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Jim Frey $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Billy Connors $2,000,000 Don Zimmer $1,000,000 Jody Davis $0 ======================================== Total $42,000,000
This is a pernicous bug. The act of reading has mutated data. Mutations- modifications of data- should only occur during writes, not during reads. A more appropriate behavior would be to load the team as it was written- with a $44 million payroll- and set a flag indicating the team is not in compliance with league regulations. The application, seeing this flag is set, can inform the user of the violation and ask them to manually correct salaries.
This is a pernicous bug. The act of reading has mutated data.
How do we solve this undesirable behavior of reads mutating data? We use the Record Pattern.
Record Pattern Prevents Data Mutations Upon Read
To clearly set expectations, let me state the Record Pattern will not eliminate all side effects. It will, however, eliminate the especially pernicious side effect of mutating data upon reading it. So what is the Record Pattern? Simply put, expressed in the C# programming language, it’s…
- A class used internally by a domain class to store data.
- The record class defines fields, not properties.
- Domain classes are persisted to (or loaded from) long-term storage by serializing (or de-serializing) their records.
- Because fields cannot have method bodies, no side-effects of getting or setting fields are possible.
- Therefore, serializing or de-serializing an object graph cannot possibly mutate data in the object graph.
Let’s write a version V2 of the baseball API using the Record Pattern.
Note how property getters and setters for primitive types simply pass through to their backing record’s field. Also note the initialization logic in the TeamBase
constructor. It passes a child property of the given record to the constructor of the associated child domain class.
Also note how simple it is to reconstruct the entire object graph from a JSON text file. We simply de-serialize the root record (the record of the root domain class, TeamFile
) and call the Initialize()
method.
Now let’s examine the Record Pattern’s impact on the data-persistence cycle of serialization and de-serialization.
PS C:\Users\Erik\Documents\Visual Studio 2019\Projects\Sandbox\Baseball App> dotnet run -c release -- testfileserialization Team Salary Cap = $50,000,000. Team Member Salary ======================================== Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Jim Frey $4,000,000 Ryne Sandberg $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Don Zimmer $2,000,000 Billy Connors $2,000,000 Jody Davis $1,000,000 ======================================== Total $44,000,000 Saving team to JSON text file. Loading team from JSON text file. Team Salary Cap = $50,000,000. Team Member Salary ======================================== ======================================== Total $0 Saving team to JSON text file. Changing team salary cap to $40,000,000. Loading team from JSON text file. Team Salary Cap = $40,000,000. Team Member Salary ======================================== ======================================== Total $0
Ah, we’ve hit the bug that motivated me to write this blog post.
An Object Reference Inequality Bug
If we pause the debugger before writing the object graph to a JSON text file, we see it does contain data. Likewise, if we load the JSON file into a text editor, we see it contains data.
So why wasn’t the object graph successfully restored from the JSON file? Because in C# value-equality is different from reference-equality. Primitive types are copied by value. So setting a string
, int
, or datetime
property will copy the value from the caller to the callee. After the operation, two variables contain the same data at two separate locations in memory. However, setting a complex object property (like a Coach
) copies the reference (location in memory) from the caller to the callee. After the operation, two variables contain pointers to the same location in computer memory. For example, in the Program.PopulateCubs1984Team
method, the team’s head coach is set by reference (line 3 below).
The bug is caused by orphaned Coach
and Player
records. For example, when the Repo created the Coach
instance (line 2 above), it did not pass a CoachRecord
to the Coach
constructor. Therefore, the coach’s data is backed by its own record. The root record on the Team
instance (which is what is serialized to JSON) has no knowledge of the coach’s record. Similarly, when a Coach
instance is added to the AssistantCoaches
collection, the root record has no knowledge of the coach’s record.
Solution
How do we fix the bug? We graft the new coach instance’s record onto the root record. However, we want to do so without leaking data persistence details. The record classes are an implementation detail best kept encapsulated (C# internal
) within the Baseball Library (DLL). We don’t want to add record classes to public interfaces because that would make them visible in the Baseball App (EXE). Here’s the solution I devised:
That’s a lot of code to grok. So I’ll highlight the essential features:
Property setters for complex objects set both a domain class field and a record field. This is accomplished via casting. Normally I view casting as an anti-pattern, that is, an indication of poor design. However, I feel casting is warranted here. It makes it possible to graft records onto the root record (enabling persistence of the entire object graph by serializing the root record) without exposing records in the public interface. For example, here’s the implementation of the HeadCoach
property in the TeamBase
class.
Collections are explicitly implemented (as opposed to declaring List<T>
) by deriving from Collection<T>
and overriding four protected methods that intercept all mutating operations. For example, here’s the implementation of mutation methods in the Players
class.
Internal interfaces expose a record that’s only visible… well, internally. For example, here’s the record exposed on the Player
class.
The record is not visible in the Baseball App (EXE).
Overloaded constructors ensure the correct runtime record type is passed to base classes. This is not possible via an optional parameter in a single (non-overloaded) constructor because optional parameters are limited to compile-type constants. Calling this(...)
is the solution. See the code above.
Internal interfaces expose overloaded repo methods that accept a record parameter. Again, because an internal
interface is used, the overloaded method is only visible… well, internally.
Copy data from long-term storage to a record, then pass that record to a domain class constructor (via an overloaded repo method mentioned above).
The code works. Run TestFileSerialization and notice the Cubs player salaries are restored correctly from the JSON text file.
PS C:\Users\Erik\...\Baseball App> dotnet run -c release -- testfileserialization Team Salary Cap = $50,000,000. Team Member Salary ======================================== Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Jim Frey $4,000,000 Ryne Sandberg $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Don Zimmer $2,000,000 Billy Connors $2,000,000 Jody Davis $1,000,000 ======================================== Total $44,000,000 Saving team to JSON text file. Loading team from JSON text file. Team Salary Cap = $50,000,000. Team Member Salary ======================================== Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Jim Frey $4,000,000 Ryne Sandberg $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Don Zimmer $2,000,000 Billy Connors $2,000,000 Jody Davis $1,000,000 ======================================== Total $44,000,000 Saving team to JSON text file. Changing team salary cap to $40,000,000. Loading team from JSON text file. Team Salary Cap = $40,000,000. Team Member Salary ======================================== Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Jim Frey $4,000,000 Ryne Sandberg $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Don Zimmer $2,000,000 Billy Connors $2,000,000 Jody Davis $1,000,000 ======================================== Total $44,000,000
Run TestSqlSerialization and notice the Cubs player salaries are restored correctly from the database.
PS C:\Users\Erik\...\Baseball App> dotnet run -c release -- testsqlserialization Changing team salary cap to $40,000,000. Loading team from database. Team Salary Cap = $40,000,000. Team Member Salary ======================================== Rick Sutcliffe $8,000,000 Leon Durham $6,000,000 Ryne Sandberg $4,000,000 Larry Bowa $4,000,000 Gary Matthews $4,000,000 Jim Frey $4,000,000 Ron Cey $3,000,000 Bobby Dernier $3,000,000 Keith Moreland $3,000,000 Don Zimmer $2,000,000 Billy Connors $2,000,000 Jody Davis $1,000,000 ======================================== Total $44,000,000
Conclusion
If you’ve made it this far clearly you have an interest in software architecture. I congratulate you on making the mental effort to get your head around a subtle and advanced programming topic. I hope my baseball analogy and example code helped illustrate the motivation to use the Record Pattern (avoid side effects when serializing data), the problem I encountered when employing it (object reference inequality), and my solution to that problem.
You may review the full source code in the Baseball Library and Baseball App folders of my Sandbox project in GitHub.