The story of storage — part 2

Pavel Novikov
9 min readApr 28, 2019

Well, okay. Seems that previous article was too vague and over-loaded with terms. I’ve realized my mistakes and decide to proceed more closely to code.

First of all, let’s narrow our interest to queries for now. To the read operations in your business logic. Now we know that read operation do not produce any side-effects (pretty obvious, I think it is nothing to prove here). Hmm… we used to hear this definition somewhere, right? Exactly. That sounds like definition of pure function. In OO-languages pure functions technically are made with static keyword. Can we say that all static methods are pure functions? Well, almost. Besides being static, pure functions ought not to depend on any mutable static state: not to write something into static variables nor read values from static variables that are being changed somewhere. Pretty clear, eh? Unfortunately, in modern OO languages we do not have tools to ensure such condition naturally, so we have to manually control that our code does not access any mutable static state. But in practice it usually does not become a noticable problem till you use your brain while writing the code :)

Image illustrating my personal concern about queries in your application

Let’s notice that in C# we have exciting opportunity to easily turn simple static method into extension method. This circumstance actually opens a portal to the brave new world of behavioral mixins. And all of it does not violate OO principles (technically), esures type safety and is backward-compatible with all other language features. That’s why I love C#. That’s why we all do :) Honestly, I believe that C# is our brave new future. But before we will get to it, let’s talk a little about…

The Glorious Old Past

Let’s begin with classics and proceed to its derivatives. Consider following code.

You all know what it is. It is classical repository pattern used along with UoW pattern. I bet that you have seen such code at least once per your career.

What is wrong with it? Well… Too much things to tell. First of all, my personal sorrow, teasing my heart like big blunt rusty knife: that is not C# code.

No, seriously, Microsoft has released 8 versions of modern multi-paradigm language with strong compile-time type safety, having tons of modern features to provide neat code. But when it comes to business — we deny all these achievements and use it like… like Java? Or like C# 2.0? Where are you from, folks? 2005? 1998? I state that I strongly do not want to hear a single word about F#, haskell, DDD and microservices until we have such pieces of sh*t like this code in our C# projects. Amen.

Well okay. The bombing has finished, let’s proceed to exact points:

  • It is not mockable. Yes, formally you can derive these classes, or implement accompanying interfaces, mock all calls and redirect them into collections with test data that is being carefully typed in manually. But come on! I bet that only maniac will actually do (and maintain) that. Industry approach here would be… hm… to deploy test database, stuff it with data, run your code and validate results? Moreover I bet that it would be magnificent if SQLite will be used for that, not full-sized MSSQL instance. Great. So am I right that we have to write ton of code, create fat testing infrastructure incl. separate database, deployment, data transfer and cleanup scripts just to f*kin test that result of GetAll() contains our predefined book with title “Test”? Any questions why IT is so expensive still?
  • It is not scalable. Well, not “scalable” in the sense that we are going to do something different in case if we have high load, no. “Not scalable” here is in the sense of that it simply will not work when you have 500 entity types. Look, we have 18 lines of code per repository implementation. In case if we have 500 entities — there will be 500 additional class files containing o̵v̵e̵r̵ exactly 9000 lines of boilerplate code. Solution? T4 templates? And how long all boilerplate code will be generated? Even if you have latest Core i9 and modern Samsung V-NAND SSD (who said “expensive”?) it will take not less than 30 seconds, I believe. Please keep in mind that 500 entities it is actually only mid-size project. Also code generation approach smells like… huh. Something from early 2000s. Another option is to use generic repositories — and it can actually help. Until you realize the next point:
  • Your IoC will not like that. We use IoC frameworks to build our apps, right? Autofac, NInject, MS Unity, MEF. You use them to glue all the components of your app into one piece. IoC frameworks require registration of things and assuming some lifetime. Like that

I didn’t meet teams who start using IoC modules from early beginning (understandable — on early stages modules are overhead). But I met people who reach their first refactoring stage having 1500 entries in container registered from different places. It will require 2 (expensive) human/weeks to sort this out somehow;

  • It is not testable at all. Regardless of huge IoC registration file, I’d outline some well-known effect that such variety of small units will produce. I believe, everyone who followed such pattern had have a though about it at least once. Every attempt to write unit test starts with question “I need to test some method in class A. Now how damn I instantiate it?”. And you start to dig: “service A needs repository B, also needs service C, that needs repositories D and E, and”… And finally half of your unit test code consists of initializing/mocking things. And 2 small asserts at the end — by this time you don’t have power to write more ones. Soon you will be tired of creating all required services, units and repositories manually and decide to bring IoC into your tests. It will simplify things, but not for long time. After a while you quit creating unit tests with full support from team & management. Because creating tests becomes expensive;
  • Statefulness. You realize that you are adding state to things that does not actually require it? Instead of applying some functionality to existing stateful entity (database in our case) — we are creating another stateful entity (repository, its state — database). Moreover, repository instance is bound by lifetime to the database. It means that it will become useless as soon as database become disposed. I know, it is very old-fashioned to complain about this, but please, people, take pity over your RAM and GC! Instead of passing simple command to O/RM (that also produces tons of sh*t btw) you build small city in memory via reflection (remember IoC) and ask it for some result in polite manner. If it is your price for readability, I’d say that it is too… expensive, yes.

The Complicated Present

Well, okay. Let’s do not consider repository and UoW. It is already considered anti-pattern. Let’s consider CQS example. It must be modern one if I’m not mistaken. Long story short: it suggests to create 3 classes for each query. Like this:

Correct. 3 classes for every query.

I even do not point that in such implementation you do not have ability to chain your queries. I even do not point to bloated IoC registration code. I even do not complain about memory.

I just think that to create 3 stateful classes for each query is pure madness. How are you going to navigate in solution containing 3 classes per query?

Just… just imagine that! Complex projects have thousands of queries — small ad-hoc “get by Id” ones, large pivot reporting queries, mid-sized utility queries consisting of 3–4 lines of code — for each of them you will have to create 3 classes. At least 1 class. How much code will you have? Thousands of classes? Fans of CQS will argue that it is testable: okay, I agree. But am I right that in addition to 3 existing classes we will get unit-test? I think that honestly there is one indisputable advantage of such approach: if you use it — you must be sure you will not lose your job in next 2–3 years, because customer’s business side will definitely need separate IT branch to support this mess. Or bankrupcy. Or both.

Seems that there are not so much approaches left. QueryObject? Oh, wait, it is from Java, where we do not have expression trees and LINQ-SQL translation. Maybe this one will work? Ah, damn, it again contains repository.

Let me briefly explain idea of QueryObject and derived Query Specification pattern ideas in C#: both of them are about creating classes that store query criteria (instance of Expression<>) and (obviously) separating them of execution. The difference is that Query Specification is more “dynamic” — you can construct new specification of several existing specifications. Under the hood it usually works via reflection and APIs of System.Linq.Expressions namespace.

But-but, hold on guys! I want to reveal the ancient truth: Microsoft has already invented API for storing expression trees in .NET without execution! Yeah, you didn’t know?

Moreover, shocking! It also has fluent extensions to dynamically build query criteria!

Ladies and gentelmen, are you ready? Meet…

The IQueryable<> interface.

No comment.

The Brave New Future

Let me tell you about the future. In the future simplicity drives all. Less code = less bugs, faster creation, faster maintenance. I believe that in the future we do not re-invent wheels, nor repeat ourselves.

Let’s take IQueryable interface and twist a little around it. IQueryable holds 3 things basically:

  • reference to data source
  • reference to expression tree that describes data set that we’d like to obtain
  • reference to Query provider that can create new IQueryables and execute existing ones

In .NET framework we have number of extension methods for IQueryable. Some of them are building up expression tree (in immutable manner), another ones are executing queries with variety of options (ToArray(), ToList(), First(), Single() etc).

But! No one forbids you to write your own extensions for IQueryable. Even generic extensions for IQueryable:

Is it enough? Unfortunately no. From time to time we also need other entities to join within your LINQ query. So let’s create another interface that wraps IQueryable and suits for our query extensions. I call it IQueryFor:

Good, only few strokes left. Now we need common access point for queries. I called it Storage:

Finally. Let’s take a look at some use-cases of it. First, you can use it directly:

Or you can inject IStorage into container and use it in services/controllers:

Extremely flexible storage extensions that I’ve made in one of my projects:

This approach allows to pull some subset of fields from DB instead of fetching whole entity (that might be expensive):

If your implementation of IQueryFor<> will not fetch underlying IQueryable immediately, then you can easily flip semantics:

In conclusion

That is my suggestion for queries in system. According to the fact that queries do not produce side-effects, to turn all code that is just retrieving data into static pure extensions for IQueryFor<> instead of pushing them here and there among various stateful services. Such approach is:

  • Easily testable. The only thing you need is to make one single instance of IQueryFor<> implementation, stuff it with your data, and then… well… call static extensions and compare its outcome against expected results. you do not even need to clean up test environment! Here we have to work in assumption that LINQ-SQL translation that EF provides is correct, but it is out of scope of current article. From my personal experience, in most of cases it works as expected;
  • Elegant. You do not need to create service class, register something in IoC. So finally you just immediately write your query. Without any noticable amount of glue code. Do exactly what required, without creating any middleware abstractions;
  • Well-structured. You just create static class or peek some existing one— it does not matter since extension methods automatically connect to types they are being declared for. Even if you will stack all your queries into single file — it will not change a thing from user’s side. You have complete freedom regarding spreading your code among files;
  • RAM-friendly. You do not create hierarhy of stateful entities/services/repositories in order to obtain documents report. Yes, of course, one small IQueryFor<> implementation will be instantiated, but it is nothing in comparison with traditional through-IoC approach. Also you can get bonus memory points using ByIdRequired with projection (described above).

Where are implementations?

Implementation if IQueryFor<> and IStorage you can make by yourself. It is pretty simple. Treat it as your homework. :) In fact, Reinforced.Storage, of course contains its own implementations of these interfaces but they are more complex and consider lots of small details and also have some advanced extensions (e.g. for testing pipeline). If I will show implementations right now — you will be confused and start to ask questions that will require separate articles to answer. So for now you can take described interfaces, overall approach and try it within your project.

While I prepare next article.

See ya.

--

--