During this time, the monolith of our system has gone through many changes, the most significant happened this year – we transferred it all to .NET 6 and moved to Kubernetes. The transition was not an easy task and lasted a total of a year.

In this article, we will share the details of this large-scale project, talk about the features of the monolith that complicated the transition, and the improvements that saved our developers a lot of pain.

What’s wrong with the monolith

The monolith itself is a completely normal design pattern, and some companies live well with it. But in our case, it became a serious architectural problem that prevented the development teams from delivering value to the business.

The project started on the .NET Framework 4.0 and became huge – at some point it had 2 million lines of code. We started sawing the monolith in 2016, but the process was slow. Now it contains 600 thousand lines of code in C#, 200 projects, 16 of which are being launched. At the beginning of 2021, the monolith was still on the .NET Framework and running on Windows Server, and it was causing a bunch of problems.

Inconvenient development

To develop the monolith, you needed Windows. But not all developers wanted to work on Windows – they mostly use Mac, and there are also those who work on Linux. For Mac, I had to use Parallels Desktop, which was additional expense and inconvenience.

 

In addition, to start developing a monolith on a fresh computer, you had to spend almost a whole day setting up the environment: installing a bunch of different SDKs, Visual Studio of a certain version, IIS Express and much more. Starting a monolith locally turned into a whole story. Our own configuration system with the generation of XML files also added problems (more on this later).

Long and expensive assembly

For assembly there was a Cake script of 500 lines, which worked quickly only on 16-32 core Windows servers. Moreover, each assembly of the monolith took 15 minutes, because it was necessary to prepare our own special artifacts. All assembly steps were carried out sequentially, all this was also done on TeamCity, for which we had to prepare the images of build agents ourselves.

Difficult to test

The monolith’s integration tests “for historical reasons” were written in .NET Core and were in a separate repository. To run them, a full-fledged deployed environment was required. And running integration tests means 6 builds in TeamCity. In general, the process is difficult to maintain, tests are difficult to debug, new ones are difficult to write, and also to do it synchronously in two repositories.

Slow deployments

We use our own deployment system due to the nature of the code and deployment. Or rather, because we once made a gross and terrible mistake.

That’s why now our monolith is shared 16 times—by the number of countries. And we launch not 16 applications, but 256 (actually more, because replicas are needed). We make all new services workable with several countries at once (we call this country-agnostic).

First, we decided to find out what others were doing. But there weren’t really any articles about sites based on the .NET Framework and similar specifics – many were simply published through WebDeploy or they had a different load. As a result, we took the process described in the StackOverflow articles as the basis for the new deployment, because we had a similar stack. All we had to do was repeat their assembly and deployment, adding only the complication with the configuration and countries.

Deployment happens like this: we go to the server via PowerShell Remoting, configure IIS there (for example, create websites), remove IIS from the balancer, update the sites, enter IIS back into the balancer.

This is how we got our own “under-Kubernetes”

The deployment process itself was automatic, but setting up and maintaining virtual machines was manual: for example, introducing a new balancer or server required a lot of time.

It was very difficult to implement autoscaling, auto healing or any other basic Kubernetes features on this system. And why, if we already have Kubernetes – we got it in 2018 for .NET Core and all other services.

The production environment was located on 11 servers. The servers were pets, not cattle – we knew them by name and they performed only one role.

A virtual machine on Windows Server costs at least 2 times more in the cloud than on Linux. The reason for the cost is understandable, because Windows Server is a huge combine with different capabilities, Active Directory, Storage Spaces, etc. However, on our web servers only the web server itself runs and nothing else, that is, we pay for nothing. We do not have any other infrastructure on Windows Server.

The SRE team spent time supporting two infrastructures: Windows Server VM and Kubernetes. For example, to update the logging subsystem, it had to be done twice: on K8s, which took a week in total, and on Windows Server, which took two months.

3 attempts to fix everything

Kubernetes for Windows

It worked, but it was hard to imagine the investment that would have to be made to bring K8s for Windows to parity with K8s for Linux in our environment – logs, metrics, build, etc. In addition, K8s for Windows is not completely ready: for example, in Azure Kubernetes Service in 2020 there was no Windows Server 2022 yet, and Windows Server 2019 images weighed 10 gigabytes.

Mono

Didn’t work out of the box, but could be started. Since the quality of our monolith’s code left much to be desired, Mono encountered funny bugs: for example, some assembled sites lacked assemblies. It seems that on Windows, they were taken from the GAC, but Mono could not do that.

Complete cutting of the monolith

We also considered this option, because if you completely saw the monolith, there would be no problems with it. But we estimate the time frame for a complete cut at 2-3 years, with the allocation of one or two more targeted teams for this, because one service with complete reengineering is cut in 9-12 months, and there are 16 of them. Thus, the cut looks more like a long-term strategy , there is no benefit from it in the short term.

Moreover, in order to cut a monolith, it needs to be developed. And if the monolith has a slow development speed, then it will take a long time to saw it. This is such a vicious circle.

As a result, we calculated that if we switched to .NET 6 and Kubernetes, we would save about 10% of the cost of our infrastructure per month on servers only, not counting the opportunity cost of accelerated development and deployment. We will reduce costs through Windows licenses and by increasing server utilization, for example, autoscaling.

Translating the monolith to .NET 6

The project started in May 2021. At this point, 4 out of 16 services were already on .NET Core 2.1-3.1. Most of the libraries were also on .net standard 2.0. Accordingly, 12 services of different sizes had to be updated – some had a pair of controllers, and some had 300 Razor Views and controllers.

We started by installing .NET 5 on Windows virtual machines and agents, setting up the stand infrastructure, cleaning up the repository and projects. We switched to other tasks and went on vacation.

AlertServer.Web was translated over the summer. In the process, we discovered that the solution is very connected, which makes it difficult to update something in isolation, packages.config makes updating and restoring packages difficult, and it’s almost impossible to use the rider.

The fact is that projects on packages.config do not support transitive links to packages, they pump out all their packages into the ./packages directory and the rider (and VS) mercilessly slow down any attempt to do something. When you transfer csproj to PackageReference, packages are updated instantly, and restored to the cache, and from there they are pulled up by references in project.assets.json.

As a result, along with the main project, we had to transfer half of the repository and libraries to .net standard 2.0 and PackageReference. Then things went faster, because… the bulk of the translation is updating packages.

From the end of August until November we translated Auth, LegacyFacade & Consumers, were distracted by tests, a green pipeline, and transferring private NuGet packages to GitHub Packages (from myget.org).

In November, 2 new people joined us, and there were three backenders on the project. We started and completed the translation of two services at once – CallCenter.Api and Api, transferred autotests to .NET 5, and started working on CallCenter.Web.

By the end of February, the entire solution was updated to .NET 6.0. At the same time, 4 more different teams were involved in the translation of the remaining four services: Admin.Web, CashHardware.Web, OfficeManager.Web, RestaurantCashier.Web. At this point, 15 people were already working on the project. At the same time, the guys who joined the translation at the start acted as consultants for the “newcomers”.

In March, we mainly worked on minor “tweaks” and began moving to Kubernetes:

  • prepared assembly, testing on Linux & GitHub Actions;
  • made applications cross-platform;
  • removed System.Drawing;

Paths have been fixed everywhere (at least separators from \ to /), krakozyabry (html encode)

In May, we started deploying the monolith to Kubernetes, transferred the first test benches and low-load canary productions.

During load testing, we found a lot of bugs, mainly in sync over async code. How it was fixed:

  • used Ben.BlockingDetector to find the blocking code;
  • Microsoft.VisualStudio.Threading.Analyzers;
  • jaeger tracing;

We discovered a problem with graceful shutdown: if it is not done, the pods will be killed while requests are being sent to them, and clients will see errors. To work correctly in Kubernetes, an application must correctly handle external stop signals (SIGTERM). How to configure SIGTERM processing is described here.

And finally, in June, the entire Dodo IS was fully operational in Kubernetes!

An example checklist for upgrading a service from .NET Framework 4.8 to .NET 6

We start with either the .NET upgrade assistant or convert packages.json to nuget references (a tool in Rider/Visual Studio).

Update csproj file:

  • update the file header to Project Sdk=”Microsoft.NET.Sdk.Web”;
  • update the framework version (TargetFramework) to NET6.0;
  • we remove from the project all targets unknown to science;

We delete all left PropertyGroups, except the necessary ones.

We delete all itemgroups related to the inclusion of cs and other files in the project.

Removing import csharp.targets

Remove all System.* Microsoft.* packages

We are trying to remove all packages that are needed only as dependencies.

If there is binary serialization, it must be explicitly enabled EnableUnsafeBinarySerialization

Update Startup:

  • delete Global.asax
  • remove unnecessary dependencies
  • remove owin
  • we use .NET 6 host

We use UseForwardedHeaders in a startup.

We transfer the configuration to Microsoft.Extensions.Configuration.

Updating logging: switching to Microsoft.Extensions.Logging.

We update the metrics: either the latest version of prometheus-net or OpenTelemetry.

If possible, we abandon Autofac in favor of Microsoft.Extensions.DependencyInjection.

We are attentive to the services that start: we are converting them to HostedService.

Updating Swagger to ASP.NET Core version.

Updating SignalR to ASP.NET Core version.

We update MassTransit to the latest version.

We update all middleware.

If there is an HtmlString in the views, change it to IHtmlContent.

Update authentication and authorization:

analyze the use of Session and SystemUser (our own abstraction over the authenticated user);

Don’t forget that Session in ASP.NET Core is synchronous by default. You must call .LoadAsync() before any work to make it asynchronous.

Updating controllers:

we cut down the controller method signatures so that ActionResult is returned everywhere, for example:

was

public PartialViewResult PartialIndex()

became

public IActionResult PartialIndex()

we check the serialization: either we leave Newtonsoft.JSON, or we move from it everywhere to system.text.json;

casing defaults have changed in JSON serialization, take this into account (JsonSerializerOptions.PropertyNamingPolicy = null).

We configure routing and model binding.

Don’t forget to translate the test projects.

Summary and results

The task of updating the monolith is a long process. We were able to do this relatively quickly due to the fact that people were added to the project in stages. We started with one developer in May 2021, added two more in November, and some of the services were translated by the owning teams themselves. However, at the end of February we decided to speed up and involved 4 more teams in the project, and suddenly everything was completed in three weeks. This was possible due to the fact that we already had translation expertise, which we could share with 4 teams, working as visiting experts for them.

Much of the complexity lay in the infrastructure libraries. Those. The C# code itself worked, plus or minus, the same way, but the infrastructure libraries and framework worked completely differently. Therefore, in the simplest case, the entire update consisted of updating all external libraries: NLog, MassTransit, etc., updating Startup/Program.cs. For example, many times we wanted to abandon Autofac in favor of Microsoft DI and put it off because then we would have to touch literally the entire solution.

We transferred 16 sites to .NET6, touched more than 400 Razor Views, cut out a countless amount of old code and libraries, deleted more than 100 servers, several domains, and a bunch of configurations from TeamCity.

Now we only have one infrastructure for everyone. Monolith can be developed on all platforms (Windows, Linux, MacOS), assembly takes 6 minutes, updating in production takes 10 minutes. For .NET 6 we use GitHub Actions, which has hosted runners that you pay by the second and don’t need to configure anything. As a result, we saved 10% of monthly expenses.

I hope our experience will help you avoid mistakes if you have to face a similar task.