Let’s talk about the terrible codebase you are most likely dealing with right now. There is a project and you are needed to add new features and fix bugs. But you open IDE, make a pool of a repository with a project – and you want to cry. It seems that this code is impossible to work with.

The picture, of course, is a little depressing, but … let’s put emotions aside. And let’s see what we can do quickly to ease the suffering.

Why do we feel unhappy working with legacy code?

If you think about it, Legacy is not so bad: it means that the project is alive, it brings money and there is some benefit in it. Of course, it’s cool when we ourselves choose a language, a database, a messaging system, and design an architecture from scratch, but a startup may never get into production. And legacy – already pleases users.

But from the point of view of a senior developer, it often looks like this:

  • the application is very fragile: if we fix a bug somewhere, two new ones arrive in response;
  • there are regressions bugs – they are like vampires that cannot be killed, each release appears over and over again;
  • new features take more and more time: releases are postponed, deadlines are squeezed, you start deliberately increasing your planning time.

These troubles are usually caused by the huge volume of technical debt – and the attitude to it in the team. Tests, refactorings take time. Of course, when it is more important to share features, you want to forget about it. And it seems that we ourselves allowed this to happen in the industry.

When you are asked to do something, something ready is expected of you. And if a feature is needed so urgently that a “fire alarm” in the form of tests is not even needed for it, this should be stipulated at the very beginning. And our task as developers is to convey this to the business.

Step one: write smoke tests that say the application is at least alive

Having tests is really important to a business. And if there were no tests before you, and you decided to improve something, start with them. Perhaps at this stage you do not need the “correct tests” – unit, acceptance tests, etc. Start with “just tests”, which will treat the application as a black box – and cover at least the critical endpoints. For example, make sure that after refactoring the registration, the application at least does not throw a 500 error.

What to use? Alternatively, Codeception – it has a nice interface, you can quickly throw smoke tests on the most critical endpoints, the tests themselves are concise and understandable. And it also has such an expressive API and the test can be read just like a user story.

What to cover? The rule is simple: we cover the functionality that is responsible for the business domain.

We have explicitly written the endpoint that we will test, and through the @see annotation made a point to the controller. This is convenient because by clicking in the IDE, you can go directly to the controller. Conversely, by clicking on the controller action, you can go to the test.

The test itself was as simple as possible: we prepare the data, save it in the database, log in as a user, make a get-request to the API and check that we received 200.

Such a test is the easiest thing to do, but it already provides an airbag. Now we could be sure that after refactoring, the application would not generate a 500 error.

Then we can improve the test: additionally check the response scheme and that it really returned what we need.

This way, you can gradually cover all the critical parts of the application and go directly to the code.

Step two: remove unused code

The project does not need any code that you “need later”. Because we have git that already remembered it. It will be necessary – we will restore.

How to understand what to delete and what not. It’s pretty easy to figure out some kind of domain or infrastructure code. The API and endpoints are a completely different matter. To deal with them, you can look at NewRelic and projects on github. But it is better to go straight to the teams and ask around – it may happen that you have some kind of endpoint, from which the analytics pumps out important data once a year. And it will be very ugly to remove it, even if, by all indications, no one called it for almost a year.

Step three: make the output layer

Very often in legacy applications, either entities stick out, or an array is collected and given somewhere in the controller. This is bad – most likely, your encapsulation is broken and the entity has public properties. This is hard to refactor: clients already expect certain fields in the response, and if we want to rename or make a property private, then even having discovered and fixed all the uses, we have a chance to break something and deal with the front.

For example, we just get an entity out of the repository and send it out as it is – that is, as soon as we refactor, the client’s API will change.

To solve the problem, you can instead always use a simple DTO in responses – even if it has only the fields that the customers need in the response. Let’s add a mapping from the entity and return it in the controller.

Now the output layer is not tied to the domain code in any way, the code itself has become more maintainable.

You can safely start improving and refactoring without fear of breaking API backward compatibility.

Step four: static code analysis

Modern PHP has strict types, but even with them, you can still change the value of a variable within the method or function itself:

And since types are checked at runtime, checks can fail at runtime. Yes, it’s always better to fall than to catch problems due to the fact that the string suddenly turned into a number (and not the number that we would expect to see). But sometimes you just want not to crash at runtime. We can perform checks before executing the code using additional tools: Psalm, PHPstan, Phan, and so on.

Perhaps you have a question: why drag static analysis into a legacy project, if we already know without it that the code is not very good there? It will help you check things that are not obvious. In legacy projects, the so-called array-driven development is often encountered – data structures are represented by arrays. For example, there is a legacy service that registers users. It accepts an array $ params.

The code is not obvious. We’ll have to go into the service to find out what fields are used there. By static analysis, we can explicitly indicate that this method needs such and such an array with such and such keys, and under the keys – values of certain types.

Having written the format and data type through the dockblocks, during the next refactoring, we will launch Psalm, it will check all possible calls and tell in advance whether the code will break. Convenient enough when working with legacy.

What’s next?

Take control of your existing codebase, understand how it works, and try to tidy it up a bit. Having in your arsenal tests, statistical analysis and gradually cleaning up the code, you can start more serious refactoring and improve the quality of the product.

And then – just think about what else you need to feel happy working with this code. What else is missing? Make a plan for yourself and gradually improve the code you inherited.