What is better – Oracle or Redis or How to justify the choice of the platform
By seeing such a headline, you probably have already decided that the article is either stupidity or a provocation. But do not rush to conclusions: employees of large corporations, especially corporations with state participation, quite often have to compare different platforms, including completely opposite ones, for example, like those in the title.
Of course, no one compares DBMS like that, because their strengths and weaknesses are well known. As a rule, platforms that solve any applied problem are subject to comparison.
When you start an educational project or a hobby project, the motivation for choosing a platform can be very diverse: “I know this platform very well”, “I am interested in understanding this one”, “here is the best documentation” … In the case of a commercial company, the selection criterion is one: how much will you have to pay and what willyou get for this money.
Naturally, you want to pay less and get more. However, it is necessary to decide which is more important – pay less or get more, and assign a weight to each node. Let’s assume that a high-quality solution is more important to us than a cheap one, and we assign the “Cost” node a weight of 40%, and the “Opportunities” node – 60%.
In large corporations, the opposite is usually the case – the weight of the value does not fall below 50%, and maybe more than 60%. In our example, it is only important that the total weight of the child nodes of any parent node must be 100%.
Db-engines.com knows about 500 database management systems. Naturally, if you choose a target platform from so many options, you may end up with a review article, but not a commercial project. In order to reduce the choice space, cutoff criteria are formulated, and if the platform does not meet these criteria, then it is not considered.
Cutoff criteria can relate to technological features, for example:
- ACID guarantees;
- relational data model;
- SQL language support (note, this is not the same as “relational model”);
- possibility of horizontal scaling.
There may be general criteria:
- availability of commercial support;
- the presence of the platform in some rating (for example, in the first hundred of the db-engines.com rating);
- presence of experts in the market.
After all, there may be enterprise-specific criteria:
- availability of specialists in the team;
- compatibility with monitoring system X or with backup system Y, to which all maintenance is connected …
The most important thing is to have a list of cut-off criteria. Otherwise, there will certainly be some expert (or “expert”) who enjoys the special confidence of the management, who will say “why didn’t you choose platform Z, I know it’s the best.”
The cost of the solution obviously consists of the cost of licenses, the cost of maintenance and the cost of equipment.
If the systems are of approximately the same class (for example, Microsoft SQL Server and PostgreSQL), then for simplicity, we can assume that the amount of equipment for both solutions will be approximately the same. This will allow you not to evaluate the equipment, thereby saving a lot of time and effort. If you have to compare completely different systems (Oracle vs. Redis), then it is obvious that for a correct assessment it is necessary to do sizing (calculation of the amount of equipment). Sizing a non-existent system is a very thankless task, so they still try to avoid such a comparison. It is easy to do this: zero data loss and a relational model are written in clipping conditions, or vice versa – a load of 50 thousand transactions per second.
To evaluate licenses, it is enough to ask the vendor or its partners for the cost of a license for a fixed number of cores and support for a fixed period. As a rule, companies have already built strong relationships with software vendors, and if the database operation department cannot answer the cost question on its own, then one letter is enough to receive this information.
Different vendors may have different licensing metrics: by the number of cores, the amount of data, or the number of nodes. The standby database can be free, or it can be licensed in the same way as the main one. If only some differences in metrics were found, you will have to describe in detail the model stand and calculate the cost of licenses for the stand.
An important point for a correct comparison is the same support conditions. For example, Oracle support costs 22% of the license price per year, and PostgreSQL support costs nothing. Is it correct to compare? No, because an error that cannot be eliminated on our own has completely different consequences: in the first case, support specialists will quickly help to fix it, and in the second case there is a risk of project delay or downtime of the finished system for an indefinite period.
There are three ways to equalize the calculation conditions:
- Use Oracle without support (in reality, this does not happen).
- Buy PostgreSQL support – for example, from Postgres Professional.
- Include risks associated with a lack of support.
For example, the calculation of risks may look like this: in the case of a fatal database failure, the downtime of the system will be 1 business day. The planned profit from the use of the system is 40 billion Mongolian tugriks per year, the frequency of accidents is estimated at 1/400, thus, the risk of lack of support is estimated at about 100 million dollars per year. Obviously, the “projected profit” and the “estimated accident rate” are virtual quantities, but it is much better to have such a model than to have none.
In reality, the system may be too important, and reputation losses from extended downtime will be unacceptable, so support will be required. If downtime is allowed, then dropping support can sometimes be a good way to save money.
Suppose that after all the calculations, the cost of operating platform A for 5 years turned out to be 800 million, the cost of operating platform B – 650 million, and the cost of operating platform C – 600 million. Platform C, as a winner, receives a full point for the cost and platforms A and B – slightly less, in proportion to how many times they are more expensive. In this case – 0.75 and 0.92 points, respectively.
The assessment of opportunities is divided into many groups, the number of which is limited only by the imagination of the person making the assessment. The best option seems to be the division of capabilities into teams that will use these capabilities; in our example, these are developers, administrators, and information security officers. Let’s assume that the weights of these functions are distributed as 40:40:20.
Development functions include:
- ease of data manipulation;
- presence of secondary indexes.
The list of criteria, like their weights, are very subjective. Even when solving the same problem, these lists, item weights and answers will differ significantly depending on the composition of your team. For example, Facebook uses MySQL to store data, and Instagram is built on top of Cassandra. It is unlikely that the developers of these applications filled out such tables. One can only guess that Mark Zuckerberg chose a full-fledged relational model, paying for this with the need for applied sharding, while Kevin Systrom laid down scaling by means of the platform, sacrificing the convenience of data access.
Administration functions include:
- backup system capabilities;
- ease of monitoring;
- ease of capacity management – disks and nodes;
- data replication capabilities.
Please note that the wording of the questions must be quantifiable. You can even agree on how to evaluate a particular function. Let’s, for example, try to rate the backup tools using the example of the tools supplied with the Oracle DBMS:
|imp/exp||Upload and download data||0.1|
|begin/end backup||Copy files||0.3|
|ZDLRA||Incremental copy only, fastest recovery to point||1.0|
If there are no clear assessment criteria, it makes sense to ask several experts to give marks and then average them.
Finally, let’s just list the information security features:
- availability of password management policies;
- ability to connect external authentication tools (LDAP, Kerberos);
- role model of access;
- audit capabilities;
- disk encryption;
- encryption over the network (TLS);
- data protection from the administrator.
Separately, I would like to warn against using the results of any load tests that were not done by you as arguments.
First, the data structure and load profile of the applications under test may differ significantly from the task that you are going to solve. About 10-15 years ago, database makers loved to flaunt the results achieved in the TPC benchmarks, but now no one seems to take these results seriously.
Secondly, the performance of the system depends quite strongly on the platform for which the code was originally written and on what hardware the test was carried out. I have seen many benchmarks comparing Oracle to PostgreSQL. The results range from the unconditional superiority of one system to the equally unconditional superiority of another.
And finally, thirdly, you do not know anything about who conducted the test. Both qualifications, which affect the quality of OS and platform customization, are important, as well as motivation, which affects the test results more than all other factors combined.
If performance is critical, do the test yourself instead.