SQL DB Is Not ''The Only Thing You Need''
How to deal with legacy SQL DB spaghetti.
Join the DZone community and get the full member experience.Join For Free
After facing some legacy systems lately and complaints that 'SQL DB is bottleneck' and 'we have huge SQL cluster', usually accompanying with something like 'we plan to switch to microservices.' The idea came up to my mind that somehow 'microservices pattern' is usually interpreted as separating 'calculation' or 'processing' but not about data storage separation.
SQL database in legacy solution is usually used for storing and sometimes processing of:
- Business data.
- 'Stateful activity.'
- Event logs (business, IoT, system).
- Business logic execution.
Let's dive deeper into each scenario.
Business Domain Data
Business data — even though you can move it to some document storage that makes sense sometimes, just leave it there for now. There are usually better options to start optimization with.
Any activity that will end up one day but needs some persistence till that happy moment. Usually, it's a huge piece of work to do it in the proper way but it looks pretty small until you dive deeper. The good news there are already solutions to handle all the heavy load.
- User interaction requests. When the system needs some input from a user. Common cases for enterprise solutions are all sorts of business process automation tasks ('approve document,' 'fulfill form,' etc.).
- Workflows ('saga pattern').
- Message queuing.
- ETL artifacts. Some data is processed on a regular basis and the process is usually multistage, so intermediary results are stored in tables.
- A table that is constantly updating (number of update operations is significant) and 'loosely coupled' with other tables in DB.
- Table with 'status'/'processed' field or 'task' in the table name.
|User Interaction||Tables refer to the 'user data' table and in the permanent update process.||BPMS.||Camunda|
|WF, Sagas||Check for common symptoms.||Specialized WF solutions.||Temporal, Cadence, Netflix Conductor|
|Queues||Look at solution code — one or several 'workers' is the key.||AMQP, the Message bus.||RabbitMq, Azure MessageBus|
|ETL||Same as queues (look for 'workers' in the code) but tables usually contains bigger records.||Build datapipe or use actor approach for 'stateful serverless.'||Airflow, Luigi vs. Akka, Orleans|
Journals of 'actual events' that are not always well structured.
- IoT/IIoT data. The user interacts with some hardware and you log this event or some device's sensor sends data system — think about it as IoT-data-streams and handle these events in 'modern way.'
- Business events. The external system generates data that needs to be processed.
- System, application, solution events, user UI interaction events, etc.
- huge (in terms of the number of rows) tables that constantly growing with no links to other tables.
- 'append only' tables that periodically cleaned up (mostly manually).
|IoT||*Simple Processing||MongoDB, Kafka, Clickhouse.|
|Business Events||*Simple Processing||MongoDB, MessageBus, Data Warehouse.|
|System Events||Logging Solutions||ELK, Prometheus.|
- Store raw incoming events in NoSQL DB (MongoDB) with TTL.
- Create data-pipe to process them (Airflow, Luigi, Spark) or go with RabbitMQ and console-apps.
- Save consumable results in Data Warehouse or OLAP-friendly DB (Clickhouse).
Check 'stream processing' approaches, but it might be too much at this stage.
Stored Procedures and Triggers
Last but not least; you have tons of stored procedures and triggers, in other words, lots of solutions business logic leaves there:
|Triggers||Start 'next step' of processing after 'status' is changed.||Check 'stateful activities' part of this article.|
|Heavy Aggregating Procedures||Generating reports or spreadsheets for export.||Check data pipes.|
|Scheduled Heavy Updating Procedures||Monthly salary payments in a big enterprise.||BPMS/WF and/or immutable data pattern.|
(Very opinionated) Usually, lots of stored procedures and/or triggers are pure evil.
There are definitely more cases of using SQL DBs and other persistence approaches than mentioned in this short article. Nothing wrong to use DB for all the things mentioned above, but there are special solutions to handle these tasks with:
- More functionality. Even if don't need it now, it's like a good old hammer — you'll see many nails around.
- Better tested, production-ready, and better optimized.
- Community to support and developers to find.
Published at DZone with permission of Sergey Kovalev. See the original article here.
Opinions expressed by DZone contributors are their own.