Today's conventional relational database management systems — such as Oracle, MS SQL, MySQL, etc. — are still relied on by many large-scale corporations. And being a large-scale company brings gigabyte-sized data that might have 150+ columns tables with it (apparently, you can have 1k columns on an Oracle DB table).
Naturally, a relational DB table having unique key columns is inevitable. This may include sequence objects in Oracle and MS SQL, serial objects in PostgreSQL, etc. Those columns are crucial to creating relations between tables and accessing the desired data.
CREATE SEQUENCE sequence_name MINVALUE value MAXVALUE value START WITH value INCREMENT BY value CACHE value;
While accessing the aforementioned data or inserting new data into the table, those unique key columns are mostly used in increments of one. This means that in the process of using those columns in the application layer (Java, C#, etc.), you can store them in a variable and increment by one on-the-spot and save resources for another transaction because it's known that each transaction means more effort to access the physical memory.
With this in mind, as a flaw in the design process of the development of the application, sometimes a developer may be unable to foresee that the unique key column of the enormous gigabyte-sized table can exceed the numerical limit of the designated variable. The database won't control which type you assign to the number after the
SELECT statement, and Java or C# won't know that your data access object returns an unsuitable object until you run the code. For example, if the number on the said column exceeds
2147483647 and you try to assign that value to an integer variable, the code will crash on runtime and there is no way knowing before it crashes whether you designed your variables accordingly.
SELECT SEQUENCE_NAME, LAST_NUMBER FROM DBA_SEQUENCES WHERE LAST_NUMBER>999999999 AND LAST_NUMBER<2147483647
Then comes our beloved logging tools to the rescue, such as Splunk, ELK/Logstash, etc. Those tools commonly give perspective to your large amounts of data of any kind and also let you search and analyze the data. That means it can look through your code files, too! (Check out the References section for more information on Splunk and tools for logging big data.)
After that, the first thing you need to do is watch your sequence objects for a period of time and look for a trend to see if it's worth being concerned about — because you don't want to end up chasing a large number that has never been incremented in months, or maybe even years. That could mean that a table with a large ID column is a look-up or a catalog table that gets inserted almost never since its creation.
Since you know the ones to keep an eye on, you can use Splunk to search through the application codes by using the sequence object's name as if it's been selected from the table and assigned to a variable. If you get a hit from that search, you check the ones getting hit to see if they've been assigned to an unsuitable type.
Below, you can see a regex for a Splunk search on Java files:
search index=sourcecode 'name of the object here' sourcetype=java
The next step is to have a whitelist so that you don't have to look for those which are safe every time you run that control.
To wrap up, of course, it's crucial to design your development precisely accurately for the behaviors of your data, the client's traffic, and the requirements. But if you want to be sure that the code from the past will be checked for troubles to the future, bringing the database and blessings of industrial data logging solutions to the application layer is the way to go.