Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Generate SQL Linked List Data With Python Generator

DZone's Guide to

How to Generate SQL Linked List Data With Python Generator

Here's how to generate fast table values with Python Generator, a part of dbForge Data Generator.

· Database Zone ·
Free Resource

Databases are better when they can run themselves. CockroachDB is a SQL database that automates scaling and recovery. Check it out here.

In this post, we would like to show how to quickly generate table values that are based on the values of another column of the same table. For this purpose, we will use the Python Generator being a part of dbForge Data Generator for SQL Server.

First, let’s create a test table using the following script:

CREATE TABLE [dbo].[TestLinkedListTable](
[id] [int] NULL,
[name] [nchar](100) NULL,
[previous_id] [int] NULL,
[next_id] [int] NULL
)
GO

Then, we run the Data Generation wizard, select the Python generator for the previous_id column and enter the following script:

import clr
clr.AddReference("System")
from System import DBNull

def main():
  i = 0
  while i != config["n_rows"]:
    if (i == 0):
  yield DBNull.Value
    i = i + 1
    yield i

main()

previous_id-linked-list











For the next_id column, we also select the Python generator and enter the following script:

import clr
clr.AddReference("System")
from System import DBNull

def main():
  i = 2
  while 1 == 1:
   yield i
   i = i + 1
   if (i == config["n_rows"] + 1):
  yield DBNull.Value
  break

main()

next_id-linked-list











Voila! As a result of the above manipulations, we’ve got the records, where each record references to a next record using the next_id column:

result-linked-list


result2-linked-list









That was a good start, but let’s complicate the task a bit, and try generating several linked lists of various sizes, whereas each list must be related to a unique Parent. And furthermore, the value in each child must be computed from a value column in its Parent.

So again, let’s create two tables:

CREATE TABLE [dbo].[TestLinkedListTableChild](

[id] [int] NULL,

[name] [nchar](100) NULL,

[previous_id] [int] NULL,

[next_id] [int] NULL,

[valueX] [int] NULL,

[commonParent] [int] NULL

)

CREATE TABLE [dbo].[Parent](

[id] [int] NULL,

[name] [nchar](100) NULL,

[value] [int] NULL

)

Then, we run the Data Generation wizard and clear the Include NULL values option.

Let’s enter the following Python codes:

  1. For the previous_id column:
    import clr
    import random
    clr.AddReference("System")
    from System import DBNull
    
    clr.AddReference("System")
    from System import Random
    random = Random(1)
    
    def main():
      i = 0
    
      max_possible_count_in_chain = 5 
    
      while 1 == 1:
         items_count_in_chain = random.Next(1, max_possible_count_in_chain)
    
         while 1 == 1:
           if (items_count_in_chain == 0 or i == 0):
          yield DBNull.Value
    
           if (items_count_in_chain == 0):
         i+= 1
         break
           i+= 1
           yield i
           items_count_in_chain -= 1
    
    main()
  2. For the next_id column:
    import clr
    import random
    clr.AddReference("System")
    from System import DBNull
    
    clr.AddReference("System")
    from System import Random
    random = Random(1)
    
    def main():
      i = 1
      max_possible_count_in_chain = 5 
    
      while 1 == 1:
         items_count_in_chain = random.Next(1, max_possible_count_in_chain)
    
         while 1 == 1:
           if (items_count_in_chain == 0):
          yield DBNull.Value
    
           if (items_count_in_chain == 0):
         i+= 1
         break
           i+= 1
           yield i
           items_count_in_chain -= 1
    
    main()
  3. For the commonParent table:
    import clr
    import random
    clr.AddReference("System")
    from System import DBNull
    
    clr.AddReference("System")
    from System import Random
    random = Random(1)
    
    def main():
      i = 0
      max_possible_count_in_chain = 5 
    
      while 1 == 1:
         i +=1 
         items_count_in_chain = random.Next(1, max_possible_count_in_chain) + 1
    
         while 1 == 1:
            if (items_count_in_chain == 0):
               break
            yield i
            items_count_in_chain -= 1
    
    main()

After that, we generate a script and run it against the database.

To remove the records from the Parent table which didn’t have corresponding records in theTestLinkedListTableChild table, we run the following script:

delete parent
where not exists 
(select 1 from TestLinkedListTableChild tl where tl.commonParent = parent.id) 

Since the last chain couldn’t be not closed,
unclosed-chain-linked-list







Let’s run the following script to remove the last broken record:

delete TestLinkedListTableChild where commonParent not in (
  select commonParent from TestLinkedListTableChild tl
    where exists (
    select commonParent from TestLinkedListTableChild tl_child
    where next_id is null and tl.commonParent = tl_child.commonParent
  ) and exists (
  select commonParent from TestLinkedListTableChild tl_child
  where previous_id is null and tl.commonParent = tl_child.commonParent
 )
)

Finally, we’ve got the desired result–totally valid chains:

result-2-linked-list

Databases should be easy to deploy, easy to use, and easy to scale. If you agree, you should check out CockroachDB, a scalable SQL database built for businesses of every size. Check it out here. 

Topics:
sql server ,data generation ,python ,test data

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}