DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Express Hibernate Queries as Type-Safe Java Streams
  • Introduction: Querydsl vs. JPA Criteria
  • How Doris Connects to Multiple Databases for Federated Analytics and Data Migration
  • Writing DTOs With Java8, Lombok, and Java14+

Trending

  • How to Submit a Post to DZone
  • Using Python Libraries in Java
  • The Smart Way to Talk to Your Database: Why Hybrid API + NL2SQL Wins
  • How To Build Resilient Microservices Using Circuit Breakers and Retries: A Developer’s Guide To Surviving
  1. DZone
  2. Data Engineering
  3. Databases
  4. An Advanced Complex Data Migration Solution

An Advanced Complex Data Migration Solution

In this article, learn about a data migration solution for cases where Java processing is needed on the records to migrate.

By 
Sven Loesekann user avatar
Sven Loesekann
·
Oct. 09, 23 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
5.3K Views

Join the DZone community and get the full member experience.

Join For Free

In production systems, new features sometimes need a data migration to be implemented. Such a migration can be done with different tools. For simple migrations, SQL can be used. It is fast and easily integrated into Liquibase or other tools to manage database migrations. This solution is for use cases that can not be done in SQL scripts. 

The Use Case

The MovieManager project stores the keys to access TheMovieDB in the database. To improve the project, the keys should now be stored encrypted with Tink. The existing keys need to be encrypted during the data migration, and new keys need to be encrypted during the sign-in process. The movie import service needs to decrypt the keys to use them during the import. 

The Data Migration

Update the Database Table

To mark migrated rows in the "user1" table, a "migration" column is added in this Liquibase script: 

<changeSet id="41" author="angular2guy">
    <addColumn tableName="user1">
        <column defaultValue="0" type="bigint" name="migration"/>
    </addColumn>
</changeSet>


The changeSet adds the "migration" column to the "user1" table and sets the default value "0".

Executing the Data Migration

The data migration is started with the startMigration(...) method in the CronJobs class:

...
private static volatile boolean migrationsDone = false;
...
@Scheduled(initialDelay = 2000, fixedRate = 36000000)
@SchedulerLock(name = "Migrations_scheduledTask", lockAtLeastFor = "PT2H", 
  lockAtMostFor = "PT3H")
public void startMigrations() {
  LOG.info("Start migrations.");
  if (!migrationsDone) {
    this.dataMigrationService.encryptUserKeys().thenApplyAsync(result -> {
      LOG.info("Users migrated: {}", result);
      return result;
    });
  }
  migrationsDone = true;
}


The method startMigrations() is called with the @Scheduled annotation because that enables the use of @SchedulerLock. The @SchedulerLock annotation sets a database lock to limit the execution to one instance to enable horizontal scalability. The startMigrations() method is called 2 seconds after startup and then every hour with the @Scheduled annotation. The encryptUserKeys() method returns a CompletableFuture that enables the use of thenApplyAsync(...) to log the amount of migrated users nonblocking. The static variable migrationsDone makes sure that each application instance calls the dataMigrationService only once and makes the other calls essentially free. 

Migrating the Data

To query the Users, the JpaUserRepository has the method findOpenMigrations:

public interface JpaUserRepository extends CrudRepository<User, Long> {
  ...
  @Query("select u from User u where u.migration < :migrationId")
  List<User> findOpenMigrations(@Param(value = "migrationId") 
    Long migrationId);
}


The method searches for entities where the migration property has not been increased to the migrationId that marks them as migrated.

The DataMigrationService contains the encryptUserKeys() method to do the migration:

@Service
@Transactional(propagation = Propagation.REQUIRES_NEW)
public class DataMigrationService {
...    
  @Async
  public CompletableFuture<Long> encryptUserKeys() {
    List<User> migratedUsers = this.userRepository.findOpenMigrations(1L)
      .stream().map(myUser -> {          
        myUser.setUuid(Optional.ofNullable(myUser.getUuid())
         .filter(myStr -> !myStr.isBlank())
         .orElse(UUID.randomUUID().toString()));             
        myUser.setMoviedbkey(this.userDetailService
          .encrypt(myUser.getMoviedbkey(), myUser.getUuid()));
        myUser.setMigration(myUser.getMigration() + 1);
        return myUser;
    }).collect(Collectors.toList());
    this.userRepository.saveAll(migratedUsers);
    return CompletableFuture.completedFuture(
      Integer.valueOf(migratedUsers.size()).longValue());
  }
}


The service has the Propagation.REQUIRES_NEW in the annotation to make sure that each method gets wrapped in its own transaction. 

The encryptUserKeys() method has the Async annotation to avoid any timeouts on the calling side. The findOpenMigrations(...) method of the repository returns the not migrated entities and uses map for the migration. In the map it is first checked if the user's UUID is set, or if it is created and set. Then the encrypt(...) method of the UserDetailService is used to encrypt the user key, and the migration property is increased to show that the entity was migrated. The migrated entities are put in a list and saved with the repository. Then the result CompletableFuture is created to return the amount of migrations done. If the migrations are already done, findOpenMigrations(...) returns an empty collection and nothing is mapped or saved.

The UserDetailServiceBase does the encryption in its encrypt() method:

...
@Value("${tink.json.key}")
private String tinkJsonKey;
private DeterministicAead daead;
...
@PostConstruct
public void init() throws GeneralSecurityException {
  DeterministicAeadConfig.register();
  KeysetHandle handle = TinkJsonProtoKeysetFormat.parseKeyset(
    this.tinkJsonKey, InsecureSecretKeyAccess.get());
  this.daead = handle.getPrimitive(DeterministicAead.class);
}
...
public String encrypt(String movieDbKey, String uuid) {
  byte[] cipherBytes;
  try {
    cipherBytes = daead.encryptDeterministically(
       movieDbKey.getBytes(Charset.defaultCharset()),
       uuid.getBytes(Charset.defaultCharset()));
  } catch (GeneralSecurityException e) {
      throw new RuntimeException(e);
  }
  String cipherText = new String(Base64.getEncoder().encode(cipherBytes), 
    Charset.defaultCharset());
  return cipherText;
}


  • The tinkJsonKey is a secret, and must be injected as an environment variable or Helm chart value into the application for security reasons. 
  • The init() method is annotated with @PostConstruct to run as initialization, and it registers the config and creates the KeysetHandle with the tinkJsonKey. Then the primitive is initialized.
  • The encrypt(...) method creates the cipherBytes with encryptDeterministcally(...) and the parameters of the method. The UUID is used to have unique cipherBytes for each user. The result is Base64 encoded and returned as String. 

Conclusion: Data Migration

This migration needs to run as an application and not as a script. The trade-off is that the migration code is now in the application, and after the migration is run it, is dead code. That code should be removed then, but in the real world, the time to do this is limited and after some time it is forgotten. The alternative is to use something like Spring Batch, but doing that will take more effort and time because the JPA entities/repos can not be reused that easily. A TODO to clean up the method in the DataMigrationService should do the trick sooner or later. 

One operations constraint has to be considered: during migration, the database is in an inconsistent state and the user access to the applications should be stopped. 

Finally Using the Keys

The MovieService contains the decrypt(...) method:

@Value("${tink.json.key}")
private String tinkJsonKey;
private DeterministicAead daead;
...
@PostConstruct
public void init() throws GeneralSecurityException {
  DeterministicAeadConfig.register();
  KeysetHandle handle = TinkJsonProtoKeysetFormat
    .parseKeyset(this.tinkJsonKey, InsecureSecretKeyAccess.get());
  this.daead = handle.getPrimitive(DeterministicAead.class);
}
...
private String decrypt(String cipherText, String uuid) 
  throws GeneralSecurityException {
  String result = new String(daead.decryptDeterministically(
    Base64.getDecoder().decode(cipherText),
    uuid.getBytes(Charset.defaultCharset())));
  return result;
}


The properties and the init() method are the same as with the encryption. The decrypt(...) method first Base64 decodes the cipherText and then uses the result and the UUID to decrypt the key and return it as a String. That key string is used with the movieDbRestClient methods to import movie data into the database. 

Conclusion

The Tink library makes using encryption easy enough. The tinkJsonKey has to be injected at runtime and should not be in a repo file or the application jar. A tinkJsonKey can be created with the EncryptionTest createKeySet(). The ShedLock library enables horizontal scalability, and Spring provides the toolbox that is used. The solution tries to balance the trade-offs for a horizontally scalable data migration that can not be done in a script.

Data migration sql Strings Java (programming language)

Published at DZone with permission of Sven Loesekann. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Express Hibernate Queries as Type-Safe Java Streams
  • Introduction: Querydsl vs. JPA Criteria
  • How Doris Connects to Multiple Databases for Federated Analytics and Data Migration
  • Writing DTOs With Java8, Lombok, and Java14+

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: