How to Batch DELETE Statements with Hibernate
Join the DZone community and get the full member experience.
Join For FreeIntroduction
In my , I explained the Hibernate configurations required for batching INSERT and UPDATE statements. This post will continue this topic with DELETE statements batching.
Domain model entities
We’ll start with the following entity model:
The Post entity has a one-to-many association to a Comment and a one-to-one relationship with the PostDetails entity:
@OneToMany(cascade = CascadeType.ALL, mappedBy = "post",
orphanRemoval = true)
private List<Comment> comments = new ArrayList<>();
@OneToOne(cascade = CascadeType.ALL, mappedBy = "post",
orphanRemoval = true, fetch = FetchType.LAZY)
private PostDetails details;
The up-coming tests will be run against the following data:
doInTransaction(session -> {
int batchSize = batchSize();
for(int i = 0; i < itemsCount(); i++) {
int j = 0;
Post post = new Post(String.format(
"Post no. %d", i));
post.addComment(new Comment( String.format(
"Post comment %d:%d", i, j++)));
post.addComment(new Comment(String.format(
"Post comment %d:%d", i, j++)));
post.addDetails(new PostDetails());
session.persist(post);
if(i % batchSize == 0 && i > 0) {
session.flush();
session.clear();
}
}
});
Hibernate Configuration
As , the following properties are required for batching INSERT and UPDATE statements:
properties.put("hibernate.jdbc.batch_size",
String.valueOf(batchSize()));
properties.put("hibernate.order_inserts",
"true");
properties.put("hibernate.order_updates",
"true");
properties.put("hibernate.jdbc.batch_versioned_data",
"true");
Next, we are going to check if DELETE statements are batched as well.
JPA Cascade Delete
Because is convenient, I’m going to prove that CascadeType.DELETE and JDBC batching don’t mix well.
The following tests is going to:
- Select some Posts along with Comments and PostDetails
- Delete the Posts, while propagating the delete event to Comments and PostDetails as well
@Test
public void testCascadeDelete() {
LOGGER.info("Test batch delete with cascade");
final AtomicReference<Long> startNanos =
new AtomicReference<>();
addDeleteBatchingRows();
doInTransaction(session -> {
List<Post> posts = session.createQuery(
"select distinct p " +
"from Post p " +
"join fetch p.details d " +
"join fetch p.comments c")
.list();
startNanos.set(System.nanoTime());
for (Post post : posts) {
session.delete(post);
}
});
LOGGER.info("{}.testCascadeDelete took {} millis",
getClass().getSimpleName(),
TimeUnit.NANOSECONDS.toMillis(
System.nanoTime() - startNanos.get()
));
}
Running this test gives the following output:
Query:{[delete from Comment where id=? and version=?][55,0]} {[delete from Comment where id=? and version=?][56,0]}
Query:{[delete from PostDetails where id=?][3]}
Query:{[delete from Post where id=? and version=?][3,0]}
Query:{[delete from Comment where id=? and version=?][54,0]} {[delete from Comment where id=? and version=?][53,0]}
Query:{[delete from PostDetails where id=?][2]}
Query:{[delete from Post where id=? and version=?][2,0]}
Query:{[delete from Comment where id=? and version=?][52,0]} {[delete from Comment where id=? and version=?][51,0]}
Query:{[delete from PostDetails where id=?][1]}
Query:{[delete from Post where id=? and version=?][1,0]}
Only the Comment DELETE statements were batched, the other entities being deleted in separate database round-trips.
The reason for this behaviour is given by the ActionQueue sorting implementation:
if ( session.getFactory().getSettings().isOrderUpdatesEnabled() ) {
// sort the updates by pk
updates.sort();
}
if ( session.getFactory().getSettings().isOrderInsertsEnabled() ) {
insertions.sort();
}
While INSERTS and UPDATES are covered, DELETE statements are not sorted at all. A JDBC batch can only be reused when all statements belong to the same database table. When an incoming statement targets a different database table, the current batch has to be released, so that the new batch matches the current statement database table:
public Batch getBatch(BatchKey key) {
if ( currentBatch != null ) {
if ( currentBatch.getKey().equals( key ) ) {
return currentBatch;
}
else {
currentBatch.execute();
currentBatch.release();
}
}
currentBatch = batchBuilder().buildBatch(key, this);
return currentBatch;
}
If you enjoy reading this article, you might want to subscribe to my newsletter and get a discount for my book as well.
Orphan removal and manual flushing
A work-around is to dissociate all Child entities while manually flushing the HibernateSession before advancing to a new Child association:
@Test
public void testOrphanRemoval() {
LOGGER.info("Test batch delete with orphan removal");
final AtomicReference<Long> startNanos =
new AtomicReference<>();
addDeleteBatchingRows();
doInTransaction(session -> {
List<Post> posts = session.createQuery(
"select distinct p " +
"from Post p " +
"join fetch p.details d " +
"join fetch p.comments c")
.list();
startNanos.set(System.nanoTime());
posts.forEach(Post::removeDetails);
session.flush();
posts.forEach(post -> {
for (Iterator<Comment> commentIterator =
post.getComments().iterator();
commentIterator.hasNext(); ) {
Comment comment = commentIterator.next();
comment.post = null;
commentIterator.remove();
}
});
session.flush();
posts.forEach(session::delete);
});
LOGGER.info("{}.testOrphanRemoval took {} millis",
getClass().getSimpleName(),
TimeUnit.NANOSECONDS.toMillis(
System.nanoTime() - startNanos.get()
));
}
This time all DELETE statements are properly batched:
Query:{[delete from PostDetails where id=?][2]} {[delete from PostDetails where id=?][3]} {[delete from PostDetails where id=?][1]}
Query:{[delete from Comment where id=? and version=?][53,0]} {[delete from Comment where id=? and version=?][54,0]} {[delete from Comment where id=? and version=?][56,0]} {[delete from Comment where id=? and version=?][55,0]} {[delete from Comment where id=? and version=?][52,0]} {[delete from Comment where id=? and version=?][51,
Query:{[delete from Post where id=? and version=?][2,0]} {[delete from Post where id=? and version=?][3,0]} {[delete from Post where id=? and version=?][1,0]}
SQL Cascade Delete
A better solution is to use SQL cascade deletion, instead of JPA entity state propagation mechanism. This way, we can also reduce the DML statements count. Because Hibernate Session acts as a , we must be extra cautious when mixing entity state transitions with database-side automatic actions, as the Persistence Context might not reflect the latest database changes.
The Post entity one-to-manyComment association is marked with the Hibernate specific @OnDelete annotation, so that the auto-generated database schema includes the ON DELETE CASCADE directive:
@OneToMany(cascade = {
CascadeType.PERSIST,
CascadeType.MERGE},
mappedBy = "post")
@OnDelete(action = OnDeleteAction.CASCADE)
private List<Comment> comments = new ArrayList<>();
Generating the following DDL:
alter table Comment add constraint
FK_apirq8ka64iidc18f3k6x5tc5 foreign key (post_id)
references Post on delete cascade
The same is done with the PostDetails entity one-to-one Post association:
@OneToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "id")
@MapsId
@OnDelete(action = OnDeleteAction.CASCADE)
private Post post;
And the associated DDL:
alter table PostDetails add constraint
FK_h14un5v94coafqonc6medfpv8 foreign key (id)
references Post on delete cascade
The CascadeType.ALL and orphanRemoval were replaced with CascadeType.PERSIST and CascadeType.MERGE, because we no longer want Hibernate to propagate the entity removal event.
The test only deletes the Post entities.
doInTransaction(session -> {
List<Post> posts = session.createQuery(
"select p from Post p")
.list();
startNanos.set(System.nanoTime());
for (Post post : posts) {
session.delete(post);
}
});
The DELETE statements are properly batched as there’s only one target table.
Query:{[delete from Post where id=? and version=?][1,0]} {[delete from Post where id=? and version=?][2,0]} {[delete from Post where id=? and version=?][3,0]}
If you enjoyed this article, I bet you are going to love my book as well.
Conclusion
If INSERT and UPDATE statements batching is just a matter of configuration, DELETE statements require some additional steps, which may increase the data access layer complexity.
Code available on GitHub.
If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider .
Published at DZone with permission of Vlad Mihalcea. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments