DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • MongoDB to Couchbase: An Introduction to Developers and Experts
  • Architecture and Code Design, Pt. 1: Relational Persistence Insights to Use Today and On the Upcoming Years
  • MongoDB to Couchbase for Developers, Part 1: Architecture
  • Introduction to Couchbase for Oracle Developers and Experts: Part 2 - Database Objects

Trending

  • Dropwizard vs. Micronaut: Unpacking the Best Framework for Microservices
  • Mastering Advanced Traffic Management in Multi-Cloud Kubernetes: Scaling With Multiple Istio Ingress Gateways
  • Vibe Coding With GitHub Copilot: Optimizing API Performance in Fintech Microservices
  • Intro to RAG: Foundations of Retrieval Augmented Generation, Part 1
  1. DZone
  2. Data Engineering
  3. Databases
  4. 10 More Common Mistakes Java Developers Make when Writing SQL

10 More Common Mistakes Java Developers Make when Writing SQL

By 
Lukas Eder user avatar
Lukas Eder
·
Oct. 03, 22 · Interview
Likes (3)
Comment
Save
Tweet
Share
37.5K Views

Join the DZone community and get the full member experience.

Join For Free

i was positively surprised to see how popular my recent listing about   10 common mistakes java developers make when writing sql   was, both  on my own blog  and  on my syndication partner dzone  . the popularity shows a couple of things:

  • how important sql is to the professional java world.
  • how common it is to forget about some basic sql things.
  • how well sql-centric libraries such as  jooq  or  mybatis  are responding to market needs, by  embracing sql  . an amusing fact is that users have even mentioned my blog post on  slick’s mailing list  . slick is a non-sql-centric database access library in scala. like  linq (and linq-to-sql) it focuses on language integration  , not on sql code generation.

anyway, the common mistakes i listed previously are far from complete, so i will treat you to a sequel of 10 subtly less common, yet equally interesting mistakes java developers make when writing sql.

1. not using preparedstatements

interestingly, this mistake or misbelief still surfaces  blogs  ,  forums  and mailing lists many years after the appearance of jdbc, even if it is about a very simple thing to remember and to understand. it appears that some developers refrain from using preparedstatements for any of these reasons:

  • they don’t know about preparedstatements
  • they think that preparedstatements are slower
  • they think that writing a preparedstatement takes more effort

first off, let’s bust the above myths. in 96% of the cases, you’re better off writing a preparedstatement rather than a static statement. why? for simple reasons:

  • you can omit syntax errors originating from bad string concatenation when inlining bind values.
  • you can omit sql injection vulnerabilities from bad string concatenation when inlining bind values.
  • you can avoid edge-cases when inlining more “sophisticated” data types, such as timestamp, binary data, and others.
  • you can keep open preparedstatements around for a while, reusing them with new bind values instead of closing them immediately (useful in postgres, for instance).
  • you can make use of  adaptive cursor sharing  (oracle-speak) in more sophisticated databases. this helps prevent hard-parsing sql statements for every new set of bind values.

convinced? yes. note, there are some rare cases when you actually want to inline bind values in order to give your database’s cost-based optimiser some heads-up about what kind of data is really going to be affected by the query. typically, this results in “constant” predicates such as:

  • deleted = 1
  • status = 42

but it shouldn’t result in “variable” predicates such as:

  • first_name like “jon%”
  • amount > 19.95

note that modern databases implement bind-variable peeking. hence, by default, you might as well use bind values for all your query parameters. note also that higher-level apis such as  jpa criteriaquery  or  jooq  will help you generate preparedstatements and bind values very easily and transparently when writing embedded jpql or embedded sql.

more background info:

  • caveats of bind value peeking: an interesting  blog post by oracle guru tanel poder  on the subject
  • cursor sharing. an interesting  stack overflow question  .

 the cure  :

by default, always use preparedstatements instead of static statements. by default, never inline bind values into your sql.

2. returning too many columns

this mistake is quite frequent and can lead to very bad effects both in your database’s execution plan and in your java application. let’s look at the second effect first:

 bad effects on the java application: 

if you’re selecting * (star) or a “default” set of 50 columns, which you’re reusing among various daos, you’re transferring lots of data from the database into a jdbc resultset. even if you’re not reading the data from the resultset, it has been transferred over the wire and loaded into your memory by the jdbc driver. that’s quite a waste of io and memory if you  know  that you’re only going to need 2-3 of those columns.

this was obvious, but beware also of…

 bad effects on the database execution plan: 

these effects may actually be much worse than the effects on the java application. sophisticated databases perform a lot of sql transformation when calculating the best execution plan for your query. it may well be that some parts of your query can be “transformed away”, knowing that they won’t contribute to the projection (select clause) or to the filtering predicates. i’ve recently blogged about this in the context of schema meta data:
 how schema meta data impacts oracle query transformations 

now, this is quite a beast. think about a sophisticated select that will join two views:

select *
from   customer_view c
join   order_view o
  on   c.cust_id = o.cust_id

each of the views that are joined to the above joined table reference might again join data from dozens of tables, such as customer_address, order_history, order_settlement, etc. given the select * projection, your database has no choice but to fully perform the loading of all those joined tables, when in fact, the only thing that you were interested in was this:

select c.first_name, c.last_name, o.amount
from   customer_view c
join   order_view o
  on   c.cust_id = o.cust_id

a good database will transform your sql in a way that most of the “hidden” joins can be removed, which results in much less io and memory consumption within the database.

 the cure  :

never execute select *. never reuse the same projection for various queries. always try to reduce the projection to the data that you really need.

note that this can be quite hard to achieve with orms.

3. thinking that join is a select clause

this isn’t a mistake with a lot of impact on performance or sql correctness, but nevertheless, sql developers should be aware of the fact that the join clause is not part of the select statement per se. the  sql standard 1992  defines a  table reference  as such:

6.3 <table reference>

<table reference> ::=
    <table name> [ [ as ] <correlation name>
      [ <left paren> <derived column list> <right paren> ] ]
  | <derived table> [ as ] <correlation name>
      [ <left paren> <derived column list> <right paren> ]
  | <joined table>

the  from  clause and also the  joined table  can then make use of such  table references  :

7.4 <from clause>

<from clause> ::= 
    from <table reference> [ { <comma> <table reference> }... ]

7.5 <joined table>

<joined table> ::=
    <cross join>
  | <qualified join>
  | <left paren> <joined table> <right paren>

<cross join> ::=
    <table reference> cross join <table reference>

<qualified join> ::=
    <table reference> [ natural ] [ <join type> ] join
      <table reference> [ <join specification> ]

relational databases are very table-centric. many operations are performed on physical, joined or derived tables in one way or another. to write sql effectively, it is important to understand that the  select .. from  clause expects a comma-separated list of table references in whatever form they may be provided.

depending on the complexity of the table reference, some databases also accept sophisticated table references in other statements, such as insert, update, delete, merge. see  oracle’s manuals for instance  , explaining how to create updatable views.

 the cure  :

always think of your  from  clause to expect a table reference as a whole. if you write a  join  clause, think of this  join  clause to be part of a complex table reference:

select c.first_name, c.last_name, o.amount
from
 
    customer_view c
      join order_view o
      on c.cust_id = o.cust_id

4. using pre-ansi join syntax

now that we’ve clarified how table references work (see the previous point), it should become a bit more obvious that the pre-ansi join syntax should be avoided at all costs. to execution plans, it usually makes no difference if you specify join predicates in the  join .. on  clause or in the  where  clause. but from a readability and maintenance perspective, using the  where  clause for both filtering predicates and join predicates is a major quagmire. consider this simple example:

select c.first_name, c.last_name, o.amount
from   customer_view c,
       order_view o
where  o.amount > 100
and    c.cust_id = o.cust_id
and    c.language = 'en'

can you spot the join predicate? what if we joined dozens of tables? this gets much worse when applying proprietary syntaxes for outer join, such as  oracle’s (+) syntax  .

 the cure  :

always use the ansi join syntax. never put join predicates into the  where  clause. there is absolutely no advantage to using the pre-ansi join syntax.

5. forgetting to escape input to the like predicate

the  sql standard 1992  specifies the  like predicate  as such:

8.5 <like predicate>

<like predicate> ::=
    <match value> [ not ] like <pattern>
      [ escape <escape character> ]

the  escape  keyword should be used almost always when allowing for user input to be used in your sql queries. while it may be rare that the percent sign (%) is actually supposed to be part of the data, the underscore (_) might well be:

select *
from   t
where  t.x like 'some!_prefix%' escape '!'

 the cure  :

always think of proper escaping when using the like predicate.

6. thinking that not (a in (x, y)) is the boolean inverse of a in (x, y)

this one is subtle but very important with respect to nulls! let’s review what  a in (x, y)  really means:

                  a in (x, y)
is the same as    a = any (x, y)
is the same as    a = x or a = y

when at the same time,  not (a in (x, y))  really means:

                  not (a in (x, y))
is the same as    a not in (x, y)
is the same as    a != any (x, y)
is the same as    a != x and a != y

that looks like the boolean inverse of the previous predicate, but it isn’t! if any of  x  or  y  is  null  , the  not in  predicate will result in  unknown  whereas the  in  predicate might still return a boolean value.

or in other words, when  a in (x, y)  yields  true  or  false  ,  not(a in (x, y))  may still yield  unknown  instead of  false  or  true  . note, that this is also true if the right-hand side of the  in  predicate is a subquery.

don’t believe it? see this  sql fiddle  for yourself. it shows that the following query yields no result:

select 1
where     1 in (null)
union all
select 2
where not(1 in (null))

more details can be seen in  my previous blog post on that subject  , which also shows some sql dialect incompatibilities in that area.

 the cure  :

beware of the  not in  predicate when nullable columns are involved!

7. thinking that not (a is null) is the same as a is not null

right, so we remembered that sql implements three-valued logic when it comes to handling null. that’s why we can use the null predicate to check for null values. right? right.

but even the null predicate is subtle. beware that the two following predicates are only equivalent for row value expressions of degree 1:

                   not (a is null)
is not the same as a is not null

if a is a row value expression with a degree of more than 1, then the truth table is transformed such that:

  • a is null yields true only if all values in a are null
  • not(a is null) yields false only if all values in a are null
  • a is not null yields true only if all values in a are not null
  • not(a is not null) yields false only if all values in a are not null

see more details in my  previous blog post on that subject  .

 the cure  :

when using row value expressions, beware of the null predicate, which might not work as expected.

8. not using row value expressions where they are supported

row value expressions are an awesome sql feature. when sql is a very table-centric language, tables are also very row-centric. row value expressions let you describe complex predicates much more easily, by creating local ad-hoc rows that can be compared with other rows of the same degree and row type. a simple example is to query customers for first names and last names at the same time.

select c.address
from   customer c,
where (c.first_name, c.last_name) = (?, ?)

as can be seen, this syntax is slightly more concise than the equivalent syntax where each column from the predicate’s left-hand side is compared with the corresponding column on the right-hand side. this is particularly true if many independent predicates are combined with and. using row value expressions allows you to combine correlated predicates into one. this is most useful for join expressions on composite foreign keys:

select c.first_name, c.last_name, a.street
from   customer c
join   address a
  on  (c.id, c.tenant_id) = (a.id, a.tenant_id)

unfortunately, not all databases support row value expressions in the same way. but the sql standard had defined them already in  1992  , and if you use them, sophisticated databases like oracle or postgres can use them for calculating better execution plans. this is explained on the popular  use the index, luke  page.

 the cure  :

use row value expressions whenever you can. they will make your sql more concise and possibly even faster.

9. not defining enough constraints

so, i’m going to cite  tom kyte  and  use the index, luke  again. you cannot have enough constraints in your meta data. first off, constraints help you keep your data from corrupting, which is already very useful. but to me, more importantly, constraints will help the database perform sql transformations, as the database can decide that

  • some values are equivalent
  • some clauses are redundant
  • some clauses are “void” (i.e. they will not return any values)

some developers may think that constraints are slow. the opposite is the case, unless you insert lots and lots of data, in case of which you can either disable constraints for a large operation, or use a temporary “load table” without constraints, transferring data offline to the real table.

 the cure  :

define as many constraints as you can. they will help your database to perform better when querying.

10. thinking that 50ms is fast query execution

the nosql hype is still ongoing, and many companies still think they’re twitter or facebook in dire need of faster, more scalable solutions, escaping acid and relational models to scale horizontally. some may succeed (e.g. twitter or facebook), others may run into this:

bnelf1gcuaexynu

found here:  https://twitter.com/codinghorror/status/347070841059692545 

for the others who are forced (or chose) to stick with proven relational databases, don’t be tricked into thinking that modern databases are slow. they’re hyper fast. in fact, they’re so fast, they can parse your 20kb query text, calculate 2000-line execution plans, and actually execute that monster in less than a millisecond, if you and your dba get along well and tune your database to the max.

they may be slow because of your application misusing a  popular orm  , or because that orm won’t be able to produce fast sql for your complex querying logic. in that case, you may want to chose a more sql-centric api like  jdbc  ,  jooq  or  mybatis  that will let you get back in control of your sql.

so, don’t think that a query execution of 50ms is fast or even acceptable. it’s not. if you get these speeds at development time, make sure you investigate execution plans. those monsters might explode in production, where you have more complex contexts and data.

conclusion

sql is a lot of fun, but also very subtle in various ways. it’s not easy to get it right as my previous blog post about  10 common mistakes  has shown. but sql can be mastered and it’s worth the trouble. data is your most valuable asset. treat data with respect and write better sql.


sql Database Relational database Java (programming language) dev Data (computing) Joins (concurrency library)

Published at DZone with permission of Lukas Eder. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • MongoDB to Couchbase: An Introduction to Developers and Experts
  • Architecture and Code Design, Pt. 1: Relational Persistence Insights to Use Today and On the Upcoming Years
  • MongoDB to Couchbase for Developers, Part 1: Architecture
  • Introduction to Couchbase for Oracle Developers and Experts: Part 2 - Database Objects

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!