DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Keep Calm and Column Wise
  • Accelerating Insights With Couchbase Columnar
  • JSON-Based Serialized LOB Pattern
  • Introduction to Couchbase for Oracle Developers and Experts: Part 2 - Database Objects

Trending

  • *You* Can Shape Trend Reports: Join DZone's Software Supply Chain Security Research
  • GDPR Compliance With .NET: Securing Data the Right Way
  • How to Build Scalable Mobile Apps With React Native: A Step-by-Step Guide
  • Accelerating AI Inference With TensorRT
  1. DZone
  2. Data Engineering
  3. Databases
  4. JSON Is Case Sensitive but You Don't Have to Be

JSON Is Case Sensitive but You Don't Have to Be

In this article, we'll discuss dealing with data case sensitivity.

By 
Keshav Murthy user avatar
Keshav Murthy
DZone Core CORE ·
Apr. 07, 20 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
20.1K Views

Join the DZone community and get the full member experience.

Join For Free

A RoSe by any other case would smell as sweet. William Shakespeare

You must have learned capitalization rules in your grammar school, but the real-world search is not so sensitive to capitalization. Charles de Gaulle uses lower case for the middle "de," Tony La Russa uses upper case for "La," and there may be etymological reasons for it, but it's unlikely for your customer service agent to remember.

Databases have a variety of sensitivities. SQL, by default, is case insensitive to identifiers and keywords, but case sensitive to data. JSON is case sensitive to both field names and data. So is N1QL. JSON can have the following. N1QL will select-join-project each field and value as a distinct field and value.

SQL
 




x
11


 
1
SELECT {"City": "San Francisco", "city": "san francisco", "citY": "saN fanciscO"}
2

           
3
[
4
  {
5
    "$1": {
6
      "City": "San Francisco",
7
      "citY": "saN fanciscO",
8
      "city": "san francisco"
9
    }
10
  }
11
]



In this article, we'll discuss dealing with data case sensitivity. Your field references are still case sensitive. If you use the wrong case for the field name, N1QL assumes this is a missing field and assigns MISSING value to that field.

Let's consider a simple predicate in N1QL to lookup all permutations of cases.

SQL
 




xxxxxxxxxx
1
11
1


 
1
WHERE name in [“joe”, “joE”, “jOe”, “Joe”, “JoE”, “JOe”, “JOE”]



This requires seven different lookups into the index. "John" requires more index lookups and "Fitzerald" requires even more. There is a standard way to do this. Simply create an index by lowering the case of the field and the literal.

SQL
 




xxxxxxxxxx
1


 
1
WHERE LOWER(name) = “joe”



This lookup can be made faster by creating the index with the right expression.

SQL
 




xxxxxxxxxx
1


 
1
CREATE INDEX i1 ON customer(LOWER(name));



Ensure that your query is picking up the right index and pushes the predicate to the index scan. And that's the idea. Queries that have predicates pushed to the index scan run much faster than the queries that won't. This is true for predicates and true aggregate pushdown as well.

SQL
 




xxxxxxxxxx
1
23


 
1
EXPLAIN SELECT * FROM `customer` WHERE LOWER(name) = "joe";
2

           
3
      {
4
        "#operator": "IndexScan3",
5
        "index": "i1",
6
        "index_id": "c117bdf583c2e276",
7
        "index_projection": {
8
          "primary_key": true
9
        },
10
        "keyspace": "customer",
11
        "namespace": "default",
12
        "spans": [
13
          {
14
            "exact": true,
15
            "range": [
16
              {
17
                "high": "\"joe\"",
18
                "inclusion": 3,
19
                "low": "\"joe\""
20
              }
21
            ]
22
          }
23
        ],



Case insensitivity in a composite index scenario.

SQL
 




xxxxxxxxxx
1


 
1
WHERE LOWER(name) = “joe” 
2
AND zip = 94821 
3
AND salary > 500 
4
AND join_date <= “2017-01-01” 
5
AND LOWER(county) LIKE “san%”



SQL
 




xxxxxxxxxx
1


 
1
CREATE INDEX i2 ON customer(LOWER(name), 
2
                            zip, 
3
                            LOWER(county), 
4
                            join_date, 
5
                            salary)



Case insensitivity in Array functions

String functions  like SPLIT(), SUFFIXES(), many of the array functions and object functions do return arrays. So how do you use them in a case insensitive way?

We follow the same principle as before. Create an expression to lower case the values first before you process them via these functions.

SQL
 




xxxxxxxxxx
1
13


 
1
SELECT SPLIT("Good Morning, Joe") as splitresult;
2
    "splitresult": [
3
      "Good",
4
      "Morning,",
5
      "Joe"
6
    ]
7

           
8
SELECT SPLIT(LOWER(“Good Morning, Joe”));
9
    "splitresult": [
10
      "good",
11
      "morning,",
12
      "joe"
13
    ]



Now, what you really want is to filter based on a value within the string.

SQL
 




xxxxxxxxxx
1


 
1
WHERE  LOWER(xyz) LIKE “%good%”;



This is probably the worst predicate in SQL in terms of performance.

SQL
 




xxxxxxxxxx
1


 
1
SELECT * FROM customer
2
WHERE  x IN SPLIT(LOWER(xyz)) SATISFIES x = “good” END



Now, what index would you create for this? ADVISE comes in handy.

SQL
 




xxxxxxxxxx
1


 
1
CREATE INDEX adv_DISTINCT_split_lower_xyz ON `customer`
2
           (DISTINCT ARRAY `x` FOR x in split(lower((`xyz`))) END)



As usual, verify your explain.

JSON
 




xxxxxxxxxx
1
23


 
1
        {
2
            "#operator": "DistinctScan",
3
            "scan": {
4
                "#operator": "IndexScan3",
5
                "index": "adv_DISTINCT_split_lower_xyz",
6
                "index_id": "552ab6c643616fbc",
7
                "index_projection": {
8
                    "primary_key": true
9
                },
10
                "keyspace": "customer",
11
                "namespace": "default",
12
                "spans": [
13
                    {
14
                        "exact": true,
15
                        "range": [
16
                            {
17
                                "high": "\"good\"",
18
                                "inclusion": 3,
19
                                "low": "\"good\""
20
                            }
21
                        ]
22
                    }
23
                ],



If you'd like to UNNEST and have a simple WHERE clause, use this query. Always verify your explain to ensure the predicates are pushed to index scan.

SQL
 




xxxxxxxxxx
1


 
1
SELECT * FROM customer  
2
              UNNEST SPLIT(LOWER(xyz)) AS x 
3
WHERE   x = "good"



Using Tokens 

TOKENS() function makes it simple to get the lower case by taking that option as an argument. See the article More Than LIKE: Efficient JSON Searching With N1QL for details and examples


Complex expressions

SQL
 




xxxxxxxxxx
1


 
1
SELECT * 
2
FROM customer
3
WHERE lower(fname) || lower(mname) || lower(lname) = “JoeMSmith”



How could we optimize this? Index Advisor to the rescue. Again.

SQL
 




xxxxxxxxxx
1


 
1
CREATE INDEX adv_lower_fname_concat_lower_mname_concat_lower_lname 
2
  ON `customer`(lower((`fname`))||lower((`mname`))||lower((`lname`)))



Explain to confirm the plan:

Java
 




xxxxxxxxxx
1
23


 
1
       {
2
            "#operator": "IndexScan3",
3
            "index": "adv_lower_fname_concat_lower_mname_concat_lower_lname",
4
            "index_id": "aaa14cbdf14e9cd8",
5
            "index_projection": {
6
                "primary_key": true
7
            },
8
            "keyspace": "customer",
9
            "namespace": "default",
10
            "spans": [
11
                {
12
                    "exact": true,
13
                    "range": [
14
                        {
15
                            "high": "\"JoeMSmith\"",
16
                            "inclusion": 3,
17
                            "low": "\"JoeMSmith\""
18
                        }
19
                    ]
20
                }
21
            ],
22
            "using": "gsi"
23
        },



Bringing in the Big Guns: Full-Text Search

As you've realized, this is a text processing and querying problem. FTS can scan, store, search text in various ways. Case insensitive search is one of them. Let's see the plan for a simple search query.

SQL
 




xxxxxxxxxx
1
42


 
1
select * from customer
2
where search (name, "joe")
3

           
4
  "~children": [
5
        {
6
            "#operator": "PrimaryScan3",
7
            "index": "#primary",
8
            "index_projection": {
9
                "primary_key": true
10
            },
11
            "keyspace": "customer",
12
            "namespace": "default",
13
            "using": "gsi"
14
        },
15
        {
16
            "#operator": "Fetch",
17
            "keyspace": "customer",
18
            "namespace": "default"
19
        },
20
        {
21
            "#operator": "Parallel",
22
            "~child": {
23
                "#operator": "Sequence",
24
                "~children": [
25
                    {
26
                        "#operator": "Filter",
27
                        "condition": "search((`customer`.`name`), \"joe\")"
28
                    },
29
                    {
30
                        "#operator": "InitialProject",
31
                        "result_terms": [
32
                            {
33
                                "expr": "self",
34
                                "star": true
35
                         }
36
                        ]
37
                    }
38
                ]
39
            }
40
        }
41
    ]
42
}



This is NOT the plan you want...This is using a primary scan!

After creating the text index on the bucket customer, things are much better:

SQL
 




xxxxxxxxxx
1
19


 
1
select * from customer
2
where search (name, "joe")
3

           
4
{
5
    "#operator": "Sequence",
6
    "~children": [
7
        {
8
            "#operator": "IndexFtsSearch",
9
            "index": "trname",
10
            "index_id": "3bdb61e5010e8838",
11
            "keyspace": "customer",
12
            "namespace": "default",
13
            "search_info": {
14
                "field": "\"`name`\"",
15
                "outname": "out",
16
                "query": "\"joe\""
17
            },
18
            "using": "fts"
19
        },



The default standard analyzer lowers all the tokens and therefore you'll find all the "joe"s: JOE, joe, Joe, JOe, etc. You can define a custom analyzer and provide specific instructions to lowercase the tokens. Here's an example.

JSON
 




xxxxxxxxxx
1
12


 
1
 "mapping": {
2
   "analysis": {
3
    "analyzers": {
4
     "mylower": {
5
      "token_filters": [
6
       "to_lower"
7
      ],
8
      "tokenizer": "unicode",
9
      "type": "custom"
10
     }
11
    }
12
   },


Here's how you add it in the UI.



References0l.

1. Couchbase Documentation: https://docs.couchbase.com/server/6.5/introduction/intro.html

2. Couchbase N1QL: https://docs.couchbase.com/server/6.5/n1ql/n1ql-language-reference/index.html

3. Couchbase FTS:  https://docs.couchbase.com/server/6.5/fts/full-text-intro.html

JSON Database sql

Published at DZone with permission of Keshav Murthy, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Keep Calm and Column Wise
  • Accelerating Insights With Couchbase Columnar
  • JSON-Based Serialized LOB Pattern
  • Introduction to Couchbase for Oracle Developers and Experts: Part 2 - Database Objects

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!