Testing N1QL: SQL for JSON (Part 5)
Now that we've learned how to use the MySQL Random Query Generator, let's use it to debug SQL and N1QL to find out where errors come from and how to fix them.
Join the DZone community and get the full member experience.
Join For FreeWelcome back to Testing N1QL! If you're just getting in, check out part one, part two, part three, and part four. Now that you're familiar with MySQL's RQG, let's take a look at the execution environment.
Our test case assumes that we have MySQL and the Python connector for MySQL installed on the machine where we want to run our test framework: testrunner. Links to our GitHub repository of testrunner are provided below. The test case can run for single or multiple tables in MySQL. Since it is hard to store expected results for millions of queries, we want to compute it on the fly. This requires the installation of MySQL.
The test creates the database in MySQL as specified by the database schema. MySQL tables get created and data is populated. Various data types considered here are Datetime, String, Numeric, Boolean.
We load the MySQL database created into Couchbase buckets. Each MySQL table is loaded into a Couchbase bucket. Each row in the MySQL table is converted into JSON docs in the bucket. The primary key value of a MySQL table, which is unique, defines the unique key of each Couchbase document.
Query template files are loaded and unzipped. Then we build the primary index on Couchbase buckets and MySQL tables. A few tests build different secondary indexes on Couchbase buckets by taking different possible combinations of fields. The tests using secondary indexes verify for both views and GSI indexes.
We vary the number of queries to be generated for our testing purposes as below:
Sanity Runs |
Number of Queries |
Run Frequency |
Dev Sanity |
1000 |
Upon introduction of a new feature or optimization of an existing feature by Dev. |
Daily QE Sanity |
10000 |
Every 24 hours with a new development build by QE. |
BiWeekly Full RQG |
550K |
Every 2 weeks. |
Though there is no limitation, we validate up to 550K queries.
Figure3: Visualization of Framework
The figure above gives us an idea of how we compare actual JSON results generated by N1QL queries with expected results from SQL queries after inserting the exact same data in both a MySQL database and Couchbase buckets — then running the same queries respectively.
Once we have the respective queries, we run SQL queries against MySQL and generate the expected results. Then, we run N1QL queries against Couchbase and generate the actual results. Now, in order to compare results, we need to normalize the data, since MySQL generates resultsets in rows/columns while N1QL generates JSON.
What Is the Meaning of Normalization Here?
We add an order by clause in the respective SQL and N1QL query so that results are ordered based on the key, and then we compare them.
sql_query becomes sql_query order by <primary_key_id>
n1ql_query becomes n1ql_query order by <document key>
First, we compare the lengths of actual and expected results. Then we convert SQL results to JSON and compare them with N1QL JSON results.
For tests with multiple indexes, we specify N1QL queries without index hints and see if the correct index is being used in explain clause.
The difference in results is displayed very clearly for debugging purposes as shown below.
There is an AssertionError giving an analysis of pass percentage, total queries run, and distribution of failure.
Example of results from a failed run:
AssertionError: Total Queries Run = 10000, Pass = 9997, Fail = 3, Pass Percentage = 99 %
[ KEYWORD FAILURE DISTRIBUTION ]
AND :: 0%
BETWEEN :: 0%
FALSE :: 0%
DISTINCT :: 0%
IS :: 0%
IN :: 0%
>= :: 0%
= :: 0%
KEY :: 0%
< :: 0%
NOT :: 1%
> :: 0%
NULL :: 0%
TRUE :: 0%
OR :: 0%
<= :: 0%
In this example, 0% mentioned with keywords in [ KEYWORD FAILURE DISTRIBUTION ] indicates that N1QL queries with these keywords gave correct results while those with the NOT keyword gave in 1% of results different results from SQL queries. This is how we identify the pattern where failures occurred.
The test also displays the exact SQL and N1QL query, which gave different results in the end.
Example:
sql_query |
n1ql_query |
SELECT primary_key_id, |
SELECT primary_key_id, |
Example of Results From a Successful Run
2017-01-16 19:23:40 | INFO | MainProcess | test_thread | [test_rqg._test_result_analysis] Total Queries Run = 10000, Pass = 10000, Fail = 0, Pass Pecentage = 100 %
2017-01-16 19:23:40 | INFO | MainProcess | test_thread | [test_rqg.test_rqg_crud_ops]
ok
Ran 1 test in 16045.208s
OK
MySQLCursor: DROP SCHEMA IF EXISTS crud_db_..
summary so far suite rqg.test_rqg.RQGTests , pass 1 , fail 0
This helps the QE and developers to analyze the differences and figure out the reason for incorrect results in an efficient way.
We caught a lot of bugs related to incorrect results in N1QL last year using our rqg testrunner framework.
The following table shows some of the bugs caught by this RQG framework:
Issue # |
Description |
Comments |
With secondary indexing Having clause with group by returns inconsistent result as compared to MySQL |
On comparing the explain it was seen that the spans array was different in both cases. |
|
Difference in the number of results seen for few aggregate queries |
This error was seen as part of intersect scan optimizations being made in N1QL. |
|
Merge into and delete queries reporting wrong results |
This error was introduced as a result of a race condition in merge queries. |
|
Mismatch in Queries seen for count(*) and min functions |
||
Secondary indexes without index hints not giving correct results for sum,average,count |
Problem / Solution: If an index scan produces multiple spans that might overlap, the results of the index scan must be de-duped. An additional optimization would be to eliminate duplicate spans. |
Details on an Example bug: MB-19171
This bug shows how QE daily RQG run caught the regression which was introduced as part of our N1QL optimizations.
The following queries gave different results, and the mismatch in results shows the expected results from a particular SQL query and actual results from the same n1ql query.
sql_query :: SELECT MIN( int_field1 ) FROM simple_table_1 WHERE char_field1 >= "N" n1ql_query :: SELECT MIN( int_field1 ) FROM simple_table_1 WHERE char_field1 >= "N" mismatch in results :: expected :: [{u'MIN( int_field1 )': 34}], actual :: [] |
sql_query :: SELECT COUNT(*) FROM simple_table_1 WHERE (int_field1 > 5042) AND (decimal_field1 > 5201) n1ql_query :: SELECT COUNT(*) FROM simple_table_1 WHERE (int_field1 > 5042) AND (decimal_field1 > 5201) mismatch in results :: expected :: [236], actual :: [500] |
sql_query :: SELECT MIN( int_field1 ) FROM simple_table_1 WHERE varchar_field1 > "aYTUOeXhmw" n1ql_query :: SELECT MIN( int_field1 ) FROM simple_table_1 WHERE varchar_field1 > "aYTUOeXhmw" mismatch in results :: expected :: [{u'MIN( int_field1 )': 69}], actual :: [] |
sql_query :: SELECT MIN( int_field1 ) FROM simple_table_1 WHERE NOT (varchar_field1 < "aYTUOeXhmw") n1ql_query :: SELECT MIN( int_field1 ) FROM simple_table_1 WHERE NOT (varchar_field1 < "aYTUOeXhmw") mismatch in results :: expected :: [{u'MIN( int_field1 )': 69}], actual :: [] |
Dev could identify the patterns in the queries where we saw the mismatch and fix the issue quickly.
Advantages of This Framework
The main advantage of this framework is to catch regressions in N1QL, since the random field combinations are not picked by functional or unit tests. When developers change N1QL code, they can be at ease as most of the issues will be immediately caught by RQG.
Installation of RQG
If you want to install RQG on your system,here are the instructions:
Install Perl: Make sure it is Perl 5, version 18, subversion 2 (v5.18.2) built for darwin-thread-multi-2level (with two registered patches, see perl -V for more detail)
Install MySQL and then link MySQL
Install DBI:
Install cpanm with this command:
sudo cpan App::cpanminus
.Install DBI:
sudo cpanm DBI
.
Install DBD::MySQL:
Get the necessary files from CPAN. Open the CPAN terminal:
sudo perl -MCPAN -e 'shell'.
Then, run these commands: get DBD::MySQL and exit.
Before compiling you have to create some alias because MySQL
It's installed differently on a Mac than on Linux.
cd /usr/local
sudo mkdir lib
cd lib:
sudo ln -s /usr/local/MySQL/lib/*.dylib
Finally, you must install and compile the library:
cd ~/.cpan/build/DBD*/
sudo perl Makefile.PL --testuser='yourMySQLuser' --testpassword='
yourMySQLpassword'-
sudo make
sudo make test
sudo make install
Pre-Requisites for RQG
MySQL, Python MySQL connector (Python lib that talks to MySQL).
Relevant GitHub Links
https://github.com/Couchbase/testrunner/tree/master/b/resources/rqg
https://github.com/Couchbase/testrunner/tree/master/pytests/rqg
https://github.com/Couchbase/testrunner/blob/master/lib/Couchbase_helper
RQG for MySQL:
N1QL Documentation:
If you have any feedback or questions on this article, please email me at prerna.manaktala@couchbase.com
Acknowledgements:
Raju Suravarjjala <raju@couchbase.com>
Parag Agarwal <>
Keshav Murthy <keshav@couchbase.com>
Opinions expressed by DZone contributors are their own.
Comments