As we know, the database is among the most important and powerful parts of any organization. It is the collection of Schema, Tables, Relationships, Queries, and Views. It is an organized collection of data. But do you ever think about these questions?
- How does the database manage all the tables?
- How does the database manage all the relationships?
- How do we perform all our operations easily?
- Is there any way for us to know about all this?
There is one answer for all these question, and that anwer is Metastore. In the Metastore, the database keeps all the information related to our databases, tables, and relations as Metadata. Whenever you want to know something about your database, then you can find it easily in the Metastore.
Here, we will talk about a Hive-Metastore system, where we keep all the information about tables and relations.
All Hive implementations need a metastore service, where it stores metadata. It is implemented using tables in a relational database. By default, Hive uses a built-in Derby SQL server. It provides single process storage, so when we use Derby, we cannot run instances of Hive CLI. Whenever we want to run Hive on a personal machine or for some developer task, then it is good, but when we want to use it in a cluster, then MySQL or any other similar relational database is required.
Now when you run your Hive query and you are using the default Derby database, you will find that your current directory now contains a new sub-directory, metastore_db. Also, the metastore will be created if it doesn’t already exist. The property of interest here is javax.jdo.option.ConnectionURL.
The default value of this property is jdbc:derby:;databaseName=metastore_db;create=true. This value specifies that you will be using the embedded Derby as your Hive metastore, and the location of the metastore is metastore_db.
We can also configure the directory for the Hive to store table information. By default, the location of the warehouse is file:///user/hive/warehouse and we can also use the hive-site.xml file for the local or remote metastore.
We used hive-site.xml for changing the configuration to specify to Hive where the database is stored. We used a JDBC-compliant database for the metastore because the default embedded database is not suitable for production. For providing these configurations, we used the hive-site.xml file. Below is an example of hive-site.xml for using a MySQL database:
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/metastore</value> <description>metadata is stored in a MySQL server</description> </property> <property> <name>javax.jdo.option.ConnectionURL.metastore_spark_plsql</name> <value>jdbc:mysql://localhost/metastore_sql</value> <description>user metadata is stored in a MySQL server</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>MySQL JDBC driver class</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>user</value> <description>user name for connecting to mysql server </description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>password</value> <description>password for connecting to mysql server </description> </property> </configuration>
When we use the MySQL JDBC driver, then we download Jconnector (MySQL JDBC Driver) and place it in the $HIVE_HOME/lib and place hive-site.xml in $HIVE_HOME/conf. After this, we will be able to store our metastore in MySQL.
To know about metastore tables, field, and relation please look into this diagram:
Here in this diagram, we can find all the answers to any questions we have about the metastore. We can find out how metastore stores database and table information and how these tables are internally connected with each other. In these tables, we can find all the information that's related to the tables.