Over a million developers have joined DZone.

Solr 4.2: Index Structure Reading API

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

With the release of Solr 4.2 we’ve got the possibility to use the HTTP protocol to get information about Solr index structure. Of course, if one wanted to do that prior to Solr 4.2 it could be achieved by fetching the schema.xml file, parsing it and then getting the needed information. However when Solr 4.2 was released we’ve got a dedicated API which can return the information we need without the need of parsing the whole schema.xml file.

Possibilities

Let’s look at the new API by example.

Getting information in XML format

Many Solr users are used to getting their data in the XML format, at least when using Solr HTTP API. However, the schema API uses JSON as the default format. In order to get the data in the XML format in all the below examples, you’ll need to appeng the wt=xml parameter to the call, for example like that:

$curl 'http://localhost:8983/solr/collection1/schema/fieldtypes?wt=xml'

Defined fields information

Let’s start by looking at how to fetch information about the fields that are defined in Solr. In order to do that we have the following possibilities:

  1. Get information about all the fields defined in the index
  2. Get information for a one, explicitly defined field

In the first case we should use the following command:

$curl 'http://localhost:8983/solr/collection1/schema/fields'

In second case we should add the / character and the field name to the above command. For example in order to get the information about the author field we should use the following command:

$curl 'http://localhost:8983/solr/collection1/schema/fields/author'

Solr response for the first command will be similar to the following one:

{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "fields":[{
      "name":"_version_",
      "type":"long",
      "indexed":true,
      "stored":true},
    {
      "name":"author",
      "type":"text_general",
      "indexed":true,
      "stored":true},
    {
      "name":"cat",
      "type":"string",
      "multiValued":true,
      "indexed":true,
      "stored":true},
    {
      "name":"category",
      "type":"text_general",
      "indexed":true,
      "stored":true},
    {
      "name":"id",
      "type":"string",
      "multiValued":false,
      "indexed":true,
      "required":true,
      "stored":true,
      "uniqueKey":true},
    {
      "name":"url",
      "type":"text_general",
      "indexed":true,
      "stored":true},
    {
      "name":"weight",
      "type":"float",
      "indexed":true,
      "stored":true}]}

On the other hand the response for the second command would be as follows:

{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "field":{
    "name":"author",
    "type":"text_general",
    "indexed":true,
    "stored":true}}

Getting information about defined dynamic fields

Similar to what information we can get about the fields defined in the schema.xml we can get the information about dynamic fields. Again we have to options:

  1. Get information about all dynamic fields
  2. Get information about specific dynamic field pattern

In order to get all the information about dynamic fields we should use the following command:

$curl 'http://localhost:8983/solr/collection1/schema/dynamicfields'

In order to get information about a specific pattern we append the / character followed by the pattern, for example like this:

$curl 'http://localhost:8983/solr/collection1/schema/dynamicfields/random_*'

Solr will return the following response for the first query:

{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "dynamicfields":[{
      "name":"*_coordinate",
      "type":"tdouble",
      "indexed":true,
      "stored":false},
    {
      "name":"ignored_*",
      "type":"ignored",
      "multiValued":true},
    {
      "name":"random_*",
      "type":"random"},
    {
      "name":"*_p",
      "type":"location",
      "indexed":true,
      "stored":true},
    {
      "name":"*_c",
      "type":"currency",
      "indexed":true,
      "stored":true}]}

And the following response will be returned for the second command:

{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "dynamicfield":{
    "name":"random_*",
    "type":"random"}}

Getting field types

As you probably guess, in a way similar to the above describes examples, we can also get the information about the field types defined in our schema.xml files. We can fetch the following information:

  1. All the field types defined in the schema.xml file
  2. A single type

To get all the defined field types we should run the following command:

$curl 'http://localhost:8983/solr/collection1/schema/fieldtypes'

The get information about a single type we should again add the / character and append the field type name to it, for example like this:

$curl 'http://localhost:8983/solr/collection1/schema/fieldtypes/text_gl'

Solr will return the following information in response to the first command:

{
  "responseHeader":{
    "status":0,
    "QTime":3},
  "fieldTypes":[{
      "name":"alphaOnlySort",
      "class":"solr.TextField",
      "sortMissingLast":true,
      "omitNorms":true,
      "analyzer":{
        "class":"solr.TokenizerChain",
        "tokenizer":{
          "class":"solr.KeywordTokenizerFactory"},
        "filters":[{
            "class":"solr.LowerCaseFilterFactory"},
          {
            "class":"solr.TrimFilterFactory"},
          {
            "class":"solr.PatternReplaceFilterFactory",
            "replace":"all",
            "replacement":"",
            "pattern":"([^a-z])"}]},
      "fields":[],
      "dynamicFields":[]},
    {
      "name":"boolean",
      "class":"solr.BoolField",
      "sortMissingLast":true,
      "fields":["inStock"],
      "dynamicFields":["*_bs",
        "*_b"]},
    {
      "name":"text_gl",
      "class":"solr.TextField",
      "positionIncrementGap":"100",
      "analyzer":{
        "class":"solr.TokenizerChain",
        "tokenizer":{
          "class":"solr.StandardTokenizerFactory"},
        "filters":[{
            "class":"solr.LowerCaseFilterFactory"},
          {
            "class":"solr.StopFilterFactory",
            "words":"lang/stopwords_gl.txt",
            "ignoreCase":"true",
            "enablePositionIncrements":"true"},
          {
            "class":"solr.GalicianStemFilterFactory"}]},
      "fields":[],
      "dynamicFields":[]},
    {
      "name":"tlong",
      "class":"solr.TrieLongField",
      "precisionStep":"8",
      "positionIncrementGap":"0",
      "fields":[],
      "dynamicFields":["*_tl"]}]}

In response to the second command Solr will return the following:

{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "fieldType":{
    "name":"text_gl",
    "class":"solr.TextField",
    "positionIncrementGap":"100",
    "analyzer":{
      "class":"solr.TokenizerChain",
      "tokenizer":{
        "class":"solr.StandardTokenizerFactory"},
      "filters":[{
          "class":"solr.LowerCaseFilterFactory"},
        {
          "class":"solr.StopFilterFactory",
          "words":"lang/stopwords_gl.txt",
          "ignoreCase":"true",
          "enablePositionIncrements":"true"},
        {
          "class":"solr.GalicianStemFilterFactory"}]},
    "fields":[],
    "dynamicFields":[]}}

As you can see, the amount information is nice as we are getting all the information about the field types and in addition to that the information which field are using give field (both dynamic and non-dynamic.

Retrieving information about copyFields

In addition to what we’ve discussed so far we are able to get information about copyFields section from the schema.xml. In order to do that one should run the following command:

$curl 'http://localhost:8983/solr/collection1/schema/copyfields'

And in response we will get the following data:

{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "copyfields":[{
      "source":"author",
      "dest":"text"},
    {
      "source":"cat",
      "dest":"text"},
    {
      "source":"content",
      "dest":"text"},
    {
      "source":"content_type",
      "dest":"text"},
    {
      "source":"description",
      "dest":"text"},
    {
      "source":"features",
      "dest":"text"},
    {
      "source":"author",
      "dest":"author_s",
      "destDynamicBase":"*_s"}]}

The future

In Solr 4.3 the described API was improved and is now being prepared to enable not only reading of the index structure, but also writing modifications to it with the use of HTTP requests. We can expect that feature in one of the upcoming versions of Apache Solr, so its worth waiting in my opinion, at least by those who needs it.

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:

Published at DZone with permission of Rafał Kuć, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}