Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Parse Elasticsearch Results Using Ruby

DZone's Guide to

Parse Elasticsearch Results Using Ruby

· Big Data Zone ·
Free Resource

Learn how to operationalize machine learning and data science projects to monetize your AI initiatives. Download the Gartner report now.

One of our modules in our project is an elasticsearch cluster.

In order to fine tune the configuration (shards, replicas, mapping, etc.) and the queries, we created a JMeter environment.

I wanted to test a simple query with many different input parameters, which will return results.
I.e. query for documents that exist.

The setup for JMeter is simple. I created the query I want to check as a POST parameter.
In that query, instead of putting one specific value, which means sending the same values in the query over and over, I used parameter. I directed JMeter to read from a file (CSV) the parameters.

The next thing was to create that data file. A file, which consists of rows with real values from the cluster.

For that I used another query, which I ran against the cluster using CURL.
(I am changing some parameters naming)

{
   "fields":[
      "FIELD_1"
   ],
   "size":10000,
   "query":{
      "constant_score":{
         "filter":{
            "bool":{
               "must":[
                  {
                     "term":{
                        "LIVE":true
                     }
                  },
                  {
                     "exists":{
                        "field":"FIELD_1"
                     }
                  }
               ]
            }
         }
      }
   }
}

I piped the result into a file.
Here’s a sample of the file (I changed the names of the index, document type and values for this example):

{
  "took" : 586,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 63807792,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "my_index",
      "_type" : "the_document",
      "_id" : "1111111",
      "_score" : 1.0,
      "fields" : {
        "FIELD_1" : "123"
      }
    }, {
      "_index" : "my_index",
      "_type" : "the_document",
      "_id" : "22222222",
      "_score" : 1.0,
      "fields" : {
        "FIELD_1" : "12345"
      }
    }, {
      "_index" : "my_index",
      "_type" : "the_document",
      "_id" : "33333333",
      "_score" : 1.0,
      "fields" : {
        "FIELD_1" : "4456"
      }
    } ]
  }
}

The next thing was parsing this json file, taking only FIELD_1 and put the value in a new file.
For that I used Ruby:

#!/usr/bin/ruby

require 'rubygems'
require 'json'
require 'pp'

input_file = ARGV[0]
output_file = ARGV[1]

json = File.read(input_file)
obj = JSON.parse(json)
hits = obj['hits']

actual_hits = hits['hits']
begin
  file = File.open(output_file, "w")
  actual_hits.each do |hit|
    fields = hit['fields']
    field1 = fields['FIELD_1']
    file.puts(field1)
  end
rescue IOError => e
  # there was an error
ensure
  file.close unless file == nil
end

Important note:
There’s a shorter, better, way to write to file in Ruby:

File.write(output_file, field1)

Unfortunately I can’t use it, as I have older Ruby version and I can’t upgrade it in our sandbox environment.

Bias comes in a variety of forms, all of them potentially damaging to the efficacy of your ML algorithm. Our Chief Data Scientist discusses the source of most headlines about AI failures here.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}