DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • ElasticSearch: Parent and Child Joins — Game of Thrones Edition
  • Connecting Elasticsearch Directly to your Java EE Application
  • Logging With the Elastic Stack
  • How To Generate Scripts of Database Objects in SQL Server

Trending

  • Send Your Logs to Loki
  • Memory Management in Java: An Introduction
  • Analyzing Stock Tick Data in SingleStoreDB Using LangChain and OpenAI's Whisper
  • Five Free AI Tools for Programmers to 10X Their Productivity
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Searchable Documents? Yes You Can. Another Reason to Choose AsciiDoc

Searchable Documents? Yes You Can. Another Reason to Choose AsciiDoc

Alex Soto user avatar by
Alex Soto
·
Jun. 10, 13 · Interview
Like (0)
Save
Tweet
Share
3.73K Views

Join the DZone community and get the full member experience.

Join For Free

Elasticsearch is a flexible and powerful open source, distributed real-time search and analytics engine for the cloud based on Apache Lucene which provides full text search capabilities. It is document oriented and schema free.


Asciidoctor is a pure Ruby processor for converting AsciiDoc source files and strings into HTML 5, DocBook 4.5 and other formats. Apart of Asciidoctor Ruby part, there is an Asciidoctor-java-integration project which let us call Asciidoctor functions from Java without noticing that Ruby code is being executed.


In this post we are going to see how we can use Elasticsearch over AsciiDocdocuments to make them searchable by their header information or by their content.


Let's add required dependencies:

<dependencies>
	
	<dependency>
		<groupId>junit</groupId>
		<artifactId>junit</artifactId>
		<version>4.11</version>
		<scope>test</scope>
	</dependency>
	<dependency>
		<groupId>com.googlecode.lambdaj</groupId>
		<artifactId>lambdaj</artifactId>
		<version>2.3.3</version>
	</dependency>
	<dependency>
		<groupId>org.elasticsearch</groupId>
		<artifactId>elasticsearch</artifactId>
		<version>0.90.1</version>
	</dependency>
	<dependency>
		<groupId>org.asciidoctor</groupId>
		<artifactId>asciidoctor-java-integration</artifactId>
		<version>0.1.3</version>
	</dependency>
		
</dependencies>
Lambdaj library is used to convert AsciiDoc files to a json documents.
Now we can start an Elasticsearch instance which in our case it is going to be an embedded instance.
node = nodeBuilder().local(true).node();

Next step is parse AsciiDoc document header, read its content and convert them into a json document.

An example of json document stored in Elasticsearch can be:

{
   "title":"Asciidoctor Maven plugin 0.1.2 released!",
   "authors":[
      {
         "author":"Jason Porter",
         "email":"example@mail.com"
      }
   ],
   "version":null,
   "content":"= Asciidoctor Maven plugin 0.1.2 released!.....",
   "tags":[
      "release",
      "plugin"
   ]
}

And for converting an AsciiDoc File to a json document we are going to useXContentBuilder class which is provided by ElasticsearchJava API to create jsondocuments programmatically.

package com.lordofthejars.asciidoctor;
 
import static org.elasticsearch.common.xcontent.XContentFactory.*;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.List;
 
import org.asciidoctor.Asciidoctor;
import org.asciidoctor.Author;
import org.asciidoctor.DocumentHeader;
import org.asciidoctor.internal.IOUtils;
import org.elasticsearch.common.xcontent.XContentBuilder;
 
import ch.lambdaj.function.convert.Converter;
 
public class AsciidoctorFileJsonConverter implements Converter<File, XContentBuilder> {
	
	private Asciidoctor asciidoctor;
	
	public AsciidoctorFileJsonConverter() {
		this.asciidoctor = Asciidoctor.Factory.create();
	}
	
	public XContentBuilder convert(File asciidoctor) {
		
		DocumentHeader documentHeader = this.asciidoctor.readDocumentHeader(asciidoctor);
		
		XContentBuilder jsonContent = null;
		try {
			jsonContent = jsonBuilder()
				    .startObject()
				    .field("title", documentHeader.getDocumentTitle())
				    .startArray("authors");
				    
					Author mainAuthor = documentHeader.getAuthor();
			
					jsonContent.startObject()
								.field("author", mainAuthor.getFullName())
								.field("email", mainAuthor.getEmail())
								.endObject();
					
					List<Author> authors = documentHeader.getAuthors();
					
					for (Author author : authors) {
						jsonContent.startObject()
								.field("author", author.getFullName())
								.field("email", author.getEmail())
								.endObject();
					}
					
				    jsonContent.endArray()
				    		.field("version", documentHeader.getRevisionInfo().getNumber())
				    		.field("content", readContent(asciidoctor))
				    		.array("tags", parseTags((String)documentHeader.getAttributes().get("tags")))
				    .endObject();
		} catch (IOException e) {
			throw new IllegalArgumentException(e);
		}
		
		return jsonContent;
	}
 
	private String[] parseTags(String tags) {
		tags = tags.substring(1, tags.length()-1);
		return tags.split(", ");
	}
	
	private String readContent(File content) throws FileNotFoundException {
		return IOUtils.readFull(new FileInputStream(content));
	}
	
}
Basically we are building the json document by calling startObject methods to start a new object, field method to add new fields, and startArray to start an array. Then this builder will be used to render the equivalent object in json format. Notice that we are using readDocumentHeader method from Asciidoctor class which returns header attributes from AsciiDoc file without reading and rendering the whole document. And finally content field is set with all document content.
And now we are ready to start indexing documents. Note that populateData method receives as parameter  a Client object. This object is from Elasticsearch Java APIand represents a connection to Elasticsearch database.
import static ch.lambdaj.Lambda.convert;
//....
 
private void populateData(Client client) throws IOException {
		List<File> asciidoctorFiles = new ArrayList<File>() {{
			add(new File("target/test-classes/java_release.adoc"));
			add(new File("target/test-classes/maven_release.adoc"));
		}};
		
		List<XContentBuilder> jsonDocuments = convertAsciidoctorFilesToJson(asciidoctorFiles);
		
		for (int i=0; i < jsonDocuments.size(); i++) {
			client.prepareIndex("docs", 
                                         "asciidoctor", Integer.toString(i)).setSource(jsonDocuments.get(i)).execute().actionGet();
			
		}
                
                client.admin().indices().refresh(new RefreshRequest("docs")).actionGet();
}
 
private List<XContentBuilder> convertAsciidoctorFilesToJson(List<File> asciidoctorFiles) {
		return convert(asciidoctorFiles, new AsciidoctorFileJsonConverter());
}

It is important to note that the first part of the algorithm is converting all our AsciiDocfiles (in our case two) to XContentBuilder instances by using previous converter class and the method convert of Lambdaj project.

If you want you can take a look to both documents used in this example in https://github.com/asciidoctor/asciidoctor.github.com/blob/develop/news/asciidoctor-java-integration-0-1-3-released.adoc and https://github.com/asciidoctor/asciidoctor.github.com/blob/develop/news/asciidoctor-maven-plugin-0-1-2-released.adoc.

Next part is inserting documents inside one index. This is done by using prepareIndexmethod, which requires an index name (docs), an index type (asciidoctor), and the idof the document being inserted. Then we call setSource method which transforms theXContentBuilder object to json, and finally by calling execute().actionGet(), data is sent to database.

The final step is only required because we are using an embedded instance ofElasticsearch (in production this part should not be required), which refresh the indexes by calling refresh method.

After that point we can start querying Elasticsearch for retrieving information from our AsciiDoc documents.

Let's start with very simple example, which returns all documents inserted:

SearchResponse response = client.prepareSearch().execute().actionGet();

Next we are going to search for all documents that has been written by Alex Sotowhich in our case is one.

import static org.elasticsearch.index.query.QueryBuilders.matchQuery;
//....
QueryBuilder matchQuery =  matchQuery("author", "Alex Soto");
 
QueryBuilder matchQuery =  matchQuery("author", "Alexander Soto");
Note that I am searching for field author the string Alex Soto, which returns only one. The other document is written by Jason. But it is interesting to say that if you search for Alexander Soto, the same document will be returned; Elasticsearch is smart enough to know that Alex and Alexander are very similar names so it returns the document too.


More queries, how about finding documents written by someone who is called Alex, but not Soto.

import static org.elasticsearch.index.query.QueryBuilders.fieldQuery;
 
//....
 
QueryBuilder matchQuery =  fieldQuery("author", "+Alex -Soto");
And of course no results are returned in this case. See that in this case we are using afield query instead of a term query, and we use +, and - symbols to exclude and include words.


Also you can find all documents which contains the word released on title.

import static org.elasticsearch.index.query.QueryBuilders.matchQuery;
 
//....
 
QueryBuilder matchQuery =  matchQuery("title", "released");

And finally let's find all documents that talks about 0.1.2 release, in this case only one document talks about it, the other one talks about 0.1.3.

QueryBuilder matchQuery =  matchQuery("content", "0.1.2");

Now we only have to send the query to Elasticsearch database, which is done by using prepareSearch method.

SearchResponse response = client.prepareSearch("docs")
			  .setTypes("asciidoctor")
			  .setQuery(matchQuery)
			  .execute()
			  .actionGet();
		
SearchHits hits = response.getHits();
	
for (SearchHit searchHit : hits) {
      System.out.println(searchHit.getSource().get("content"));
}
Note that in this case we are printing the AsciiDoc content through console, but you could use asciidoctor.render(String content, Options options) method to render the content into required format.

So in this post we have seen how to index documents using Elasticsearch, how to get some important information from AsciiDoc files using Asciidoctor-java-integration project, and finally how to execute some queries to inserted documents. Of course there are more kind of queries in Elasticsearch, but the intend of this post wasn't to explore all possibilities of Elasticsearch.
Also as corollary, note how important it is using AsciiDoc format for writing your documents. Without much effort you can build a search engine for your documentation. On the other side, imagine all code that would be required to implement the same using any proprietary binary format like Microsoft Word. So we have shown another reason to use AsciiDoc instead of other formats.


Document Database AsciiDoc Elasticsearch

Published at DZone with permission of Alex Soto, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • ElasticSearch: Parent and Child Joins — Game of Thrones Edition
  • Connecting Elasticsearch Directly to your Java EE Application
  • Logging With the Elastic Stack
  • How To Generate Scripts of Database Objects in SQL Server

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: