DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Curious about the future of data-driven systems? Join our Data Engineering roundtable and learn how to build scalable data platforms.

Data Engineering: The industry has come a long way from organizing unstructured data to adopting today's modern data pipelines. See how.

Threat Detection: Learn core practices for managing security risks and vulnerabilities in your organization — don't regret those threats!

Managing API integrations: Assess your use case and needs — plus learn patterns for the design, build, and maintenance of your integrations.

Related

  • Securing the Software Supply Chain: Chainguard Builds on Foundational Innovation
  • Thriving Amid Giants: A Guide for Small Players in the LLM Search Engine Market
  • Top 13 Mobile App Development Platforms You Need
  • Apache Aries: Helping Enterprise Developers Build OSGi Apps

Trending

  • AI/ML Innovation in the Kubernetes Ecosystem
  • MuleSoft: Best Practices on Batch Processing
  • Understanding the Differences Between Repository and Data Access Object (DAO)
  • How to Enhance the Performance of .NET Core Applications for Large Responses
  1. DZone
  2. Popular
  3. Open Source
  4. Build a search engine with strus

Build a search engine with strus

By 
Patrick Frey user avatar
Patrick Frey
·
Jun. 23, 15 · News
Likes (0)
Comment
Save
Tweet
Share
706 Views

Join the DZone community and get the full member experience.

Join For Free

What is strus ?

The project strus is a collection of libraries and tools written in C++ to build a competitive search engine. Currently it is a single person project that started in September 2014 and therefore the competitiveness in terms of features of the software is more a promise than a fact. It definitely needs more brain to be put into it to catch up with the big players for open source search engines Lucene and Xapian. But strus is not only a me-too-project for search.

  • Strus introduces expression matching and information extraction on a different level than other known open source engines (read more…).
  • Strus simplifies the architecture of a search engine by “outsourcing” of components like the key/value store database storing the data blocks. This componentization (see components of strus) reduces the amount of code drastically and it raises opportunities for experts on a specific topic to contribute (read more…). Strus is not the first attempt to try that, but it is the first attempt as open source project, that has a performance within reach of the big open source search engines. And it does that without a 10 years history of optimization in the back. Strus might not be there at eye level, but let’s see what happens, if more different reasoning and competition is put into it.

For who is strus ?

People I would primarily like to address with this blog are developers or hackers as potential contributors or for feedback. On the other hand the project could already be interesting for experimental projects that can afford to go along with the development of strus. As stakeholder you can influence the project too. As the demo project, the search on the complete Wikipedia collection (English) shows, it is already possible to build projects, but you have to be aware, that dead lines should not exist, because you might hit a point where a feature you need is not instantaneously available. Project planning gets difficult at the current stage. Furthermore the state of documentation is still quite poor.

Programming paradigms

All interfaces of strus are pure. No inheritance is used in the main header files. Strus is more a lego thing than a provider of solution classes. If you want for example to build a sequence of terms as feature for your search, you have to build its expression tree with help of a stack, rather than picking a class that implements a sequence query. In PHP this looks as follows:

$terms = [ “hello”,  “world” ];
$query->pushTerm( “word”/*feature type*/, $term[0] );
$query->pushTerm( “word”/*feature type*/, $term[1] );
$query->pushExpression( “sequence”, 2/*nof terms*/, 2/*position range*/);
$query->defineFeature( “docfeat” /*name addressing this feature set*/);

The number of interface classes is small (see for example the interface classes of the core), but you have to understand them. If you want to contribute, you should also have a closer look at the  programing guidelines.

Try it

There exist a guide how to fetch, build and install strus. Unfortunately a tutorial is still missing. There will be one soon !

Support

I will reply to questions. Please mail me to contact at project dash strus dot net.


Thanks

I want to thank the authors of LevelDB here. I was looking for some time for a key/value store database that had an upper bound seek function in the interface. The upper bound seek is crucial because it allows you to minimize block accesses on disk when joining sets. A key/value store without upper bound seek would have forced me to create virtual blocks that point to other blocks. This would mean more disk accesses to fetch the data blocks needed. LevelDB has it. Any other alternative candidate to implement the database interface has to have it too.


Social Media

Github: patrickfrey
Twitter: @ProjectStrus


Search engine (computing) Engine Build (game engine) Open source

Opinions expressed by DZone contributors are their own.

Related

  • Securing the Software Supply Chain: Chainguard Builds on Foundational Innovation
  • Thriving Amid Giants: A Guide for Small Players in the LLM Search Engine Market
  • Top 13 Mobile App Development Platforms You Need
  • Apache Aries: Helping Enterprise Developers Build OSGi Apps

Partner Resources


Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: