Over a million developers have joined DZone.
Platinum Partner

Lexicographically Sorting Large Files in Linux

· Big Data Zone

Learn more about Connected Data Platforms that power the creation of modern data applications and how they deliver actionable intelligence, brought to you in partnership with Hortonworks.

When I hear the word “sort” my first thought is usually “Hadoop”! Yes, sorting is one thing that Hadoop does well, but if you’re working with large files in Linux the built-in sort command is often all you need.

Let’s say you have a large file on a host with 2GB or more of main memory free. The following sortcommand is a efficient way to lexicographically-order large files.

LC_COLLATE=C sort --buffer-size=1G --temporary-directory=./tmp --unique bigfile.txt

Let’s break this command down and examine each part in detail.

sort image

The Big Data Zone is brought to you in partnership with Hortonworks.  Learn, Collaborate, and Thrive with Hortonworks Community Connection

Topics:

Published at DZone with permission of Alex Holmes , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}