DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Big Data
  4. LZOP Decompression - Revenge of the Useless Cat

LZOP Decompression - Revenge of the Useless Cat

Alex Holmes user avatar by
Alex Holmes
·
Feb. 11, 13 · Interview
Like (0)
Save
Tweet
Share
5.49K Views

Join the DZone community and get the full member experience.

Join For Free

For me LZOP is the ubiquitous compression codec with working with large text files in HDFS due to its MapReduce data locality advantages. As a result when I want to peek at LZOP-compressed files in HDFS I use a command such as:

shell$ hadoop fs -get /some/file.lzo | lzop -dc | head

With this command the output of a LZOP-compressed file in HDFS is piped to the lzop utility, where the -dcflags tell lzop to decompress the stream and write the uncompressed data to standard out, and the final headwill show the first 10 lines of the data. I may substitute head with other utilities such as awk or sed, but I always follow this general pattern of piping the output lzop output to another utility.

Imagine my surprise the other day when I tried the same command on a smaller file (hence not needing to use the head command), only to see this error:

shell$ hadoop fs -get /some/file.lzo | lzop -dc
lzop: <stdout>: uncompressed data not written to a terminal

What just happened - why would the first command work, but not the second? My guess is that this is likely the authors of the lzop utility safeguarding us accidentally flooding standard output with uncompressed data. Which is frustrating, because as you can see from the following example this is a different route than that which the authors of gunzip took:

shell$ echo "the cat" | gzip -c | gunzip -c
the cat

If we run the same command with lzop we see the same result as was saw earlier:

shell$ echo "the cat" | lzop -c | lzop -dc
lzop: <stdout>: uncompressed data not written to a terminal

A ghetto approach to solving this problem is to pipe the lzop output to cat (which is a necessary violation of theuseless cat pattern):

shell$ hadoop fs -get /some/file.lzo | lzop -dc | cat

Luckily lzop has a -f option which removes the need for the cat:

shell$ hadoop fs -get /some/file.lzo | lzop -dcf

It turns out that man page on lzop is instructive with regards to the -f option, indicates various scenarios where it can be helpful:

shell$ man lzop
...
-f, --force
   Force lzop to

    - overwrite existing files
    - (de-)compress from stdin even if it seems a terminal
    - (de-)compress to stdout even if it seems a terminal
    - allow option -c in combination with -U

   Using -f two or more times forces things like

    - compress files that already have a .lzo suffix
    - try to decompress files that do not have a valid suffix
    - try to handle compressed files with unknown header flags

   Use with care.

Command (computing) Data (computing) hadoop Stream (computing) Advantage (cryptography) MapReduce Peek (software)

Published at DZone with permission of Alex Holmes, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Top 5 PHP REST API Frameworks
  • The Importance of Delegation in Management Teams
  • Asynchronous HTTP Requests With RxJava
  • Why Open Source Is Much More Than Just a Free Tier

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: