DZone
Performance Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Performance Zone > LZ* compression algorithms

LZ* compression algorithms

Oliver Hookins user avatar by
Oliver Hookins
·
May. 26, 14 · Performance Zone · Interview
Like (0)
Save
Tweet
5.26K Views

Join the DZone community and get the full member experience.

Join For Free

While I was implementing a small, naive log aggregation tool I had a moment to consider the type of compression I wanted to use on the log files at rest. The main implication this has outside of the efficiency of compression and how much space the files will take up is how they can be used once stored. If you are using tools like gzip/gunzip with awk or other simple command-line tools, or even within scripts written in common languages like Python or Ruby, gzip compression poses very little problem – since in most cases you are processing one file at a time.

If you want to use a distributed computation system like Hadoop on the other hand, this can be a problem. Gzip files can’t be split, so you tend to suffer by only having a single mapper being able to work on a file at a time. If your files are small, this may not be a problem but if your files are large it can be. Other tools such as Impala will outright refuse to work on gzipped data, so this may further limit your options. I started looking into alternative compression algorithms that these tools do support and one name that kept coming up was LZO. If you look it up on Wikipedia it doesn’t offer much insight into what it actually is. Since I was implementing my aggregation tool in Golang, I checked the standard library and only found LZW compression.

Are LZO and LZW the same? Are they related? Is one better than the other? I also found very little help in the Google results (but perhaps that is just me). In the end I implemented gzip compression in my program, but little did I know that even gzip is basically in the same family as these aforementioned LZ-prefixed algorithms (via deflate).

I just started watching a new series of videos on the Google Developers YouTube channel called Compressor Head – episode 2 of which covers the LZ Compression Family, how these algorithms work in easy-to-understand terms and which programs we know of today that inherit from the fundamental algorithms LZ77 and LZ78. I highly recommend watching them!

Algorithm

Published at DZone with permission of Oliver Hookins, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 6 Things Startups Can Do to Avoid Tech Debt
  • What Is HttpSession in Servlets?
  • Top Six Kubernetes Best Practices for Fleet Management
  • Suspicious Sortings in Unity, ASP.NET Core, and More

Comments

Performance Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo