DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Bonus Algorithm: Run-Length Encoding

Bonus Algorithm: Run-Length Encoding

Stoimen Popov user avatar by
Stoimen Popov
·
May. 09, 12 · Interview
Like (0)
Save
Tweet
Share
7.03K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction

Run-length encoding is a data compression algorithm that helps us encode large runs of repeating items by only sending one item from the run and a counter showing how many times this item is repeated. Unfortunately this technique is useless when trying to compress natural language texts, because they don’t have long runs of repeating elements. In the other hand RLE is useful when it comes to image compression, because images happen to have long runs pixels with identical color.

As you can see on the following picture we can compress consecutive pixels by only replacing each run with one pixel from it and a counter showing how many items it contains.

Lossless RLE for Images

Although lossless RLE can be quite effective for image compression, it is still not the best approach!

In this case we can save only counters for pixels that are repeated more than once. Such the input stream “aaaabbaba” will be compressed as “[4]a[2]baba”.

Actually there are several ways run-length encoding can be used for image compression. A possible way of compressing a picture can be either row by row or column by column, as it is shown on the picture below.

Row by row or column by column compression

Row by row or column by column compression.

The problem in practice is that sometimes compressing row by row may be effective, while in other cases the same approach is very ineffective. This is illustrated by the image below.

Effective and Ineffective Compression

Sometimes image compression may be done only after some preprocessing that can help us understand the best compression approach!

Obviously run-length encoding is a very good approach when compressing images, however when we talk about big images with millions of pixels it’s somehow natural to come with some lossy compression.

Overview

Lossy RLE is a very suitable algorithm when it comes to images, because in most of the cases large images do appear to have big spaces of identical pixel colors, i.e. when the half of the picture is the blue sky. By using lossy compression we can skip very short runs.

First we’ve to say how long will be the shortest run that we will keep in the compression. For instance if 3 is the shortest run, then runs of 2 consecutive elements will be skipped.

Lossless Pixel Row

Lossless compression of a pixel row in some cases can be very inefective!

Of course if we set the shortest run to be only one element long, this will make our compression completely lossless, which isn’t very effective. However when we talk about millions of pixels even runs of three or more elements are very short, so it’s up to the developer to decide how long will be the shortest run.

Some Examples

Let’s first define the shortest run that we will keep untouched to be at least three element long.

Lossy Pixel Row

We can lose some information that is invisbile to the eye.

The above image is compressed more effectively than the lossless pixel row from the previous picture.

The thing is how to merge short runs. For instance the following three runs have to be blended into one color run.

Blending Short Runs

We must chose how to blend short runs!

We can choose the middle color (option #1) or not, but this will always depend on the picture and it will be effective in some cases and ineffective in other.

Implementation

Implementing run-length encoding is easy in general. Here’s a simple PHP code that shows a lossy RLE.

/**
 * Compresses an input list of objects by losing some data using
 * run-length encoding
 * 
 * @param mixed $objectList
 * @param int $minLength
 */
function lossyRLE($objectList, $minLength)
{
	$len 		= is_string($objectList) 
			? strlen($objectList)		// string as an input stream
			: count($objectList);		// array as an input stream
	$j 		= 1;
	$compressed = array();				// compressed output
 
	for ($i = 0; $i < $len; $i++) {
		if (isset($objectList[$i+1]) && $objectList[$i] == $objectList[$i+1]) {
			$j++;
		} else {
			$l = count($compressed);
			// This is where RLE is converted to a lossy algorithm!
			// In case the run is shorter than a predefined length the
			// algorithm will skip these elements and will stretch the last
			// saved run.
			// NOTE: this logic can be changed in order to take other
			// decisions depending on the goals.
			if ($j < $minLength && $j < $l) {
				$compressed[$l-1]['count'] += $j;
			} else {
				$compressed[] = array('count' => $j, $objectList[$i]);
			}
			$j = 1;
		}
	}
 
	return $compressed;
}
 
$input = 'aaaabbaabbbbba';
 
// aaaaaaaabbbbbb
lossyRLE($input, 3);

The code above can be modified in order to work with more complex data. Let’s say we have a “pixel” abstraction as on the example above.

/**
 * Pixel abstraction
 */
class Pixel
{
	private $_color = null;
 
	public function __construct($color = '')
	{
		$this->_color = $color;
	}
 
	public function getColor() 
	{
		return $this->_color;
	}
}
 
/**
 * Inits the pixels array
 * 
 * @param array $pixels
 */
function init(array &$pixels = array())
{
	$colors = array('red', 'green', 'blue');
 
	for ($i = 0; $i < 100; $i++) {
		$pixels[] = new Pixel($colors[mt_rand(0, 2)]);
	}
}
 
/**
 * Compresses an input list of objects by losing some data using
 * run-length encoding
 * 
 * @param mixed $objectList
 * @param int $minLength
 */
function lossyRLE($objectList, $minLength)
{
	$len 		= is_string($objectList) 
			? strlen($objectList)		// string as an input stream
			: count($objectList);		// array as an input stream
	$j 		= 1;
	$compressed = array();				// compressed output
 
	for ($i = 0; $i < $len; $i++) {
		if (isset($objectList[$i+1]) && $objectList[$i] == $objectList[$i+1]) {
			$j++;
		} else {
			$l = count($compressed);
			// This is where RLE is converted to a lossy algorithm!
			// In case the run is shorter than a predefined length the
			// algorithm will skip these elements and will stretch the last
			// saved run.
			// NOTE: this logic can be changed in order to take other
			// decisions depending on the goals.
			if ($j < $minLength && $l > $j) {
				$compressed[$l-1]['count'] += $j;
			} else {
				$compressed[] = array('count' => $j, $objectList[$i]);
			}
			$j = 1;
		}
	}
 
	return $compressed;
}
 
$pixels = array();
 
// initializes the pixels array
init($pixels);
 
$compressed = lossyRLE($pixels, 3);
 
print_r($compressed);

Complexity

In general lossless RLE compelxity is linear – O(n) where n is the number of items from the input stream. Even with the small modification above the complexity remains linear. However we can modify the compression in a slightly different manner (in order to get the middle value from consecutive short runs). This will somehow affect the complexity of the algorithm, of course.

Application

Run-length encoding isn’t a very effective option when compressing texts, but for images where long runs of the identical pixels happen to occur it is quite useful.

Lossy RLE in Practice

 

Nevertheless RLE is easy to convert into a lossy algorithm, that makes it very suitable for image compression.

Related posts:

  1. Computer Algorithms: Data Compression with Run-length Encoding
  2. Computer Algorithms: Data Compression with Bitmaps
  3. Computer Algorithms: Data Compression with Diagram Encoding and Pattern Substitution
Algorithm

Published at DZone with permission of Stoimen Popov, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Spring Boot, Quarkus, or Micronaut?
  • HTTP vs Messaging for Microservices Communications
  • Fargate vs. Lambda: The Battle of the Future
  • Full Lifecycle API Management Is Dead

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: