Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Base64 Encoding: A Visual Explanation

DZone's Guide to

Base64 Encoding: A Visual Explanation

A software engineer provides an introduction to the concept of Base64 encoding, giving some visual aids and sample code to help.

· Web Dev Zone
Free Resource

Add user login and MFA to your next project in minutes. Create a free Okta developer account, drop in one of our SDKs to your application and get back to building.

Base64 encoding appears here and there in web development. Perhaps its most familiar usage is in HTML image tags when we inline our image data (more on this later):

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAALCAYAAABCm8wlAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH4QoPAxIb88htFgAAABl0RVh0Q29tbWVudABDcmVhdGVkIHdpdGggR0lNUFeBDhcAAACxSURBVBjTdY6xasJgGEXP/RvoonvAd8hDyD84+BZBEMSxL9GtQ8Fis7i6BkGI4DP4CA4dnQON3g6WNjb2wLd8nAsHWsR3D7JXt18kALFwz2dGmPVhJt0IcenUDVsgu91eCRZ9IOMfAnBvSCz8I3QYL0yV6zfyL+VUxKWfMJuOEFd+dE3pC1Finwj0HfGBeKGmblcFTIN4U2C4m+hZAaTrASSGox6YV7k+ARAp4gIIOH0BmuY1E5TjCIUAAAAASUVORK5CYII=">

As a programmer, it is easy to accept this random-looking ASCII string as the “Base64 encoded” abstraction and move on. To go from raw bytes to the Base64 encoding, however, is a straightforward process, and this post illustrates how we get there. We’ll also discuss some of the why behind Base64 encoding and a couple places you may see it.

A Visualization

The gist of the encoding process is captured in the interactive CodePen linked via the image below (or here). Type in some ASCII characters in the top input and hit the “Encode” button.

Image title

If you run a few strings through this visualization, you may notice that the encoding process is simply a pair of nested loops. The outer loop iterates over the data in 24-bit increments; the spec refers to these as “input groups.” The inner loop iterates over each input group 6 bits at a time. Each 6-bit value is interpreted as an unsigned integer that is used to index an alphabet of 64 characters. The indexed alphabet value is the output. With the help of ES6 generators, this encoding process can be implemented with just a handful of functions:

/**
 * @param {Uint8Array} bytes
 * @return {string} Base64 encoded string
 */
function base64Encode(bytes) {
   let encoding = '';
   for (let group of groups24Bits(bytes)) {
      for (let value of values6Bits(group)) {
         if (value !== undefined) {
            encoding += ALPHABET[value];
         } else {
            encoding += PAD;
         }
      }
   }
   return encoding;
}

const ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
const PAD = '=';

/**
 * @param {Uint8Array} bytes
 * @return {Uint8Array} The next input group (yielded on each execution)
 */
function* groups24Bits(bytes) {
   for (let i = 0; i < bytes.length; i += 3) {
      yield bytes.slice(i, i + 3); // 3 bytes/3 octets/24 bits
   }
}

/**
 * @param {Uint8Array} group Expected to be array of 1 to 3 bytes
 * @return {number|undefined} The next 6-bit value from the 
 * input group (yielded on each execution)
 */
function* values6Bits(group) {
   const paddedGroup = Uint8Array.from([0, 0, 0]);
   paddedGroup.set(group);

   let numValues = Math.ceil((group.length * 8) / 6);
   for (let i = 0; i < numValues; i++) { let base64Value; if (i == 0) { base64Value = (paddedGroup[0] & 0b11111100) >> 2;
      } else if (i == 1) {
         base64Value = (paddedGroup[0] & 0b00000011) << 4; base64Value = base64Value | ((paddedGroup[1] & 0b11110000) >> 4);
      } else if (i == 2) {
         base64Value = (paddedGroup[1] & 0b00001111) << 2; base64Value = base64Value | ((paddedGroup[2] & 0b11000000) >> 6);
      } else if (i == 3) {
         base64Value = paddedGroup[2] & 0b00111111;
      }
      yield base64Value;
   }

   let numPaddingValues = 4 - numValues;
   for (let j = 0; j < numPaddingValues; j++) {
      yield undefined;
   }
}

If there is an “interesting” part to the encoding process, it is the ending conditions where we must apply padding. Each input group is required to be 24 bits long (or equivalently three 8-bit bytes). (It seems likely the spec writers chose 24-bit input groups since 24 is the least common multiple of 6 and 8.) In the implementation given above, we pad the final group with bytes of zeroes when the final input group is only 1 or 2 bytes long. As we iterate over this final input group, if the 6-bit value consists entirely of padding bits, then = is the output character, the designated padding character. If, however, the 6-bit value straddles “real” bits and padding bits—as can be seen in the input “foob”—then the alphabet is still indexed and the padding bits are taken to be zeroes.

A Couple Usages

You will not find any mention of “HTML” in the Base64 spec. Instead, the authors simply mention that Base64 encoding is used in environments where, “perhaps for legacy reasons,” the “storage or transfer” of data is limited to ASCII characters. More or less, this idea sums up the browser and its heavy consumption of HTML, JSON, CSS, and JavaScript. Increasingly, this text is encoded using UTF-8, a superset of ASCII. In this text-heavy ecosystem, Base64 encoding finds various niche applications.

Data URLs

The first part of a URL is the scheme. It is the prefix string that goes before the first colon; for example, it is the https in https://example.com or the beginning ftp in ftp://ftp.funet.fi/pub/standards/RFC/rfc4648.txt. The scheme tells the client (a browser or a different network app) how to retrieve the resource and what protocol to follow. The scheme prefix also makes URLs extensible and suitable for future protocols. If a new protocol comes along, we can create a new URL scheme for it and still identify resources by URL.

The data scheme is one such extension, which we saw in the image encoded in the introduction. This scheme tells clients, “My resource’s data is located right here in the rest of this URL string.” URLs that use the data scheme follow this format:

data:[<mediatype>][;base64],<data>

You can find more particulars about this format in the spec, but we will focus on the image “data URLs” we mentioned at the outset. Here are a couple examples of the big three binary image formats used in a few different contexts:

.triangle-icn {
    background-image: url(data:image/gif;base64,R0lGODlhCAAHAIABAGSV7f///yH+EUNyZWF0ZWQgd2l0aCBHSU1QACH5BAEKAAEALAAAAAAIAAcAAAINjI8BkMq41onRUHljAQA7);
    background-repeat: no-repeat;
    background-position: center;
}
const image = new Image();
image.src =   "data:image/jpg;base64,/9j/4AAQSkZJRgABAQEAWQBZAAD/2wBDAAMCAgMCAgMDAwMEAwMEBQgFBQQEBQoHBwYIDAoMDAsKCwsNDhIQDQ4RDgsLEBYQERMUFRUVDA8XGBYUGBIUFRT/2wBDAQMEBAUEBQkFBQkUDQsNFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBT/wgARCAAHAAgDAREAAhEBAxEB/8QAFAABAAAAAAAAAAAAAAAAAAAAB//EABUBAQEAAAAAAAAAAAAAAAAAAAUG/9oADAMBAAIQAxAAAAFXph//xAAWEAADAAAAAAAAAAAAAAAAAAABFRb/2gAIAQEAAQUCoS8//8QAGBEAAgMAAAAAAAAAAAAAAAAAABEUI0H/2gAIAQMBAT8Bk3PD/8QAGxEAAAcBAAAAAAAAAAAAAAAAAAESFBUxQuH/2gAIAQIBAT8Bjyap1fB//8QAGxAAAQQDAAAAAAAAAAAAAAAAEQABEhQzUWH/2gAIAQEABj8CmXrYxza//8QAGxAAAQQDAAAAAAAAAAAAAAAAAQARIUFRcZH/2gAIAQEAAT8hI7j3Vy86hf/aAAwDAQACAAMAAAAQf//EABoRAAEFAQAAAAAAAAAAAAAAAJEAESFRgaH/2gAIAQMBAT8QeLlnkL//xAAaEQACAgMAAAAAAAAAAAAAAAABIRFhAFHB/9oACAECAQE/EARV18QtS8//xAAYEAEBAAMAAAAAAAAAAAAAAAABEQAh8P/aAAgBAQABPxBQyNdodCLh/9k=" 

const canvas = document.getElementById("myCanvas");
const ctx = canvas.getContext('2d');
ctx.drawImage(image, 0, 0);
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAHCAYAAAA1WQxeAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAA3XAAAN1wFCKJt4AAAAB3RJTUUH4QoQAiIwiYqPWwAAAHNJREFUCNd1zaENQjEAhOGvgAFbAwnMwhoMUFeBYBEEVXQAGIIFGARXjXkC817SEPjdXfLfBR2ptD0WeNQcGUPPDUPNcTcVs85OWGObSjtNfUilwRxDt/SuOa7SpQmjfcbx6+5eczyEVNoGL79ZznD1n+cHk7cb99sXV8cAAAAASUVORK5CYII=">

Source Maps

Another common but less visible usage of Base64 encoding is in source maps. Below is a source map generated by Google’s Closure compiler:

{
    "version":3,
    "file":"",
    "lineCount":1,
    "mappings":"AAUIA,OAAAC,IAAA,CAAYC,CAGFC,IATAC,QAAQ,EAAW,CAE7B,IAAAF,EAAA,CAOsBG,cATO,CAMjBH,GAAZ;",
    "sources":["greet.js"],
    "names":["console","log","_greeting","greeter","Greeter","greeting"]
}

Here Base64 encoding is used for the mappings field. The comma and semicolon delimited snippets are the Base64 encoded binary data of integers encoded as variable-length quantities (VLQ).

Images and source maps are just a couple places Base64 encoding is used. If you know of others or any novel uses of Base64 encoding, please mention them in the comments below. It also might be worth “inspecting” page sources to find others. For example, in Chrome, if you go to chrome://dino  you can find that the offline dinosaur game’s image assets (and it appears sound assets) are Base64 encoded (examining these assets—which are also embedded on YouTube’s homepage—is how I discovered the dinosaur can duck under the low-flying pterodactyls).

Launch your application faster with Okta’s user management API. Register today for the free forever developer edition!

Topics:
web developement ,javascript ,base64

Published at DZone with permission of Ty Lewis, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}