DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • How to Get Plain Text From Common Documents in Java
  • How To Convert Common Documents to PNG Image Arrays in Java
  • How To Compare DOCX Documents in Java
  • How to Rasterize PDFs in Java

Trending

  • Unlocking the Benefits of a Private API in AWS API Gateway
  • Integrating Security as Code: A Necessity for DevSecOps
  • Unlocking AI Coding Assistants Part 2: Generating Code
  • Top Book Picks for Site Reliability Engineers
  1. DZone
  2. Data Engineering
  3. Databases
  4. How to retrieve/extract metadata information from audio files using Java and Apache Tika API?

How to retrieve/extract metadata information from audio files using Java and Apache Tika API?

By 
Singaram Subramanian user avatar
Singaram Subramanian
·
Oct. 20, 11 · Interview
Likes (0)
Comment
Save
Tweet
Share
33.5K Views

Join the DZone community and get the full member experience.

Join For Free

i guess, i’m writing this post after a long time. this time, i’m writing about apache tika api that a friend of mine and i tried out to extract/retrieve metadata information from audio files supported by it – .mp3, .aiff, .au, .midi, .wav.

to make it clear, here’s a screenshot of the information shown by windows vista about an audio file:

we wanted to extract this using java and with googling, found that apache tika would help. we needed this metadata to index audio files for it to be searchable in a search application that we’re building using apache lucene .

here’s a sample java program that extracts metadata from an mp3 file:

package singz.samples.search.audio.metadata;

import java.io.file;
import java.io.fileinputstream;
import java.io.filenotfoundexception;
import java.io.ioexception;
import java.io.inputstream;

import org.apache.tika.exception.tikaexception;
import org.apache.tika.metadata.metadata;
import org.apache.tika.parser.parsecontext;
import org.apache.tika.parser.parser;
import org.apache.tika.parser.mp3.mp3parser;
import org.xml.sax.contenthandler;
import org.xml.sax.saxexception;
import org.xml.sax.helpers.defaulthandler;

/**
* @author singaram subramanian
* extract metadata of an audio file using apache tika api
*
*/

public class audiometadataextractordemo {

public static void main(string[] args) {

// this audio file has metadata embedded in xmp (extensible metadata platform) standard
// created by adobe systems inc. xmp standardizes the definition, creation, and
// processing of extensible metadata.

string audiofileloc = "c:\\pop\\backstreetboys_showmethemeaningofbeinglonely.mp3";

try {

inputstream input = new fileinputstream(new file(audiofileloc));
contenthandler handler = new defaulthandler();
metadata metadata = new metadata();
parser parser = new mp3parser();
parsecontext parsectx = new parsecontext();
parser.parse(input, handler, metadata, parsectx);
input.close();

// list all metadata
string[] metadatanames = metadata.names();

for(string name : metadatanames){
system.out.println(name + ": " + metadata.get(name));
}

// retrieve the necessary info from metadata
// names - title, xmpdm:artist etc. - mentioned below may differ based
// on the standard used for processing and storing standardized and/or
// proprietary information relating to the contents of a file.

system.out.println("title: " + metadata.get("title"));
system.out.println("artists: " + metadata.get("xmpdm:artist"));
system.out.println("genre: " + metadata.get("xmpdm:genre"));

} catch (filenotfoundexception e) {
e.printstacktrace();
} catch (ioexception e) {
e.printstacktrace();
} catch (saxexception e) {
e.printstacktrace();
} catch (tikaexception e) {
e.printstacktrace();
}
}
}

maven pom xml

<project xmlns="http://maven.apache.org/pom/4.0.0" xmlns:xsi="http://www.w3.org/2001/xmlschema-instance"
	xsi:schemalocation="http://maven.apache.org/pom/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelversion>4.0.0</modelversion>

	<groupid>singz.samples.search.audio</groupid>
	<artifactid>audiometadataextractor</artifactid>
	<version>0.0.1</version>
	<packaging>jar</packaging>

	<name>audiometadataextractor</name>
	<url>http://maven.apache.org</url>

	<properties>
		<project.build.sourceencoding>utf-8</project.build.sourceencoding>
	</properties>

	<dependencies>
		<dependency>
			<groupid>org.apache.tika</groupid>
			<artifactid>tika-core</artifactid>
			<version>0.10</version>
		</dependency>

		<dependency>
			<groupid>org.apache.tika</groupid>
			<artifactid>tika-parsers</artifactid>
			<version>0.10</version>
		</dependency>
	</dependencies>
</project>

output

xmpdm:releasedate: 2001
xmpdm:audiochanneltype: stereo
xmpdm:album: top 100 pop
author: backstreet boys
xmpdm:artist: backstreet boys
channels: 2
xmpdm:audiosamplerate: 44100
xmpdm:logcomment: eng
xmpdm:tracknumber: 04
version: mpeg 3 layer iii version 1
xmpdm:composer: null
xmpdm:audiocompressor: mp3
title: show me the meaning of being lonely
samplerate: 44100
xmpdm:genre: pop
content-type: audio/mpeg
title: show me the meaning of being lonely
artists: backstreet boys
genre: pop

about apache tika

http://tika.apache.org/index.html

“the apache tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.”

http://www.lucidimagination.com/devzone/technical-articles/content-extraction-tika#article.tika

“apache tika is a content type detection and content extraction framework. tika provides a general application programming interface that can be used to detect the content type of a document and also parse textual content and metadata from several document formats. tika does not try to understand the full variety of different document formats by itself but instead delegates the real work to various existing parser libraries such as apache poi for microsoft formats, pdfbox for adobe pdf, neko html for html etc.

the grand idea behind tika is that it offers a generic interface for parsing multiple formats. the tika api hides the technical differences of the various parser implementations. this means that you don’t have to learn and consume one api for every format you use but can instead use a single api – the tika api. internally tika usually delegates the parsing work to existing parsing libraries and adapts the parse result so that client applications can easily manage variety of formats.

tika aims to be efficient in using available resources (mainly ram) while parsing. the tika api is stream oriented so that the parsed source document does not need to be loaded into memory all at once but only as it is needed. ultimately, however, the amount of resources consumed is mandated by the parser libraries that tika uses.

at the time of writing this, tika supports directly around 30 document formats. see list of supported document formats . the list of supported document formats is not limited by tika in any way. in the simplest case you can add support for new document formats by implementing a thin adapter that that implements the parser interface for the new document format.”

about xmp standard

http://en.wikipedia.org/wiki/extensible_metadata_platform

“the adobe extensible metadata platform ( xmp ) is a standard, created by
adobe systems inc. , for processing and storing standardized and proprietary information relating to the contents of a file.

xmp standardizes the definition, creation, and processing of extensible metadata . serialized xmp can be embedded into a significant number of popular file formats, without breaking their readability by non-xmp-aware applications. embedding metadata avoids many problems that occur when metadata is stored separately. xmp is used in pdf , photography and photo editing applications.

xmp can be used in several file formats such as pdf , jpeg , jpeg 2000 , jpeg xr , gif , png , html , tiff , adobe illustrator , psd , mp3 , mp4 , audio video interleave , wav , rf64 , audio interchange file format , postscript , encapsulated postscript , and proposed for djvu . in a typical edited jpeg file, xmp information is typically included alongside exif and iptc information interchange model data.”

from http://singztechmusings.wordpress.com/2011/10/17/how-to-retrieveextract-metadata-information-from-audio-files-using-java-and-apache-tika-api/

Apache Tika Metadata API Java (programming language) Document application Parser (programming language)

Opinions expressed by DZone contributors are their own.

Related

  • How to Get Plain Text From Common Documents in Java
  • How To Convert Common Documents to PNG Image Arrays in Java
  • How To Compare DOCX Documents in Java
  • How to Rasterize PDFs in Java

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!