DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Setting Up Data Pipelines With Snowflake Dynamic Tables
  • Harnessing Real-Time Insights With Streaming SQL on Kafka
  • Thread-Safety Pitfalls in XML Processing
  • Java Stream API: 3 Things Every Developer Should Know About

Trending

  • Java's Quiet Revolution: Thriving in the Serverless Kubernetes Era
  • Develop a Reverse Proxy With Caching in Go
  • Issue and Present Verifiable Credentials With Spring Boot and Android
  • Why We Still Struggle With Manual Test Execution in 2025
  1. DZone
  2. Coding
  3. JavaScript
  4. Turning Recursive File System Traversal Into Stream

Turning Recursive File System Traversal Into Stream

By 
Tomasz Nurkiewicz user avatar
Tomasz Nurkiewicz
DZone Core CORE ·
Jul. 09, 14 · Interview
Likes (0)
Comment
Save
Tweet
Share
6.7K Views

Join the DZone community and get the full member experience.

Join For Free

When I was learning programming, back in the days of Turbo Pascal, I managed to list files in directory using FindFirst,FindNext and FindClose functions. First I came up with a procedure printing contents of a given directory. You can imagine how proud I was to discover I can actually call that procedure from itself to traverse file system recursively. Well, I didn't know the term recursion back then, but it worked. Similar code in Java would look something like this:

public void printFilesRecursively(final File folder) {
    for (final File entry : listFilesIn(folder)) {
        if (entry.isDirectory()) {
            printFilesRecursively(entry);
        } else {
            System.out.println(entry.getAbsolutePath());
        }
    }
}

private File[] listFilesIn(File folder) {
    final File[] files = folder.listFiles();
    return files != null ? files : new File[]{};
}

Didn't know File.listFiles() can return null, did ya? That's how it signals I/O errors, like if IOException never existed. But that's not the point. System.out.println() is rarely what we need, thus this method is neither reusable nor composable. It is probably the best counterexample of Open/Closed principle. I can imagine several use cases for recursive traversal of file system:

  1. Getting a complete list of all files for display purposes
  2. Looking for all files matching given pattern/property (also check out File.list(FilenameFilter))
  3. Searching for one particular file
  4. Processing every single file, e.g. sending it over network

Every use case above has a unique set of challenges. For example we don't want to build a list of all files because it will take a significant amount of time and memory before we can start processing it. We would like to process files as they are discovered and lazily - by pipe-lining computation (but without clumsy visitor pattern). Also we want to short-circuit searching to avoid unnecessary I/O. Luckily in Java 8 some of these issues can be addressed with streams:

final File home = new File(FileUtils.getUserDirectoryPath());
final Stream<Path> files = Files.list(home.toPath());
files.forEach(System.out::println);

Remember that Files.list(Path) (new in Java 8) does not look into subdirectories - we'll fix that later. The most important lesson here is: Files.list() returns a Stream<Path> - a value that we can pass around, compose, map, filter, etc. It's extremely flexible, e.g. it's fairly simple to count how many files I have in a directory per extension:

import org.apache.commons.io.FilenameUtils;

//...

final File home = new File(FileUtils.getUserDirectoryPath());
final Stream<Path> files = Files.list(home.toPath());
final Map<String, List<Path>> byExtension = files
        .filter(path -> !path.toFile().isDirectory())
        .collect(groupingBy(path -> getExt(path)));

byExtension.
        forEach((extension, matchingFiles) ->
                System.out.println(
                        extension + "\t" + matchingFiles.size()));

//...

private String getExt(Path path) {
    return FilenameUtils.getExtension(path.toString()).toLowerCase();
}

OK, just another API, you might say. But it becomes really interesting once we need to go deeper, recursively traversing subdirectories. One amazing feature of streams is that you can combine them with each other in various ways. Old Scala saying "flatMap that shit" is applicable here as well, check out this recursive Java 8 code:

//WARNING: doesn't compile, yet:

private static Stream<Path> filesInDir(Path dir) {
    return Files.list(dir)
            .flatMap(path ->
                    path.toFile().isDirectory() ?
                            filesInDir(path) :
                            singletonList(path).stream());
}

Stream<Path> lazily produced by filesInDir() contains all files within directory including subdirectories. You can use it as any other stream by calling map(), filter(), anyMatch(), findFirst(), etc. But how does it really work?flatMap() is similar to map() but while map() is a straightforward 1:1 transformation, flatMap() allows replacing single entry in input Stream with multiple entries. If we had used map(), we would have end up with Stream<Stream<Path>>(or maybe Stream<List<Path>>). But flatMap() flattens this structure, in a way exploding inner entries. Let's see a simple example. Imagine Files.list() returned two files and one directory. For files flatMap() receives a one-element stream with that file. We can't simply return that file, we have to wrap it, but essentially this is no-operation. It gets way more interesting for a directory. In that case we call filesInDir() recursively. As a result we get a stream of contents of that directory, which we inject into our outer stream.

Code above is short, sweet and... doesn't compile. These pesky checked exceptions again. Here is a fixed code, wrapping checked exceptions for sanity:

public static Stream<Path> filesInDir(Path dir) {
    return listFiles(dir)
            .flatMap(path ->
                    path.toFile().isDirectory() ?
                            filesInDir(path) :
                            singletonList(path).stream());
}

private static Stream<Path> listFiles(Path dir) {
    try {
        return Files.list(dir);
    } catch (IOException e) {
        throw Throwables.propagate(e);
    }
}

Unfortunately this quite elegant code is not lazy enough. flatMap() evaluates eagerly, thus it always traverses all subdirectories, even if we barely ask for first file. You can try with my tiny LazySeq library that tries to provide even lazier abstraction, similar to streams in Scala or lazy-seq in Clojure. But even standard JDK 8 solution might be really helpful and simplify your code significantly.

File system Stream (computing)

Published at DZone with permission of Tomasz Nurkiewicz, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Setting Up Data Pipelines With Snowflake Dynamic Tables
  • Harnessing Real-Time Insights With Streaming SQL on Kafka
  • Thread-Safety Pitfalls in XML Processing
  • Java Stream API: 3 Things Every Developer Should Know About

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!