DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Creating Customized and Bootable Disk Images of Host Systems
  • Reactive Event Streaming Architecture With Kafka, Redis Streams, Spring Boot, and HTTP Server-Sent Events (SSE)
  • Superior Stream Processing: Apache Flink's Impact on Data Lakehouse Architecture
  • SeaweedFS vs. JuiceFS Design and Features

Trending

  • TDD With FastAPI Is Easy
  • Running End-To-End Tests in GitHub Actions
  • Demystifying Enterprise Integration Patterns: Bridging the Gap Between Systems
  • How to Configure Istio, Prometheus and Grafana for Monitoring
  1. DZone
  2. Coding
  3. JavaScript
  4. Turning Recursive File System Traversal Into Stream

Turning Recursive File System Traversal Into Stream

Tomasz Nurkiewicz user avatar by
Tomasz Nurkiewicz
CORE ·
Jul. 09, 14 · Interview
Like (0)
Save
Tweet
Share
6.10K Views

Join the DZone community and get the full member experience.

Join For Free

When I was learning programming, back in the days of Turbo Pascal, I managed to list files in directory using FindFirst,FindNext and FindClose functions. First I came up with a procedure printing contents of a given directory. You can imagine how proud I was to discover I can actually call that procedure from itself to traverse file system recursively. Well, I didn't know the term recursion back then, but it worked. Similar code in Java would look something like this:

public void printFilesRecursively(final File folder) {
    for (final File entry : listFilesIn(folder)) {
        if (entry.isDirectory()) {
            printFilesRecursively(entry);
        } else {
            System.out.println(entry.getAbsolutePath());
        }
    }
}

private File[] listFilesIn(File folder) {
    final File[] files = folder.listFiles();
    return files != null ? files : new File[]{};
}

Didn't know File.listFiles() can return null, did ya? That's how it signals I/O errors, like if IOException never existed. But that's not the point. System.out.println() is rarely what we need, thus this method is neither reusable nor composable. It is probably the best counterexample of Open/Closed principle. I can imagine several use cases for recursive traversal of file system:

  1. Getting a complete list of all files for display purposes
  2. Looking for all files matching given pattern/property (also check out File.list(FilenameFilter))
  3. Searching for one particular file
  4. Processing every single file, e.g. sending it over network

Every use case above has a unique set of challenges. For example we don't want to build a list of all files because it will take a significant amount of time and memory before we can start processing it. We would like to process files as they are discovered and lazily - by pipe-lining computation (but without clumsy visitor pattern). Also we want to short-circuit searching to avoid unnecessary I/O. Luckily in Java 8 some of these issues can be addressed with streams:

final File home = new File(FileUtils.getUserDirectoryPath());
final Stream<Path> files = Files.list(home.toPath());
files.forEach(System.out::println);

Remember that Files.list(Path) (new in Java 8) does not look into subdirectories - we'll fix that later. The most important lesson here is: Files.list() returns a Stream<Path> - a value that we can pass around, compose, map, filter, etc. It's extremely flexible, e.g. it's fairly simple to count how many files I have in a directory per extension:

import org.apache.commons.io.FilenameUtils;

//...

final File home = new File(FileUtils.getUserDirectoryPath());
final Stream<Path> files = Files.list(home.toPath());
final Map<String, List<Path>> byExtension = files
        .filter(path -> !path.toFile().isDirectory())
        .collect(groupingBy(path -> getExt(path)));

byExtension.
        forEach((extension, matchingFiles) ->
                System.out.println(
                        extension + "\t" + matchingFiles.size()));

//...

private String getExt(Path path) {
    return FilenameUtils.getExtension(path.toString()).toLowerCase();
}

OK, just another API, you might say. But it becomes really interesting once we need to go deeper, recursively traversing subdirectories. One amazing feature of streams is that you can combine them with each other in various ways. Old Scala saying "flatMap that shit" is applicable here as well, check out this recursive Java 8 code:

//WARNING: doesn't compile, yet:

private static Stream<Path> filesInDir(Path dir) {
    return Files.list(dir)
            .flatMap(path ->
                    path.toFile().isDirectory() ?
                            filesInDir(path) :
                            singletonList(path).stream());
}

Stream<Path> lazily produced by filesInDir() contains all files within directory including subdirectories. You can use it as any other stream by calling map(), filter(), anyMatch(), findFirst(), etc. But how does it really work?flatMap() is similar to map() but while map() is a straightforward 1:1 transformation, flatMap() allows replacing single entry in input Stream with multiple entries. If we had used map(), we would have end up with Stream<Stream<Path>>(or maybe Stream<List<Path>>). But flatMap() flattens this structure, in a way exploding inner entries. Let's see a simple example. Imagine Files.list() returned two files and one directory. For files flatMap() receives a one-element stream with that file. We can't simply return that file, we have to wrap it, but essentially this is no-operation. It gets way more interesting for a directory. In that case we call filesInDir() recursively. As a result we get a stream of contents of that directory, which we inject into our outer stream.

Code above is short, sweet and... doesn't compile. These pesky checked exceptions again. Here is a fixed code, wrapping checked exceptions for sanity:

public static Stream<Path> filesInDir(Path dir) {
    return listFiles(dir)
            .flatMap(path ->
                    path.toFile().isDirectory() ?
                            filesInDir(path) :
                            singletonList(path).stream());
}

private static Stream<Path> listFiles(Path dir) {
    try {
        return Files.list(dir);
    } catch (IOException e) {
        throw Throwables.propagate(e);
    }
}

Unfortunately this quite elegant code is not lazy enough. flatMap() evaluates eagerly, thus it always traverses all subdirectories, even if we barely ask for first file. You can try with my tiny LazySeq library that tries to provide even lazier abstraction, similar to streams in Scala or lazy-seq in Clojure. But even standard JDK 8 solution might be really helpful and simplify your code significantly.

File system Stream (computing)

Published at DZone with permission of Tomasz Nurkiewicz, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Creating Customized and Bootable Disk Images of Host Systems
  • Reactive Event Streaming Architecture With Kafka, Redis Streams, Spring Boot, and HTTP Server-Sent Events (SSE)
  • Superior Stream Processing: Apache Flink's Impact on Data Lakehouse Architecture
  • SeaweedFS vs. JuiceFS Design and Features

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: