Over a million developers have joined DZone.
Platinum Partner

How to repair a full Unix directory

· Java Zone

The Java Zone is brought to you in partnership with ZeroTurnaround. Discover how you can skip the build and redeploy process by using JRebel by ZeroTurnaround.

A reader wrote me this week that his bash scripts were complaining "out of memory"; what should he do? It didn't take long to get him moving again.

While my colleague Sandra Henry-Stocker usually covers this territory in her "Unix as a Second Language", the ideas involved in this episode apply nicely in common situations developers and Windows administrators encounter, so I think there is value in reporting them here. My correspondent knew that he wanted to run

    find . -type f -exec grep -i -l -H "keyword" '{}' + | xargs rm -rf

but he was getting "out of memory" because he had millions (!) of files in his directory tree, and, if I understood him correctly, was operating with an older host that only had 256 megabytes of main memory. What should he do?

My first thought:

# Caution: this coding is fragile, in that it mishandles filenames which
# embed blanks. Accommodating those is a story for another day.
find . -type f -exec grep -i -l -H "$keyword" {} \; > $INTERMEDIATE_FILE
rm -rf $NAME

Did that help? "Yes!", the report came back--well, "yes and no." As I'm a big believer that long journeys begin with small steps, I found more encouragement than discouragement in the answer. Apparently the questioner needed to do several waves of cleanup, and "unrolling" the one-liner with an $INTERMEDIATE_FILE helped with some of the out-of-memory situations, but not all.

"One step at a time", I thought. After a little more negotiation, we reduced his symptoms to "out of memory" faults with....



    find ./ -size -6k -type f >> $INTERMEDIATE_FILE

Did I have any tricks left for those?

Sure; in fact, I have a history of creating this situation for myself. I often use temporary files for various test automations I run, and, unless I'm scrupulous about cleaning up after the tests, it's easy to find myself with tens of thousands of files named, for example, /tmp/tmp${RANDOM}.log. I've often had so many of these that trying to clean up the mess with rm /tmp/tmp*log does just what my questioner described: complains "out of memory". In a case like this, it's time to "eat the elephant one bite at a time", which translates, in this case, to something like

    rm /tmp/tmp*a*.log
rm /tmp/tmp*b*.log
rm /tmp/tmp*[g-j]*.log
rm /tmp/tmp*[A-H]*.log

In English, the idea is to specify a subset of /tmp/tmp*.log small enough to fit in memory, but large enough to nibble away at the whole list. After slicing out a few "chunks", we quickly reduce the whole collection of remaining /tmp/tmp*.log to a manageable size, where more traditional bash programming can take over.

For find, a homologous approach would be something like

    find . -name "*a*" -size -6k -type f >> $INTERMEDIATE_FILE
find . -name "*[bc]*" -size 6k -type f >> $INTERMEDIATE_FILE

The excitement wasn't quite over yet, of course; situations like this seem always to have "loose ends". In the case of my questioner, he had many files whose names included non-ASCII Unicode characters. I've got plenty of tricks for dealing with those, too, including switching to Tcl for my scripting. This time, though, we started with the files whose names were easy to express, processed all of them, and then determined, to my non-surprise, that the residuum which remained was small enough that the questioner could use his usual bash coding skills. Mission accomplished.

What's the conclusion? I don't have a particularly polished aphorism to summarize what happened. I do know, though, that many cases that look like "show-stoppers" the first time encountered turn out to be easy to solve for someone with just a little more experience. If you're feeling stuck, be clear with yourself what your true requirements are, what you're getting, and what appears to constrain you. Ask for help; someone else, with a different perspective, might quickly see a way to fit together all the elements of your problem to make a solution.

There's also a lesson here about craft-work that I don't yet know how to put into words. Part of the difference between "textbook learning" and the kind of professional training that diesel mechanics, physicians, lawyers, and plumbers all practice has to do with learning how to handle novel situations. It involves thorough apprenticeship in the basics, followed by exposure to progressively more challenging variations. If rm * doesn't give you what you want, break down the * part into pieces small enough to handle.

The Java Zone is brought to you in partnership with ZeroTurnaround. Discover how you can skip the build and redeploy process by using JRebel by ZeroTurnaround.


Published at DZone with permission of Cameron Laird .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}