After years working with BSD and Linux, this morning I was limited by rm. Apparently there is a maximum number of files that can be passed as arguments to rm. Thankfully, there's an easy workaround.
Using rm to delete files is as natural to many as breathing, particularly in the BSD/Linux community. Whether it be removing a single named file, a directory, or a set of files using a complex regular expression; rm is generally used with ease on a daily basis. So imagine my surprise when an apparently simple rm command doesn't work.
I use amavisd-new to perform spam/virus checks on my email. I think it's an excellent method for controlling email of questionable content, since it allows the administrator much control in what action to take when dealing with spam/virus-loaded messages. Without getting into a lot of details, one of the things I have amavisd-new configured to do is move spam/virus mail scored at a particular threshold to /var/virusmails. They aren't delivered to the end user, and they are contained in a location only available to root.
As you can see, each message classified as spam is gzipped in a timestamp sort of fashion. The same goes for the viruses, although they aren't gzipped.
root@mx /var/virusmails # ls razor-agent.log spam-3398a20c9a59797df9b57fbe34feeace-20040519-084342-19051-05.gz spam-57e230b6d1dca0dadf83d858d0b10788-20040519-084400-19144-03.gz spam-6f3be6d2304f90e418db23443916101a-20040519-082357-18227-10.gz virus-20040419-091017-12544-01 virus-20040419-130621-14993-07 virus-20040421-120113-57877-07 virus-20040421-165651-61698-07 virus-20040423-020850-90966-03 virus-20040423-090733-97665-04 virus-20040427-211030-99133-07 virus-20040427-225312-01622-01 virus-20040428-190241-18845-05 virus-20040505-103654-59956-10
After a time, these messages begin to fill /var rather needlessly. My /var partition isn't monstrously large. So from time to time I go in and remove the spam archives. The last example was after a recent removal, hence only three spam messages.
So I went in and attempt to remove all the files starting with "spam-" - simple enough, right?
root@mx /var/virusmails # rm spam-* /bin/rm: Argument list too long.
Wrong. The irritating (if not slightly elusive) error message. How many files was I dealing with here?
root@mx /var/virusmails # ls -1 | grep virus | wc -l 1667
After a brief search -- checking out limitations on rm and tcsh -- the only useful information that I could find regarding maximum number of arguments was from rm's manpage. It states that "[t]he rm command uses getopt(3) to parse its arguments," which doesn't tell me a whole lot. Perhaps it can't handle more than 1024 arguments?
The workaround is simple enough. Use find to pipe all the matching files to rm, one at a time.
root@mx /var/virusmails # find . -name 'spam-*' | xargs rm
Works like a charm.
May 25, 2004 - Update
I've received a handful of emails from Slashdot readers about this article, offering more information on this issue. The most well researched one came from John Simpson, who originally saw (and answered) a similar question on the Linux Enthusiasts & Professionals mailing list. While my solution does work for the files in question (in this case), John's comments describe some of its limitations, as well as some history on why the limits are there to begin with.
i found your site through a slashdot link, and your comment about "rm: Argument list too long" caught my eye... i have a correction and an explanation of what the problem really is... it's not a limitation of "rm" or of your shell, it's a limit inside the kernel. first the correction: the "find | xargs rm" thing only worked correctly because none of the filenames involved had any spaces in them. if the filenames involved have spaces, you will need to do use find's "-print0" option in conjunction with xargs's "-0" option. otherwise the shell that xargs uses to execute the "rm" command line will treat the space as a token separator, thereby treating the name as two (or more) names, none of which are the thing you're trying to actually delete. the command line should look like this... find . -name 'spam-*' -print0 | xargs -0 rm and the explanation: in the linux kernel is the function execve(), which is how all of the other exec() functions (execl, execlp, execle, etc.) are actually implemented. the way it works is by creating a 128K buffer at the top end of the memory space and copying the command line and environment for the new process into this space. it then loads the new program into memory, sets its argv and envp pointers, and jumps to its entry point. there's a lot more to it than that, but the point is that there is a 128K buffer which is the only thing "held" from the parent process to the child. the "Argument list too long" error message is actually the kernel's E2BIG error code, returned when the execve() is not able to fit the supplied argument list and environment into the 128K buffer. this came up a while back on my linux user group's mailing list, with somebody who wanted some obscenely huge number of environment variables. i got curious and walked the kernel source code in order to find the answer. my post is here... http://leap-cf.org/oldarchive/2004-May/038802.html the mailing list archiver didn't clean up the extra "=20" marks that gpg put at the end of each line when it signed it, but other than that it's pretty clear.
For a more in depth technical explanation regarding the Linux kernel, read John's orginal post to the LEAP list. While I use BSD kernels, I imagine that there are similar (if not identical) limits there as well.