how to iterate through a list of files in a directory and compare them to an index. - Perl
This is a discussion on how to iterate through a list of files in a directory and compare them to an index. - Perl ; Hi,
I here is a sample of the problem I am trying to solve.
I have an index.txt file that contains two values separated by a pipe
symbol like this:
junk_file_test1|test1.pdf
junk_file_test2|test2.pdf
I slurp the file in, open a directory ...
-
how to iterate through a list of files in a directory and compare them to an index.
Hi,
I here is a sample of the problem I am trying to solve.
I have an index.txt file that contains two values separated by a pipe
symbol like this:
junk_file_test1|test1.pdf
junk_file_test2|test2.pdf
I slurp the file in, open a directory handle and try to compare the
value to the right of the pipe to the file names in the directory.
My final goal is to normalize the names so that they are the same
case on the file system as indicated in the index file. I think this
will be a simple rename. however, at this point when I run through
my while loop I only get one file name output when the script ends,
but I expect to see multiple matches. It seems like the all the
contents of the file index are run against one iteration of the
directory contents then it quits.
I have changed the list assignment from the split function to an
array and added a foreach loop there but that seems to provide the
same output.
Here is a copy of my current code snippet:
#!/usr/bin/perl
#
use strict;
use warnings;
my $index = "/tmp/www/index.txt";
my $pdf_dir = "/tmp/www";
open (INDEX, "$index") || die "Can't open $index: $!\n";
opendir (PDFDIR, $pdf_dir) || die "Can't open $pdf_dir: $!\n";
while (my $line = <INDEX> ) {
chomp $line;
my ($raw_name, $std_name) = (split /\|/, $line);
if (grep {$std_name} readdir(PDFDIR)) {
print "I found this: $std_name\n";
}
}
Thanks,
-Angus
-
Re: how to iterate through a list of files in a directory and compare them to an index.
On Fri, Mar 7, 2008 at 2:00 AM, Angus Glanville <aglanville@comcast.net> wrote:
snip
> It seems like the all the
> contents of the file index are run against one iteration of the
> directory contents then it quits.
snip
> opendir (PDFDIR, $pdf_dir) || die "Can't open $pdf_dir: $!\n";
>
> while (my $line = <INDEX> ) {
> chomp $line;
> my ($raw_name, $std_name) = (split /\|/, $line);
> if (grep {$std_name} readdir(PDFDIR)) {
> print "I found this: $std_name\n";
> }
> }
snip
You are opening the PDFDIR directory handle outside of the while loop.
You are then consuming all of the entries with the grep. You either
need to move the opendir to inside the while loop or add a call to
rewinddir*.
* see perldoc -f rewinddir or http://perldoc.perl.org/functions/rewinddir.html
--
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.
-
Re: how to iterate through a list of files in a directory and comparethem to an index.
Angus Glanville wrote:
> I have an index.txt file that contains two values separated by a pipe
> symbol like this:
> junk_file_test1|test1.pdf
> junk_file_test2|test2.pdf
>
> I slurp the file in,
That not what you do in the code you posted.
> open a directory handle and try to compare the
> value to the right of the pipe to the file names in the directory.
Storing the file names in a hash makes later comparisons easier.
> My
> final goal is to normalize the names so that they are the same case on
> the file system as indicated in the index file. I think this will be a
> simple rename. however, at this point when I run through my while loop
> I only get one file name output when the script ends, but I expect to
> see multiple matches.
<snip>
> while (my $line = <INDEX> ) {
> chomp $line;
> my ($raw_name, $std_name) = (split /\|/, $line);
> if (grep {$std_name} readdir(PDFDIR)) {
That empties PDFDIR at the first iteration of the while loop, which is
why you only get one match.
Please consider this example:
# store file names in hash
open my $INDEX, '<', $index or die "Can't open $index: $!";
my %std_names;
while ( <$INDEX> ) {
my ($name) = /\|(.+)/ or die 'Parsing failed';
$std_names{ lc $name } = $name;
}
# process directory
chdir $pdf_dir or die $!;
opendir my $PDFDIR, $pdf_dir or die "Can't open $pdf_dir: $!";
while ( my $file = readdir $PDFDIR ) {
if ( $std_names{ lc $file } ) {
rename $file, $std_names{ lc $file } or die $!;
}
}
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
-
Re: how to iterate through a list of files in a directory and comparethem to an index.
Angus Glanville wrote:
> Hi,
Hello,
> I here is a sample of the problem I am trying to solve.
>
> I have an index.txt file that contains two values separated by a pipe
> symbol like this:
> junk_file_test1|test1.pdf
> junk_file_test2|test2.pdf
>
> I slurp the file in, open a directory handle and try to compare the
> value to the right of the pipe to the file names in the directory. My
> final goal is to normalize the names so that they are the same case on
> the file system as indicated in the index file. I think this will be a
> simple rename. however, at this point when I run through my while loop
> I only get one file name output when the script ends, but I expect to
> see multiple matches. It seems like the all the contents of the file
> index are run against one iteration of the directory contents then it
> quits.
>
> I have changed the list assignment from the split function to an array
> and added a foreach loop there but that seems to provide the same output.
>
> Here is a copy of my current code snippet:
>
> #!/usr/bin/perl
> #
>
> use strict;
> use warnings;
>
> my $index = "/tmp/www/index.txt";
> my $pdf_dir = "/tmp/www";
>
> open (INDEX, "$index") || die "Can't open $index: $!\n";
> opendir (PDFDIR, $pdf_dir) || die "Can't open $pdf_dir: $!\n";
>
> while (my $line = <INDEX> ) {
> chomp $line;
> my ($raw_name, $std_name) = (split /\|/, $line);
> if (grep {$std_name} readdir(PDFDIR)) {
readdir() produces a list of all the names in $pdf_dir and then every
subsequent use of readdir() produces undef. grep() uses a boolean test
so the value of $std_name is tested for true or false and since it is
always true all values from readdir() are passed through.
> print "I found this: $std_name\n";
> }
> }
You probably need to use the -e (exists) file test operator:
while ( my $line = <INDEX> ) {
chomp $line;
my ( $raw_name, $std_name ) = split /\|/, $line;
if ( -e "$pdf_dir/$std_name" ) {
print "I found this: $std_name\n";
}
}
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
-
Re: how to iterate through a list of files in a directory and comparethemto an index.
John W. Krahn wrote:
> Angus Glanville wrote:
>>
<snip>
>> My final goal is to normalize the names so that they are the same
>> case on the file system as indicated in the index file.
<snip>
> You probably need to use the -e (exists) file test operator:
>
> while ( my $line = <INDEX> ) {
> chomp $line;
> my ( $raw_name, $std_name ) = split /\|/, $line;
> if ( -e "$pdf_dir/$std_name" ) {
> print "I found this: $std_name\n";
> }
> }
Since case may differ, that code might fail to find some of the files
(at least on non-Windows platforms).
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl