how to iterate through a list of files in a directory and compare them to an index. - Perl

This is a discussion on how to iterate through a list of files in a directory and compare them to an index. - Perl ; Hi, I here is a sample of the problem I am trying to solve. I have an index.txt file that contains two values separated by a pipe symbol like this: junk_file_test1|test1.pdf junk_file_test2|test2.pdf I slurp the file in, open a directory ...

+ Reply to Thread
Results 1 to 5 of 5

how to iterate through a list of files in a directory and compare them to an index.

  1. Default how to iterate through a list of files in a directory and compare them to an index.

    Hi,

    I here is a sample of the problem I am trying to solve.

    I have an index.txt file that contains two values separated by a pipe
    symbol like this:
    junk_file_test1|test1.pdf
    junk_file_test2|test2.pdf

    I slurp the file in, open a directory handle and try to compare the
    value to the right of the pipe to the file names in the directory.
    My final goal is to normalize the names so that they are the same
    case on the file system as indicated in the index file. I think this
    will be a simple rename. however, at this point when I run through
    my while loop I only get one file name output when the script ends,
    but I expect to see multiple matches. It seems like the all the
    contents of the file index are run against one iteration of the
    directory contents then it quits.

    I have changed the list assignment from the split function to an
    array and added a foreach loop there but that seems to provide the
    same output.

    Here is a copy of my current code snippet:

    #!/usr/bin/perl
    #

    use strict;
    use warnings;

    my $index = "/tmp/www/index.txt";
    my $pdf_dir = "/tmp/www";

    open (INDEX, "$index") || die "Can't open $index: $!\n";
    opendir (PDFDIR, $pdf_dir) || die "Can't open $pdf_dir: $!\n";

    while (my $line = <INDEX> ) {
    chomp $line;
    my ($raw_name, $std_name) = (split /\|/, $line);
    if (grep {$std_name} readdir(PDFDIR)) {
    print "I found this: $std_name\n";
    }
    }

    Thanks,

    -Angus

  2. Default Re: how to iterate through a list of files in a directory and compare them to an index.

    On Fri, Mar 7, 2008 at 2:00 AM, Angus Glanville <aglanville@comcast.net> wrote:
    snip
    > It seems like the all the
    > contents of the file index are run against one iteration of the
    > directory contents then it quits.

    snip
    > opendir (PDFDIR, $pdf_dir) || die "Can't open $pdf_dir: $!\n";
    >
    > while (my $line = <INDEX> ) {
    > chomp $line;
    > my ($raw_name, $std_name) = (split /\|/, $line);
    > if (grep {$std_name} readdir(PDFDIR)) {
    > print "I found this: $std_name\n";
    > }
    > }

    snip

    You are opening the PDFDIR directory handle outside of the while loop.
    You are then consuming all of the entries with the grep. You either
    need to move the opendir to inside the while loop or add a call to
    rewinddir*.

    * see perldoc -f rewinddir or http://perldoc.perl.org/functions/rewinddir.html
    --
    Chas. Owens
    wonkden.net
    The most important skill a programmer can have is the ability to read.

  3. Default Re: how to iterate through a list of files in a directory and comparethem to an index.

    Angus Glanville wrote:
    > I have an index.txt file that contains two values separated by a pipe
    > symbol like this:
    > junk_file_test1|test1.pdf
    > junk_file_test2|test2.pdf
    >
    > I slurp the file in,


    That not what you do in the code you posted.

    > open a directory handle and try to compare the
    > value to the right of the pipe to the file names in the directory.


    Storing the file names in a hash makes later comparisons easier.

    > My
    > final goal is to normalize the names so that they are the same case on
    > the file system as indicated in the index file. I think this will be a
    > simple rename. however, at this point when I run through my while loop
    > I only get one file name output when the script ends, but I expect to
    > see multiple matches.


    <snip>

    > while (my $line = <INDEX> ) {
    > chomp $line;
    > my ($raw_name, $std_name) = (split /\|/, $line);
    > if (grep {$std_name} readdir(PDFDIR)) {


    That empties PDFDIR at the first iteration of the while loop, which is
    why you only get one match.

    Please consider this example:

    # store file names in hash
    open my $INDEX, '<', $index or die "Can't open $index: $!";
    my %std_names;
    while ( <$INDEX> ) {
    my ($name) = /\|(.+)/ or die 'Parsing failed';
    $std_names{ lc $name } = $name;
    }

    # process directory
    chdir $pdf_dir or die $!;
    opendir my $PDFDIR, $pdf_dir or die "Can't open $pdf_dir: $!";
    while ( my $file = readdir $PDFDIR ) {
    if ( $std_names{ lc $file } ) {
    rename $file, $std_names{ lc $file } or die $!;
    }
    }

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl

  4. Default Re: how to iterate through a list of files in a directory and comparethem to an index.

    Angus Glanville wrote:
    > Hi,


    Hello,

    > I here is a sample of the problem I am trying to solve.
    >
    > I have an index.txt file that contains two values separated by a pipe
    > symbol like this:
    > junk_file_test1|test1.pdf
    > junk_file_test2|test2.pdf
    >
    > I slurp the file in, open a directory handle and try to compare the
    > value to the right of the pipe to the file names in the directory. My
    > final goal is to normalize the names so that they are the same case on
    > the file system as indicated in the index file. I think this will be a
    > simple rename. however, at this point when I run through my while loop
    > I only get one file name output when the script ends, but I expect to
    > see multiple matches. It seems like the all the contents of the file
    > index are run against one iteration of the directory contents then it
    > quits.
    >
    > I have changed the list assignment from the split function to an array
    > and added a foreach loop there but that seems to provide the same output.
    >
    > Here is a copy of my current code snippet:
    >
    > #!/usr/bin/perl
    > #
    >
    > use strict;
    > use warnings;
    >
    > my $index = "/tmp/www/index.txt";
    > my $pdf_dir = "/tmp/www";
    >
    > open (INDEX, "$index") || die "Can't open $index: $!\n";
    > opendir (PDFDIR, $pdf_dir) || die "Can't open $pdf_dir: $!\n";
    >
    > while (my $line = <INDEX> ) {
    > chomp $line;
    > my ($raw_name, $std_name) = (split /\|/, $line);
    > if (grep {$std_name} readdir(PDFDIR)) {


    readdir() produces a list of all the names in $pdf_dir and then every
    subsequent use of readdir() produces undef. grep() uses a boolean test
    so the value of $std_name is tested for true or false and since it is
    always true all values from readdir() are passed through.


    > print "I found this: $std_name\n";
    > }
    > }


    You probably need to use the -e (exists) file test operator:

    while ( my $line = <INDEX> ) {
    chomp $line;
    my ( $raw_name, $std_name ) = split /\|/, $line;
    if ( -e "$pdf_dir/$std_name" ) {
    print "I found this: $std_name\n";
    }
    }



    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall

  5. Default Re: how to iterate through a list of files in a directory and comparethemto an index.

    John W. Krahn wrote:
    > Angus Glanville wrote:
    >>


    <snip>

    >> My final goal is to normalize the names so that they are the same
    >> case on the file system as indicated in the index file.


    <snip>

    > You probably need to use the -e (exists) file test operator:
    >
    > while ( my $line = <INDEX> ) {
    > chomp $line;
    > my ( $raw_name, $std_name ) = split /\|/, $line;
    > if ( -e "$pdf_dir/$std_name" ) {
    > print "I found this: $std_name\n";
    > }
    > }


    Since case may differ, that code might fail to find some of the files
    (at least on non-Windows platforms).

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl

+ Reply to Thread