Extracting strings delimited by other strings - Perl
This is a discussion on Extracting strings delimited by other strings - Perl ; Hi,
I need to write some code that will allowed embedded, specially formatted
comments to document test cases within a program (SAS code). The code will
process the programs, pulling out the test case information. AFAIK this is
similar to ...
-
Extracting strings delimited by other strings
Hi,
I need to write some code that will allowed embedded, specially formatted
comments to document test cases within a program (SAS code). The code will
process the programs, pulling out the test case information. AFAIK this is
similar to how Javadoc works, embedding documentation alongside code.
The syntax will look like:
/*
<testcase>
TESTID: TEST1
OBJECTIVE: The objective of the test
PROCEDURE: The procedure that the test uses
Continuation line from the above
RESULTS: The expected results of the test
Continuation line
Another "continuation" line
</testcase>
*/
The syntax can also be embedded in titles statements:
/* <testcase> */
title3 "TESTID: TEST1";
title4 "OBJECTIVE: The objective of the test";
title5 "PROCEDURE: The procedure that the test uses";
title6 " Continuation line from the above";
title6 "RESULTS: The expected results of the test";
title7 " Continuation line";
title8 ' Another "continuation" line';
/* </testcase> */
After processing the program, the desired output is a tab-delimited string
containing filename, testid, objective, procedure, and results. For those
lines that were continued, I would like an embedded CR/LF. Leading spaces
should be
removed, as well as any title statements, "outer" quotation marks
(preserving inner quotation marks), and trailing semi-colons.
Are there any modules that I could use as a starting point for this? If you
have any code does something similar, could you either post it or email it
to me? It will be easier to modify an existing example than to start from
scratch.
Kind Regards,
Scott
-
Re: Extracting strings delimited by other strings
"Scott Bass" <usenet739_yahoo_com_au> wrote in
news:427c11c6$0$32016$5a62ac22@per-qv1-newsreader-01.iinet.net.au:
> I need to write some code
....
> After processing the program, the desired output is a tab-delimited
> string containing filename, testid, objective, procedure, and results.
> For those lines that were continued, I would like an embedded CR/LF.
Surely, you do not expect people here to write a program to your specs.
> Are there any modules that I could use as a starting point for this?
Did you find anything on CPAN? Did you try Google?
The surest way to get quality help here is to post what you have
attempted so far, and specific questions regarding specific issues you
are have encountered.
Sinan
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(reverse each component and remove .invalid for email address)
comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
-
Re: Extracting strings delimited by other strings
Scott Bass <> wrote:
> /*
><testcase>
> TESTID: TEST1
> OBJECTIVE: The objective of the test
> PROCEDURE: The procedure that the test uses
> Continuation line from the above
> RESULTS: The expected results of the test
> Continuation line
> Another "continuation" line
></testcase>
> */
>
> The syntax can also be embedded in titles statements:
>
> /* <testcase> */
> title3 "TESTID: TEST1";
> title4 "OBJECTIVE: The objective of the test";
> title5 "PROCEDURE: The procedure that the test uses";
> title6 " Continuation line from the above";
> title6 "RESULTS: The expected results of the test";
> title7 " Continuation line";
> title8 ' Another "continuation" line';
> /* </testcase> */
>
> After processing the program, the desired output is a tab-delimited string
I'm going to use commas because they are easier to see.
> containing filename, testid, objective, procedure, and results. For those
> lines that were continued, I would like an embedded CR/LF. Leading spaces
> should be
> removed, as well as any title statements, "outer" quotation marks
> (preserving inner quotation marks), and trailing semi-colons.
>
> Are there any modules that I could use as a starting point for this?
I dunno.
Hardly seems worth modularization when it only takes about
20 lines of regular ol' Perl.
Assuming the whole file is slurped into $_ :
------------------------------
while ( m#<testcase>(.*?)</testcase>#gs ) {
my $record = normalize($1);
my(undef,@parts) = split /(?:TESTID|OBJECTIVE|PROCEDURE|RESULTS):\s+/,
$record;
chomp @parts;
print join( ',', @parts), "\n";
}
sub normalize {
my($r) = @_;
$r =~ s#^\s*\*/##; # snip bits of comment delimiters
$r =~ s#/\*\s*##;
$r =~ s#^title\d+ ##gm; # remove title statement cruft
$r =~ s#^(['"])(.*?)\1;#$2#gm;
$r =~ s#\n\s\s+#\n#; # join continuation lines
$r =~ s#^\s*##; # trim spaces
$r =~ s#\s*$##;
return $r;
}
------------------------------
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
Similar Threads
-
By Application Development in forum DOTNET
Replies: 5
Last Post: 10-19-2007, 02:40 PM
-
By Application Development in forum Python
Replies: 4
Last Post: 08-23-2007, 05:20 AM
-
By Application Development in forum Editors
Replies: 5
Last Post: 04-20-2007, 08:48 AM
-
By Application Development in forum Compilers
Replies: 1
Last Post: 04-18-2007, 12:51 PM
-
By Application Development in forum basic.visual
Replies: 6
Last Post: 02-28-2004, 01:59 PM