Please help me in writing script

This is a discussion on Please help me in writing script within the awk forums in Programming Languages category; Hi , I have the following two files.The characters 14-18 in file input2 will match 2nd field deleimited by ~ in file input1. swadmin @ tb142:/rangedoms1/working/ CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input1 P00000012~00027 P00000027~00061 P00000270~00417 P00000271~00418 P00000272~00419 P00000273~00420 P00000274~00422 P00000275~00424 P00000276~00428 P00000277~00429 P00000278~00431 P00000279~00432 P00000329~00483 P60000329~00483 P50000329~00483 P40000329~00483 P30000329~00483 P20000329~00483 P10000329~00483 P00000483~00639 P01000079~00178 P11000079~00178 P90000079~00178 P80000079~00178 P70000079~00178 P60000079~00178 P50000079~00178 P40000079~00178 P30000079~00178 P20000079~00178 P10000079~00178 P00000178~00306 swadmin @ tb142:/rangedoms1/working/ CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input2 000134900017400027B-ABATPNB050062184TPNB050063880 000134900017400483B-ABATPNB050062184TPNB050063880 000134900017400178C-BCBTPNB050562934TPNB050446531 000134900017400178B-ABATPNB050062184TPNB050063880 000134900017400483C-ACATPNB050064199TPNB050064268 000134900017400178C-ACATPNB050064199TPNB050064268 Now I want the file that contains every line in input2 but the characters 14-18 must be replaced with the matching field1 delimited by ~ in file ...

Go Back   Application Development Forum > Programming Languages > awk

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 07-02-2008, 10:14 AM
Injam
Guest
 
Default Please help me in writing script

Hi ,
I have the following two files.The characters 14-18 in file input2
will match 2nd field deleimited by ~ in file input1.

swadmin@tb142:/rangedoms1/working/
CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input1
P00000012~00027
P00000027~00061
P00000270~00417
P00000271~00418
P00000272~00419
P00000273~00420
P00000274~00422
P00000275~00424
P00000276~00428
P00000277~00429
P00000278~00431
P00000279~00432
P00000329~00483
P60000329~00483
P50000329~00483
P40000329~00483
P30000329~00483
P20000329~00483
P10000329~00483
P00000483~00639
P01000079~00178
P11000079~00178
P90000079~00178
P80000079~00178
P70000079~00178
P60000079~00178
P50000079~00178
P40000079~00178
P30000079~00178
P20000079~00178
P10000079~00178
P00000178~00306
swadmin@tb142:/rangedoms1/working/
CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input2
000134900017400027B-ABATPNB050062184TPNB050063880
000134900017400483B-ABATPNB050062184TPNB050063880
000134900017400178C-BCBTPNB050562934TPNB050446531
000134900017400178B-ABATPNB050062184TPNB050063880
000134900017400483C-ACATPNB050064199TPNB050064268
000134900017400178C-ACATPNB050064199TPNB050064268
Now I want the file that contains every line in input2 but the
characters 14-18 must be replaced with the matching field1 delimited
by ~ in file input1 and it should not be repeated.
I have used the following script which displays
swadmin@tb142:/rangedoms1/working/
CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
$1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}'
input1 input2
0001349000174 P00000012 B-ABATPNB050062184TPNB050063880
0001349000174 P10000329 B-ABATPNB050062184TPNB050063880
0001349000174 P10000079 C-BCBTPNB050562934TPNB050446531
0001349000174 P10000079 B-ABATPNB050062184TPNB050063880
0001349000174 P10000329 C-ACATPNB050064199TPNB050064268
0001349000174 P10000079 C-ACATPNB050064199TPNB050064268

But P10000079, P10000329 are repeated which we don’t want.

resulting file should look like the following

0001349000174 P00000012B-ABATPNB050062184TPNB050063880
0001349000174 P00000329B-ABATPNB050062184TPNB050063880
0001349000174 P01000079C-BCBTPNB050562934TPNB050446531
0001349000174 P11000079B-ABATPNB050062184TPNB050063880
0001349000174 P60000329C-ACATPNB050064199TPNB050064268
0001349000174 P90000079C-ACATPNB050064199TPNB050064268
I have used the following script which displays
swadmin@tb142:/rangedoms1/working/
CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
$1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}'
input1 input2
0001349000174 P00000012 B-ABATPNB050062184TPNB050063880
0001349000174 P10000329 B-ABATPNB050062184TPNB050063880
0001349000174 P10000079 C-BCBTPNB050562934TPNB050446531
0001349000174 P10000079 B-ABATPNB050062184TPNB050063880
0001349000174 P10000329 C-ACATPNB050064199TPNB050064268
0001349000174 P10000079 C-ACATPNB050064199TPNB050064268

But P10000079, P10000329 are repeated which we don’t want.

Many thanks in advance for help being done.

Regards
Injam
Reply With Quote
  #2  
Old 07-02-2008, 10:52 AM
Dave B
Guest
 
Default Re: Please help me in writing script

Injam wrote:

> Hi ,
> I have the following two files.The characters 14-18 in file input2
> will match 2nd field deleimited by ~ in file input1.
>
> swadmin@tb142:/rangedoms1/working/
> CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input1
> P00000012~00027
> P00000027~00061
> P00000270~00417
> P00000271~00418
> P00000272~00419
> P00000273~00420
> P00000274~00422
> P00000275~00424
> P00000276~00428
> P00000277~00429
> P00000278~00431
> P00000279~00432
> P00000329~00483
> P60000329~00483
> P50000329~00483
> P40000329~00483
> P30000329~00483
> P20000329~00483
> P10000329~00483
> P00000483~00639
> P01000079~00178
> P11000079~00178
> P90000079~00178
> P80000079~00178
> P70000079~00178
> P60000079~00178
> P50000079~00178
> P40000079~00178
> P30000079~00178
> P20000079~00178
> P10000079~00178
> P00000178~00306
> swadmin@tb142:/rangedoms1/working/
> CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input2
> 000134900017400027B-ABATPNB050062184TPNB050063880
> 000134900017400483B-ABATPNB050062184TPNB050063880
> 000134900017400178C-BCBTPNB050562934TPNB050446531
> 000134900017400178B-ABATPNB050062184TPNB050063880
> 000134900017400483C-ACATPNB050064199TPNB050064268
> 000134900017400178C-ACATPNB050064199TPNB050064268
> Now I want the file that contains every line in input2 but the
> characters 14-18 must be replaced with the matching field1 delimited
> by ~ in file input1 and it should not be repeated.
>[snip]
> resulting file should look like the following
>
> 0001349000174 P00000012B-ABATPNB050062184TPNB050063880
> 0001349000174 P00000329B-ABATPNB050062184TPNB050063880
> 0001349000174 P01000079C-BCBTPNB050562934TPNB050446531
> 0001349000174 P11000079B-ABATPNB050062184TPNB050063880
> 0001349000174 P60000329C-ACATPNB050064199TPNB050064268
> 0001349000174 P90000079C-ACATPNB050064199TPNB050064268


I assume that, for each key that appears in "input2", you want to use the
"next" available value for that key in "input1". Thus, it is implicitly
assumed that each key in "input2" will occur at most a number of times equal
to the number of times it occurs in "input1" (eg, 7 for 00483, 11 for 00178
or 1 for 00306), otherwise there will be no available values for the
replacement.

If I got that right, try this:

awk --posix -F'~' '
NR==FNR {a[$2]=a[$2] $1;next}
{k=substr($0,14,5);v=substr(a[k],1,9);sub(/^.{9}/,"",a[k]);
print substr($0,1,13) v substr($0,19)}' input1 input2

--
awk 'BEGIN{O="~"~"~";o="=="=="==";o+=+o;x=o""o;while(X ++<x-o-O)c=c"%c";
X=O""O;printf c,O+x*o*o+X,(X+x)*(O+o)-o,+X*X-o-O,o+x*o*o+X,x*o*o+X-o-o,
x*(o+o)+X-O,+X*X-X+o+o,x+x+x-o,o+X+O+o+x*o*o,x+O+x*o*o,x*o*o+x+O+o+o+O,
x+o+x*o*o,x+x*o*o+O,o+x+x*o*o,o+X*o*o,X+x*o*o,x*o* o+O+x,x+x*o*o-O,X-O}'
Reply With Quote
  #3  
Old 07-02-2008, 11:33 AM
Injam
Guest
 
Default Re: Please help me in writing script

On Jul 2, 7:52*pm, Dave B <da...@addr.invalid> wrote:
> Injam wrote:
> > Hi ,
> > I have the following two files.The characters 14-18 in file input2
> > will match 2nd field deleimited by ~ in file input1.

>
> > swadmin@tb142:/rangedoms1/working/
> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input1
> > P00000012~00027
> > P00000027~00061
> > P00000270~00417
> > P00000271~00418
> > P00000272~00419
> > P00000273~00420
> > P00000274~00422
> > P00000275~00424
> > P00000276~00428
> > P00000277~00429
> > P00000278~00431
> > P00000279~00432
> > P00000329~00483
> > P60000329~00483
> > P50000329~00483
> > P40000329~00483
> > P30000329~00483
> > P20000329~00483
> > P10000329~00483
> > P00000483~00639
> > P01000079~00178
> > P11000079~00178
> > P90000079~00178
> > P80000079~00178
> > P70000079~00178
> > P60000079~00178
> > P50000079~00178
> > P40000079~00178
> > P30000079~00178
> > P20000079~00178
> > P10000079~00178
> > P00000178~00306
> > swadmin@tb142:/rangedoms1/working/
> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input2
> > 000134900017400027B-ABATPNB050062184TPNB050063880
> > 000134900017400483B-ABATPNB050062184TPNB050063880
> > 000134900017400178C-BCBTPNB050562934TPNB050446531
> > 000134900017400178B-ABATPNB050062184TPNB050063880
> > 000134900017400483C-ACATPNB050064199TPNB050064268
> > 000134900017400178C-ACATPNB050064199TPNB050064268
> > Now I want the file that contains every line in input2 but the
> > characters 14-18 must be replaced with the matching field1 delimited
> > by ~ in file input1 and it should not be repeated.
> >[snip]
> > resulting file should look like the following

>
> > 0001349000174 P00000012B-ABATPNB050062184TPNB050063880
> > 0001349000174 P00000329B-ABATPNB050062184TPNB050063880
> > 0001349000174 P01000079C-BCBTPNB050562934TPNB050446531
> > 0001349000174 P11000079B-ABATPNB050062184TPNB050063880
> > 0001349000174 P60000329C-ACATPNB050064199TPNB050064268
> > 0001349000174 P90000079C-ACATPNB050064199TPNB050064268

>
> I assume that, for each key that appears in "input2", you want to use the
> "next" available value for that key in "input1". Thus, it is implicitly
> assumed that each key in "input2" will occur at most a number of times equal
> to the number of times it occurs in "input1" (eg, 7 for 00483, 11 for 00178
> or 1 for 00306), otherwise there will be no available values for the
> replacement.
>
> If I got that right, try this:
>
> awk --posix -F'~' '
> * NR==FNR {a[$2]=a[$2] $1;next}
> * * * * * {k=substr($0,14,5);v=substr(a[k],1,9);sub(/^.{9}/,"",a[k]);
> * * * * * *print substr($0,1,13) v substr($0,19)}' input1 input2
>
> --
> awk 'BEGIN{O="~"~"~";o="=="=="==";o+=+o;x=o""o;while(X ++<x-o-O)c=c"%c";
> X=O""O;printf c,O+x*o*o+X,(X+x)*(O+o)-o,+X*X-o-O,o+x*o*o+X,x*o*o+X-o-o,
> x*(o+o)+X-O,+X*X-X+o+o,x+x+x-o,o+X+O+o+x*o*o,x+O+x*o*o,x*o*o+x+O+o+o+O,
> x+o+x*o*o,x+x*o*o+O,o+x+x*o*o,o+X*o*o,X+x*o*o,x*o* o+O+x,x+x*o*o-O,X-O}'- Hide quoted text -
>
> - Show quoted text -


Hi,
I am facing the following issue when i tried to execute the above
given code
swadmin@tb142:/rangedoms1/working/
CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> str($0,1,13) v
substr($0,19)}' input1 input2 <
awk: Not a recognized flag: -
Usage: awk [-F Character][-v Variable=Value][-f File|Commands]
[Variable=Value|File ...]
swadmin@tb142:/rangedoms1/working/
CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>

Many thanks
Injam
Reply With Quote
  #4  
Old 07-02-2008, 11:46 AM
Dave B
Guest
 
Default Re: Please help me in writing script

Injam wrote:

> Hi,
> I am facing the following issue when i tried to execute the above
> given code
> swadmin@tb142:/rangedoms1/working/
> CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> str($0,1,13) v
> substr($0,19)}' input1 input2 <
> awk: Not a recognized flag: -
> Usage: awk [-F Character][-v Variable=Value][-f File|Commands]
> [Variable=Value|File ...]
> swadmin@tb142:/rangedoms1/working/
> CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>


The above errorr message is a bit confused.
First, make sure you copied/pasted correctly. Then, if you are not using GNU
awk, omit the "--posix" flag. What system are you using?

--
awk 'BEGIN{O="~"~"~";o="=="=="==";o+=+o;x=o""o;while(X ++<x-o-O)c=c"%c";
X=O""O;printf c,O+x*o*o+X,(X+x)*(O+o)-o,+X*X-o-O,o+x*o*o+X,x*o*o+X-o-o,
x*(o+o)+X-O,+X*X-X+o+o,x+x+x-o,o+X+O+o+x*o*o,x+O+x*o*o,x*o*o+x+O+o+o+O,
x+o+x*o*o,x+x*o*o+O,o+x+x*o*o,o+X*o*o,X+x*o*o,x*o* o+O+x,x+x*o*o-O,X-O}'
Reply With Quote
  #5  
Old 07-02-2008, 03:04 PM
loki harfagr
Guest
 
Default Re: Please help me in writing script

On Wed, 02 Jul 2008 07:14:55 -0700, Injam wrote:

> Hi ,
> I have the following two files.The characters 14-18 in file input2 will
> match 2nd field deleimited by ~ in file input1.
>
> swadmin@tb142:/rangedoms1/working/
> CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input1 P00000012~00027
> P00000027~00061
> P00000270~00417
> P00000271~00418
> P00000272~00419
> P00000273~00420
> P00000274~00422
> P00000275~00424
> P00000276~00428
> P00000277~00429
> P00000278~00431
> P00000279~00432
> P00000329~00483
> P60000329~00483
> P50000329~00483
> P40000329~00483
> P30000329~00483
> P20000329~00483
> P10000329~00483
> P00000483~00639
> P01000079~00178
> P11000079~00178
> P90000079~00178
> P80000079~00178
> P70000079~00178
> P60000079~00178
> P50000079~00178
> P40000079~00178
> P30000079~00178
> P20000079~00178
> P10000079~00178
> P00000178~00306
> swadmin@tb142:/rangedoms1/working/
> CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input2
> 000134900017400027B-ABATPNB050062184TPNB050063880
> 000134900017400483B-ABATPNB050062184TPNB050063880
> 000134900017400178C-BCBTPNB050562934TPNB050446531
> 000134900017400178B-ABATPNB050062184TPNB050063880
> 000134900017400483C-ACATPNB050064199TPNB050064268
> 000134900017400178C-ACATPNB050064199TPNB050064268 Now I want the file
> that contains every line in input2 but the characters 14-18 must be
> replaced with the matching field1 delimited by ~ in file input1 and it
> should not be repeated. I have used the following script which displays
> swadmin@tb142:/rangedoms1/working/
> CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
> $1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}' input1
> input2
> 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880 0001349000174
> P10000329 B-ABATPNB050062184TPNB050063880 0001349000174 P10000079
> C-BCBTPNB050562934TPNB050446531 0001349000174 P10000079
> B-ABATPNB050062184TPNB050063880 0001349000174 P10000329
> C-ACATPNB050064199TPNB050064268 0001349000174 P10000079
> C-ACATPNB050064199TPNB050064268
>
> But P10000079, P10000329 are repeated which we don’t want.
>
> resulting file should look like the following
>
> 0001349000174 P00000012B-ABATPNB050062184TPNB050063880 0001349000174
> P00000329B-ABATPNB050062184TPNB050063880 0001349000174
> P01000079C-BCBTPNB050562934TPNB050446531 0001349000174
> P11000079B-ABATPNB050062184TPNB050063880 0001349000174
> P60000329C-ACATPNB050064199TPNB050064268 0001349000174
> P90000079C-ACATPNB050064199TPNB050064268 I have used the following
> script which displays swadmin@tb142:/rangedoms1/working/
> CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
> $1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}' input1
> input2
> 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880 0001349000174
> P10000329 B-ABATPNB050062184TPNB050063880 0001349000174 P10000079
> C-BCBTPNB050562934TPNB050446531 0001349000174 P10000079
> B-ABATPNB050062184TPNB050063880 0001349000174 P10000329
> C-ACATPNB050064199TPNB050064268 0001349000174 P10000079
> C-ACATPNB050064199TPNB050064268
>
> But P10000079, P10000329 are repeated which we don’t want.
>
> Many thanks in advance for help being done.
>
> Regards
> Injam


yup, regards too ;-)

----------
$ gawk --re-interval '
NR==FNR{T[$2]=$1}
NR>FNR{
t=gensub(/^(.{13})(.....).*$/,"\\2","")
a=gensub(/^(.{13})(.....).*$/,"\\1","")
c=gensub(/^(.{13})(.....)(.*$)/,"\\3","")

for(i in T){
if(t==i){ print a" "T[i]c }
}
}
' FS="~" OFS="" input1 input2
----------

given the samples you gave within your posts it *shoukd* be
what the Dr ordered, but...

if any problem please give information about:
which awk was tested, which LANG/LC_ALL, and provide test files
with wanted and used I/O samples ;-)
Reply With Quote
  #6  
Old 07-02-2008, 11:40 PM
Injam
Guest
 
Default Re: Please help me in writing script

On Jul 3, 12:04*am, loki harfagr <l...@theDarkDesign.free.fr> wrote:
> On Wed, 02 Jul 2008 07:14:55 -0700, Injam wrote:
> > Hi ,
> > I have the following two files.The characters 14-18 in file input2 will
> > match 2nd field deleimited by ~ in file input1.

>
> > swadmin@tb142:/rangedoms1/working/
> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input1 P00000012~00027
> > P00000027~00061
> > P00000270~00417
> > P00000271~00418
> > P00000272~00419
> > P00000273~00420
> > P00000274~00422
> > P00000275~00424
> > P00000276~00428
> > P00000277~00429
> > P00000278~00431
> > P00000279~00432
> > P00000329~00483
> > P60000329~00483
> > P50000329~00483
> > P40000329~00483
> > P30000329~00483
> > P20000329~00483
> > P10000329~00483
> > P00000483~00639
> > P01000079~00178
> > P11000079~00178
> > P90000079~00178
> > P80000079~00178
> > P70000079~00178
> > P60000079~00178
> > P50000079~00178
> > P40000079~00178
> > P30000079~00178
> > P20000079~00178
> > P10000079~00178
> > P00000178~00306
> > swadmin@tb142:/rangedoms1/working/
> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input2
> > 000134900017400027B-ABATPNB050062184TPNB050063880
> > 000134900017400483B-ABATPNB050062184TPNB050063880
> > 000134900017400178C-BCBTPNB050562934TPNB050446531
> > 000134900017400178B-ABATPNB050062184TPNB050063880
> > 000134900017400483C-ACATPNB050064199TPNB050064268
> > 000134900017400178C-ACATPNB050064199TPNB050064268 Now I want the file
> > that contains every line in input2 but the characters 14-18 must be
> > replaced with the matching field1 delimited by ~ in file input1 and it
> > should not be repeated. I have used the following script which displays
> > swadmin@tb142:/rangedoms1/working/
> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
> > $1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}' input1
> > input2
> > 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880 0001349000174
> > P10000329 B-ABATPNB050062184TPNB050063880 0001349000174 P10000079
> > C-BCBTPNB050562934TPNB050446531 0001349000174 P10000079
> > B-ABATPNB050062184TPNB050063880 0001349000174 P10000329
> > C-ACATPNB050064199TPNB050064268 0001349000174 P10000079
> > C-ACATPNB050064199TPNB050064268

>
> > But P10000079, P10000329 are repeated which we don’t want.

>
> > resulting file should look like the following

>
> > 0001349000174 P00000012B-ABATPNB050062184TPNB050063880 0001349000174
> > P00000329B-ABATPNB050062184TPNB050063880 0001349000174
> > P01000079C-BCBTPNB050562934TPNB050446531 0001349000174
> > P11000079B-ABATPNB050062184TPNB050063880 0001349000174
> > P60000329C-ACATPNB050064199TPNB050064268 0001349000174
> > P90000079C-ACATPNB050064199TPNB050064268 I have used the following
> > script which displays swadmin@tb142:/rangedoms1/working/
> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
> > $1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}' input1
> > input2
> > 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880 0001349000174
> > P10000329 B-ABATPNB050062184TPNB050063880 0001349000174 P10000079
> > C-BCBTPNB050562934TPNB050446531 0001349000174 P10000079
> > B-ABATPNB050062184TPNB050063880 0001349000174 P10000329
> > C-ACATPNB050064199TPNB050064268 0001349000174 P10000079
> > C-ACATPNB050064199TPNB050064268

>
> > But P10000079, P10000329 are repeated which we don’t want.

>
> > Many thanks in advance for help being done.

>
> > Regards
> > Injam

>
> *yup, regards too ;-)
>
> ----------
> $ gawk --re-interval '
> NR==FNR{T[$2]=$1}
> NR>FNR{
> * * * * t=gensub(/^(.{13})(.....).*$/,"\\2","")
> * * * * a=gensub(/^(.{13})(.....).*$/,"\\1","")
> * * * * c=gensub(/^(.{13})(.....)(.*$)/,"\\3","")
>
> * * * * for(i in T){
> * * * * * * * * if(t==i){ print a" "T[i]c }
> * * * * }}
>
> ' FS="~" OFS="" input1 input2
> ----------
>
> *given the samples you gave within your posts it *shoukd* be
> what the Dr ordered, but...
>
> *if any problem please give information about:
> *which awk was tested, which LANG/LC_ALL, and provide test files
> with wanted and used I/O samples ;-)- Hide quoted text -
>
> - Show quoted text -


Hi ,
sorry for so many questions....
gawk is not installed in my server....and also gensub function is also
not there in my server.....
resulting file should look like the following

0001349000174 P00000012B-ABATPNB050062184TPNB050063880
0001349000174 P00000329B-ABATPNB050062184TPNB050063880
0001349000174 P01000079C-BCBTPNB050562934TPNB050446531
0001349000174 P11000079B-ABATPNB050062184TPNB050063880
0001349000174 P60000329C-ACATPNB050064199TPNB050064268
0001349000174 P90000079C-ACATPNB050064199TPNB050064268
I have used the following script which displays
swadmin@tb142:/rangedoms1/working/
CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
$1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}'
input1 input2
0001349000174 P00000012 B-ABATPNB050062184TPNB050063880
0001349000174 P10000329 B-ABATPNB050062184TPNB050063880
0001349000174 P10000079 C-BCBTPNB050562934TPNB050446531
0001349000174 P10000079 B-ABATPNB050062184TPNB050063880
0001349000174 P10000329 C-ACATPNB050064199TPNB050064268
0001349000174 P10000079 C-ACATPNB050064199TPNB050064268

But P10000079, P10000329 are repeated which we don’t want.

Many thanks

Regards
Inajm
Reply With Quote
  #7  
Old 07-03-2008, 04:19 AM
Loki Harfagr
Guest
 
Default Re: Please help me in writing script

Wed, 02 Jul 2008 20:40:50 -0700, Injam did catÂ*:

> On Jul 3, 12:04Â*am, loki harfagr <l...@theDarkDesign.free.fr> wrote:
>> On Wed, 02 Jul 2008 07:14:55 -0700, Injam wrote:
>> > Hi ,
>> > I have the following two files.The characters 14-18 in file input2
>> > will match 2nd field deleimited by ~ in file input1.

>>
>> > swadmin@tb142:/rangedoms1/working/
>> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input1 P00000012~00027
>> > P00000027~00061
>> > P00000270~00417
>> > P00000271~00418
>> > P00000272~00419
>> > P00000273~00420
>> > P00000274~00422
>> > P00000275~00424
>> > P00000276~00428
>> > P00000277~00429
>> > P00000278~00431
>> > P00000279~00432
>> > P00000329~00483
>> > P60000329~00483
>> > P50000329~00483
>> > P40000329~00483
>> > P30000329~00483
>> > P20000329~00483
>> > P10000329~00483
>> > P00000483~00639
>> > P01000079~00178
>> > P11000079~00178
>> > P90000079~00178
>> > P80000079~00178
>> > P70000079~00178
>> > P60000079~00178
>> > P50000079~00178
>> > P40000079~00178
>> > P30000079~00178
>> > P20000079~00178
>> > P10000079~00178
>> > P00000178~00306
>> > swadmin@tb142:/rangedoms1/working/
>> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input2
>> > 000134900017400027B-ABATPNB050062184TPNB050063880
>> > 000134900017400483B-ABATPNB050062184TPNB050063880
>> > 000134900017400178C-BCBTPNB050562934TPNB050446531
>> > 000134900017400178B-ABATPNB050062184TPNB050063880
>> > 000134900017400483C-ACATPNB050064199TPNB050064268
>> > 000134900017400178C-ACATPNB050064199TPNB050064268 Now I want the file
>> > that contains every line in input2 but the characters 14-18 must be
>> > replaced with the matching field1 delimited by ~ in file input1 and
>> > it should not be repeated. I have used the following script which
>> > displays swadmin@tb142:/rangedoms1/working/
>> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
>> > $1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}'
>> > input1 input2
>> > 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880 0001349000174
>> > P10000329 B-ABATPNB050062184TPNB050063880 0001349000174 P10000079
>> > C-BCBTPNB050562934TPNB050446531 0001349000174 P10000079
>> > B-ABATPNB050062184TPNB050063880 0001349000174 P10000329
>> > C-ACATPNB050064199TPNB050064268 0001349000174 P10000079
>> > C-ACATPNB050064199TPNB050064268

>>
>> > But P10000079, P10000329 are repeated which we don’t want.

>>
>> > resulting file should look like the following

>>
>> > 0001349000174 P00000012B-ABATPNB050062184TPNB050063880 0001349000174
>> > P00000329B-ABATPNB050062184TPNB050063880 0001349000174
>> > P01000079C-BCBTPNB050562934TPNB050446531 0001349000174
>> > P11000079B-ABATPNB050062184TPNB050063880 0001349000174
>> > P60000329C-ACATPNB050064199TPNB050064268 0001349000174
>> > P90000079C-ACATPNB050064199TPNB050064268 I have used the following
>> > script which displays swadmin@tb142:/rangedoms1/working/
>> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
>> > $1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}'
>> > input1 input2
>> > 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880 0001349000174
>> > P10000329 B-ABATPNB050062184TPNB050063880 0001349000174 P10000079
>> > C-BCBTPNB050562934TPNB050446531 0001349000174 P10000079
>> > B-ABATPNB050062184TPNB050063880 0001349000174 P10000329
>> > C-ACATPNB050064199TPNB050064268 0001349000174 P10000079
>> > C-ACATPNB050064199TPNB050064268

>>
>> > But P10000079, P10000329 are repeated which we don’t want.

>>
>> > Many thanks in advance for help being done.

>>
>> > Regards
>> > Injam

>>
>> Â*yup, regards too ;-)
>>
>> ----------
>> $ gawk --re-interval '
>> NR==FNR{T[$2]=$1}
>> NR>FNR{
>> Â* Â* Â* Â* t=gensub(/^(.{13})(.....).*$/,"\\2","")
>> Â* Â* Â* Â* a=gensub(/^(.{13})(.....).*$/,"\\1","")
>> Â* Â* Â* Â* c=gensub(/^(.{13})(.....)(.*$)/,"\\3","")
>>
>> Â* Â* Â* Â* for(i in T){
>> Â* Â* Â* Â* Â* Â* Â* Â* if(t==i){ print a" "T[i]c }
>> Â* Â* Â* Â* }}
>>
>> ' FS="~" OFS="" input1 input2
>> ----------
>>
>> Â*given the samples you gave within your posts it *shoukd* be
>> what the Dr ordered, but...
>>
>> Â*if any problem please give information about: which awk was tested,
>> Â*which LANG/LC_ALL, and provide test files
>> with wanted and used I/O samples ;-)- Hide quoted text -
>>
>> - Show quoted text -

>
> Hi ,
> sorry for so many questions....
> gawk is not installed in my server....and also gensub function is also
> not there in my server.....


OK, too bad you don't have gawk, the 'gensub' is a gawk function,
(corrected gawk solution at the end of this post)

If you can't install gawk your best bet is to follow the hints that
Dave B. gave to you, that'd be interesting to know which awks
and versions you can use and what your OS is (hint, if Solaris
beware of the old awk and use the nawk and/or the extensions one)

> resulting file should look like the following
>
> 0001349000174 P00000012B-ABATPNB050062184TPNB050063880 0001349000174
> P00000329B-ABATPNB050062184TPNB050063880 0001349000174
> P01000079C-BCBTPNB050562934TPNB050446531 0001349000174
> P11000079B-ABATPNB050062184TPNB050063880 0001349000174
> P60000329C-ACATPNB050064199TPNB050064268 0001349000174
> P90000079C-ACATPNB050064199TPNB050064268 I have used the following
> script which displays swadmin@tb142:/rangedoms1/working/
> CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
> $1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}' input1
> input2
> 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880 0001349000174
> P10000329 B-ABATPNB050062184TPNB050063880 0001349000174 P10000079
> C-BCBTPNB050562934TPNB050446531 0001349000174 P10000079
> B-ABATPNB050062184TPNB050063880 0001349000174 P10000329
> C-ACATPNB050064199TPNB050064268 0001349000174 P10000079
> C-ACATPNB050064199TPNB050064268
>
> But P10000079, P10000329 are repeated which we don’t want.
>


OK, I misread your first post about the wanted output, now
just in case you can manage to install gawk (recommended ;-)
the gawk solution would be :
----------------
$ gawk --re-interval '
NR==FNR{T[$2]=T[$2](T[$2]?" ":"")$1}
NR>FNR{
t=gensub(/^(.{13})(.....)(.*$)/,"\\2","")
a=gensub(/^(.{13})(.....)(.*$)/,"\\1","")
c=gensub(/^(.{13})(.....)(.*$)/,"\\3","")

for(i in T){
if(t==i){
g=T[i]
sub(/ .*$/,"",g)
sub(/^[^ ]* /,"",T[i])
print a" "g c }
}
}
' FS="~" OFS=""
----------------

which'd give what you want with the given test set.

Reply With Quote
  #8  
Old 07-03-2008, 07:10 AM
Injam
Guest
 
Default Re: Please help me in writing script

On Jul 3, 1:19*pm, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
wrote:
> Wed, 02 Jul 2008 20:40:50 -0700, Injam did cat*:
>
>
>
>
>
> > On Jul 3, 12:04*am, loki harfagr <l...@theDarkDesign.free.fr> wrote:
> >> On Wed, 02 Jul 2008 07:14:55 -0700, Injam wrote:
> >> > Hi ,
> >> > I have the following two files.The characters 14-18 in file input2
> >> > will match 2nd field deleimited by ~ in file input1.

>
> >> > swadmin@tb142:/rangedoms1/working/
> >> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input1 P00000012~00027
> >> > P00000027~00061
> >> > P00000270~00417
> >> > P00000271~00418
> >> > P00000272~00419
> >> > P00000273~00420
> >> > P00000274~00422
> >> > P00000275~00424
> >> > P00000276~00428
> >> > P00000277~00429
> >> > P00000278~00431
> >> > P00000279~00432
> >> > P00000329~00483
> >> > P60000329~00483
> >> > P50000329~00483
> >> > P40000329~00483
> >> > P30000329~00483
> >> > P20000329~00483
> >> > P10000329~00483
> >> > P00000483~00639
> >> > P01000079~00178
> >> > P11000079~00178
> >> > P90000079~00178
> >> > P80000079~00178
> >> > P70000079~00178
> >> > P60000079~00178
> >> > P50000079~00178
> >> > P40000079~00178
> >> > P30000079~00178
> >> > P20000079~00178
> >> > P10000079~00178
> >> > P00000178~00306
> >> > swadmin@tb142:/rangedoms1/working/
> >> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input2
> >> > 000134900017400027B-ABATPNB050062184TPNB050063880
> >> > 000134900017400483B-ABATPNB050062184TPNB050063880
> >> > 000134900017400178C-BCBTPNB050562934TPNB050446531
> >> > 000134900017400178B-ABATPNB050062184TPNB050063880
> >> > 000134900017400483C-ACATPNB050064199TPNB050064268
> >> > 000134900017400178C-ACATPNB050064199TPNB050064268 Now I want the file
> >> > that contains every line in input2 but the characters 14-18 must be
> >> > replaced with the matching field1 delimited by ~ in file input1 and
> >> > it should not be repeated. I have used the following script which
> >> > displays swadmin@tb142:/rangedoms1/working/
> >> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
> >> > $1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}'
> >> > input1 input2
> >> > 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880 0001349000174
> >> > P10000329 B-ABATPNB050062184TPNB050063880 0001349000174 P10000079
> >> > C-BCBTPNB050562934TPNB050446531 0001349000174 P10000079
> >> > B-ABATPNB050062184TPNB050063880 0001349000174 P10000329
> >> > C-ACATPNB050064199TPNB050064268 0001349000174 P10000079
> >> > C-ACATPNB050064199TPNB050064268

>
> >> > But P10000079, P10000329 are repeated which we don’t want.

>
> >> > resulting file should look like the following

>
> >> > 0001349000174 P00000012B-ABATPNB050062184TPNB050063880 0001349000174
> >> > P00000329B-ABATPNB050062184TPNB050063880 0001349000174
> >> > P01000079C-BCBTPNB050562934TPNB050446531 0001349000174
> >> > P11000079B-ABATPNB050062184TPNB050063880 0001349000174
> >> > P60000329C-ACATPNB050064199TPNB050064268 0001349000174
> >> > P90000079C-ACATPNB050064199TPNB050064268 I have used the following
> >> > script which displays swadmin@tb142:/rangedoms1/working/
> >> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
> >> > $1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}'
> >> > input1 input2
> >> > 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880 0001349000174
> >> > P10000329 B-ABATPNB050062184TPNB050063880 0001349000174 P10000079
> >> > C-BCBTPNB050562934TPNB050446531 0001349000174 P10000079
> >> > B-ABATPNB050062184TPNB050063880 0001349000174 P10000329
> >> > C-ACATPNB050064199TPNB050064268 0001349000174 P10000079
> >> > C-ACATPNB050064199TPNB050064268

>
> >> > But P10000079, P10000329 are repeated which we don’t want.

>
> >> > Many thanks in advance for help being done.

>
> >> > Regards
> >> > Injam

>
> >> *yup, regards too ;-)

>
> >> ----------
> >> $ gawk --re-interval '
> >> NR==FNR{T[$2]=$1}
> >> NR>FNR{
> >> * * * * t=gensub(/^(.{13})(.....).*$/,"\\2","")
> >> * * * * a=gensub(/^(.{13})(.....).*$/,"\\1","")
> >> * * * * c=gensub(/^(.{13})(.....)(.*$)/,"\\3","")

>
> >> * * * * for(i in T){
> >> * * * * * * * * if(t==i){ print a" "T[i]c }
> >> * * * * }}

>
> >> ' FS="~" OFS="" input1 input2
> >> ----------

>
> >> *given the samples you gave within your posts it *shoukd* be
> >> what the Dr ordered, but...

>
> >> *if any problem please give information about: which awk was tested,
> >> *which LANG/LC_ALL, and provide test files
> >> with wanted and used I/O samples ;-)- Hide quoted text -

>
> >> - Show quoted text -

>
> > Hi ,
> > sorry for so many questions....
> > gawk is not installed in my server....and also gensub function is also
> > not there in my server.....

>
> *OK, too bad you don't have gawk, the 'gensub' is a gawk function,
> (corrected gawk solution at the end of this post)
>
> *If you can't install gawk your best bet is to follow the hints that
> Dave B. gave to you, that'd be interesting to know which awks
> and versions you can use and what your OS is (hint, if Solaris
> beware of the old awk and use the nawk and/or the extensions one)
>
>
>
>
>
> > resulting file should look like the following

>
> > 0001349000174 P00000012B-ABATPNB050062184TPNB050063880 0001349000174
> > P00000329B-ABATPNB050062184TPNB050063880 0001349000174
> > P01000079C-BCBTPNB050562934TPNB050446531 0001349000174
> > P11000079B-ABATPNB050062184TPNB050063880 0001349000174
> > P60000329C-ACATPNB050064199TPNB050064268 0001349000174
> > P90000079C-ACATPNB050064199TPNB050064268 I have used the following
> > script which displays swadmin@tb142:/rangedoms1/working/
> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
> > $1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}' input1
> > input2
> > 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880 0001349000174
> > P10000329 B-ABATPNB050062184TPNB050063880 0001349000174 P10000079
> > C-BCBTPNB050562934TPNB050446531 0001349000174 P10000079
> > B-ABATPNB050062184TPNB050063880 0001349000174 P10000329
> > C-ACATPNB050064199TPNB050064268 0001349000174 P10000079
> > C-ACATPNB050064199TPNB050064268

>
> > But P10000079, P10000329 are repeated which we don’t want.

>
> *OK, I misread your first post about the wanted output, now
> just in case you can manage to install gawk (recommended ;-)
> the gawk solution would be :
> ----------------
> $ gawk --re-interval '
> NR==FNR{T[$2]=T[$2](T[$2]?" ":"")$1}
> NR>FNR{
> * * * * t=gensub(/^(.{13})(.....)(.*$)/,"\\2","")
> * * * * a=gensub(/^(.{13})(.....)(.*$)/,"\\1","")
> * * * * c=gensub(/^(.{13})(.....)(.*$)/,"\\3","")
>
> * * * * for(i in T){
> * * * * * *if(t==i){
> * * * * * * * * g=T[i]
> * * * * * * * * sub(/ .*$/,"",g)
> * * * * * * * * sub(/^[^ ]* /,"",T[i])
> * * * * * * * * print a" "g c }
> * * * * }}
>
> ' FS="~" OFS=""
> ----------------
>
> *which'd give what you want with the given test set.- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -


Thanks for the reply...

We tried and it is not possible to install gawk as it requires
approval from so many people.

If you can,give me either awk script or shell script

Many thanks
Injam
Reply With Quote
  #9  
Old 07-03-2008, 10:35 AM
Loki Harfagr
Guest
 
Default Re: Please help me in writing script

Thu, 03 Jul 2008 04:10:00 -0700, Injam did catÂ*:

> On Jul 3, 1:19Â*pm, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
> wrote:
>> Wed, 02 Jul 2008 20:40:50 -0700, Injam did catÂ*:
>>
>>
>>
>>
>>
>> > On Jul 3, 12:04Â*am, loki harfagr <l...@theDarkDesign.free.fr> wrote:
>> >> On Wed, 02 Jul 2008 07:14:55 -0700, Injam wrote:
>> >> > Hi ,
>> >> > I have the following two files.The characters 14-18 in file input2
>> >> > will match 2nd field deleimited by ~ in file input1.

>>
>> >> > swadmin@tb142:/rangedoms1/working/
>> >> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input1
>> >> > P00000012~00027 P00000027~00061
>> >> > P00000270~00417
>> >> > P00000271~00418
>> >> > P00000272~00419
>> >> > P00000273~00420
>> >> > P00000274~00422
>> >> > P00000275~00424
>> >> > P00000276~00428
>> >> > P00000277~00429
>> >> > P00000278~00431
>> >> > P00000279~00432
>> >> > P00000329~00483
>> >> > P60000329~00483
>> >> > P50000329~00483
>> >> > P40000329~00483
>> >> > P30000329~00483
>> >> > P20000329~00483
>> >> > P10000329~00483
>> >> > P00000483~00639
>> >> > P01000079~00178
>> >> > P11000079~00178
>> >> > P90000079~00178
>> >> > P80000079~00178
>> >> > P70000079~00178
>> >> > P60000079~00178
>> >> > P50000079~00178
>> >> > P40000079~00178
>> >> > P30000079~00178
>> >> > P20000079~00178
>> >> > P10000079~00178
>> >> > P00000178~00306
>> >> > swadmin@tb142:/rangedoms1/working/
>> >> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input2
>> >> > 000134900017400027B-ABATPNB050062184TPNB050063880
>> >> > 000134900017400483B-ABATPNB050062184TPNB050063880
>> >> > 000134900017400178C-BCBTPNB050562934TPNB050446531
>> >> > 000134900017400178B-ABATPNB050062184TPNB050063880
>> >> > 000134900017400483C-ACATPNB050064199TPNB050064268
>> >> > 000134900017400178C-ACATPNB050064199TPNB050064268 Now I want the
>> >> > file that contains every line in input2 but the characters 14-18
>> >> > must be replaced with the matching field1 delimited by ~ in file
>> >> > input1 and it should not be repeated. I have used the following
>> >> > script which displays swadmin@tb142:/rangedoms1/working/
>> >> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~'
>> >> > 'NR==FNR{a[$2]= $1;next}{print
>> >> > substr($0,1,13),a[substr($0,14,5)],substr($0,19)}' input1 input2
>> >> > 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880
>> >> > 0001349000174 P10000329 B-ABATPNB050062184TPNB050063880
>> >> > 0001349000174 P10000079 C-BCBTPNB050562934TPNB050446531
>> >> > 0001349000174 P10000079 B-ABATPNB050062184TPNB050063880
>> >> > 0001349000174 P10000329 C-ACATPNB050064199TPNB050064268
>> >> > 0001349000174 P10000079 C-ACATPNB050064199TPNB050064268

>>
>> >> > But P10000079, P10000329 are repeated which we don’t want.

>>
>> >> > resulting file should look like the following

>>
>> >> > 0001349000174 P00000012B-ABATPNB050062184TPNB050063880
>> >> > 0001349000174 P00000329B-ABATPNB050062184TPNB050063880
>> >> > 0001349000174 P01000079C-BCBTPNB050562934TPNB050446531
>> >> > 0001349000174 P11000079B-ABATPNB050062184TPNB050063880
>> >> > 0001349000174 P60000329C-ACATPNB050064199TPNB050064268
>> >> > 0001349000174 P90000079C-ACATPNB050064199TPNB050064268 I have used
>> >> > the following script which displays
>> >> > swadmin@tb142:/rangedoms1/working/
>> >> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~'
>> >> > 'NR==FNR{a[$2]= $1;next}{print
>> >> > substr($0,1,13),a[substr($0,14,5)],substr($0,19)}' input1 input2
>> >> > 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880
>> >> > 0001349000174 P10000329 B-ABATPNB050062184TPNB050063880
>> >> > 0001349000174 P10000079 C-BCBTPNB050562934TPNB050446531
>> >> > 0001349000174 P10000079 B-ABATPNB050062184TPNB050063880
>> >> > 0001349000174 P10000329 C-ACATPNB050064199TPNB050064268
>> >> > 0001349000174 P10000079 C-ACATPNB050064199TPNB050064268

>>
>> >> > But P10000079, P10000329 are repeated which we don’t want.

>>
>> >> > Many thanks in advance for help being done.

>>
>> >> > Regards
>> >> > Injam

>>
>> >> Â*yup, regards too ;-)

>>
>> >> ----------
>> >> $ gawk --re-interval '
>> >> NR==FNR{T[$2]=$1}
>> >> NR>FNR{
>> >> Â* Â* Â* Â* t=gensub(/^(.{13})(.....).*$/,"\\2","")
>> >> Â* Â* Â* Â* a=gensub(/^(.{13})(.....).*$/,"\\1","")
>> >> Â* Â* Â* Â* c=gensub(/^(.{13})(.....)(.*$)/,"\\3","")

>>
>> >> Â* Â* Â* Â* for(i in T){
>> >> Â* Â* Â* Â* Â* Â* Â* Â* if(t==i){ print a" "T[i]c }
>> >> Â* Â* Â* Â* }}

>>
>> >> ' FS="~" OFS="" input1 input2
>> >> ----------

>>
>> >> Â*given the samples you gave within your posts it *shoukd* be
>> >> what the Dr ordered, but...

>>
>> >> Â*if any problem please give information about: which awk was
>> >> Â*tested, which LANG/LC_ALL, and provide test files
>> >> with wanted and used I/O samples ;-)- Hide quoted text -

>>
>> >> - Show quoted text -

>>
>> > Hi ,
>> > sorry for so many questions....
>> > gawk is not installed in my server....and also gensub function is
>> > also not there in my server.....

>>
>> Â*OK, too bad you don't have gawk, the 'gensub' is a gawk function,
>> (corrected gawk solution at the end of this post)
>>
>> Â*If you can't install gawk your best bet is to follow the hints that
>> Dave B. gave to you, that'd be interesting to know which awks and
>> versions you can use and what your OS is (hint, if Solaris beware of
>> the old awk and use the nawk and/or the extensions one)
>>
>>
>>
>>
>>
>> > resulting file should look like the following

>>
>> > 0001349000174 P00000012B-ABATPNB050062184TPNB050063880 0001349000174
>> > P00000329B-ABATPNB050062184TPNB050063880 0001349000174
>> > P01000079C-BCBTPNB050562934TPNB050446531 0001349000174
>> > P11000079B-ABATPNB050062184TPNB050063880 0001349000174
>> > P60000329C-ACATPNB050064199TPNB050064268 0001349000174
>> > P90000079C-ACATPNB050064199TPNB050064268 I have used the following
>> > script which displays swadmin@tb142:/rangedoms1/working/
>> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
>> > $1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}'
>> > input1 input2
>> > 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880 0001349000174
>> > P10000329 B-ABATPNB050062184TPNB050063880 0001349000174 P10000079
>> > C-BCBTPNB050562934TPNB050446531 0001349000174 P10000079
>> > B-ABATPNB050062184TPNB050063880 0001349000174 P10000329
>> > C-ACATPNB050064199TPNB050064268 0001349000174 P10000079
>> > C-ACATPNB050064199TPNB050064268

>>
>> > But P10000079, P10000329 are repeated which we don’t want.

>>
>> Â*OK, I misread your first post about the wanted output, now
>> just in case you can manage to install gawk (recommended ;-) the gawk
>> solution would be :
>> ----------------
>> $ gawk --re-interval '
>> NR==FNR{T[$2]=T[$2](T[$2]?" ":"")$1}
>> NR>FNR{
>> Â* Â* Â* Â* t=gensub(/^(.{13})(.....)(.*$)/,"\\2","")
>> Â* Â* Â* Â* a=gensub(/^(.{13})(.....)(.*$)/,"\\1","")
>> Â* Â* Â* Â* c=gensub(/^(.{13})(.....)(.*$)/,"\\3","")
>>
>> Â* Â* Â* Â* for(i in T){
>> Â* Â* Â* Â* Â* Â*if(t==i){
>> Â* Â* Â* Â* Â* Â* Â* Â* g=T[i]
>> Â* Â* Â* Â* Â* Â* Â* Â* sub(/ .*$/,"",g)
>> Â* Â* Â* Â* Â* Â* Â* Â* sub(/^[^ ]* /,"",T[i])
>> Â* Â* Â* Â* Â* Â* Â* Â* print a" "g c }
>> Â* Â* Â* Â* }}
>>
>> ' FS="~" OFS=""
>> ----------------
>>
>> Â*which'd give what you want with the given test set.- Hide quoted text
>> Â*-
>>
>> - Show quoted text -- Hide quoted text -
>>
>> - Show quoted text -

>
> Thanks for the reply...
>
> We tried and it is not possible to install gawk as it requires approval
> from so many people.
>
> If you can,give me either awk script or shell script
>
> Many thanks
> Injam


Didn't you try, as I said, the hints given by Dave B. ?
I suppose this would work with plain awk (not b0rken old Solaris awk)
and nawk:
-------------
$ awk -F'~' '
NR==FNR{
a[$2]=a[$2] $1
next
}
{ k=substr($0,14,5)
v=substr(a[k],1,9)
sub(/^........./,"",a[k])
print substr($0,1,13)" "v substr($0,19)
}
' input1 input2
-------------

that's Dave B. solution with only two small 'adaptions':
1. rid off the --posix switch as you're not in gawk
2. replace the 'interval' expression {9} with plain ...

And... I really wonder what your awk is that you don't want to say,
it has been asked at least three times ;D)
Reply With Quote
  #10  
Old 07-08-2008, 06:51 AM
Injam
Guest
 
Default Re: Please help me in writing script

On Jul 3, 7:35*pm, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
wrote:
> Thu, 03 Jul 2008 04:10:00 -0700, Injam did cat*:
>
>
>
>
>
> > On Jul 3, 1:19*pm, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
> > wrote:
> >> Wed, 02 Jul 2008 20:40:50 -0700, Injam did cat*:

>
> >> > On Jul 3, 12:04*am, loki harfagr <l...@theDarkDesign.free.fr> wrote:
> >> >> On Wed, 02 Jul 2008 07:14:55 -0700, Injam wrote:
> >> >> > Hi ,
> >> >> > I have the following two files.The characters 14-18 in file input2
> >> >> > will match 2nd field deleimited by ~ in file input1.

>
> >> >> > swadmin@tb142:/rangedoms1/working/
> >> >> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input1
> >> >> > P00000012~00027 P00000027~00061
> >> >> > P00000270~00417
> >> >> > P00000271~00418
> >> >> > P00000272~00419
> >> >> > P00000273~00420
> >> >> > P00000274~00422
> >> >> > P00000275~00424
> >> >> > P00000276~00428
> >> >> > P00000277~00429
> >> >> > P00000278~00431
> >> >> > P00000279~00432
> >> >> > P00000329~00483
> >> >> > P60000329~00483
> >> >> > P50000329~00483
> >> >> > P40000329~00483
> >> >> > P30000329~00483
> >> >> > P20000329~00483
> >> >> > P10000329~00483
> >> >> > P00000483~00639
> >> >> > P01000079~00178
> >> >> > P11000079~00178
> >> >> > P90000079~00178
> >> >> > P80000079~00178
> >> >> > P70000079~00178
> >> >> > P60000079~00178
> >> >> > P50000079~00178
> >> >> > P40000079~00178
> >> >> > P30000079~00178
> >> >> > P20000079~00178
> >> >> > P10000079~00178
> >> >> > P00000178~00306
> >> >> > swadmin@tb142:/rangedoms1/working/
> >> >> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV> cat input2
> >> >> > 000134900017400027B-ABATPNB050062184TPNB050063880
> >> >> > 000134900017400483B-ABATPNB050062184TPNB050063880
> >> >> > 000134900017400178C-BCBTPNB050562934TPNB050446531
> >> >> > 000134900017400178B-ABATPNB050062184TPNB050063880
> >> >> > 000134900017400483C-ACATPNB050064199TPNB050064268
> >> >> > 000134900017400178C-ACATPNB050064199TPNB050064268 Now I want the
> >> >> > file that contains every line in input2 but the characters 14-18
> >> >> > must be replaced with the matching field1 delimited by ~ in file
> >> >> > input1 and it should not be repeated. I have used the following
> >> >> > script which displays swadmin@tb142:/rangedoms1/working/
> >> >> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~'
> >> >> > 'NR==FNR{a[$2]= $1;next}{print
> >> >> > substr($0,1,13),a[substr($0,14,5)],substr($0,19)}' input1 input2
> >> >> > 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880
> >> >> > 0001349000174 P10000329 B-ABATPNB050062184TPNB050063880
> >> >> > 0001349000174 P10000079 C-BCBTPNB050562934TPNB050446531
> >> >> > 0001349000174 P10000079 B-ABATPNB050062184TPNB050063880
> >> >> > 0001349000174 P10000329 C-ACATPNB050064199TPNB050064268
> >> >> > 0001349000174 P10000079 C-ACATPNB050064199TPNB050064268

>
> >> >> > But P10000079, P10000329 are repeated which we don’t want.

>
> >> >> > resulting file should look like the following

>
> >> >> > 0001349000174 P00000012B-ABATPNB050062184TPNB050063880
> >> >> > 0001349000174 P00000329B-ABATPNB050062184TPNB050063880
> >> >> > 0001349000174 P01000079C-BCBTPNB050562934TPNB050446531
> >> >> > 0001349000174 P11000079B-ABATPNB050062184TPNB050063880
> >> >> > 0001349000174 P60000329C-ACATPNB050064199TPNB050064268
> >> >> > 0001349000174 P90000079C-ACATPNB050064199TPNB050064268 I have used
> >> >> > the following script which displays
> >> >> > swadmin@tb142:/rangedoms1/working/
> >> >> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~'
> >> >> > 'NR==FNR{a[$2]= $1;next}{print
> >> >> > substr($0,1,13),a[substr($0,14,5)],substr($0,19)}' input1 input2
> >> >> > 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880
> >> >> > 0001349000174 P10000329 B-ABATPNB050062184TPNB050063880
> >> >> > 0001349000174 P10000079 C-BCBTPNB050562934TPNB050446531
> >> >> > 0001349000174 P10000079 B-ABATPNB050062184TPNB050063880
> >> >> > 0001349000174 P10000329 C-ACATPNB050064199TPNB050064268
> >> >> > 0001349000174 P10000079 C-ACATPNB050064199TPNB050064268

>
> >> >> > But P10000079, P10000329 are repeated which we don’t want.

>
> >> >> > Many thanks in advance for help being done.

>
> >> >> > Regards
> >> >> > Injam

>
> >> >> *yup, regards too ;-)

>
> >> >> ----------
> >> >> $ gawk --re-interval '
> >> >> NR==FNR{T[$2]=$1}
> >> >> NR>FNR{
> >> >> * * * * t=gensub(/^(.{13})(.....).*$/,"\\2","")
> >> >> * * * * a=gensub(/^(.{13})(.....).*$/,"\\1","")
> >> >> * * * * c=gensub(/^(.{13})(.....)(.*$)/,"\\3","")

>
> >> >> * * * * for(i in T){
> >> >> * * * * * * * * if(t==i){ print a" "T[i]c }
> >> >> * * * * }}

>
> >> >> ' FS="~" OFS="" input1 input2
> >> >> ----------

>
> >> >> *given the samples you gave within your posts it *shoukd* be
> >> >> what the Dr ordered, but...

>
> >> >> *if any problem please give information about: which awk was
> >> >> *tested, which LANG/LC_ALL, and provide test files
> >> >> with wanted and used I/O samples ;-)- Hide quoted text -

>
> >> >> - Show quoted text -

>
> >> > Hi ,
> >> > sorry for so many questions....
> >> > gawk is not installed in my server....and also gensub function is
> >> > also not there in my server.....

>
> >> *OK, too bad you don't have gawk, the 'gensub' is a gawk function,
> >> (corrected gawk solution at the end of this post)

>
> >> *If you can't install gawk your best bet is to follow the hints that
> >> Dave B. gave to you, that'd be interesting to know which awks and
> >> versions you can use and what your OS is (hint, if Solaris beware of
> >> the old awk and use the nawk and/or the extensions one)

>
> >> > resulting file should look like the following

>
> >> > 0001349000174 P00000012B-ABATPNB050062184TPNB050063880 0001349000174
> >> > P00000329B-ABATPNB050062184TPNB050063880 0001349000174
> >> > P01000079C-BCBTPNB050562934TPNB050446531 0001349000174
> >> > P11000079B-ABATPNB050062184TPNB050063880 0001349000174
> >> > P60000329C-ACATPNB050064199TPNB050064268 0001349000174
> >> > P90000079C-ACATPNB050064199TPNB050064268 I have used the following
> >> > script which displays swadmin@tb142:/rangedoms1/working/
> >> > CRST_OVERLAY_ENHANCE_Analysis_RNGCTRL_DEV>awk -F'~' 'NR==FNR{a[$2]=
> >> > $1;next}{print substr($0,1,13),a[substr($0,14,5)],substr($0,19)}'
> >> > input1 input2
> >> > 0001349000174 P00000012 B-ABATPNB050062184TPNB050063880 0001349000174
> >> > P10000329 B-ABATPNB050062184TPNB050063880 0001349000174 P10000079
> >> > C-BCBTPNB050562934TPNB050446531 0001349000174 P10000079
> >> > B-ABATPNB050062184TPNB050063880 0001349000174 P10000329
> >> > C-ACATPNB050064199TPNB050064268 0001349000174 P10000079
> >> > C-ACATPNB050064199TPNB050064268

>
> >> > But P10000079, P10000329 are repeated which we don’t want.

>
> >> *OK, I misread your first post about the wanted output, now
> >> just in case you can manage to install gawk (recommended ;-) the gawk
> >> solution would be :
> >> ----------------
> >> $ gawk --re-interval '
> >> NR==FNR{T[$2]=T[$2](T[$2]?" ":"")$1}
> >> NR>FNR{
> >> * * * * t=gensub(/^(.{13})(.....)(.*$)/,"\\2","")
> >> * * * * a=gensub(/^(.{13})(.....)(.*$)/,"\\1","")
> >> * * * * c=gensub(/^(.{13})(.....)(.*$)/,"\\3","")

>
> >> * * * * for(i in T){
> >> * * * * * *if(t==i){
> >> * * * * * * * * g=T[i]
> >> * * * * * * * * sub(/ .*$/,"",g)
> >> * * * * * * * * sub(/^[^ ]* /,"",T[i])
> >> * * * * * * * * print a" "g c }
> >> * * * * }}

>
> >> ' FS="~" OFS=""
> >> ----------------

>
> >> *which'd give what you want with the given test set.- Hide quoted text
> >> *-

>
> >> - Show quoted text -- Hide quoted text -

>
> >> - Show quoted text -

>
> > Thanks for the reply...

>
> > We tried and it is not possible to install gawk as it requires approval
> > from so many people.

>
> > If you can,give me either awk script or shell script

>
> > Many thanks
> > Injam

>
> *Didn't you try, as I said, the hints given by Dave B. ?
> *I suppose this would work with plain awk (not b0rken old Solaris awk)
> and nawk:
> -------------
> $ awk *-F'~' '
> NR==FNR{
> * * * * a[$2]=a[$2] $1
> * * * * next}
>
> { * * * k=substr($0,14,5)
> * * * * v=substr(a[k],1,9)
> * * * * sub(/^........./,"",a[k])
> * * * * print substr($0,1,13)" "v substr($0,19)}
>
> ' input1 input2
> -------------
>
> *that's Dave B. solution with only two small 'adaptions':
> 1. rid off the --posix switch as you're not in gawk
> 2. replace the 'interval' expression {9} with plain ...
>
> And... I really wonder what your awk is that you don't want to say,
> it has been asked at least three times ;D)- Hide quoted text -
>
> - Show quoted text -


Thank you...It's working fine.


Many thanks
Injam
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 02:11 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.