awk chokes on nul characters in text files

This is a discussion on awk chokes on nul characters in text files within the awk forums in Programming Languages category; Due to an unidentified file format translation problem, many of my data files contain nonprintable nul characters, and such files cannnot be processed by awk without re-saving the files (from vi, for example). What I see when reading such files into vi is the info on the bottom line: "temp/file0001" 277 lines, 33937 characters (23 null, 0 non-Ascii) and the offending line is displayed like this: ^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?234 0 1 597569.2300 Is there a simple way to tell awk to disregard such characters when reading and processing such files? I realize that text files containing nul characters are not true text ...

Go Back   Application Development Forum > Programming Languages > awk

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 07-30-2008, 12:12 PM
z.entropic
Guest
 
Default awk chokes on nul characters in text files

Due to an unidentified file format translation problem, many of my
data files contain nonprintable nul characters, and such files cannnot
be processed by awk without re-saving the files (from vi, for
example). What I see when reading such files into vi is the info on
the bottom line:

"temp/file0001" 277 lines, 33937 characters (23 null, 0 non-Ascii)

and the offending line is displayed like this:

^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?234 0
1 597569.2300

Is there a simple way to tell awk to disregard such characters when
reading and processing such files?

I realize that text files containing nul characters are not true text
files, but...

z.entropic
Reply With Quote
  #2  
Old 07-30-2008, 02:39 PM
Grant
Guest
 
Default Re: awk chokes on nul characters in text files

On Wed, 30 Jul 2008 09:12:47 -0700 (PDT), "z.entropic" <subPlanck@excite.com> wrote:

>Due to an unidentified file format translation problem, many of my
>data files contain nonprintable nul characters, and such files cannnot
>be processed by awk without re-saving the files (from vi, for
>example). What I see when reading such files into vi is the info on
>the bottom line:
>
>"temp/file0001" 277 lines, 33937 characters (23 null, 0 non-Ascii)
>
>and the offending line is displayed like this:
>
>^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?2 34 0
>1 597569.2300
>
>Is there a simple way to tell awk to disregard such characters when
>reading and processing such files?


awk doesn't like them Run the files through tr?

Grant.
--
http://bugsplatter.mine.nu/
Reply With Quote
  #3  
Old 07-30-2008, 04:05 PM
Ted Davis
Guest
 
Default Re: awk chokes on nul characters in text files

On Wed, 30 Jul 2008 09:12:47 -0700, z.entropic wrote:

> Due to an unidentified file format translation problem, many of my data
> files contain nonprintable nul characters, and such files cannnot be
> processed by awk without re-saving the files (from vi, for example). What
> I see when reading such files into vi is the info on the bottom line:
>
> "temp/file0001" 277 lines, 33937 characters (23 null, 0 non-Ascii)
>
> and the offending line is displayed like this:
>
> ^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?^?234 0 1
> 597569.2300
>
> Is there a simple way to tell awk to disregard such characters when
> reading and processing such files?
>
> I realize that text files containing nul characters are not true text
> files, but...


gawk doesn't seem to be bothered by them - it just passes them.

test program was "{print $0}"

Without knowing what you were doing, I can only guess it's version
specific. I tested 3.1.3 and 3.1.6 for Windows.

--
T.E.D. (tdavis@mst.edu)


Reply With Quote
  #4  
Old 07-31-2008, 12:13 PM
z.entropic
Guest
 
Default Re: awk chokes on nul characters in text files

On Jul 30, 4:05*pm, Ted Davis <tda...@mst.edu> wrote:
> gawk doesn't seem to be bothered by them - it just passes them.
>
> *test program was "{print $0}"
>
> Without knowing what you were doing, I can only guess it's version
> specific. I tested 3.1.3 and 3.1.6 for Windows.


I've been looking around for a precompiled, latest version of gawk for
DOS/Windows, but somehow couldn't find it... Any suggestions/links?
I do not have a compiler installed.

z.entropic
Reply With Quote
  #5  
Old 07-31-2008, 01:47 PM
pop
Guest
 
Default Re: awk chokes on nul characters in text files

z.entropic said the following on 7/31/2008 11:13 AM:
> On Jul 30, 4:05 pm, Ted Davis <tda...@mst.edu> wrote:
>> gawk doesn't seem to be bothered by them - it just passes them.
>>
>> test program was "{print $0}"
>>
>> Without knowing what you were doing, I can only guess it's version
>> specific. I tested 3.1.3 and 3.1.6 for Windows.

>
> I've been looking around for a precompiled, latest version of gawk for
> DOS/Windows, but somehow couldn't find it... Any suggestions/links?
> I do not have a compiler installed.
>
> z.entropic

Latest source and compiled binary:
http://sourceforge.net/project/showf...ckage_id=16431
or use this tiny URL:
http://tinyurl.com/6882x2


--
(^\pop/^)
I'm lost... I've gone to look for myself.
If I should return before I get back, keep me here.
--
Reply With Quote
  #6  
Old 07-31-2008, 04:24 PM
Ted Davis
Guest
 
Default Re: awk chokes on nul characters in text files

On Thu, 31 Jul 2008 09:13:13 -0700, z.entropic wrote:

>
> I've been looking around for a precompiled, latest version of gawk for
> DOS/Windows, but somehow couldn't find it... Any suggestions/links? I do
> not have a compiler installed.


The standard place for all things GNU for Windows is
<http://gnuwin32.sourceforge.net/packages.html> (well, most things).

The gawk specific link is
<http://gnuwin32.sourceforge.net/packages/gawk.htm>

--
T.E.D. (tdavis@mst.edu)


Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 02:43 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.