StreamReader and NULL characters : DOTNET
This is a discussion on StreamReader and NULL characters within the DOTNET forums in Framework and Interface Programming category; I am trying to read a file that was produced on a mainframe, it is a text output of a mainframe report. I am able to read in part of the file, the problem is that between each "page" of the report there is a NULL character. The streamreader will read up until it sees this character then it stops, the end result being that most of the file does not get read. Here is the byte pattern that is at the end of each page: 0D 0A 00 0C 0D 0A Is there an easy way to read in ...
| DOTNET Discussion forums related to Microsoft Dot net technologies, CSharp and other related items |
![]() |
| | LinkBack | Thread Tools |
|
#1
| |||
| |||
| text output of a mainframe report. I am able to read in part of the file, the problem is that between each "page" of the report there is a NULL character. The streamreader will read up until it sees this character then it stops, the end result being that most of the file does not get read. Here is the byte pattern that is at the end of each page: 0D 0A 00 0C 0D 0A Is there an easy way to read in this file without having to put it into a byte array and futzing with all of that? I have tried the following: StreamReader re = File.OpenText(SourceFile); string sTemp = re.ReadToEnd(); And: StreamReader re = File.OpenText(SourceFile); while((sTemp = re.ReadLine()) != null) { sLine = sTemp; //etc... } Thanks, Brian brian.gabriel@gmail.com |
|
#2
| |||
| |||
| NULL chars create a problem. One way to tackle this is to work with command line tools to change the NULL char into another char, or even separate into pages. This works well if the application in question is loading the data; it is not a good solution for an application that reads the file(s) every time someone uses a specific function, however. You might be able to use a FileStream and pull the data in as binary. If nothing else, you can look for the byte pattern 13-10-0-12-13-10 and replace it with something like 2 CRLFs (13-10). The cleaned file can be read through the StreamReader. You can also keep everything binary, if you feel comfortable in that world. :-) -- Gregory A. Beamer MVP, MCP: +I, SE, SD, DBA Subscribe to my blog http://gregorybeamer.spaces.live.com/lists/feed.rss or just read it: http://gregorybeamer.spaces.live.com/ ************************************************* | Think outside the box! | ************************************************* <brian.gabriel@gmail.com> wrote in message news:eaabd428-4c6a-4f0f-8dc5-c4ba9218f6f9@t54g2000hsg.googlegroups.com... >I am trying to read a file that was produced on a mainframe, it is a > text output of a mainframe report. I am able to read in part of the > file, the problem is that between each "page" of the report there is a > NULL character. The streamreader will read up until it sees this > character then it stops, the end result being that most of the file > does not get read. > > Here is the byte pattern that is at the end of each page: 0D 0A 00 0C > 0D 0A > > Is there an easy way to read in this file without having to put it > into a byte array and futzing with all of that? > > I have tried the following: > > StreamReader re = File.OpenText(SourceFile); > string sTemp = re.ReadToEnd(); > > And: > > StreamReader re = File.OpenText(SourceFile); > while((sTemp = re.ReadLine()) != null) > { > sLine = sTemp; > //etc... > } > > Thanks, > > Brian > brian.gabriel@gmail.com |
|
#3
| |||
| |||
| From my mainframe days I can see that what you have there is Ebcdic, 0D = CR, 0A = LF, 0C = form feed. I'm wondering if this means that you could use the StreamReader ctor that takes an Encoding for an 8 bit Ebcdic and it would just work. Anyway, knowing that this is Ebcdic CR LF FF might help. -- Phil Wilson [MVP Windows Installer] <brian.gabriel@gmail.com> wrote in message news:eaabd428-4c6a-4f0f-8dc5-c4ba9218f6f9@t54g2000hsg.googlegroups.com... >I am trying to read a file that was produced on a mainframe, it is a > text output of a mainframe report. I am able to read in part of the > file, the problem is that between each "page" of the report there is a > NULL character. The streamreader will read up until it sees this > character then it stops, the end result being that most of the file > does not get read. > > Here is the byte pattern that is at the end of each page: 0D 0A 00 0C > 0D 0A > > Is there an easy way to read in this file without having to put it > into a byte array and futzing with all of that? > > I have tried the following: > > StreamReader re = File.OpenText(SourceFile); > string sTemp = re.ReadToEnd(); > > And: > > StreamReader re = File.OpenText(SourceFile); > while((sTemp = re.ReadLine()) != null) > { > sLine = sTemp; > //etc... > } > > Thanks, > > Brian > brian.gabriel@gmail.com |
|
#4
| |||
| |||
| Those are ASCII control codes, copied into ANSI and ISO code pages. In EBCDIC, 0x0D = CR but no one used it, 0x25 = LF but no one used it, 0x15 = NL (newline) and it was occasionally used, and 0x0C = FF and I don't know if anyone used it. So the original poster receives data that have already been converted from EBCDIC to ASCII, but that's not the problem. The problem is that StreamReader chokes when it hits a 0x00 (NUL in both ASCII and EBCDIC). "Phil Wilson" <phil.wilson@wonderware.something.com> wrote in message news:%23wPv514iIHA.2304@TK2MSFTNGP05.phx.gbl... > From my mainframe days I can see that what you have there is Ebcdic, 0D = > CR, 0A = LF, 0C = form feed. I'm wondering if this means that you could > use the StreamReader ctor that takes an Encoding for an 8 bit Ebcdic and > it would just work. Anyway, knowing that this is Ebcdic CR LF FF might > help. > -- > Phil Wilson > [MVP Windows Installer] > > <brian.gabriel@gmail.com> wrote in message > news:eaabd428-4c6a-4f0f-8dc5-c4ba9218f6f9@t54g2000hsg.googlegroups.com... >>I am trying to read a file that was produced on a mainframe, it is a >> text output of a mainframe report. I am able to read in part of the >> file, the problem is that between each "page" of the report there is a >> NULL character. The streamreader will read up until it sees this >> character then it stops, the end result being that most of the file >> does not get read. >> >> Here is the byte pattern that is at the end of each page: 0D 0A 00 0C >> 0D 0A >> >> Is there an easy way to read in this file without having to put it >> into a byte array and futzing with all of that? >> >> I have tried the following: >> >> StreamReader re = File.OpenText(SourceFile); >> string sTemp = re.ReadToEnd(); >> >> And: >> >> StreamReader re = File.OpenText(SourceFile); >> while((sTemp = re.ReadLine()) != null) >> { >> sLine = sTemp; >> //etc... >> } >> >> Thanks, >> >> Brian >> brian.gabriel@gmail.com > > |
|
#5
| |||
| |||
| Norman Diamond <ndiamond@newsgroup.nospam> wrote: > Those are ASCII control codes, copied into ANSI and ISO code pages. > > In EBCDIC, 0x0D = CR but no one used it, 0x25 = LF but no one used it, 0x15 > = NL (newline) and it was occasionally used, and 0x0C = FF and I don't know > if anyone used it. > > So the original poster receives data that have already been converted from > EBCDIC to ASCII, but that's not the problem. The problem is that > StreamReader chokes when it hits a 0x00 (NUL in both ASCII and EBCDIC). StreamReader doesn't choke on null characters. Here's an example: using System; using System.IO; using System.Text; class Test { static void Main(string[] args) { // A NUL B NUL C byte[] data = { 65, 0, 66, 0, 67 }; using (MemoryStream stream = new MemoryStream(data)) using (StreamReader reader = new StreamReader (stream, Encoding.ASCII)) { string line = reader.ReadLine(); for (int i=0; i < line.Length; i++) { Console.WriteLine("{0}: {1}", i, line[i]=='\0' ? "NUL" : line[i].ToString()); } } } } -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet World class .NET training in the UK: http://iterativetraining.co.uk |
|
#6
| |||
| |||
| He was choking on NULL char due to the way he was looping: while (!null == (line = reader.ReadLine())) { } As soon as he hits a null char, he gets and end of read. I use this type of loop, as well, as it is rather simple, but it will choke on (char) 0. A method I employed, at one time, for EBCDIC, is running everything binary until it needed to be text. Much more efficient to stay in the binary world, from a perf perspective, but also harder to program, as we do not think in 0s and 1s. :-) -- Gregory A. Beamer MVP, MCP: +I, SE, SD, DBA Subscribe to my blog http://gregorybeamer.spaces.live.com/lists/feed.rss or just read it: http://gregorybeamer.spaces.live.com/ ************************************************* | Think outside the box! | ************************************************* "Jon Skeet [C# MVP]" <skeet@pobox.com> wrote in message news:MPG.22518e1d8d775e8bbc2@msnews.microsoft.com. .. > Norman Diamond <ndiamond@newsgroup.nospam> wrote: >> Those are ASCII control codes, copied into ANSI and ISO code pages. >> >> In EBCDIC, 0x0D = CR but no one used it, 0x25 = LF but no one used it, >> 0x15 >> = NL (newline) and it was occasionally used, and 0x0C = FF and I don't >> know >> if anyone used it. >> >> So the original poster receives data that have already been converted >> from >> EBCDIC to ASCII, but that's not the problem. The problem is that >> StreamReader chokes when it hits a 0x00 (NUL in both ASCII and EBCDIC). > > StreamReader doesn't choke on null characters. Here's an example: > > using System; > using System.IO; > using System.Text; > > class Test > { > static void Main(string[] args) > { > // A NUL B NUL C > byte[] data = { 65, 0, 66, 0, 67 }; > > using (MemoryStream stream = new MemoryStream(data)) > using (StreamReader reader = new StreamReader > (stream, Encoding.ASCII)) > { > string line = reader.ReadLine(); > for (int i=0; i < line.Length; i++) > { > Console.WriteLine("{0}: {1}", i, > line[i]=='\0' > ? "NUL" > : line[i].ToString()); > } > } > } > } > > -- > Jon Skeet - <skeet@pobox.com> > http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet > World class .NET training in the UK: http://iterativetraining.co.uk |
|
#7
| |||
| |||
| Cowboy (Gregory A. Beamer) <NoSpamMgbworld@comcast.netNoSpamM> wrote: > He was choking on NULL char due to the way he was looping: > > while (!null == (line = reader.ReadLine())) > { > } > > As soon as he hits a null char, he gets and end of read. He shouldn't do - my example shows two null characters been embedded in a line read with a single call of ReadLine. > I use this type of > loop, as well, as it is rather simple, but it will choke on (char) 0. Could you give an example of this, bearing mind my earlier example? Here's another example using the OP's sample data - again, it doesn't show StreamReader failing with null characters: using System; using System.IO; using System.Text; class Test { static void Main(string[] args) { // A B byte[] data = { 65, 66, 0x0d, 0x0a, 0, 0x0c, 0x0d, 0x0a, 67, 68}; using (MemoryStream stream = new MemoryStream(data)) using (StreamReader reader = new StreamReader(stream)) { string line; while ((line=reader.ReadLine()) != null) { Console.WriteLine ("Next line:"); for (int i=0; i < line.Length; i++) { Console.WriteLine("{0}: {1}", i, line[i]=='\0' ? "NUL" : line[i].ToString()); } } } } } > A method I employed, at one time, for EBCDIC, is running everything binary > until it needed to be text. Much more efficient to stay in the binary world, > from a perf perspective, but also harder to program, as we do not think in > 0s and 1s. :-) It also means that *you* need to worry about all the nasty issues of text handling rather than getting the framework to do it. -- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet World class .NET training in the UK: http://iterativetraining.co.uk |
|
#8
| |||
| |||
| Thanks Jon! After seeing your example I created a much small text file with the offending characters and was able to better debug the issue. The issue is not with the StreamReader, but with displaying the results in a text box. A simple Replace removed the offending characters and everything is displaying properly. Again thanks for the help. Brian brian.gabriel@gmail.com On Mar 24, 9:20*am, Jon Skeet [C# MVP] <sk...@pobox.com> wrote: > Could you give an example of this, bearing mind my earlier example? > Here's another example using the OP's sample data - again, it doesn't > show StreamReader failing with null characters: > > using System; > using System.IO; > using System.Text; > > class Test > { > * * static void Main(string[] args) > * * { > * * * * // A B > * * * * byte[] data = { 65, 66, 0x0d, 0x0a, 0, > * * * * * * * * 0x0c, 0x0d, 0x0a, 67, 68}; > > * * * * using (MemoryStream stream = new MemoryStream(data)) > * * * * using (StreamReader reader = new StreamReader(stream)) > * * * * { > * * * * * * string line; > * * * * * * while ((line=reader.ReadLine()) != null) > * * * * * * { > * * * * * * * * Console.WriteLine ("Next line:"); > * * * * * * * * for (int i=0; i < line.Length; i++) > * * * * * * * * { > * * * * * * * * * * Console.WriteLine("{0}: {1}", i, > * * * * * * * * * * * * * * * * * * * line[i]=='\0' > * * * * * * * * * * * * * * * * * * * ? "NUL" > * * * * * * * * * * * * * * * * * * * : line[i].ToString()); > * * * * * * * * } > * * * * * * } > * * * * } > * * } > > } > > A method I employed, at one time, for EBCDIC, is running everything binary > > until it needed to be text. Much more efficient to stay in the binary world, > > from a perf perspective, but also harder to program, as we do not think in > > 0s and 1s. :-) > > It also means that *you* need to worry about all the nasty issues of > text handling rather than getting the framework to do it. > > -- > Jon Skeet - <sk...@pobox.com>http://www.pobox.com/~skeet* Blog:http://www.msmvps.com/jon.skeet > World class .NET training in the UK:http://iterativetraining.co.uk |
|
#9
| |||
| |||
| What do you mean by "nobody used it"? I used to see it all the time on printers, some TTY green screens etc. It was (is?) very common in the 8-bit EBCDIC mainframe world. -- Phil Wilson [MVP Windows Installer] "Norman Diamond" <ndiamond@newsgroup.nospam> wrote in message news:emNKqwYjIHA.5412@TK2MSFTNGP02.phx.gbl... > Those are ASCII control codes, copied into ANSI and ISO code pages. > > In EBCDIC, 0x0D = CR but no one used it, 0x25 = LF but no one used it, > 0x15 = NL (newline) and it was occasionally used, and 0x0C = FF and I > don't know if anyone used it. > > So the original poster receives data that have already been converted from > EBCDIC to ASCII, but that's not the problem. The problem is that > StreamReader chokes when it hits a 0x00 (NUL in both ASCII and EBCDIC). > > > "Phil Wilson" <phil.wilson@wonderware.something.com> wrote in message > news:%23wPv514iIHA.2304@TK2MSFTNGP05.phx.gbl... >> From my mainframe days I can see that what you have there is Ebcdic, 0D = >> CR, 0A = LF, 0C = form feed. I'm wondering if this means that you could >> use the StreamReader ctor that takes an Encoding for an 8 bit Ebcdic and >> it would just work. Anyway, knowing that this is Ebcdic CR LF FF might >> help. >> -- >> Phil Wilson >> [MVP Windows Installer] >> >> <brian.gabriel@gmail.com> wrote in message >> news:eaabd428-4c6a-4f0f-8dc5-c4ba9218f6f9@t54g2000hsg.googlegroups.com... >>>I am trying to read a file that was produced on a mainframe, it is a >>> text output of a mainframe report. I am able to read in part of the >>> file, the problem is that between each "page" of the report there is a >>> NULL character. The streamreader will read up until it sees this >>> character then it stops, the end result being that most of the file >>> does not get read. >>> >>> Here is the byte pattern that is at the end of each page: 0D 0A 00 0C >>> 0D 0A >>> >>> Is there an easy way to read in this file without having to put it >>> into a byte array and futzing with all of that? >>> >>> I have tried the following: >>> >>> StreamReader re = File.OpenText(SourceFile); >>> string sTemp = re.ReadToEnd(); >>> >>> And: >>> >>> StreamReader re = File.OpenText(SourceFile); >>> while((sTemp = re.ReadLine()) != null) >>> { >>> sLine = sTemp; >>> //etc... >>> } >>> >>> Thanks, >>> >>> Brian >>> brian.gabriel@gmail.com >> >> > |
|
#10
| |||
| |||
| OK, I never saw the 0x0D 0x25 sequence used in EBCDIC in the TTY green screen world, I saw the 0x15 single byte newline character used in EBCDIC in the TTY green screen world. If the TTY firmware (or hardware) used ASCII then the driver would have to translate from EBCDIC to ASCII, but I didn't see anyone do that translation for TTYs at the application level. "Phil Wilson" <phil.wilson@wonderware.something.com> wrote in message news:ensy1rcjIHA.4740@TK2MSFTNGP05.phx.gbl... > What do you mean by "nobody used it"? I used to see it all the time on > printers, some TTY green screens etc. It was (is?) very common in the > 8-bit EBCDIC mainframe world. > -- > Phil Wilson > [MVP Windows Installer] > > "Norman Diamond" <ndiamond@newsgroup.nospam> wrote in message > news:emNKqwYjIHA.5412@TK2MSFTNGP02.phx.gbl... >> Those are ASCII control codes, copied into ANSI and ISO code pages. >> >> In EBCDIC, 0x0D = CR but no one used it, 0x25 = LF but no one used it, >> 0x15 = NL (newline) and it was occasionally used, and 0x0C = FF and I >> don't know if anyone used it. >> >> So the original poster receives data that have already been converted >> from EBCDIC to ASCII, but that's not the problem. The problem is that >> StreamReader chokes when it hits a 0x00 (NUL in both ASCII and EBCDIC). >> >> >> "Phil Wilson" <phil.wilson@wonderware.something.com> wrote in message >> news:%23wPv514iIHA.2304@TK2MSFTNGP05.phx.gbl... >>> From my mainframe days I can see that what you have there is Ebcdic, 0D >>> = CR, 0A = LF, 0C = form feed. I'm wondering if this means that you >>> could use the StreamReader ctor that takes an Encoding for an 8 bit >>> Ebcdic and it would just work. Anyway, knowing that this is Ebcdic CR >>> LF FF might help. >>> -- >>> Phil Wilson >>> [MVP Windows Installer] >>> >>> <brian.gabriel@gmail.com> wrote in message >>> news:eaabd428-4c6a-4f0f-8dc5-c4ba9218f6f9@t54g2000hsg.googlegroups.com... >>>>I am trying to read a file that was produced on a mainframe, it is a >>>> text output of a mainframe report. I am able to read in part of the >>>> file, the problem is that between each "page" of the report there is a >>>> NULL character. The streamreader will read up until it sees this >>>> character then it stops, the end result being that most of the file >>>> does not get read. >>>> >>>> Here is the byte pattern that is at the end of each page: 0D 0A 00 0C >>>> 0D 0A >>>> >>>> Is there an easy way to read in this file without having to put it >>>> into a byte array and futzing with all of that? >>>> >>>> I have tried the following: >>>> >>>> StreamReader re = File.OpenText(SourceFile); >>>> string sTemp = re.ReadToEnd(); >>>> >>>> And: >>>> >>>> StreamReader re = File.OpenText(SourceFile); >>>> while((sTemp = re.ReadLine()) != null) >>>> { >>>> sLine = sTemp; >>>> //etc... >>>> } >>>> >>>> Thanks, >>>> >>>> Brian >>>> brian.gabriel@gmail.com >>> >>> >> > > |



