Read binary data file - Java
This is a discussion on Read binary data file - Java ; Charles wrote:
> Let's review what the OP stated
>
> A struct is given in C++
>
> Data needs to read from a file in Java.
>
> You have the following data types
>
> unsigned long
...
-
Re: Read binary data file
Charles wrote:
> Let's review what the OP stated
>
> A struct is given in C++
>
> Data needs to read from a file in Java.
>
> You have the following data types
>
> unsigned long
> unsigned short
>
> As previously stated by other posters the Endianness of the operating
> system should affect how the output file is encoded. I assume this to
> be true but have not verified it to be true.
>
> We assume all unsigned longs and unsigned short will ALWAYS have the
> same bytesize.
>
> The complete struct is given as
>
> unsigned long data1;
> unsigned short data2;
> unsigned short data3;
> unsigned long data4;
>
> Can we also assume that the data will always be sequenced as described
> in the STRUCT?
> I don't see any argument why the data will be out of sequence as
> defined in the STRUCT.
But we do not know the padding, and the OP doesn't know what those sizes are,
nor the endianness of their files. They don't even know in what format the
floating-point values are stored: IEEE? We need all that information to craft
a Java equivalent, and we don't have it. The OP doesn't have it, by their
account.
> Does the input file get modified when it is transported from one
> operating system to another?
> I assume NO. This is not verified.
But if endianness and padding matter, the fact that it is not modified will
make it unreadable on the second system.
> Are there equivalents of unsigned long and unsigned short in Java?
No.
> Are they the same byte size?
We do not know. The OP hasn't given us enough information.
> Do they encode the data the same?
We do not know. The OP hasn't given us enough information.
> Try to read in Java and verify with known data. If you don't know any
> of the data values this becomes a harder task.
It's already impossible based on the information given. How much harder can
it get?
--
Lew
-
Re: Read binary data file
Lew wrote:
>
> It's already impossible based on the information given. How much harder
> can it get?
>
If the OP *MUST* move binary data, at least do it in a platform and
language-independent manner and use ASN.1 encoding.
--
martin@ | Martin Gregorie
gregorie. | Es**** UK
org |
-
Re: Read binary data file
I'm not sure if this is the same issue, but I'm trying to interpret
numeric values out of a chunk of data as follows:
int toBinary theValue
124 1111100 3.8
63 111111 4
224 11100000 4.8
63 111111 4
63 111111 4
224 11100000 4.8
64 1000000 3.2
63 111111 4
244 11110100 5
124 1111100 3.8
I can read "int" out of my blob of data, and I ran toBinaryString on
it just to visualize it. I manually typed "theValue" (that is what I
KNOW the test data is). Can someone help me figure out what code to
run in order to get "theValue"?
--Dale--
-
Re: Read binary data file
Martin Gregorie <martin@see.sig.for.address> wrote:
> If the OP *MUST* move binary data, at least do it in a platform and
> language-independent manner and use ASN.1 encoding.
I understand Hunter's comments, and and while I don't know much about
ASN.1 encoding, what I am pointing out is that binary files are usually
*not* intended to be used across sytems. Every binary data file I have
ever worked with was intended to be used either by the program that wrote
it, or separate applications that used the same utility libraries as the
application which wrote the data. There is nothing wrong with simply writing
the C structure to a file, and reading it in the same way. In this case
the code, and not some specification, drives the format of the data - and there
is *nothing* wrong with this. The lack of a need to share the data outside of
the application is what often drives the decision to use binary data in the
first place (why not take advantage of the efficiency binary files have to
offer).
Of course, every once in a while an outside user decides they want to use this
data. Well, then they have a choice. Either generate it themselves, or
spend a few hours writing something that can read it in - not a big price
to pay.
- Kurt
-
Re: Read binary data file
On Fri, 31 Aug 2007 09:15:55 -0700, "DRS.Usenet@sengsational.com"
<DRS.Usenet@sengsational.com> wrote, quoted or indirectly quoted
someone who said :
>int toBinary theValue
>124 1111100 3.8
>63 111111 4
>224 11100000 4.8
>63 111111 4
>63 111111 4
>224 11100000 4.8
>64 1000000 3.2
>63 111111 4
>244 11110100 5
>124 1111100 3.8
>
>I can read "int" out of my blob of data, and I ran toBinaryString on
>it just to visualize it. I manually typed "theValue" (that is what I
>KNOW the test data is). Can someone help me figure out what code to
>run in order to get "theValue"?
If you get enough samples you can create a
private static final double[] translate = new double[256];
to do the translation for you.
In what context did you see this code? It looks like it might be some
sort of sound encoding technique. You can read up the specs on the
encoding.
see http://mindprod.com/jgloss/sound.html to help get you started.
It might also be some sort of Huffman encoding. See
http://mindprod.com/jgloss/huffman.html
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
-
Re: Read binary data file
~kurt wrote:
> I understand Hunter's comments, and and while I don't know much about
> ASN.1 encoding, what I am pointing out is that binary files are usually
> *not* intended to be used across sytems.
Except for all the ones that are, e.g. protocol dumps; databases;
interpretive pseudo-code (e.g. .class files), ...
> Every binary data file I have
> ever worked with was intended to be used either by the program that wrote
> it, or separate applications that used the same utility libraries as the
> application which wrote the data.
Except for the ones that aren't: e.g. protocol dumps; databases;
interpretive pseudo-code (e.g. .class files), ...
> There is nothing wrong with simply writing
> the C structure to a file, and reading it in the same way. In this case
> the code, and not some specification, drives the format of the data - and there
> is *nothing* wrong with this.
There is plenty wrong with this. The format of binary data written
directly from a struct in memory depends on at least the following:
- the host hardware
- the compiler
- the compiler version
- the surrounding #pragmas
- the compiler options that were in effect when the binary that wrote
the file it was compiled
This is too many dependencies, on too many things that can't be controlled.
The only time writing a struct from memory to a file or a network can
sanely be justified is when the target application is constructed with
the same version of the same object file that wrote it. And this is not
a guarantee that in general can be met.
-
Re: Read binary data file
~kurt wrote:
>
> I understand Hunter's comments, and and while I don't know much about
> ASN.1 encoding, what I am pointing out is that binary files are usually
> *not* intended to be used across sytems.
>
I think its use is quite industry-dependent: I've never seen it used in
financial messaging (that's more likely to use SWIFT formats, which are
tagged text) but its common in the telecommunications industry.
Telcos (both fixed line and mobile) use a lot of binary data for control
and accounting purposes, mainly because this minimizes message size and
there's a LOT of stuff flying around controlling the network in real
time and accounting for its use. Switches from large vendors, e.g.
Erickson, tend to use proprietary, flat message formats but if the data
will be exchanged between different types of kit (e.g. roaming billing
data) they tend to use ASN.1: CCITT likes it.
ASN.1 has a lot in common with XML in that its a tagged field protocol,
allows nesting, and uses a tag dictionary to associate meanings with
tags. Compared with XML its a LOT more compact (tags are one byte, fixed
length fields don't have terminators, variable length fields are
preceded by a one or two byte length) and it has a number of predefined
field types as well as arrays. If you have the dictionary its easy to
interpret on the fly though, like XML, you can also use the dictionary
to generate code to encode and decode ASN.1 records.
> Every binary data file I have
> ever worked with was intended to be used either by the program that wrote
> it, or separate applications that used the same utility libraries as the
> application which wrote the data.
>
There's also a lot of binary data in large commercial systems. Formerly
it was in large serial files, then flat indexed files, now its probably
in a database. A really good reason for using an RDBMS is that it not
only hides implementation details (like endian conventions) from the
application, but the interfaces (SQL, JDBC, ODBC, etc) typically provide
field conversion facilities.
> There is nothing wrong with simply writing
> the C structure to a file, and reading it in the same way.
>
I'd probably use a CSV format any place where a database would be
obvious overkill, but ymmv.
Using CSV rather than binary makes debugging easier and (said with his
*NIX hat on) it allows the data to be handled by common scripted
utilities like awk, perl and even shell scripts. Oh yeah, Java too :-)
--
martin@ | Martin Gregorie
gregorie. | Es**** UK
org |
-
Re: Read binary data file
Esmond Pitt wrote:
> ~kurt wrote:
>> I understand Hunter's comments, and and while I don't know much about
>> ASN.1 encoding, what I am pointing out is that binary files are
>> usually *not* intended to be used across sytems.
>
> Except for all the ones that are, e.g. protocol dumps; databases;
> interpretive pseudo-code (e.g. .class files), ...
How often to database *files* get moved from one system to another? In my
experience, they stay on the server where the DBMS engine is running.
-
Re: Read binary data file
Esmond Pitt <esmond.pitt@nospam.bigpond.com> wrote:
> The only time writing a struct from memory to a file or a network can
Who is talking about writing data to a network?
> sanely be justified is when the target application is constructed with
> the same version of the same object file that wrote it. And this is not
> a guarantee that in general can be met.
Uh, this is pretty much what I just said other than I see no need for
the "guarantee" part - it is not necessary unless the *intent* is to
distribute the data externally.
As I said, my gripe is in calling the originator of the OP's data clueless.
That statement is simply clueless itself. Yes, if the original program had
been written in Java, then maybe that statement would be true. But this
is a C++ program. The data files are most likely "private", only to be
used internally. Sure, if you port the code to another platform, the
binary files between the two versions may not be compatible, but so what -
that usually isn't a problem. The new code will create binary files that
are compatible with itself. Creating some external specification that this
binary data must meet would be stupid because then, if you did port the
code, now you may have to modify it to be compatible with the original
specification, and this may require more processing of the data. Suddenly,
some specification is driving internal data, and robbing some degree of
performance from the application.
Just because a bureaucrat comes a long some time down the road and says
"though shalt write a Java program (not that Java is the best solution in
this case, but because it is the 'in' thing to do) that will use Program X's
internal data files" does not mean Program X was poorly designed.
- Kurt
-
Re: Read binary data file
~kurt wrote:
> Esmond Pitt <esmond.pitt@nospam.bigpond.com> wrote:
>
>> The only time writing a struct from memory to a file or a network can
>
> Who is talking about writing data to a network?
>
>> sanely be justified is when the target application is constructed
>> with the same version of the same object file that wrote it. And
>> this is not a guarantee that in general can be met.
>
> Uh, this is pretty much what I just said other than I see no need for
> the "guarantee" part - it is not necessary unless the *intent* is to
> distribute the data externally.
>
> As I said, my gripe is in calling the originator of the OP's data
> clueless. That statement is simply clueless itself. Yes, if the
> original program had been written in Java, then maybe that statement
> would be true. But this
> is a C++ program. The data files are most likely "private", only to
> be used internally. Sure, if you port the code to another platform,
> the binary files between the two versions may not be compatible, but
> so what - that usually isn't a problem. The new code will create
> binary files that are compatible with itself. Creating some external
> specification that this binary data must meet would be stupid because
> then, if you did port the code, now you may have to modify it to be
> compatible with the original specification, and this may require more
> processing of the data. Suddenly, some specification is driving
> internal data, and robbing some degree of performance from the
> application.
The danger is that a different compiler (or different version of the same
compiler) would cause an incompatibility. The good news is that compiler
vendors tend not to change struct layouts for that very reason. Still, this
needs to be kept in mind and tested for whenever that sort of change is
made.
Another point, not yet mentioned (or if it has been, I missed that post.)
Any structured data that's saved persistently should contain a version
number. If it never changes, you've added a small amount of overhead. When
it does change, it's now straightforward to convert older versions and
recognize new ones, which, without the explicit versioning, can be difficult
or impossible.
Similar Threads
-
By Application Development in forum labview
Replies: 0
Last Post: 12-13-2007, 12:10 PM
-
By Application Development in forum Fortran
Replies: 7
Last Post: 11-23-2007, 09:15 PM
-
By Application Development in forum lisp
Replies: 7
Last Post: 10-22-2007, 05:27 AM
-
By Application Development in forum ADO DAO RDO RDS
Replies: 0
Last Post: 12-07-2006, 06:22 AM
-
By Application Development in forum Graphics
Replies: 0
Last Post: 06-12-2006, 08:41 AM