fseek and unformatted sequential files - Fortran
This is a discussion on fseek and unformatted sequential files - Fortran ; For Fortran 95, is there a way to randomly access an unformatted
sequential file?
My original file is written as unformatted and the records are of
varialbe length which I can find out easily. The files are very big,
generally ...
-
fseek and unformatted sequential files
For Fortran 95, is there a way to randomly access an unformatted
sequential file?
My original file is written as unformatted and the records are of
varialbe length which I can find out easily. The files are very big,
generally 50 gigabytes each. Now I need to search of specific
information in the file. Though I know the exact position in the file,
I don't know to access that position. What kind of options do I have?
Thank you.
William Hu
University of Memphis
-
Re: fseek and unformatted sequential files
<huxiankui@gmail.com> wrote in message
news:9a5cfe4b-bd6f-498e-8a74-5e5a216804b3@18g2000hsf.googlegroups.com...
> For Fortran 95, is there a way to randomly access an unformatted
> sequential file?
Not in standard Fortran 95. Variable length records can only be accessed
sequentially. In order to do random access, you need fixed-length records.
However, these restrictions are for standard F95. For another option, see
below.
> My original file is written as unformatted and the records are of
> varialbe length which I can find out easily. The files are very big,
> generally 50 gigabytes each. Now I need to search of specific
> information in the file. Though I know the exact position in the file,
> I don't know to access that position. What kind of options do I have?
>
> Thank you.
> William Hu
> University of Memphis
Fortran 2003 introduces the idea of stream I/O. You can do this using
ACCESS="STREAM" on the OPEN statement. Among other properties, stream I/O
allows access by character position (for formatted files) or by file storage
unit (for unformatted files). The standard recommends, but does not require,
that file storage units be 8-bit octets in situations where this is practical.
Stream I/O is proving to be a popular Fortran 2003 feature, so many Fortran
95 compilers are implementing it early in their development schedules. Several
Fortran compielrs already have implemented it in their current commercial
releases.
If your files are around 50.0 GB, then you will need 8 byte integers in
order to position your file by file storage unit. The Fortran 2003 standard
allows the use of any kind of integer in specifying positioning by file storage
unit.
Ther are other options, but they are non-standard and may be available only
on a very limited set of platforms. If you want advice on these other options,
you will need to let us know what compilers and operating systems you are using.
--
Craig Dedo
17130 W. Burleigh Place
P. O. Box 423
Brookfield, WI 53008-0423
Voice: (262) 783-5869
Fax: (262) 783-5928
Mobile: (414) 412-5869
E-mail: <cdedo@wi.rr.com> or <craig@ctdedo.com>
-
Re: fseek and unformatted sequential files
<huxiankui@gmail.com> wrote:
> For Fortran 95, is there a way to randomly access an unformatted
> sequential file?
Not in anything standard or portable. The "sequential" means that the
records can only be acccessed in sequence.
> My original file is written as unformatted and the records are of
> varialbe length which I can find out easily. The files are very big,
> generally 50 gigabytes each. Now I need to search of specific
> information in the file. Though I know the exact position in the file,
> I don't know to access that position. What kind of options do I have?
If you can revise how they are written in the fist place, that would
probably be best. It sounds to me like f2003 stream acccess might be
most suitable for the need. Although that isn't standardized until
f2003, almost all f95 compilers have similar capabilities, although the
exact syntax details vary.
Alternatively, you can write the files as direct access files. That
requires fixed-length records. You say that you have variable length
ones. You can manage that by essentially doing your own record
management, organizing your variable-length records into fixed-length
blocks. I've done that kind of thing. It works and can be made quite
portable. It is, however, a bit of a fuss to set up. My guess is that it
would be more fuss than you are looking for.
If you can't change how the files are written, then you are stuck with
nonstandard approaches. You can probably use either of the above
approaches - stream access or fixed-length blocks read as direct
acccess. You will, however, have the extra complication of having to
manage the compiler-dependent record structure of sequential access
unformatted files. You have to understand what the underlying structure
of the records looks like (usually each record has a 4-byte header and a
4-byte trailer, but there are variations; for example, sometimes they
are 8 bytes each). You will need to find the actual data part of the
record of interest to you.
Using stream access like this is actually pretty simple. It is simpler
than it sounds. The big problem is that stream access is not
standardized until f2003 and the exact syntax used for pre-f2003
compilers (namely all of today's compilers) varies.
If you can't change how the files are written, then the
compiler-dependence of the exact form of unformatted sequential files is
at least a minor issue. If you can change to writing with stream access,
that problem goes away.
Oh. I see you mentioned fseek. That is another option - to use C code
called from Fortran. (There are some compilers that have an fseek in
Fortran, but that is nonstandard and nonportable). But calling C from
f95 isn't any more standard or portable than using stream access. It can
work, but I don't see any particular advantage to doing it in C. F2003
stream access is intentionally modeled after C, so doing it in C or with
stream acccess will be quite similar structurally.
--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
-
Re: fseek and unformatted sequential files
Craig Dedo <cdedo@wi.rr.com> wrote:
> If your files are around 50.0 GB, then you will need 8 byte integers in
> order to position your file by file storage unit.
Ah. Good catch. I overlooked that part. Depending on the exact compiler,
it might not support files that large at all. The standard allows
compilers to have such limitations, and quite a few compilers have such
a limit, though I think that newer compilers are more often tending to
support larger files. Of course, if you managed to write the files in
the first place, that is evidence that you have a compiler around that
does support files that large, but watch out for the possibility that
other compilers might not,
--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
-
Re: fseek and unformatted sequential files
Thank you for the suggestions. When will f2003 be available?
Also if the original file is written as unformatted sequential, can I
access the file using direct access? I know the exact structure of teh
unformatted sequential file and the length of every records.
All I want to do is to speed up the disk i/o process. So if there a
way to load part of the file into memory/cache and then search from
memory instead of doing direct disk i/o? As I mentioned, most of the
time, I have to do empty reading to skip through records to the one I
want to get. Is there a way to speed up the empty reading?
The compiler I'm using is Lahey/Fujitsu Fortran 95 Express Release
5.70d.
William Hu
-
Re: fseek and unformatted sequential files
william <huxiankui@gmail.com> wrote:
> Thank you for the suggestions. When will f2003 be available?
There is at least one "almost complete" 2003 compiler out now (IBM's
xlf). As to when particular vendors will have compilers out, that will
no doubt vary a lot.
But many current f95 compilers implement the f2003 stream I/O, and
pretty much all of them implement something like it.
> Also if the original file is written as unformatted sequential, can I
> access the file using direct access? I know the exact structure of teh
> unformatted sequential file and the length of every records.
The standard says not. Well, the full details of what it says get long,
but it says that you can't count on it, and in fact, you won't be able
to do so in a standard-conforming way.
In practice, yes, you usually can, but as I mentioned in my previous
post, you have to do all the record management yourself. The direct
access will just get you raw fixed-size blocks of data - not the
original sequential records. You have to do all the record management.
There are lots of caveats. It can be quite a mess, depending on more
details than I could reasonably list. For one, if the file size is not
an exact multiple of whatever size you choose for your blocks, there
will be a "partial block" at the end and you might not be able to read
that.
For another, unless you get particularly lucky, you'll have to use
equivalence or TRANSFER tricks to get the data from the block buffers
into variables of the appropriate type.
Stream access makes it all much simpler.
> All I want to do is to speed up the disk i/o process. So if there a
> way to load part of the file into memory/cache and then search from
> memory instead of doing direct disk i/o?
Not without getting *VERY* system specific. Some systems have ways of
mapping files to memory. But even if you do that, don't think it will do
magic. The data still has to get physically read from the disk.
> As I mentioned, most of the
> time, I have to do empty reading to skip through records to the one I
> want to get. Is there a way to speed up the empty reading?
Not really. If you can otherwise compute where you need to end up, then
the tricks mentioned above can help. But if you need to read through the
file, then you need to read through the file and nothing much is going
to help. Do make sure that your I/O list is empty so that you don't
waste time copying data to variables that aren't going to get used as
you skip records. But it sounds like you are doing that anyway.
> The compiler I'm using is Lahey/Fujitsu Fortran 95 Express Release
> 5.70d.
--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
-
Re: fseek and unformatted sequential files
"william" <huxiankui@gmail.com> wrote in message
news:2a391aeb-0d02-43c8-ab76-5217f8f43cd5@s8g2000prg.googlegroups.com...
> Thank you for the suggestions. When will f2003 be available?
>
> Also if the original file is written as unformatted sequential, can I
> access the file using direct access? I know the exact structure of teh
> unformatted sequential file and the length of every records.
>
> All I want to do is to speed up the disk i/o process. So if there a
> way to load part of the file into memory/cache and then search from
> memory instead of doing direct disk i/o? As I mentioned, most of the
> time, I have to do empty reading to skip through records to the one I
> want to get. Is there a way to speed up the empty reading?
>
> The compiler I'm using is Lahey/Fujitsu Fortran 95 Express Release
> 5.70d.
>
> William Hu
Unfortunately, you will need to use another compiler. I checked the Lahey
web site and the current versions of their compilers do not support Fortran 2003
Stream I/O.
You do not say whether you are using Windows or Linux. Most of my
experience is on Windows.
I know of two Windows-based compilers that implement Fortran 2003 Stream I/O
in their latest versions. They are:
* Intel Visual Fortran (IVF) 10.1 - Stream I/O is a new feature in this
version.
* Gnu gfortran 4.3.0 - I tested it using a small test program and it works
as specified in Fortran 2003.
Since you are an academic, you can get a substantial academic discount for
IVF 10.1. You can get gfortran for free. However, you need to be aware that
version 4.3.0 is still in the beta test stage.
I believe that both compilers support 8 byte integers. I have no idea if
either will support 50.0 GB files.
Using Fortran 2003 Stream I/O, you specify the position within the file
using the new POS= specifier. It takes one integer value of any kind. The
first file storage unit (FSU) has a position value of 1, not 0. E.g., to read
in a value starting at position 25, you would write something like this:
Read (Unit=LUN_Huge_File, POS=25 ) I
This assumes that LUN_Huge_File is the unit number of your huge data file and is
connected with ACCESS="STREAM".
Hope this helps. Please ask if you have any more questions.
--
Craig Dedo
17130 W. Burleigh Place
P. O. Box 423
Brookfield, WI 53008-0423
Voice: (262) 783-5869
Fax: (262) 783-5928
Mobile: (414) 412-5869
E-mail: <cdedo@wi.rr.com> or <craig@ctdedo.com>
-
Re: fseek and unformatted sequential files
Craig Dedo <cdedo@wi.rr.com> wrote:
> Unfortunately, you will need to use another compiler. I checked the
> Lahey web site and the current versions of their compilers do not support
> Fortran 2003 Stream I/O.
But they do support comparable functionality with some of the usual
alternate spellings. See the access='transparent' and form='binary'
specifiers of the OPEN statement in the Lahey docs.
I'm not sure, however, whether they support the random positioning
feature. A quick skim through the documentation doesn't make it obvious.
There is reference to the record length being treated as 1, so there is
at least a possibility that specifying rec=n might to the trick, where n
is the desired position measured in bytes.
--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
-
Re: fseek and unformatted sequential files
Thank you for the suggestions. I'm getting closer to solve the
problem. But I still need your advice.
I've divided the problem to 3 parts and finished two parts as below.
First, index the tickers in the file. By using 'transparent' option, I
can know exactly where the tickers are located in the file.
Second, using the position of the ticker, fseek can locate the ticker
without any problem.
Code:
open( 12, infile, STATUS="OLD", ACTION="READ",err=80 )
J=fseek(12,1847529928,0)
read(12,"(a10)")ticker;
write(*,*)ticker
Third, in this 64 byte record, I have the following variables:
ticker,n,nq,ndays,ncon,nterr,nqerr,avgpri,type,pexch,cusip,its,indcode,errmult
character ticker*10,pexch*1,cusip*9,indcode*4
integer n,nq,ndays,ncon,nterr,nqerr,type,its
real avgpri,errmult
The total is 64 bytes.
Now with the unformatted sequential file "infile" open like the one
above, I can read the ticker without problem. But how do I read the
integer variables n, nq, etc. and the floating number avgpri and
errmult? If you can help me solve this step, we basically solved the
entire problem.
Thank you.
Bill Hu
-
Re: fseek and unformatted sequential files
william <huxiankui@gmail.com> wrote:
> Thank you for the suggestions. I'm getting closer to solve the
> problem. But I still need your advice.
>
> I've divided the problem to 3 parts and finished two parts as below.
> First, index the tickers in the file. By using 'transparent' option, I
> can know exactly where the tickers are located in the file.
> Second, using the position of the ticker, fseek can locate the ticker
> without any problem.
> Code:
> open( 12, infile, STATUS="OLD", ACTION="READ",err=80 )
I don't see the transparent option or any common synonym of it above.
That doesn't look right. You said you were using the transparent option,
but I don't see it. It isn't *THAT* transparent; you do have to write it
in the code. :-)
> J=fseek(12,1847529928,0)
I don't know the various compiler-dependent forms of that, but you say
it worked, so I'l assume you got it right.
> read(12,"(a10)")ticker;
Why are you using a formatted read here? You said the file was
unformatted, so you shouldn't be doing a formatted read. It turns out
that for character data, there isn't much "formatting" to do in
formatted I/O and sometimes you can "fake" strange effects based on
that, but the straightforward thing to do is to use an unformatted read.
I wonder if perhaps your failure to open the file with the transparent
option is messing things up here. I could well imagine... well lots of
things, but that gets on a side track.
> Now with the unformatted sequential file "infile" open like the one
> above, I can read the ticker without problem.
The above is *NOT* an unformatted read. You have a format in it. That's
what the "(a10)" is. I suggest actually doing it as an unformatted read.
And you'll need the OPEN to use transparent mode to match. I would
assume from prior context that you know how to do that, but just in
case, it should look more like
read(12) ticker,n,nq,...
--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
Similar Threads
-
By Application Development in forum Fortran
Replies: 7
Last Post: 12-17-2007, 03:21 AM
-
By Application Development in forum Fortran
Replies: 0
Last Post: 06-13-2007, 06:45 PM
-
By Application Development in forum Idl-pvwave
Replies: 0
Last Post: 05-28-2007, 07:57 PM
-
By Application Development in forum basic.visual
Replies: 1
Last Post: 10-26-2006, 03:08 AM
-
By Application Development in forum Java-Games
Replies: 0
Last Post: 03-05-2005, 07:32 PM