fseek and unformatted sequential files - Fortran

This is a discussion on fseek and unformatted sequential files - Fortran ; For Fortran 95, is there a way to randomly access an unformatted sequential file? My original file is written as unformatted and the records are of varialbe length which I can find out easily. The files are very big, generally ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 20

fseek and unformatted sequential files

  1. Default fseek and unformatted sequential files

    For Fortran 95, is there a way to randomly access an unformatted
    sequential file?

    My original file is written as unformatted and the records are of
    varialbe length which I can find out easily. The files are very big,
    generally 50 gigabytes each. Now I need to search of specific
    information in the file. Though I know the exact position in the file,
    I don't know to access that position. What kind of options do I have?

    Thank you.

    William Hu
    University of Memphis

  2. Default Re: fseek and unformatted sequential files

    <huxiankui@gmail.com> wrote in message
    news:9a5cfe4b-bd6f-498e-8a74-5e5a216804b3@18g2000hsf.googlegroups.com...
    > For Fortran 95, is there a way to randomly access an unformatted
    > sequential file?


    Not in standard Fortran 95. Variable length records can only be accessed
    sequentially. In order to do random access, you need fixed-length records.

    However, these restrictions are for standard F95. For another option, see
    below.

    > My original file is written as unformatted and the records are of
    > varialbe length which I can find out easily. The files are very big,
    > generally 50 gigabytes each. Now I need to search of specific
    > information in the file. Though I know the exact position in the file,
    > I don't know to access that position. What kind of options do I have?
    >
    > Thank you.
    > William Hu
    > University of Memphis


    Fortran 2003 introduces the idea of stream I/O. You can do this using
    ACCESS="STREAM" on the OPEN statement. Among other properties, stream I/O
    allows access by character position (for formatted files) or by file storage
    unit (for unformatted files). The standard recommends, but does not require,
    that file storage units be 8-bit octets in situations where this is practical.

    Stream I/O is proving to be a popular Fortran 2003 feature, so many Fortran
    95 compilers are implementing it early in their development schedules. Several
    Fortran compielrs already have implemented it in their current commercial
    releases.

    If your files are around 50.0 GB, then you will need 8 byte integers in
    order to position your file by file storage unit. The Fortran 2003 standard
    allows the use of any kind of integer in specifying positioning by file storage
    unit.

    Ther are other options, but they are non-standard and may be available only
    on a very limited set of platforms. If you want advice on these other options,
    you will need to let us know what compilers and operating systems you are using.

    --
    Craig Dedo
    17130 W. Burleigh Place
    P. O. Box 423
    Brookfield, WI 53008-0423
    Voice: (262) 783-5869
    Fax: (262) 783-5928
    Mobile: (414) 412-5869
    E-mail: <cdedo@wi.rr.com> or <craig@ctdedo.com>


  3. Default Re: fseek and unformatted sequential files

    <huxiankui@gmail.com> wrote:

    > For Fortran 95, is there a way to randomly access an unformatted
    > sequential file?


    Not in anything standard or portable. The "sequential" means that the
    records can only be acccessed in sequence.

    > My original file is written as unformatted and the records are of
    > varialbe length which I can find out easily. The files are very big,
    > generally 50 gigabytes each. Now I need to search of specific
    > information in the file. Though I know the exact position in the file,
    > I don't know to access that position. What kind of options do I have?


    If you can revise how they are written in the fist place, that would
    probably be best. It sounds to me like f2003 stream acccess might be
    most suitable for the need. Although that isn't standardized until
    f2003, almost all f95 compilers have similar capabilities, although the
    exact syntax details vary.

    Alternatively, you can write the files as direct access files. That
    requires fixed-length records. You say that you have variable length
    ones. You can manage that by essentially doing your own record
    management, organizing your variable-length records into fixed-length
    blocks. I've done that kind of thing. It works and can be made quite
    portable. It is, however, a bit of a fuss to set up. My guess is that it
    would be more fuss than you are looking for.

    If you can't change how the files are written, then you are stuck with
    nonstandard approaches. You can probably use either of the above
    approaches - stream access or fixed-length blocks read as direct
    acccess. You will, however, have the extra complication of having to
    manage the compiler-dependent record structure of sequential access
    unformatted files. You have to understand what the underlying structure
    of the records looks like (usually each record has a 4-byte header and a
    4-byte trailer, but there are variations; for example, sometimes they
    are 8 bytes each). You will need to find the actual data part of the
    record of interest to you.

    Using stream access like this is actually pretty simple. It is simpler
    than it sounds. The big problem is that stream access is not
    standardized until f2003 and the exact syntax used for pre-f2003
    compilers (namely all of today's compilers) varies.

    If you can't change how the files are written, then the
    compiler-dependence of the exact form of unformatted sequential files is
    at least a minor issue. If you can change to writing with stream access,
    that problem goes away.

    Oh. I see you mentioned fseek. That is another option - to use C code
    called from Fortran. (There are some compilers that have an fseek in
    Fortran, but that is nonstandard and nonportable). But calling C from
    f95 isn't any more standard or portable than using stream access. It can
    work, but I don't see any particular advantage to doing it in C. F2003
    stream access is intentionally modeled after C, so doing it in C or with
    stream acccess will be quite similar structurally.

    --
    Richard Maine | Good judgement comes from experience;
    email: last name at domain . net | experience comes from bad judgement.
    domain: summertriangle | -- Mark Twain

  4. Default Re: fseek and unformatted sequential files

    Craig Dedo <cdedo@wi.rr.com> wrote:

    > If your files are around 50.0 GB, then you will need 8 byte integers in
    > order to position your file by file storage unit.


    Ah. Good catch. I overlooked that part. Depending on the exact compiler,
    it might not support files that large at all. The standard allows
    compilers to have such limitations, and quite a few compilers have such
    a limit, though I think that newer compilers are more often tending to
    support larger files. Of course, if you managed to write the files in
    the first place, that is evidence that you have a compiler around that
    does support files that large, but watch out for the possibility that
    other compilers might not,

    --
    Richard Maine | Good judgement comes from experience;
    email: last name at domain . net | experience comes from bad judgement.
    domain: summertriangle | -- Mark Twain

  5. Default Re: fseek and unformatted sequential files

    Thank you for the suggestions. When will f2003 be available?

    Also if the original file is written as unformatted sequential, can I
    access the file using direct access? I know the exact structure of teh
    unformatted sequential file and the length of every records.

    All I want to do is to speed up the disk i/o process. So if there a
    way to load part of the file into memory/cache and then search from
    memory instead of doing direct disk i/o? As I mentioned, most of the
    time, I have to do empty reading to skip through records to the one I
    want to get. Is there a way to speed up the empty reading?

    The compiler I'm using is Lahey/Fujitsu Fortran 95 Express Release
    5.70d.

    William Hu

  6. Default Re: fseek and unformatted sequential files

    william <huxiankui@gmail.com> wrote:

    > Thank you for the suggestions. When will f2003 be available?


    There is at least one "almost complete" 2003 compiler out now (IBM's
    xlf). As to when particular vendors will have compilers out, that will
    no doubt vary a lot.

    But many current f95 compilers implement the f2003 stream I/O, and
    pretty much all of them implement something like it.

    > Also if the original file is written as unformatted sequential, can I
    > access the file using direct access? I know the exact structure of teh
    > unformatted sequential file and the length of every records.


    The standard says not. Well, the full details of what it says get long,
    but it says that you can't count on it, and in fact, you won't be able
    to do so in a standard-conforming way.

    In practice, yes, you usually can, but as I mentioned in my previous
    post, you have to do all the record management yourself. The direct
    access will just get you raw fixed-size blocks of data - not the
    original sequential records. You have to do all the record management.

    There are lots of caveats. It can be quite a mess, depending on more
    details than I could reasonably list. For one, if the file size is not
    an exact multiple of whatever size you choose for your blocks, there
    will be a "partial block" at the end and you might not be able to read
    that.

    For another, unless you get particularly lucky, you'll have to use
    equivalence or TRANSFER tricks to get the data from the block buffers
    into variables of the appropriate type.

    Stream access makes it all much simpler.

    > All I want to do is to speed up the disk i/o process. So if there a
    > way to load part of the file into memory/cache and then search from
    > memory instead of doing direct disk i/o?


    Not without getting *VERY* system specific. Some systems have ways of
    mapping files to memory. But even if you do that, don't think it will do
    magic. The data still has to get physically read from the disk.

    > As I mentioned, most of the
    > time, I have to do empty reading to skip through records to the one I
    > want to get. Is there a way to speed up the empty reading?


    Not really. If you can otherwise compute where you need to end up, then
    the tricks mentioned above can help. But if you need to read through the
    file, then you need to read through the file and nothing much is going
    to help. Do make sure that your I/O list is empty so that you don't
    waste time copying data to variables that aren't going to get used as
    you skip records. But it sounds like you are doing that anyway.

    > The compiler I'm using is Lahey/Fujitsu Fortran 95 Express Release
    > 5.70d.


    --
    Richard Maine | Good judgement comes from experience;
    email: last name at domain . net | experience comes from bad judgement.
    domain: summertriangle | -- Mark Twain

  7. Default Re: fseek and unformatted sequential files

    "william" <huxiankui@gmail.com> wrote in message
    news:2a391aeb-0d02-43c8-ab76-5217f8f43cd5@s8g2000prg.googlegroups.com...
    > Thank you for the suggestions. When will f2003 be available?
    >
    > Also if the original file is written as unformatted sequential, can I
    > access the file using direct access? I know the exact structure of teh
    > unformatted sequential file and the length of every records.
    >
    > All I want to do is to speed up the disk i/o process. So if there a
    > way to load part of the file into memory/cache and then search from
    > memory instead of doing direct disk i/o? As I mentioned, most of the
    > time, I have to do empty reading to skip through records to the one I
    > want to get. Is there a way to speed up the empty reading?
    >
    > The compiler I'm using is Lahey/Fujitsu Fortran 95 Express Release
    > 5.70d.
    >
    > William Hu


    Unfortunately, you will need to use another compiler. I checked the Lahey
    web site and the current versions of their compilers do not support Fortran 2003
    Stream I/O.

    You do not say whether you are using Windows or Linux. Most of my
    experience is on Windows.

    I know of two Windows-based compilers that implement Fortran 2003 Stream I/O
    in their latest versions. They are:
    * Intel Visual Fortran (IVF) 10.1 - Stream I/O is a new feature in this
    version.
    * Gnu gfortran 4.3.0 - I tested it using a small test program and it works
    as specified in Fortran 2003.

    Since you are an academic, you can get a substantial academic discount for
    IVF 10.1. You can get gfortran for free. However, you need to be aware that
    version 4.3.0 is still in the beta test stage.

    I believe that both compilers support 8 byte integers. I have no idea if
    either will support 50.0 GB files.

    Using Fortran 2003 Stream I/O, you specify the position within the file
    using the new POS= specifier. It takes one integer value of any kind. The
    first file storage unit (FSU) has a position value of 1, not 0. E.g., to read
    in a value starting at position 25, you would write something like this:
    Read (Unit=LUN_Huge_File, POS=25 ) I
    This assumes that LUN_Huge_File is the unit number of your huge data file and is
    connected with ACCESS="STREAM".

    Hope this helps. Please ask if you have any more questions.

    --
    Craig Dedo
    17130 W. Burleigh Place
    P. O. Box 423
    Brookfield, WI 53008-0423
    Voice: (262) 783-5869
    Fax: (262) 783-5928
    Mobile: (414) 412-5869
    E-mail: <cdedo@wi.rr.com> or <craig@ctdedo.com>


  8. Default Re: fseek and unformatted sequential files

    Craig Dedo <cdedo@wi.rr.com> wrote:

    > Unfortunately, you will need to use another compiler. I checked the
    > Lahey web site and the current versions of their compilers do not support
    > Fortran 2003 Stream I/O.


    But they do support comparable functionality with some of the usual
    alternate spellings. See the access='transparent' and form='binary'
    specifiers of the OPEN statement in the Lahey docs.

    I'm not sure, however, whether they support the random positioning
    feature. A quick skim through the documentation doesn't make it obvious.
    There is reference to the record length being treated as 1, so there is
    at least a possibility that specifying rec=n might to the trick, where n
    is the desired position measured in bytes.

    --
    Richard Maine | Good judgement comes from experience;
    email: last name at domain . net | experience comes from bad judgement.
    domain: summertriangle | -- Mark Twain

  9. Default Re: fseek and unformatted sequential files

    Thank you for the suggestions. I'm getting closer to solve the
    problem. But I still need your advice.

    I've divided the problem to 3 parts and finished two parts as below.
    First, index the tickers in the file. By using 'transparent' option, I
    can know exactly where the tickers are located in the file.
    Second, using the position of the ticker, fseek can locate the ticker
    without any problem.
    Code:
    open( 12, infile, STATUS="OLD", ACTION="READ",err=80 )
    J=fseek(12,1847529928,0)
    read(12,"(a10)")ticker;
    write(*,*)ticker
    Third, in this 64 byte record, I have the following variables:
    ticker,n,nq,ndays,ncon,nterr,nqerr,avgpri,type,pexch,cusip,its,indcode,errmult

    character ticker*10,pexch*1,cusip*9,indcode*4
    integer n,nq,ndays,ncon,nterr,nqerr,type,its
    real avgpri,errmult

    The total is 64 bytes.

    Now with the unformatted sequential file "infile" open like the one
    above, I can read the ticker without problem. But how do I read the
    integer variables n, nq, etc. and the floating number avgpri and
    errmult? If you can help me solve this step, we basically solved the
    entire problem.

    Thank you.

    Bill Hu

  10. Default Re: fseek and unformatted sequential files

    william <huxiankui@gmail.com> wrote:

    > Thank you for the suggestions. I'm getting closer to solve the
    > problem. But I still need your advice.
    >
    > I've divided the problem to 3 parts and finished two parts as below.
    > First, index the tickers in the file. By using 'transparent' option, I
    > can know exactly where the tickers are located in the file.
    > Second, using the position of the ticker, fseek can locate the ticker
    > without any problem.
    > Code:
    > open( 12, infile, STATUS="OLD", ACTION="READ",err=80 )


    I don't see the transparent option or any common synonym of it above.
    That doesn't look right. You said you were using the transparent option,
    but I don't see it. It isn't *THAT* transparent; you do have to write it
    in the code. :-)

    > J=fseek(12,1847529928,0)


    I don't know the various compiler-dependent forms of that, but you say
    it worked, so I'l assume you got it right.

    > read(12,"(a10)")ticker;


    Why are you using a formatted read here? You said the file was
    unformatted, so you shouldn't be doing a formatted read. It turns out
    that for character data, there isn't much "formatting" to do in
    formatted I/O and sometimes you can "fake" strange effects based on
    that, but the straightforward thing to do is to use an unformatted read.

    I wonder if perhaps your failure to open the file with the transparent
    option is messing things up here. I could well imagine... well lots of
    things, but that gets on a side track.

    > Now with the unformatted sequential file "infile" open like the one
    > above, I can read the ticker without problem.


    The above is *NOT* an unformatted read. You have a format in it. That's
    what the "(a10)" is. I suggest actually doing it as an unformatted read.
    And you'll need the OPEN to use transparent mode to match. I would
    assume from prior context that you know how to do that, but just in
    case, it should look more like

    read(12) ticker,n,nq,...

    --
    Richard Maine | Good judgement comes from experience;
    email: last name at domain . net | experience comes from bad judgement.
    domain: summertriangle | -- Mark Twain

+ Reply to Thread
Page 1 of 2 1 2 LastLast

Similar Threads

  1. Problems with unformatted files on mpich + nfs
    By Application Development in forum Fortran
    Replies: 7
    Last Post: 12-17-2007, 03:21 AM
  2. Nonadvancing sequential unformatted i/o?
    By Application Development in forum Fortran
    Replies: 0
    Last Post: 06-13-2007, 06:45 PM
  3. error in reading unformatted fortran files
    By Application Development in forum Idl-pvwave
    Replies: 0
    Last Post: 05-28-2007, 07:57 PM
  4. Sequential Files
    By Application Development in forum basic.visual
    Replies: 1
    Last Post: 10-26-2006, 03:08 AM
  5. Replies: 0
    Last Post: 03-05-2005, 07:32 PM