| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| I was re-reading my huge Fortran 77/90/95 language specification manual and noted that the open file option ACCESS='BINARY' was accepted as an extension only for Windows up to NT, but not other platforms. I can solve some potential problems of upgrading Fortran programs to a more portable standard, by using DIRECT UNFORMATTED access, but I am left with the cases where programs read somewhat unknown input data files of words of bits, and proceed to determine the coding structure used (e.g. IBM 12 bit binary, IBM 16-bit binary, Qantum 12 bit card code, common 10,12,16 and 36 character ascii code and so on) before reopening the file in a more suitable way for reading that particular structure. To data I have used the BINARY option and read chunks of data to search for clues (e.g. searching for CR, LF, CR-LF, LF-CR, and DEL characters and the character interval counts between each, and the presence or absence of the top one or two bits in each character ad whether any hex zero bytes occur). What is the simple definition of of the expected structure of files declared as UNFORMATTED SEQUENTIAL? I had always thought these are expected to contain non-data markers. This seems to be the only chane of achieving portability since FORMATTED is obviously out and direct access only applies to fixed- length records, other than treating the file as a blocks of unknown data and working out what happened on the last "record"; which I will use if no other idea presents itself. |
|
#2
| |||
| |||
| Terence <tbwright@cantv.net> wrote: > I was re-reading my huge Fortran 77/90/95 language specification > manual and noted that the open file option ACCESS='BINARY' was > accepted as an extension only for Windows up to NT, but not other > platforms. Presumably you are talking about some specific, but unmentioned compiler, as the actual Fortran language specifications certainly don't say anything about Windows versions. Anyway... > What is the simple definition of of the expected structure of files > declared as UNFORMATTED SEQUENTIAL? I had always thought these are > expected to contain non-data markers. The standard doesn't say. But containing no non-data markers is not a realistic expectation (except for file systems where the record structure is maintained out-of-band, but those aren't common these days). The most common structure involves adding record-size fields before (and usually after) the data of the record. The details *DO* vary. This is essentially not an option for reading non-Fortran files; it is even a bit problematic for reading Fortran files created by other compilers. This is an issue I've been very much working with for... well... about 2 and a half decades now, I guess. It is the large part of why I wrote the initial proposal for access='stream' in f2003. The problem you are looking at is exactly why access='stream' was added in f2003. You can think of access='stream' as the standardized, and slightly cleaned up version of the nonstandard 'binary' thing you were using. Your main options are 1. The access='stream', which is supported my many current compilers, although not yet all. 2. Continue using the nonstandard binary options. Pretty much all compilers today support at least some variant of them. I have no idea what manual this is you are looking at or exactly what it actually says, but the options are widely available. Sometimes the spellings do vary slightly, which is part of why standardization was needed. 3. Interface to C and do it there. That involves some portability issues as well, but it is diable. It wouldn't be my first choice, but I know some people have done it that way. 4. Use direct access unformatted. There are lots of complications, but it is doable. I know, as I've done it. You have to do your own record management in order to give Fortran the fixed-size chuncks of data that it wants. It would take quite a while to elaborate on all the various gotchas. (Yes, the last block is one of them). The fact that it can get pretty messy is why I pushed for access='stream', which makes it all so much easier. -- Richard Maine | Good judgement comes from experience; email: last name at domain . net | experience comes from bad judgement. domain: summertriangle | -- Mark Twain |
|
#3
| |||
| |||
| On 22 aug, 08:15, nos...@see.signature (Richard Maine) wrote: > > 4. Use direct access unformatted. There are lots of complications, but > it is doable. I know, as I've done it. You have to do your own record > management in order to give Fortran the fixed-size chuncks of data that > it wants. It would take quite a while to elaborate on all the various > gotchas. (Yes, the last block is one of them). The fact that it can get > pretty messy is why I pushed for access='stream', which makes it all so > much easier. > Have a look at http;//flibs.sf.net, for that particular approach. It is - unfortunately - not quite without problems, as Richard also indicates, but it could be useful nonetheless. Regards, Arjen |
|
#4
| |||
| |||
| On Aug 21, 10:10 pm, Terence <tbwri...@cantv.net> wrote: > I was re-reading my huge Fortran 77/90/95 language specification > manual and noted that the open file option ACCESS='BINARY' was > accepted as an extension only for Windows up to NT, but not other > platforms. > > I can solve some potential problems of upgrading Fortran programs to a > more portable standard, by using DIRECT UNFORMATTED access, but I am > left with the cases where programs read somewhat unknown input data > files of words of bits, and proceed to determine the coding structure > used (e.g. IBM 12 bit binary, IBM 16-bit binary, Qantum 12 bit card > code, common 10,12,16 and 36 character ascii code and so on) before > reopening the file in a more suitable way for reading that particular > structure. > > To data I have used the BINARY option and read chunks of data to > search for clues (e.g. searching for CR, LF, CR-LF, LF-CR, and DEL > characters and the character interval counts between each, and the > presence or absence of the top one or two bits in each character ad > whether any hex zero bytes occur). > > What is the simple definition of of the expected structure of files > declared as UNFORMATTED SEQUENTIAL? I had always thought these are > expected to contain non-data markers. The file formats used by different Fortran implementations all over the map. There is no reason to expect direct-access unformatted files to be more portable than sequential-access unformatted files, even among implementations on similar operating systems. I know of a Fortran implementation that allocates an extra byte at the start each direct-access record. The byte is used, among other things, to indicate whether the record has been written. I know of another Fortran implementation that builds its support for direct-access unformatted I/O on top of an indexed files system. It is nice in that it is possible to write records whose record numbers are huge without using much space. On the other hand, I/O performance tends to be lower than with the more common implementations. The reason unformatted records are called unformatted records is that at one time they were. Early computer systems tended to use record-based I/O hardware. When all I/O is physically record-based, the record format is simply assumed. When reading or punching cards, there is no need to guess what the record structure might be. When reading or writing unblocked open reel tapes, the record structure is indicated by the physical inter-record gaps. No data needs to be provided to indicate the record structure. Bob Corbett |
|
#5
| |||
| |||
| <robert.corbett@sun.com> wrote: > There is no reason to expect direct-access > unformatted files to be more portable than sequential-access > unformatted files, even among implementations on similar > operating systems. While there may be no "reason", my observation shows it to be so. By "so" I am taking your "more portable" literally in that "more portable" is not the same thing as "100% portable." Yes, I also know of exceptions. But they are distinctly exceptions. You can work with an awfully lot of compilers and never run into one of the exceptions. Some, though not all, of the exceptions can be addressed by compiler switches such as Lahey's /nohed. That's unlike direct access sequential, where you find differences every time you turn around. Since I have actually used direct access unformatted for this myself on a large variety of systems, I'm going to be pretty adamant about claming that it can be done. I acknowledge that the portability is not 100%. I recall telling some people for whom it might have been relevant at the time that if they were going to try to port the code in question to a 60-bit CDC machine, they were just going to be out of luck and I wasn't going to be willing to try to support that. I still observe that its portability is significantly better than that of sequential unformatted in practice. -- Richard Maine | Good judgement comes from experience; email: last name at domain . net | experience comes from bad judgement. domain: summertriangle | -- Mark Twain |
|
#6
| |||
| |||
| robert.corbett@sun.com wrote: (snip) > The reason unformatted records are called unformatted records > is that at one time they were. Early computer systems tended > to use record-based I/O hardware. When all I/O is physically > record-based, the record format is simply assumed. When > reading or punching cards, there is no need to guess what the > record structure might be. When reading or writing unblocked > open reel tapes, the record structure is indicated by the > physical inter-record gaps. No data needs to be provided to > indicate the record structure. This still done for many tape systems. For disks, most now use a fixed hardware block size and buffer it in memory to give the impression of a uniform byte stream to the user. IBM mainframes use a record oriented I/O system for disks. For direct access files, each record maps to a physical disk block allocated to the appropriate size. (It may be remapped inside the disk/controller hardware, but the record structure is visible to the OS.) -- glen |
|
#7
| |||
| |||
| Yes, thanks to all responding. I should have said I worked on early Fortran (post Fortran II) for IBM in 1960 on, and know what was then "the standard" with respect to the two FORM formats and two ACCESS methods, right through 370 days. However, I have found weird changes to what I thought was sacrosanct, when working (for my next company for 28 years) all over the world on mainframes and process control and message switching computers, and finding count bytes stuck into UNFORMATTED SEQUENTIAL input files. .. To respond to doubts as to what compiler I am referring to, I use CVF/ DVF 6.6c for Windows work, and MS F77 v3.31 for all DOS work; both of which accept "BINARY" as a sequentail access option. And Lahey accepts "Transparent" for the same usefull purpose. Given the above comments, I am now certain I will stick with DIRECT UNFORMATTED and process the data myself (as usual for variable length reords). I do wish the standard default for RECL on this mode was still bytes (and not 4-byte words) and not a compiler option. After all the dafault (if not specified) in SEQUENTIAL, is bytes, not 4-byte words - just the opposite! I think I leaned how to deal with the last, posiibly incomplete, record for all possibilities (and there are quite a few). |
|
#8
| |||
| |||
| Terence wrote: > I do wish the standard default for RECL on this mode was still bytes > (and not 4-byte words) and not a compiler option. After all the > dafault (if not specified) in SEQUENTIAL, is bytes, not 4-byte words - > just the opposite! Since you're using CVF, the default for UNFORMATTED access is 4-byte units of RECL=. This has been the mode of DEC compilers for more than 30 years and comes from the F77 standard's use of the term "storage units" being interpreted as "numerical storage units" - that is, the size of an INTEGER or REAL. In Fortran 2003, the standard still allows this but recommends the use of bytes (I forget the exact wording). -- Steve Lionel Developer Products Division Intel Corporation Nashua, NH For email address, replace "invalid" with "com" User communities for Intel Software Development Products http://softwareforums.intel.com/ Intel Fortran Support http://support.intel.com/support/per...etools/fortran My Fortran blog http://www.intel.com/software/drfortran |
|
#9
| |||
| |||
| Steve Lionel wrote: > Terence wrote: > > > I do wish the standard default for RECL on this mode was still bytes > > (and not 4-byte words) and not a compiler option. After all the > > dafault (if not specified) in SEQUENTIAL, is bytes, not 4-byte words - > > just the opposite! > > Since you're using CVF, the default for UNFORMATTED access is 4-byte > units of RECL=. This has been the mode of DEC compilers for more than > 30 years and comes from the F77 standard's use of the term "storage > units" being interpreted as "numerical storage units" - that is, the > size of an INTEGER or REAL. > > In Fortran 2003, the standard still allows this but recommends the use > of bytes (I forget the exact wording). Sensible! But the default is the opposite of IBM.s 704/7044 etc and Microsoft's AT default of one byte counts - which causes puzzlement to a new non-DEC user of the CVF compiler. IBM MAY have had a different default for the mainframe in the later /370 days because of disk drives, but I only used these for the ULA BMD Fortran statistical packages And the above words almost duplicate Steve's original help to me with precisely this problem when I installed and started to use the CVF 6.6 compiler "back when" this came out. A demonstration of why I would NEVER imply his refusal to help, Richard!. (You have to set an override switch which is well hidden in Visual Studio). |
|
#10
| |||
| |||
| On Aug 23, 3:22 pm, Steve Lionel <steve.lio...@intel.invalid> wrote: > Terence wrote: > > I do wish the standard default for RECL on this mode was still bytes > > (and not 4-byte words) and not a compiler option. After all the > > dafault (if not specified) in SEQUENTIAL, is bytes, not 4-byte words - > > just the opposite! > > Since you're using CVF, the default for UNFORMATTED access is 4-byte > units of RECL=. This has been the mode of DEC compilers for more than > 30 years and comes from the F77 standard's use of the term "storage > units" being interpreted as "numerical storage units" - that is, the > size of an INTEGER or REAL. > > In Fortran 2003, the standard still allows this but recommends the use > of bytes (I forget the exact wording). > F2003 also has ISO_FORTRAN_ENV to supply the actual file storage unit size in bits. Hopefully, this will be commonly available in the near future. P.S. My ISP recent cut off usenet access with little warning, and without very good reason (they could just exclude binary groups.) If other people find themselves without usenet access, you can get c.l.f and other via proxy web sites like "Google Groups". |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.