C Interoperability - Strings

This is a discussion on C Interoperability - Strings within the Fortran forums in Programming Languages category; LR wrote: > glen herrmannsfeldt wrote: > > >>Note that null terminated strings aren't part of the C language, > > > This is from a old copy of a C99 draft I have, sorry, no date. > > 5.2.1 Character sets > [#2] In a character constant or string literal, members of > the execution character set shall be represented by > corresponding members of the source character set or by > escape sequences consisting of the backslash \ followed by > one or more characters. A byte with all bits set to 0, > called the null character, ...

Go Back   Application Development Forum > Programming Languages > Fortran

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #11  
Old 09-03-2008, 09:06 PM
Gary Scott
Guest
 
Default Re: C Interoperability - Strings

LR wrote:

> glen herrmannsfeldt wrote:
>
>
>>Note that null terminated strings aren't part of the C language,

>
>
> This is from a old copy of a C99 draft I have, sorry, no date.
>
> 5.2.1 Character sets
> [#2] In a character constant or string literal, members of
> the execution character set shall be represented by
> corresponding members of the source character set or by
> escape sequences consisting of the backslash \ followed by
> one or more characters. A byte with all bits set to 0,
> called the null character, shall exist in the basic
> execution character set; it is used to terminate a character
> string.


I think that I would have done it differently. I would have (if I were
sticking with this method of delimiting) required a leading character in
addition to a trailing character. The leading and trailing character
must match and must be a character not present in the literal string
itself. This allows you to change the delimiter, sometimes useful for
hardware devices that expect null characters for other purposes (e.g.
timing, noops). You could also add a function to set the string
delimiter I guess, but it might be messy to make that globally available.

>
> LR



--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford
Reply With Quote
  #12  
Old 09-03-2008, 09:42 PM
Gary Scott
Guest
 
Default Re: C Interoperability - Strings

glen herrmannsfeldt wrote:
> Gary Scott wrote:
> (snip)
>
>> I think that I would have done it differently. I would have (if I
>> were sticking with this method of delimiting) required a leading
>> character in addition to a trailing character. The leading and
>> trailing character must match and must be a character not present in
>> the literal string itself. This allows you to change the delimiter,
>> sometimes useful for hardware devices that expect null characters for
>> other purposes (e.g. timing, noops). You could also add a function to
>> set the string delimiter I guess, but it might be messy to make that
>> globally available.

>
>
> That method is used by some systems for delimiting stings as
> command input. I believe some DEC editors used it for string
> search commands, and maybe some unix editors, too. I don't
> know any that use it for strings in storage, though, but it
> does seem a good idea, and is only slightly harder to process.
>
> There are some algorithms that use the constant terminator
> to advantage, with pointers to the middle of a string,
> running until the next null terminator. One that I
> know about is the suffix array algorithm.
>
> Though I believe my choice is still to store the current
> length at the beginning.


That would be my preference. I'd probably also include a single byte at
the beginning to identify the size/kind of the following integer. That
way, you could handle virtually any string length, while being more
space efficient for shorter strings (you could use a one-byte length
(plus the 1-byte type byte)). You could specify a type (maybe in bits)
of a 64 or 128 bit integer for very long strings (128-bit seems unlikely
to be necessary, but flexible anyway). This scheme also allows you to
make a round trip write/read operation for variable length strings or DT
components without foreknowledge of the string length. You could also
include a rudamentary checksum for the length integer within the type
byte (unless you wanted to be able to specify in bits and use the full
range available).

>
> -- glen
>



--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford
Reply With Quote
  #13  
Old 09-03-2008, 10:16 PM
glen herrmannsfeldt
Guest
 
Default Re: C Interoperability - Strings

Gary Scott wrote:
(snip)

> I think that I would have done it differently. I would have (if I were
> sticking with this method of delimiting) required a leading character in
> addition to a trailing character. The leading and trailing character
> must match and must be a character not present in the literal string
> itself. This allows you to change the delimiter, sometimes useful for
> hardware devices that expect null characters for other purposes (e.g.
> timing, noops). You could also add a function to set the string
> delimiter I guess, but it might be messy to make that globally available.


That method is used by some systems for delimiting stings as
command input. I believe some DEC editors used it for string
search commands, and maybe some unix editors, too. I don't
know any that use it for strings in storage, though, but it
does seem a good idea, and is only slightly harder to process.

There are some algorithms that use the constant terminator
to advantage, with pointers to the middle of a string,
running until the next null terminator. One that I
know about is the suffix array algorithm.

Though I believe my choice is still to store the current
length at the beginning.

-- glen

Reply With Quote
  #14  
Old 09-04-2008, 12:17 AM
LR
Guest
 
Default Re: C Interoperability - Strings

Gary Scott wrote:
> glen herrmannsfeldt wrote:


>> Though I believe my choice is still to store the current
>> length at the beginning.

>
> That would be my preference. [snip] This scheme also allows you to
> make a round trip write/read operation for variable length strings or DT
> components without foreknowledge of the string length. You could also
> include a rudamentary checksum for the length integer within the type
> byte (unless you wanted to be able to specify in bits and use the full
> range available).



At the risk of being very OT, you might find it interesting to look at
the design of C++'s std::string and std::stringstream.

I think I've seen some implementations of things similar to std::string
in some version of Fortran, with what might be some interoperability
with C strings, but sorry, I don't remember which version or where I saw it.

LR
Reply With Quote
  #15  
Old 09-04-2008, 12:50 AM
JayBee
Guest
 
Default Re: C Interoperability - Strings

On 2008-09-04, glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
> Gary Scott wrote:
> (snip)
>
>> I've yet to run across any C APIs that do not expect/return null
>> termintors, but if I ever do, it won't surprise me as much.

>
> See strncpy() in the standard C library.


The way I see it C strings are supposed to be null terminated. The point
of strncpy and similar functins added in C99 is to be less prone to
buffer overflows. strncpy(), for example, will stop copying when hitting
null, just like the original strcpy(), the extra size argument is just a
safety precaution in case the null hasn't been encountered within a
specified limit. And yes, the fact that strncpy() does not null
terminate the target string in case it hits the limit is a design
mistake, not a feature. See strlcpy() for a as of yet not standardized
function that corrects that particular error.

Most other languages (Fortran included) have string types that are
easier to use and less error prone than the frankly retarded way in
which C does it.

--
JayBee
Reply With Quote
  #16  
Old 09-04-2008, 08:12 AM
Gary Scott
Guest
 
Default Re: C Interoperability - Strings

Gary Scott wrote:
> glen herrmannsfeldt wrote:
>
>> Gary Scott wrote:
>> (snip)
>>
>>> I think that I would have done it differently. I would have (if I
>>> were sticking with this method of delimiting) required a leading
>>> character in addition to a trailing character. The leading and
>>> trailing character must match and must be a character not present in
>>> the literal string itself. This allows you to change the delimiter,
>>> sometimes useful for hardware devices that expect null characters for
>>> other purposes (e.g. timing, noops). You could also add a function
>>> to set the string delimiter I guess, but it might be messy to make
>>> that globally available.

>>
>>
>>
>> That method is used by some systems for delimiting stings as
>> command input. I believe some DEC editors used it for string
>> search commands, and maybe some unix editors, too. I don't
>> know any that use it for strings in storage, though, but it
>> does seem a good idea, and is only slightly harder to process.
>>
>> There are some algorithms that use the constant terminator
>> to advantage, with pointers to the middle of a string,
>> running until the next null terminator. One that I
>> know about is the suffix array algorithm.
>>
>> Though I believe my choice is still to store the current
>> length at the beginning.

>
>
> That would be my preference. I'd probably also include a single byte at
> the beginning to identify the size/kind of the following integer. That
> way, you could handle virtually any string length, while being more
> space efficient for shorter strings (you could use a one-byte length
> (plus the 1-byte type byte)). You could specify a type (maybe in bits)
> of a 64 or 128 bit integer for very long strings (128-bit seems unlikely
> to be necessary, but flexible anyway). This scheme also allows you to
> make a round trip write/read operation for variable length strings or DT
> components without foreknowledge of the string length. You could also
> include a rudamentary checksum for the length integer within the type
> byte (unless you wanted to be able to specify in bits and use the full
> range available).


Hmm, we could get even fancier: Include a definition of the number of
bits in a "byte", oh the possibilities are endless.

>
>>
>> -- glen
>>

>
>



--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford
Reply With Quote
  #17  
Old 09-15-2008, 02:44 AM
David Thompson
Guest
 
Default Re: C Interoperability - Strings

On Wed, 03 Sep 2008 18:20:12 -0500, Gary Scott
<garylscott@sbcglobal.net> wrote:

> glen herrmannsfeldt wrote:
> > Gary Scott wrote:
> > (snip)
> >
> >> No, I only use F95 compilers at present (CVF, LF95, Absoft F90) that
> >> interoperate with normal character buffers of whatever length as long
> >> as you insert a null character at the end of the trimmed length. Are
> >> you saying that I must use an array to designate a string? Boy that
> >> would be inconvenient.

> >
> >
> > Note that null terminated strings aren't part of the C language,


(literals for them are, as already corrected)

> > but are used by some C library routines. There are many library
> > routines that will work just fine with strings without null
> > terminators. As in a previous post on sorting, there is
> > strncmp which will compare strings up to a specified length
> > (or to a null character if that comes first.) There is
> > also memcmp() to compare strings containing null characters.

>

All mem* treat nulls as ordinary data; strn* don't _require_ null
terminators but do honor them on input (and strncat always produces
them on output, but as noted strncpy doesn't get that one right).

> I've yet to run across any C APIs that do not expect/return null
> termintors, but if I ever do, it won't surprise me as much.
>

In standard C, in addition to mem*, fread/fwrite (when applied to
strings; they can also apply to other data).

mbstowcs/reverse similarly don't require but do honor terminators, and
in theory so do mbtowc/reverse but the way that latter are used they
will rarely and perhaps never actually encounter terminators.

In Unix/POSIX, which historically is closely associated with C
although not strictly required, all the I/O: read/write, send/recv,
msgput/get, etc.; again this could be applied to string or other data.
Win32 I/O them same, although it's arguable if it's really a 'C API'.
OTOH both of those treat filenames (including programnames) as null
terminated; Tandem^WCompaq^WHP NonStop Guardian (mode) handles even
filenames as pointer-and-length (or pointer&maxlen&actlen for output).

- formerly david.thompson1 || achar(64) || worldnet.att.net
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 03:03 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.