| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#11
| |||
| |||
| LR wrote: > glen herrmannsfeldt wrote: > > >>Note that null terminated strings aren't part of the C language, > > > This is from a old copy of a C99 draft I have, sorry, no date. > > 5.2.1 Character sets > [#2] In a character constant or string literal, members of > the execution character set shall be represented by > corresponding members of the source character set or by > escape sequences consisting of the backslash \ followed by > one or more characters. A byte with all bits set to 0, > called the null character, shall exist in the basic > execution character set; it is used to terminate a character > string. I think that I would have done it differently. I would have (if I were sticking with this method of delimiting) required a leading character in addition to a trailing character. The leading and trailing character must match and must be a character not present in the literal string itself. This allows you to change the delimiter, sometimes useful for hardware devices that expect null characters for other purposes (e.g. timing, noops). You could also add a function to set the string delimiter I guess, but it might be messy to make that globally available. > > LR -- Gary Scott mailto:garylscott@sbcglobal dot net Fortran Library: http://www.fortranlib.com Support the Original G95 Project: http://www.g95.org -OR- Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html If you want to do the impossible, don't hire an expert because he knows it can't be done. -- Henry Ford |
|
#12
| |||
| |||
| glen herrmannsfeldt wrote: > Gary Scott wrote: > (snip) > >> I think that I would have done it differently. I would have (if I >> were sticking with this method of delimiting) required a leading >> character in addition to a trailing character. The leading and >> trailing character must match and must be a character not present in >> the literal string itself. This allows you to change the delimiter, >> sometimes useful for hardware devices that expect null characters for >> other purposes (e.g. timing, noops). You could also add a function to >> set the string delimiter I guess, but it might be messy to make that >> globally available. > > > That method is used by some systems for delimiting stings as > command input. I believe some DEC editors used it for string > search commands, and maybe some unix editors, too. I don't > know any that use it for strings in storage, though, but it > does seem a good idea, and is only slightly harder to process. > > There are some algorithms that use the constant terminator > to advantage, with pointers to the middle of a string, > running until the next null terminator. One that I > know about is the suffix array algorithm. > > Though I believe my choice is still to store the current > length at the beginning. That would be my preference. I'd probably also include a single byte at the beginning to identify the size/kind of the following integer. That way, you could handle virtually any string length, while being more space efficient for shorter strings (you could use a one-byte length (plus the 1-byte type byte)). You could specify a type (maybe in bits) of a 64 or 128 bit integer for very long strings (128-bit seems unlikely to be necessary, but flexible anyway). This scheme also allows you to make a round trip write/read operation for variable length strings or DT components without foreknowledge of the string length. You could also include a rudamentary checksum for the length integer within the type byte (unless you wanted to be able to specify in bits and use the full range available). > > -- glen > -- Gary Scott mailto:garylscott@sbcglobal dot net Fortran Library: http://www.fortranlib.com Support the Original G95 Project: http://www.g95.org -OR- Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html If you want to do the impossible, don't hire an expert because he knows it can't be done. -- Henry Ford |
|
#13
| |||
| |||
| Gary Scott wrote: (snip) > I think that I would have done it differently. I would have (if I were > sticking with this method of delimiting) required a leading character in > addition to a trailing character. The leading and trailing character > must match and must be a character not present in the literal string > itself. This allows you to change the delimiter, sometimes useful for > hardware devices that expect null characters for other purposes (e.g. > timing, noops). You could also add a function to set the string > delimiter I guess, but it might be messy to make that globally available. That method is used by some systems for delimiting stings as command input. I believe some DEC editors used it for string search commands, and maybe some unix editors, too. I don't know any that use it for strings in storage, though, but it does seem a good idea, and is only slightly harder to process. There are some algorithms that use the constant terminator to advantage, with pointers to the middle of a string, running until the next null terminator. One that I know about is the suffix array algorithm. Though I believe my choice is still to store the current length at the beginning. -- glen |
|
#14
| |||
| |||
| Gary Scott wrote: > glen herrmannsfeldt wrote: >> Though I believe my choice is still to store the current >> length at the beginning. > > That would be my preference. [snip] This scheme also allows you to > make a round trip write/read operation for variable length strings or DT > components without foreknowledge of the string length. You could also > include a rudamentary checksum for the length integer within the type > byte (unless you wanted to be able to specify in bits and use the full > range available). At the risk of being very OT, you might find it interesting to look at the design of C++'s std::string and std::stringstream. I think I've seen some implementations of things similar to std::string in some version of Fortran, with what might be some interoperability with C strings, but sorry, I don't remember which version or where I saw it. LR |
|
#15
| |||
| |||
| On 2008-09-04, glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote: > Gary Scott wrote: > (snip) > >> I've yet to run across any C APIs that do not expect/return null>> termintors, but if I ever do, it won't surprise me as much. > > See strncpy() in the standard C library. The way I see it C strings are supposed to be null terminated. The point of strncpy and similar functins added in C99 is to be less prone to buffer overflows. strncpy(), for example, will stop copying when hitting null, just like the original strcpy(), the extra size argument is just a safety precaution in case the null hasn't been encountered within a specified limit. And yes, the fact that strncpy() does not null terminate the target string in case it hits the limit is a design mistake, not a feature. See strlcpy() for a as of yet not standardized function that corrects that particular error. Most other languages (Fortran included) have string types that are easier to use and less error prone than the frankly retarded way in which C does it. -- JayBee |
|
#16
| |||
| |||
| Gary Scott wrote: > glen herrmannsfeldt wrote: > >> Gary Scott wrote: >> (snip) >> >>> I think that I would have done it differently. I would have (if I >>> were sticking with this method of delimiting) required a leading >>> character in addition to a trailing character. The leading and >>> trailing character must match and must be a character not present in >>> the literal string itself. This allows you to change the delimiter, >>> sometimes useful for hardware devices that expect null characters for >>> other purposes (e.g. timing, noops). You could also add a function >>> to set the string delimiter I guess, but it might be messy to make >>> that globally available. >> >> >> >> That method is used by some systems for delimiting stings as >> command input. I believe some DEC editors used it for string >> search commands, and maybe some unix editors, too. I don't >> know any that use it for strings in storage, though, but it >> does seem a good idea, and is only slightly harder to process. >> >> There are some algorithms that use the constant terminator >> to advantage, with pointers to the middle of a string, >> running until the next null terminator. One that I >> know about is the suffix array algorithm. >> >> Though I believe my choice is still to store the current >> length at the beginning. > > > That would be my preference. I'd probably also include a single byte at > the beginning to identify the size/kind of the following integer. That > way, you could handle virtually any string length, while being more > space efficient for shorter strings (you could use a one-byte length > (plus the 1-byte type byte)). You could specify a type (maybe in bits) > of a 64 or 128 bit integer for very long strings (128-bit seems unlikely > to be necessary, but flexible anyway). This scheme also allows you to > make a round trip write/read operation for variable length strings or DT > components without foreknowledge of the string length. You could also > include a rudamentary checksum for the length integer within the type > byte (unless you wanted to be able to specify in bits and use the full > range available). Hmm, we could get even fancier: Include a definition of the number of bits in a "byte", oh the possibilities are endless. > >> >> -- glen >> > > -- Gary Scott mailto:garylscott@sbcglobal dot net Fortran Library: http://www.fortranlib.com Support the Original G95 Project: http://www.g95.org -OR- Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html If you want to do the impossible, don't hire an expert because he knows it can't be done. -- Henry Ford |
|
#17
| |||
| |||
| On Wed, 03 Sep 2008 18:20:12 -0500, Gary Scott <garylscott@sbcglobal.net> wrote: > glen herrmannsfeldt wrote: > > Gary Scott wrote: > > (snip) > > > >> No, I only use F95 compilers at present (CVF, LF95, Absoft F90) that > >> interoperate with normal character buffers of whatever length as long > >> as you insert a null character at the end of the trimmed length. Are > >> you saying that I must use an array to designate a string? Boy that > >> would be inconvenient. > > > > > > Note that null terminated strings aren't part of the C language, (literals for them are, as already corrected) > > but are used by some C library routines. There are many library > > routines that will work just fine with strings without null > > terminators. As in a previous post on sorting, there is > > strncmp which will compare strings up to a specified length > > (or to a null character if that comes first.) There is > > also memcmp() to compare strings containing null characters. > All mem* treat nulls as ordinary data; strn* don't _require_ null terminators but do honor them on input (and strncat always produces them on output, but as noted strncpy doesn't get that one right). > I've yet to run across any C APIs that do not expect/return null> termintors, but if I ever do, it won't surprise me as much. > In standard C, in addition to mem*, fread/fwrite (when applied to strings; they can also apply to other data). mbstowcs/reverse similarly don't require but do honor terminators, and in theory so do mbtowc/reverse but the way that latter are used they will rarely and perhaps never actually encounter terminators. In Unix/POSIX, which historically is closely associated with C although not strictly required, all the I/O: read/write, send/recv, msgput/get, etc.; again this could be applied to string or other data. Win32 I/O them same, although it's arguable if it's really a 'C API'. OTOH both of those treat filenames (including programnames) as null terminated; Tandem^WCompaq^WHP NonStop Guardian (mode) handles even filenames as pointer-and-length (or pointer&maxlen&actlen for output). - formerly david.thompson1 || achar(64) || worldnet.att.net |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.