Reading long lines from a file - C

This is a discussion on Reading long lines from a file - C ; Hello, I suspect this comes up quite often, but I haven't found an exact solution in the FAQ. I have to read and parse a file with arbitrarily long lines and have come up with the following plan: 1. start ...

+ Reply to Thread
Results 1 to 8 of 8

Reading long lines from a file

  1. Default Reading long lines from a file

    Hello,

    I suspect this comes up quite often, but I haven't found an exact
    solution in the FAQ. I have to read and parse a file with arbitrarily
    long lines and have come up with the following plan:

    1. start with a statically allocated buffer and a pointer of equal size
    2. read into the buffer using fgets and append to the pointer
    3. if buffer does not contain '\n', reallocate buffer and jump to 2
    4. return the pointer

    Do you see anything wrong with this? If so, how can I improve it?

    Thanks in advance,
    Vlad Dogaru

    --
    Number one reason to date an engineer:
    The world does revolve around us; we pick the coordinate system.

  2. Default Re: Reading long lines from a file

    Vlad Dogaru said:

    > Hello,
    >
    > I suspect this comes up quite often, but I haven't found an exact
    > solution in the FAQ. I have to read and parse a file with arbitrarily
    > long lines and have come up with the following plan:
    >
    > 1. start with a statically allocated buffer and a pointer of equal
    > size 2. read into the buffer using fgets and append to the pointer
    > 3. if buffer does not contain '\n', reallocate buffer and jump to 2
    > 4. return the pointer
    >
    > Do you see anything wrong with this? If so, how can I improve it?


    To start with, you can't reallocate a statically allocated buffer! Nor
    can you have a pointer of equal size to a buffer except by sizing the
    buffer to be the same size as a pointer. Nor can you append to a
    pointer.

    Once we get those impossibilities out of the way, we can dispense with
    the unnecessary fgets call - your input is already buffered, so why
    buffer it again through fgets?

    Here's the plan:

    Allocate C (greater than 1) bytes of storage space DYNAMICALLY - point
    at this allocation with P. Set U to 0. Have a temporary pointer T
    kicking about the place.

    While you can read a character successfully that isn't a newline:
    If U == C - 1
    You're about to run out of space, so get some more
    T = realloc(P, C * 2)
    If that didn't work, you might want to try lower multipliers
    (1.5, 1.25 maybe) or even use add instead of multiply - and
    warn the caller that you're running low on RAM.
    Eventually, either you give up (in which case tell the user
    you failed), or you succeed, in which case set P = T
    Increase C to describe the new allocation amount accurately
    Endif

    If all is well
    P[U++] = the character you read
    Endif
    Endwhile
    If all is well
    P[U] = '\0'
    End if
    P now contains the line.

    For a discussion of long-line issues, an implementation of a full line
    capture function, and links to other such implementations, see
    http://www.cpax.org.uk/prg/writings/fgetdata.php

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999

  3. Default Re: Reading long lines from a file

    Vlad Dogaru wrote:
    >
    > Hello,
    >
    > I suspect this comes up quite often, but I haven't found an exact
    > solution in the FAQ. I have to read and parse a file with arbitrarily
    > long lines and have come up with the following plan:
    >
    > 1. start with a statically allocated buffer and a pointer of equal size
    > 2. read into the buffer using fgets and append to the pointer
    > 3. if buffer does not contain '\n', reallocate buffer and jump to 2
    > 4. return the pointer
    >
    > Do you see anything wrong with this?


    Possibly with the phrase "statically allocated".
    There's three kinds of duration:
    1 automatic
    2 static
    3 allocated

    Only allocated memory can be reallocated.

    > If so, how can I improve it?


    A few of the regulars here
    have written their own getline functions:
    http://www.cpax.org.uk/prg/writings/...ta.php#related

    --
    pete

  4. Default Re: Reading long lines from a file

    Richard Heathfield wrote:
    > Vlad Dogaru said:
    >
    >> Hello,
    >>
    >> I suspect this comes up quite often, but I haven't found an exact
    >> solution in the FAQ. I have to read and parse a file with arbitrarily
    >> long lines and have come up with the following plan:
    >>
    >> 1. start with a statically allocated buffer and a pointer of equal
    >> size 2. read into the buffer using fgets and append to the pointer
    >> 3. if buffer does not contain '\n', reallocate buffer and jump to 2
    >> 4. return the pointer
    >>
    >> Do you see anything wrong with this? If so, how can I improve it?

    >
    > To start with, you can't reallocate a statically allocated buffer! Nor
    > can you have a pointer of equal size to a buffer except by sizing the
    > buffer to be the same size as a pointer. Nor can you append to a
    > pointer.
    >
    > Once we get those impossibilities out of the way, we can dispense with
    > the unnecessary fgets call - your input is already buffered, so why
    > buffer it again through fgets?



    If anything, my lack of English skills has contributed to the
    misunderstanding. I was talking about:
    char b[100], *p;
    Reading into b with fgets, then reallocating p as necessary to do a
    strcat(p, b).

    But your solution is much more elegant and now I see why fgets is
    unnecessary.

    >
    > Here's the plan:
    >
    > Allocate C (greater than 1) bytes of storage space DYNAMICALLY - point
    > at this allocation with P. Set U to 0. Have a temporary pointer T
    > kicking about the place.
    >
    > While you can read a character successfully that isn't a newline:
    > If U == C - 1
    > You're about to run out of space, so get some more
    > T = realloc(P, C * 2)
    > If that didn't work, you might want to try lower multipliers
    > (1.5, 1.25 maybe) or even use add instead of multiply - and
    > warn the caller that you're running low on RAM.
    > Eventually, either you give up (in which case tell the user
    > you failed), or you succeed, in which case set P = T
    > Increase C to describe the new allocation amount accurately
    > Endif
    >
    > If all is well
    > P[U++] = the character you read
    > Endif
    > Endwhile
    > If all is well
    > P[U] = '\0'
    > End if
    > P now contains the line.
    >
    > For a discussion of long-line issues, an implementation of a full line
    > capture function, and links to other such implementations, see
    > http://www.cpax.org.uk/prg/writings/fgetdata.php


    Thank you for the clarification and the link. I will look into it and I
    am confident that I can write a similar function.

    Vlad
    --
    Number one reason to date an engineer:
    The world does revolve around us; we pick the coordinate system.

  5. Default Re: Reading long lines from a file

    Vlad Dogaru wrote:
    > Hello,
    >
    > I suspect this comes up quite often, but I haven't found an exact
    > solution in the FAQ. I have to read and parse a file with arbitrarily
    > long lines and have come up with the following plan:
    >
    > 1. start with a statically allocated buffer and a pointer of equal size
    > 2. read into the buffer using fgets and append to the pointer
    > 3. if buffer does not contain '\n', reallocate buffer and jump to 2
    > 4. return the pointer
    >
    > Do you see anything wrong with this? If so, how can I improve it?


    This may not apply to your particular case, but in some instances I have
    encountered with "arbitrarily long lines" one can just read a character
    at a time, examine it, perform some action, and then continue. This
    removes the need for a huge buffer, which in the worst case, might not
    even fit into the computer's memory. Obviously this won't work if any
    modification to the front of the line depends on a value near the end of
    the line.

    If you do go with the expanding buffer method be sure you that you do
    NOT use strcat() to append each new chunk of text. Doing so will result
    in each such addition scanning from the front of the buffer for the
    terminal '\0' in the string. I've seen this bug many, many times.
    It can cause a huge performance hit. Instead, keep track of the
    length of the string in the buffer and just copy the new string directly
    to the appropriate position, then adjust the length variable, and repeat.

    Regards,

    David Mathog


  6. Default Re: Reading long lines from a file

    Vlad Dogaru wrote, On 14/08/07 11:46:
    > Richard Heathfield wrote:


    <snip>

    >> To start with, you can't reallocate a statically allocated buffer! Nor
    >> can you have a pointer of equal size to a buffer except by sizing the
    >> buffer to be the same size as a pointer. Nor can you append to a pointer.
    >>
    >> Once we get those impossibilities out of the way, we can dispense with
    >> the unnecessary fgets call - your input is already buffered, so why
    >> buffer it again through fgets?

    >
    > If anything, my lack of English skills has contributed to the
    > misunderstanding. I was talking about:
    > char b[100], *p;
    > Reading into b with fgets, then reallocating p as necessary to do a
    > strcat(p, b).


    Since we do not know what p points to we cannot say whether you are
    allowed to realloc what it points to or not. You can only pass pointers
    returned by malloc or realloc to realloc.

    Also be ware of denial-of-service attacks where a user deliberately
    creates a file with a line 5GB long.

    <snip>
    --
    Flash Gordon

  7. Default Re: Reading long lines from a file

    On 2007-08-14 17:43, Flash Gordon <spam@flash-gordon.me.uk> wrote:
    > Vlad Dogaru wrote, On 14/08/07 11:46:
    >> Richard Heathfield wrote:
    >>> To start with, you can't reallocate a statically allocated buffer! Nor
    >>> can you have a pointer of equal size to a buffer except by sizing the
    >>> buffer to be the same size as a pointer. Nor can you append to a pointer.

    [...]
    >> If anything, my lack of English skills has contributed to the
    >> misunderstanding. I was talking about:
    >> char b[100], *p;
    >> Reading into b with fgets, then reallocating p as necessary to do a
    >> strcat(p, b).

    >
    > Since we do not know what p points to we cannot say whether you are
    > allowed to realloc what it points to or not.


    We cannot *know*, but I think it is reasonable to assume from the
    description to assume that he uses malloc to get the initial value for
    p. You don't always have to assume the stupidest possible version if
    something isn't specified exactly ;-).

    > Also be ware of denial-of-service attacks where a user deliberately
    > creates a file with a line 5GB long.


    ACK. But that's probably not something which should be hard-coded into
    the application. After all, the program might run on a machine with 64
    GB RAM where 5 GB of memory usage is quite acceptable. You could use a
    configurable limit or rely on OS features to limit memory consumption
    (e.g. ulimit on unixoid systems).

    hp

    --
    _ | Peter J. Holzer | I know I'd be respectful of a pirate
    |_|_) | Sysadmin WSR | with an emu on his shoulder.
    | | | hjp@hjp.at |
    __/ | http://www.hjp.at/ | -- Sam in "Freefall"

  8. Default Re: Reading long lines from a file

    On Aug 20, 1:57 pm, "Peter J. Holzer" <hjp-usen...@hjp.at> wrote:
    > On 2007-08-14 17:43, Flash Gordon <s...@flash-gordon.me.uk> wrote:
    >
    > > Vlad Dogaru wrote, On 14/08/07 11:46:
    > >> Richard Heathfield wrote:
    > >>> To start with, you can't reallocate a statically allocated buffer! Nor
    > >>> can you have a pointer of equal size to a buffer except by sizing the
    > >>> buffer to be the same size as a pointer. Nor can you append to a pointer.

    > [...]
    > >> If anything, my lack of English skills has contributed to the
    > >> misunderstanding. I was talking about:
    > >> char b[100], *p;
    > >> Reading into b with fgets, then reallocating p as necessary to do a
    > >> strcat(p, b).

    >
    > > Since we do not know what p points to we cannot say whether you are
    > > allowed to realloc what it points to or not.

    >
    > We cannot *know*, but I think it is reasonable to assume from the
    > description to assume that he uses malloc to get the initial value for
    > p. You don't always have to assume the stupidest possible version if
    > something isn't specified exactly ;-).


    Reading Flash Gordon's post I don't see him assuming anything.
    He was simply aiming to cover all possibilities and I'm all for
    that ; we do aim to be accurate around here.


+ Reply to Thread

Similar Threads

  1. Reading a file containing blank lines
    By Application Development in forum Fortran
    Replies: 10
    Last Post: 10-10-2007, 10:06 AM
  2. skip first N lines when reading file
    By Application Development in forum Perl
    Replies: 7
    Last Post: 06-28-2007, 04:08 AM
  3. Vim really hates long lines. Anything I can do?
    By Application Development in forum Editors
    Replies: 15
    Last Post: 05-10-2007, 03:34 PM
  4. Replies: 3
    Last Post: 05-03-2007, 01:02 PM
  5. how to wrap long lines
    By Application Development in forum Mutt
    Replies: 10
    Last Post: 10-07-2006, 03:47 PM