Reading files, splitting on a delimiter and newlines. - Python

This is a discussion on Reading files, splitting on a delimiter and newlines. - Python ; chrispwd a écrit : > Hello, > > I have a situation where I have a file that contains text similar to: > > myValue1 = contents of value1 > myValue2 = contents of value2 but > with a new ...

+ Reply to Thread
Results 1 to 9 of 9

Reading files, splitting on a delimiter and newlines.

  1. Default Re: Reading files, splitting on a delimiter and newlines.

    chrispwd a écrit :
    > Hello,
    >
    > I have a situation where I have a file that contains text similar to:
    >
    > myValue1 = contents of value1
    > myValue2 = contents of value2 but
    > with a new line here
    > myValue3 = contents of value3
    >
    > My first approach was to open the file, use readlines to split the
    > lines on the "=" delimiter into a key/value pair (to be stored in a
    > dict).
    >
    > After processing a couple files I noticed its possible that a newline
    > can be present in the value as shown in myValue2.
    >
    > In this case its not an option to say remove the newlines if its a
    > "multi line" value as the value data needs to stay intact.
    >
    > I'm a bit confused as how to go about getting this to work.
    >
    > Any suggestions on an approach would be greatly appreciated!
    >


    data = {}
    key = None
    for line in open('yourfile.txt'):
    line = line.strip()
    if not line:
    # skip empty lines
    continue
    if '=' in line:
    key, value = map(str.strip, line.split('=', 1))
    data[key] = value
    elif key is None:
    # first line without a '='
    raise ValueError("invalid format")
    else:
    # multiline
    data[key] += "\n" + line


    print data
    => {'myValue3': 'contents of value3', 'myValue2': 'contents of value2
    but\nwith a new line here', 'myValue1': 'contents of value1'}

    HTH

  2. Default Re: Reading files, splitting on a delimiter and newlines.

    attn.steven.kuo a écrit :
    > On Jul 25, 8:46 am, chris... wrote:
    >
    >>Hello,
    >>
    >>I have a situation where I have a file that contains text similar to:
    >>
    >>myValue1 = contents of value1
    >>myValue2 = contents of value2 but
    >> with a new line here
    >>myValue3 = contents of value3
    >>
    >>My first approach was to open the file, use readlines to split the
    >>lines on the "=" delimiter into a key/value pair (to be stored in a
    >>dict).
    >>
    >>After processing a couple files I noticed its possible that a newline
    >>can be present in the value as shown in myValue2.
    >>
    >>In this case its not an option to say remove the newlines if its a
    >>"multi line" value as the value data needs to stay intact.
    >>
    >>I'm a bit confused as how to go about getting this to work.
    >>
    >>Any suggestions on an approach would be greatly appreciated!

    >
    >
    >
    >
    > Check the length of the list returned from split; this allows
    > your to append to the previously extracted value if need be.
    >
    > import StringIO
    > import pprint
    >
    > buf = """\
    > myValue1 = contents of value1
    > myValue2 = contents of value2 but
    > with a new line here
    > myValue3 = contents of value3
    > """
    >
    > mockfile = StringIO.StringIO(buf)
    >
    > record=dict()
    >
    > for line in mockfile:
    > kvpair = line.split('=', 2)


    You want :
    kvpair = line.split('=', 1)

    >>> toto = "x = 42 = 33"
    >>> toto.split('=', 2)

    ['x ', ' 42 ', ' 33']


    > if len(kvpair) == 2:
    > key, value = kvpair
    > record[key] = value
    > else:
    > record[key] += line


    Also, this won't handle the case where the first line doesn't contain an
    '=' (NameError, name 'key' is not defined)

  3. Default Reading files, splitting on a delimiter and newlines.

    Hello,

    I have a situation where I have a file that contains text similar to:

    myValue1 = contents of value1
    myValue2 = contents of value2 but
    with a new line here
    myValue3 = contents of value3

    My first approach was to open the file, use readlines to split the
    lines on the "=" delimiter into a key/value pair (to be stored in a
    dict).

    After processing a couple files I noticed its possible that a newline
    can be present in the value as shown in myValue2.

    In this case its not an option to say remove the newlines if its a
    "multi line" value as the value data needs to stay intact.

    I'm a bit confused as how to go about getting this to work.

    Any suggestions on an approach would be greatly appreciated!


  4. Default Re: Reading files, splitting on a delimiter and newlines.

    On Jul 25, 10:46 am, chris... wrote:
    > Hello,
    >
    > I have a situation where I have a file that contains text similar to:
    >
    > myValue1 = contents of value1
    > myValue2 = contents of value2 but
    > with a new line here
    > myValue3 = contents of value3
    >
    > My first approach was to open the file, use readlines to split the
    > lines on the "=" delimiter into a key/value pair (to be stored in a
    > dict).
    >
    > After processing a couple files I noticed its possible that a newline
    > can be present in the value as shown in myValue2.
    >
    > In this case its not an option to say remove the newlines if its a
    > "multi line" value as the value data needs to stay intact.
    >
    > I'm a bit confused as how to go about getting this to work.
    >
    > Any suggestions on an approach would be greatly appreciated!


    I'm confused. You don't want the newline to be present, but you can't
    remove it because the data has to stay intact? If you don't want to
    change it, then what's the problem?

    Mike


  5. Default Re: Reading files, splitting on a delimiter and newlines.

    On Wed, 25 Jul 2007 09:16:26 -0700, kyosohma wrote:

    > On Jul 25, 10:46 am, chris... wrote:
    >> Hello,
    >>
    >> I have a situation where I have a file that contains text similar to:
    >>
    >> myValue1 = contents of value1
    >> myValue2 = contents of value2 but
    >> with a new line here
    >> myValue3 = contents of value3
    >>
    >> My first approach was to open the file, use readlines to split the
    >> lines on the "=" delimiter into a key/value pair (to be stored in a
    >> dict).
    >>
    >> After processing a couple files I noticed its possible that a newline
    >> can be present in the value as shown in myValue2.
    >>
    >> In this case its not an option to say remove the newlines if its a
    >> "multi line" value as the value data needs to stay intact.
    >>
    >> I'm a bit confused as how to go about getting this to work.
    >>
    >> Any suggestions on an approach would be greatly appreciated!

    >
    > I'm confused. You don't want the newline to be present, but you can't
    > remove it because the data has to stay intact? If you don't want to
    > change it, then what's the problem?
    >
    > Mike


    It's obviously that simple line-by-line filtering won't handle multi-line
    statements.

    You could solve that by saving the last item you added something to and,
    if the line currently handles doesn't look like an assignment, append it
    to this item. You might run into problems with such data:

    foo = modern maths
    proved that 1 = 1
    bar = single

    If your dataset always has indendation on subsequent lines, you might use
    this. Or if the key's name is always just one word.

    HTH,
    Stargaming

  6. Default Re: Reading files, splitting on a delimiter and newlines.

    On Jul 26, 3:08 am, Stargaming <stargam...> wrote:
    > On Wed, 25 Jul 2007 09:16:26 -0700, kyosohma wrote:
    > > On Jul 25, 10:46 am, chris... wrote:
    > >> Hello,

    >
    > >> I have a situation where I have a file that contains text similar to:

    >
    > >> myValue1 = contents of value1
    > >> myValue2 = contents of value2 but
    > >> with a new line here
    > >> myValue3 = contents of value3

    >
    > >> My first approach was to open the file, use readlines to split the
    > >> lines on the "=" delimiter into a key/value pair (to be stored in a
    > >> dict).

    >
    > >> After processing a couple files I noticed its possible that a newline
    > >> can be present in the value as shown in myValue2.

    >
    > >> In this case its not an option to say remove the newlines if its a
    > >> "multi line" value as the value data needs to stay intact.

    >
    > >> I'm a bit confused as how to go about getting this to work.

    >
    > >> Any suggestions on an approach would be greatly appreciated!

    >
    > > I'm confused. You don't want the newline to be present, but you can't
    > > remove it because the data has to stay intact? If you don't want to
    > > change it, then what's the problem?

    >
    > > Mike

    >
    > It's obviously that simple line-by-line filtering won't handle multi-line
    > statements.
    >
    > You could solve that by saving the last item you added something to and,
    > if the line currently handles doesn't look like an assignment, append it
    > to this item. You might run into problems with such data:
    >
    > foo = modern maths
    > proved that 1 = 1
    > bar = single
    >
    > If your dataset always has indendation on subsequent lines, you might use
    > this. Or if the key's name is always just one word.
    >


    My take: all of the above, plus: Given that you want to extract stuff
    of the form <LHS> = <RHS> I'd suggest developing a fairly precise
    regular expression for LHS, maybe even for RHS, and trying this on as
    many of these files as you can.

    Why an RE for RHS? Consider:

    foo = somebody said "I think that
    REs = trouble
    maybe_better = pyparsing"

    :-)


  7. Default Re: Reading files, splitting on a delimiter and newlines.

    On Jul 25, 8:46 am, chris... wrote:
    > Hello,
    >
    > I have a situation where I have a file that contains text similar to:
    >
    > myValue1 = contents of value1
    > myValue2 = contents of value2 but
    > with a new line here
    > myValue3 = contents of value3
    >
    > My first approach was to open the file, use readlines to split the
    > lines on the "=" delimiter into a key/value pair (to be stored in a
    > dict).
    >
    > After processing a couple files I noticed its possible that a newline
    > can be present in the value as shown in myValue2.
    >
    > In this case its not an option to say remove the newlines if its a
    > "multi line" value as the value data needs to stay intact.
    >
    > I'm a bit confused as how to go about getting this to work.
    >
    > Any suggestions on an approach would be greatly appreciated!




    Check the length of the list returned from split; this allows
    your to append to the previously extracted value if need be.

    import StringIO
    import pprint

    buf = """\
    myValue1 = contents of value1
    myValue2 = contents of value2 but
    with a new line here
    myValue3 = contents of value3
    """

    mockfile = StringIO.StringIO(buf)

    record=dict()

    for line in mockfile:
    kvpair = line.split('=', 2)
    if len(kvpair) == 2:
    key, value = kvpair
    record[key] = value
    else:
    record[key] += line

    pprint.pprint(record)

    # lstrip() to remove newlines if needed ...

    --
    Hope this helps,
    Steven


  8. Default Re: Reading files, splitting on a delimiter and newlines.

    On Jul 25, 7:56 pm, "attn.steven...."
    <attn.steven....> wrote:
    > On Jul 25, 8:46 am, chris... wrote:
    >
    >
    >
    > > Hello,

    >
    > > I have a situation where I have a file that contains text similar to:

    >
    > > myValue1 = contents of value1
    > > myValue2 = contents of value2 but
    > > with a new line here
    > > myValue3 = contents of value3

    >
    > > My first approach was to open the file, use readlines to split the
    > > lines on the "=" delimiter into a key/value pair (to be stored in a
    > > dict).

    >
    > > After processing a couple files I noticed its possible that a newline
    > > can be present in the value as shown in myValue2.

    >
    > > In this case its not an option to say remove the newlines if its a
    > > "multi line" value as the value data needs to stay intact.

    >
    > > I'm a bit confused as how to go about getting this to work.

    >
    > > Any suggestions on an approach would be greatly appreciated!

    >
    > Check the length of the list returned from split; this allows
    > your to append to the previously extracted value if need be.
    >
    > import StringIO
    > import pprint
    >
    > buf = """\
    > myValue1 = contents of value1
    > myValue2 = contents of value2 but
    > with a new line here
    > myValue3 = contents of value3
    > """
    >
    > mockfile = StringIO.StringIO(buf)
    >
    > record=dict()
    >
    > for line in mockfile:
    > kvpair = line.split('=', 2)
    > if len(kvpair) == 2:
    > key, value = kvpair
    > record[key] = value
    > else:
    > record[key] += line
    >
    > pprint.pprint(record)
    >
    > # lstrip() to remove newlines if needed ...
    >
    > --
    > Hope this helps,
    > Steven


    Great thank you! That was the logic I was looking for.


  9. Default Re: Reading files, splitting on a delimiter and newlines.

    : <kyo...ma> Wrote:

    > On Jul 25, 10:46 am, chris... wrote:
    > > Hello,
    > >
    > > I have a situation where I have a file that contains text similar to:
    > >
    > > myValue1 = contents of value1
    > > myValue2 = contents of value2 but
    > > with a new line here
    > > myValue3 = contents of value3
    > >
    > > My first approach was to open the file, use readlines to split the
    > > lines on the "=" delimiter into a key/value pair (to be stored in a
    > > dict).
    > >
    > > After processing a couple files I noticed its possible that a newline
    > > can be present in the value as shown in myValue2.
    > >
    > > In this case its not an option to say remove the newlines if its a
    > > "multi line" value as the value data needs to stay intact.
    > >
    > > I'm a bit confused as how to go about getting this to work.
    > >
    > > Any suggestions on an approach would be greatly appreciated!

    >
    > I'm confused. You don't want the newline to be present, but you can't
    > remove it because the data has to stay intact? If you don't want to
    > change it, then what's the problem?


    I think the OP's trouble is that the value he wants gets split up by the
    newline at the end of the line when he uses readline().

    One can try adding the single value to the previous value in the previous
    key/value pair when the split does not yield two values - a bit hackish,
    but given structured input data it might work.

    - Hendrik


+ Reply to Thread

Similar Threads

  1. Splitting WCF ServiceHost configurations into separate files?
    By Application Development in forum DOTNET
    Replies: 3
    Last Post: 10-17-2007, 09:56 AM
  2. Question about reading data I've created with , delimiter
    By Application Development in forum PHP
    Replies: 13
    Last Post: 07-16-2007, 01:23 AM
  3. Splitting packages in per-procedure separate files
    By Application Development in forum ADA
    Replies: 13
    Last Post: 02-12-2007, 01:42 PM
  4. Tab delimiter
    By Application Development in forum cobol
    Replies: 0
    Last Post: 10-27-2006, 03:19 PM
  5. Archive.pst files | Splitting programatically
    By Application Development in forum Microsoft Exchange
    Replies: 1
    Last Post: 01-24-2005, 07:50 PM