[VW]: read an Ascii File line by line

This is a discussion on [VW]: read an Ascii File line by line within the Smalltalk forums in Programming Languages category; Hello everyone, during my exploratory home project implementation using VisualWorks 7.6 I realized that I need to read a couple of CSV files and change them (should serve as Seaside application database). As I did not find any package supporting csv, I figured I should write my own. It all works pretty well, however I have a nagging feeling that I overlooked something: How do you read a file line-wise in VW? I extended one of the ReadStream Classes in my own package and implemented a getNextLine which simply gets every char up to a LineDeliminator, chomps the LineDeliminator and ...

Go Back   Application Development Forum > Programming Languages > Smalltalk

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-09-2008, 01:26 PM
Snorik
Guest
 
Default [VW]: read an Ascii File line by line

Hello everyone,

during my exploratory home project implementation using VisualWorks
7.6 I realized that I need to read a couple of CSV files and change
them (should serve as Seaside application database). As I did not find
any package supporting csv, I figured I should write my own. It all
works pretty well, however I have a nagging feeling that I overlooked
something:

How do you read a file line-wise in VW?
I extended one of the ReadStream Classes in my own package and
implemented a getNextLine which simply gets every char up to a
LineDeliminator, chomps the LineDeliminator and returns the whole as a
String. This is pretty dumb (it works, though), surely there has to be
an a better way (either better implementation or what I would expect a
VW-own implementation which I simply missed)?

What is more, how is writing back to a file handled best in VW? The
simple approach would be open the file, clear it and stream the
internal representation back into the file, that is nonsense, however.
I know that other dialects write back changes you made to a file
automatically - how is this handled in VW?

I know, pretty basic questions for this crowd, but I am still trying
to get a grip on VisualWorks.

Would someone be so kind as to help a confused hobbyist on this matter?
Reply With Quote
  #2  
Old 08-09-2008, 02:23 PM
Stefan Schmiedl
Guest
 
Default Re: [VW]: read an Ascii File line by line

On Sat, 9 Aug 2008 10:26:23 -0700 (PDT)
Snorik <clauskick@hotmail.com> wrote:

> works pretty well, however I have a nagging feeling that I overlooked
> something:


Your approach will probably work in most cases, but be aware
that you'll end up with a restricted CSV capability only. Apart from the
lack of an "official" specification what CSV should be, you won't be
able to handle "special" characters inside the delimited records.

Assuming that you use $, as field delimiter, how do you handle
a text field containing a $,? Assuming you use $" to delimit text
fields, how do you handle fields containing $"?

How many fields do you see in the following snippet
==========
a,,,",","""",",","
","""
,"""
==========
.... and I'm not even sure that I did the quoting correctly.

My point is that CSV is a bad decision to store data in unless you know
exactly what you are getting.

If you need to persist object state and *not* interface to external
programs, you can use BOSS (see docs), the XML-based SIXX
(cross-platform, public store), or the more compact
StateReplicationProtocol http://sourceforge.net/projects/srp
or, if you're feeling adventurous, some package from the Xtreams bundle
(also public store) which I can't recall right now. With them it's
mostly a "open stream, write this collection/read contents, close
stream" approach, which is very easy to handle.

>
> What is more, how is writing back to a file handled best in VW? The
> simple approach would be open the file, clear it and stream the
> internal representation back into the file, that is nonsense, however.
> I know that other dialects write back changes you made to a file
> automatically - how is this handled in VW?


Depends on the solution you chose above. With SRP it's:

SRP.SrpConfiguration default saveObject: aCollection
toBinaryFilename: aPath asFilename

If you want to use "basic stuff", try something like

s := 'hello.txt' asFilename writeStream.
[ s nextPutAll: 'hello world' ] ensure: [ s close ]

> I know, pretty basic questions for this crowd, but I am still trying
> to get a grip on VisualWorks.


"pretty basic questions" only implies that there are not enough "HowTo"
docs in places a novice would look.

HTH,
s.
Reply With Quote
  #3  
Old 08-10-2008, 05:56 AM
Snorik
Guest
 
Default Re: : read an Ascii File line by line

On 9 Aug., 20:23, Stefan Schmiedl <s...@xss.de> wrote:
> On Sat, 9 Aug 2008 10:26:23 -0700 (PDT)
>
> Snorik <clausk...@hotmail.com> wrote:
> > works pretty well, however I have a nagging feeling that I overlooked
> > something:

>
> Your approach will probably work in most cases, but be aware
> that you'll end up with a restricted CSV capability only. Apart from the
> lack of an "official" specification what CSV should be, you won't be
> able to handle "special" characters inside the delimited records.


The restriction is fine, I simply want an easy way to store data and
still to be able to edit it manually, i.e. with a text editor.
And I do not consider XML to be overly "human readable" (depending on
the DTD though), I have to deal enough with that nonsense in my day
job.

> Assuming that you use $, as field delimiter, how do you handle
> a text field containing a $,? Assuming you use $" to delimit text
> fields, how do you handle fields containing $"?


If you specify $whatever as delimiter, then every $whatever will be
treated as that.

*snip*

> My point is that CSV is a bad decision to store data in unless you know
> exactly what you are getting.


Well, for now, I define how the format looks so I should be alright,
right?

> If you need to persist object state and *not* interface to external
> programs, you can use BOSS (see docs), the XML-based SIXX
> (cross-platform, public store), or the more compact
> StateReplicationProtocolhttp://sourceforge.net/projects/srp
> or, if you're feeling adventurous, some package from the Xtreams bundle
> (also public store) which I can't recall right now. With them it's
> mostly a "open stream, write this collection/read contents, close
> stream" approach, which is very easy to handle.


What about external editing? Well, works for SIXX but not for the
rest.
The others sound a bit like the good old ObjectFiler.

> > What is more, how is writing back to a file handled best in VW? The
> > simple approach would be open the file, clear it and stream the
> > internal representation back into the file, that is nonsense, however.
> > I know that other dialects write back changes you made to a file
> > automatically - how is this handled in VW?

>
> Depends on the solution you chose above. With SRP it's:
>
> SRP.SrpConfiguration default saveObject: aCollection
> toBinaryFilename: aPath asFilename
>


Ok, thats for the pointers.
The Stream classes still seem to be weird

Reply With Quote
  #4  
Old 08-10-2008, 03:38 PM
Stefan Schmiedl
Guest
 
Default Re: : read an Ascii File line by line

On Sun, 10 Aug 2008 02:56:12 -0700 (PDT)
Snorik <clauskick@hotmail.com> wrote:

> > Assuming that you use $, as field delimiter, how do you handle
> > a text field containing a $,? Assuming you use $" to delimit text
> > fields, how do you handle fields containing $"?

>
> If you specify $whatever as delimiter, then every $whatever will be
> treated as that.


I know this approach ... Usually I end up with some character as
separator that cannot be entered by the user :-)

> > My point is that CSV is a bad decision to store data in unless you know
> > exactly what you are getting.

>
> Well, for now, I define how the format looks so I should be alright,
> right?


If the content "fits" the format, yes. Otherwise you need some
escape route, be it doubling the separator or going the HTML-entity way.

> What about external editing? Well, works for SIXX but not for the
> rest.
> The others sound a bit like the good old ObjectFiler.


If you need text-editability (is that a word?), you're stuck with
text-based formats. SRP has one, IIRC, but I don't know how
human-readable that is. Probably "not".

> The Stream classes still seem to be weird


Not as weird as in Java :-)

s.
Reply With Quote
  #5  
Old 08-11-2008, 06:21 AM
Reinout Heeck
Guest
 
Default Re: [VW]: read an Ascii File line by line

Snorik wrote:
>
> How do you read a file line-wise in VW?
> I extended one of the ReadStream Classes in my own package and
> implemented a getNextLine which simply gets every char up to a
> LineDeliminator, chomps the LineDeliminator and returns the whole as a
> String.


Pretty good! That's how the system does it too :-)

In my image the class Stream implements #nextLine, the odd thing is that
it is not in the base image but that you need to load the package
NetClientBase to have it :-(

Here's its implementation:

nextLine
^self upTo: Character cr



> This is pretty dumb (it works, though),


Rather than 'dumb' I tend to call such code 'naive'. Wherever possible
(acceptable) I prefer naive code above 'smart' stuff because naive code
tends to be easier to maintain.


> surely there has to be
> an a better way (either better implementation or what I would expect a
> VW-own implementation which I simply missed)?



Some programmers might want to get rid of the 'needless' creation of
strings that represent the lines (since they will be broken up further
anyway). This has been explored in the package ComputingStreams (in the
public repository), see here for some background:
http://www.cincomsmalltalk.com/userb...try=3370092213


>
> What is more, how is writing back to a file handled best in VW? The
> simple approach would be open the file, clear it and stream the
> internal representation back into the file, that is nonsense, however.
> I know that other dialects write back changes you made to a file
> automatically - how is this handled in VW?


Yes, this is the typical approach.

If you open a WriteStream on an existing file it will be automatically
truncated to 0 bytes before writing.

If the file would be holding records of a known fixed length you could
do tricks such that only new records are written to disc. But since you
don't have fixed length records I don't see any other option than
writing out the whole file after a change.

>
> I know, pretty basic questions for this crowd, but I am still trying
> to get a grip on VisualWorks.
>
> Would someone be so kind as to help a confused hobbyist on this matter?


This newsgroup will do, although you may find a larger and more
specialized crowd over at the VWNC list:
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Cheers,

Reinout
-------
Reply With Quote
  #6  
Old 08-11-2008, 07:02 AM
Stefan Schmiedl
Guest
 
Default Re: [VW]: read an Ascii File line by line

On Mon, 11 Aug 2008 12:21:42 +0200
Reinout Heeck <reinout@soops.nl> wrote:

> In my image the class Stream implements #nextLine, the odd thing is that
> it is not in the base image but that you need to load the package
> NetClientBase to have it :-(
>
> Here's its implementation:
>
> nextLine
> ^self upTo: Character cr


The reason might be that the line delimiter is a tad dependent on
where you use it. IIRC, a line as used in SMTP or HTTP headers
is delimited by CRLF, so depending on what you're aiming at the
above implementation might work or not.

> Some programmers might want to get rid of the 'needless' creation of
> strings that represent the lines (since they will be broken up further
> anyway). This has been explored in the package ComputingStreams (in the
> public repository), see here for some background:
> http://www.cincomsmalltalk.com/userb...try=3370092213


"premature deoptimization" ... I like this idea :-)

s.


Reply With Quote
  #7  
Old 08-11-2008, 09:15 AM
Reinout Heeck
Guest
 
Default Re: [VW]: read an Ascii File line by line

Stefan Schmiedl wrote:
> On Mon, 11 Aug 2008 12:21:42 +0200
> Reinout Heeck <reinout@soops.nl> wrote:
>
>> In my image the class Stream implements #nextLine, the odd thing is that
>> it is not in the base image but that you need to load the package
>> NetClientBase to have it :-(
>>
>> Here's its implementation:
>>
>> nextLine
>> ^self upTo: Character cr

>
> The reason might be that the line delimiter is a tad dependent on
> where you use it. IIRC, a line as used in SMTP or HTTP headers
> is delimited by CRLF, so depending on what you're aiming at the
> above implementation might work or not.


As far as I know all streams should model a line delimiter as Character
cr in the image. Any stream that does not should be viewed as a bug I
guess...


R
-
Reply With Quote
  #8  
Old 08-11-2008, 01:01 PM
Snorik
Guest
 
Default Re: : read an Ascii File line by line

On 11 Aug., 12:21, Reinout Heeck <rein...@soops.nl> wrote:
> Snorik wrote:
>
> > How do you read a file line-wise in VW?
> > I extended one of the ReadStream Classes in my own package and
> > implemented a getNextLine which simply gets every char up to a
> > LineDeliminator, chomps the LineDeliminator and returns the whole as a
> > String.

>
> Pretty good! That's how the system does it too :-)


Well, my implementation was a bit more eloquent, though.

> In my image the class Stream implements #nextLine, the odd thing is that
> it is not in the base image but that you need to load the package
> NetClientBase to have it :-(


Ok, loaded the package now and swore some more.
Why is it that there are so many things missing in the base image?
I am thinking of the Environment Enhancements especially.

Same with nextLine, I was actually at least expecting it to be in
ExternalReadStream in the base image.
(At least, I implemented it in ExternalReadStream, figuring it would
be suitated best there)

> Here's its implementation:
>
> nextLine
> ^self upTo: Character cr
>
> > This is pretty dumb (it works, though),

>
> Rather than 'dumb' I tend to call such code 'naive'. Wherever possible
> (acceptable) I prefer naive code above 'smart' stuff because naive code
> tends to be easier to maintain.


Granted, yes. But I was thinking about the line delimiters and
wondering how many different implementation you actually would have to
have, depending on what stream and what OS you are using.

> > surely there has to be
> > an a better way (either better implementation or what I would expect a
> > VW-own implementation which I simply missed)?

>
> Some programmers might want to get rid of the 'needless' creation of
> strings that represent the lines (since they will be broken up further
> anyway). This has been explored in the package ComputingStreams (in the
> public repository), see here for some background:http://www.cincomsmalltalk.com/userb...howComments=tr...


Ok, another thing to learn.

> > What is more, how is writing back to a file handled best in VW? The
> > simple approach would be open the file, clear it and stream the
> > internal representation back into the file, that is nonsense, however.
> > I know that other dialects write back changes you made to a file
> > automatically - how is this handled in VW?

>
> Yes, this is the typical approach.
>
> If you open a WriteStream on an existing file it will be automatically
> truncated to 0 bytes before writing.


Ok, thats basically just a unix ">".

> If the file would be holding records of a known fixed length you could
> do tricks such that only new records are written to disc. But since you
> don't have fixed length records I don't see any other option than
> writing out the whole file after a change.


I am wondering whether not just keep changes in a separate variable
and if there are new lines, just append them to the file.
Well, perhaps, in time, I will have a public package that people can
take apart and make fun of :>

> > I know, pretty basic questions for this crowd, but I am still trying
> > to get a grip on VisualWorks.

>
> > Would someone be so kind as to help a confused hobbyist on this matter?

>
> This newsgroup will do, although you may find a larger and more
> specialized crowd over at the VWNC list:http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


Ok, I was not aware of that list. Thanks for the hint (*registered*).
Reply With Quote
  #9  
Old 08-12-2008, 04:30 AM
Reinout Heeck
Guest
 
Default Re: : read an Ascii File line by line

Snorik wrote:

>
>> In my image the class Stream implements #nextLine, the odd thing is that
>> it is not in the base image but that you need to load the package
>> NetClientBase to have it :-(

>
> Ok, loaded the package now and swore some more.
> Why is it that there are so many things missing in the base image?
> I am thinking of the Environment Enhancements especially.
>
> Same with nextLine, I was actually at least expecting it to be in
> ExternalReadStream in the base image.


Do me a favor and ask that question again on the vwnc list -- Cincom
staff participates in that list.
Your question is a long-standing complaint, I know Cincom wants to do
something about this but it seems to stay relatively low on their
priority list. On the other hand Cincom is at the moment focusing on the
experience for newbies so complaints by newcomers like you might get
more attention ;-)

More generally I guess Cincom (and the St community) would be helped if
you report your experiences with VisualWorks - particularly how it is to
use it as a newbie.




Cheers,

R
-





Reply With Quote
  #10  
Old 08-12-2008, 07:38 PM
Terry Raymond
Guest
 
Default Re: [VW]: read an Ascii File line by line

Stefan Schmiedl <s@xss.de> wrote in
news:20080811130249.2cffc0cc@g64.xss.de:

> On Mon, 11 Aug 2008 12:21:42 +0200
> Reinout Heeck <reinout@soops.nl> wrote:
>
>> In my image the class Stream implements #nextLine, the odd thing is
>> that it is not in the base image but that you need to load the
>> package NetClientBase to have it :-(
>>
>> Here's its implementation:
>>
>> nextLine
>> ^self upTo: Character cr

>
> The reason might be that the line delimiter is a tad dependent on
> where you use it. IIRC, a line as used in SMTP or HTTP headers
> is delimited by CRLF, so depending on what you're aiming at the
> above implementation might work or not.


If you are not using binary, then a CR is the delimiter.
Take a closer look at the streams, particuarly code related
to line end detection. You will find that it translates a
line end to CR. Basically, the system creates an abstraction
for you so you don't have to worry about it.


--
Terry
================================================== =========
Terry Raymond
Crafted Smalltalk
80 Lazywood Ln.
Tiverton, RI 02878
(401) 624-4517 traymond at craftedsmalltalk nospam dot com
<http://www.craftedsmalltalk.com>
================================================== =========
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 10:30 PM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.