File output and buffering

This is a discussion on File output and buffering within the ADA forums in Programming Languages category; On Thu, 21 Aug 2008 00:10:52 -0700 (PDT), Maciej Sobczak wrote: > On 20 Sie, 17:39, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> > wrote: > >> Buffering is used to make I/O in an asynchronous and/or conveyered way. > > No, it is not asynchronous. Nothing happens in the background, the > operations are only grouped. The group is (usually) transmitted in the > synchronous fashion. > > I do not know what is "conveyered". Pipelined processing. When you refer to throughput, then it is increased only because of existence of hidden conveyers, which ultimately always boils down to some asynchronously working ...

Go Back   Application Development Forum > Programming Languages > ADA

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
Reply

 

LinkBack Thread Tools Display Modes
  #11  
Old 08-21-2008, 05:24 AM
Dmitry A. Kazakov
Guest
 
Default Re: File output and buffering

On Thu, 21 Aug 2008 00:10:52 -0700 (PDT), Maciej Sobczak wrote:

> On 20 Sie, 17:39, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>
>> Buffering is used to make I/O in an asynchronous and/or conveyered way.

>
> No, it is not asynchronous. Nothing happens in the background, the
> operations are only grouped. The group is (usually) transmitted in the
> synchronous fashion.
>
> I do not know what is "conveyered".


Pipelined processing. When you refer to throughput, then it is increased
only because of existence of hidden conveyers, which ultimately always
boils down to some asynchronously working elements.

>> That does not make I/O faster in terms of latencies.

>
> It does make it faster in terms of throughput.
>
> Note: I do not imply that throughput is more valuable for optimization
> than latency - these can be different goals and usually are.
>
>> Any language buffer on top of numerous layered buffers, typical for an OS,
>> adds nothing, but overhead.

>
> It can reduce the overhead that is associated with the number of
> requests. System calls are not free and there is also a significant
> latency of the medium that is better to be avoided (like network
> roundtrips or disk seek times).


Well, here we need to clarify what is the I/O end point. When you say
"system call" it presumes that the end point is the driver. Let us fix it.
Now, the next question is where coalescing/pipelining is to happen. See
where it goes? Is the driver's interface a stream of units or else, also,
of blocks of units?

Case A. There is no back door to the driver, you have only a stream. What
can buffering add? Nothing, but overhead.

Case B. There is a back door for pushing bigger chunks of units. Then use
it in your application and it will go *faster* than whatever buffered
interface on top of the same thing!

Note also that A and B usually refer different protocol layers. It is
common to put a stream layer onto something block-oriented beneath, and
reverse. That stream is buffering and necessarily an overhead. Buffering is
always overhead. We buy it only because the alternative is inaccessible,
like to do DMA transfers from the application. But a language library is in
the *same* position as the application, so buffering there would gain
nothing, *from* performance perspective.

Ada.Text_IO is slow because of the buffering it does in order to implement
a protocol (pages) which you do not need. Classic abstraction inversion
case.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
Reply With Quote
  #12  
Old 08-21-2008, 04:54 PM
Maciej Sobczak
Guest
 
Default Re: File output and buffering

On 21 Sie, 11:24, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> > I do not know what is "conveyered".

>
> Pipelined processing. When you refer to throughput, then it is increased
> only because of existence of hidden conveyers, which ultimately always
> boils down to some asynchronously working elements.


No, there is no asynchronous processing there (usually). There is
grouping that leads to smaller number of still synchronous operations.

> Well, here we need to clarify what is the I/O end point.


No, we do not need to, especially when it is already clear that we
would spiral down in an endless philosophy discussion about
definitions.

It is enough to get a clock and measure two simple test programs.
I can offer the test programs if needed.

> Ada.Text_IO is slow because of the buffering it does in order to implement
> a protocol (pages) which you do not need.


I do not see how paging could be related here.
Or at least I can imagine an implementation where the overhead of
bookkeeping pages is less than 15-20x.

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada
Reply With Quote
  #13  
Old 08-21-2008, 05:27 PM
Dmitry A. Kazakov
Guest
 
Default Re: File output and buffering

On Thu, 21 Aug 2008 13:54:25 -0700 (PDT), Maciej Sobczak wrote:

> On 21 Sie, 11:24, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>
>>> I do not know what is "conveyered".

>>
>> Pipelined processing. When you refer to throughput, then it is increased
>> only because of existence of hidden conveyers, which ultimately always
>> boils down to some asynchronously working elements.

>
> No, there is no asynchronous processing there (usually). There is
> grouping that leads to smaller number of still synchronous operations.


"Still synchronous operations" of items in the group? Come on, grouping
brings nothing if items are output synchronously to the caller. Coalescing
helps if and only if individual items in the group are output
asynchronously to the caller and to the receiver. In other words when the
interested parties re-synchronize only at the ends of a group. In which
state relatively to the output is the caller between the ends of a group?

> It is enough to get a clock and measure two simple test programs.
> I can offer the test programs if needed.


No thanks. We are actually paid for designing such tests, so we have plenty
of.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
Reply With Quote
  #14  
Old 08-22-2008, 07:53 AM
Maciej Sobczak
Guest
 
Default Re: File output and buffering

On 21 Sie, 23:27, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> "Still synchronous operations" of items in the group? Come on, grouping
> brings nothing if items are output synchronously to the caller.


Of course it brings a lot - it minimizes the total overhead due to
smaller number of requests.

Ever tried to send each character in a separate mail instead of
sending one mail containing many characters?

> In which
> state relatively to the output is the caller between the ends of a group?


Why should I care? Sometimes I care only about throughput.

> > It is enough to get a clock and measure two simple test programs.
> > I can offer the test programs if needed.

>
> No thanks. We are actually paid for designing such tests, so we have plenty
> of.


Then why do you try so hard to distort this discussion?

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada
Reply With Quote
  #15  
Old 08-22-2008, 09:22 AM
Dmitry A. Kazakov
Guest
 
Default Re: File output and buffering

On Fri, 22 Aug 2008 04:53:56 -0700 (PDT), Maciej Sobczak wrote:

> On 21 Sie, 23:27, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>
>> "Still synchronous operations" of items in the group? Come on, grouping
>> brings nothing if items are output synchronously to the caller.

>
> Of course it brings a lot - it minimizes the total overhead due to
> smaller number of requests.
>
> Ever tried to send each character in a separate mail instead of
> sending one mail containing many characters?


It seems that you didn't read my posts. One last try. In your example, when
characters of a message are sent *synchronously* (assuming E-mail as the
transport layer, no back doors, etc) then each single character has to be
sent as a reply to the answer to the earlier mail. The very ability to send
multiple characters in one mail means that they are sent in parallel =
asynchronously. Compare it to parallel vs. serial communication. For the
rest see

http://en.wikipedia.org/wiki/Buffer_...mmunication%29

Note the category of the article, read the purposes of buffering.

>> In which
>> state relatively to the output is the caller between the ends of a group?

>
> Why should I care?


Because it debunks your claim that the transfer of individual items is
synchronous. It is asynchronous, when makes sense.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
Reply With Quote
  #16  
Old 08-22-2008, 05:41 PM
Maciej Sobczak
Guest
 
Default Re: File output and buffering

On 22 Sie, 15:22, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> It seems that you didn't read my posts.


I've read them, but did not understand.

> One last try. In your example, when
> characters of a message are sent *synchronously* (assuming E-mail as the
> transport layer, no back doors, etc) then each single character has to be
> sent as a reply to the answer to the earlier mail.


Then we have a different notion of "synchronously".
When I write something to the file, the operation is synchronous when
the program *waits* for the transfer to complete.

> The very ability to send
> multiple characters in one mail means that they are sent in parallel =
> asynchronously.


Then we have a different notion of "asynchronously".
When I write something to the file, the operation is asynchronous when
the program can continue while the transfer is being handled.

And we have also a different notion of "parallel".
When I send a mail, it is transferred serially over a network cable.
The longer is the mail the longer it takes (hint: with parallel
communication the time of transmission would not depend on the number
of characters in the mail, since they would be sent, well, in
parallel).

> Compare it to parallel vs. serial communication.


Nothing to compare.

> For the
> rest see
>
> * *http://en.wikipedia.org/wiki/Buffer_...mmunication%29


Short, but nice. Especially point d).

> Note the category of the article, read the purposes of buffering.


Yes, the purpose d) is what I'm talking about. I use buffers to group
data into smaller number of bigger units. This is where the
performance gain comes from.

> Because it debunks your claim that the transfer of individual items is
> synchronous. It is asynchronous, when makes sense.


No, it is synchronous, since the program has to wait until the
transfer completes (if the transfer is triggered at all - the buffer
makes that happen less frequently).

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada
Reply With Quote
  #17  
Old 08-23-2008, 06:25 AM
Dmitry A. Kazakov
Guest
 
Default Re: File output and buffering

On Fri, 22 Aug 2008 14:41:18 -0700 (PDT), Maciej Sobczak wrote:

> On 22 Sie, 15:22, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>
>> One last try. In your example, when
>> characters of a message are sent *synchronously* (assuming E-mail as the
>> transport layer, no back doors, etc) then each single character has to be
>> sent as a reply to the answer to the earlier mail.

>
> Then we have a different notion of "synchronously".
> When I write something to the file, the operation is synchronous when
> the program *waits* for the transfer to complete.


The transfer of the group, not the transfers of the individual items of.

>> The very ability to send
>> multiple characters in one mail means that they are sent in parallel =
>> asynchronously.

>
> Then we have a different notion of "asynchronously".
> When I write something to the file, the operation is asynchronous when
> the program can continue while the transfer is being handled.


That is the same notion. Asynchronous = not synchronous. The semantics of a
transfer of a group of items does not depend on the order and exact timing
of the transfers of individual items. If any, because they might be not
transferred at all. Consider protocols which recode the group, digital
fountains, etc.

> And we have also a different notion of "parallel".
> When I send a mail, it is transferred serially over a network cable.


Wrong, they are printed and then sent per pigeon post.

You have defined the transport layer as E-mail. That's it. Don't make
suggestions about how E-mail might work, there are lots of ways.

> The longer is the mail the longer it takes (hint: with parallel
> communication the time of transmission would not depend on the number
> of characters in the mail, since they would be sent, well, in
> parallel).


Nope I have a huge rack of multiplexed modems installed in the cellar.

You again make assumptions about possible implementations of the transport
layer, which weren't there when you presented the example. If the transport
were rather a synchronous bytes stream, then buffering obviously would
bring *nothing* to the throughout.

>> * *http://en.wikipedia.org/wiki/Buffer_...mmunication%29

>
> Short, but nice. Especially point d).


Right, it says "operated on as a unit", read my previous posts. Who
operates them as "a unit"? You need an independent asynchronous agent
capable to do so, otherwise it is not a unit. If you have such an agent,
and you can talk to it in terms of such units, then that is *without*
buffering, and it is faster than anything else. The purpose of d) is to
collect, it is merely an adapter between two protocols. Layered protocols
are always slower.

>> Note the category of the article, read the purposes of buffering.

>
> Yes, the purpose d) is what I'm talking about. I use buffers to group
> data into smaller number of bigger units. This is where the
> performance gain comes from.


No, it is where you lose performance, because I just send bigger units
directly.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
Reply With Quote
  #18  
Old 08-23-2008, 09:41 AM
Steve
Guest
 
Default Re: File output and buffering

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message
news:yiw2f938342v.xzb47swyx5h4$.dlg@40tude.net...
> On Fri, 22 Aug 2008 14:41:18 -0700 (PDT), Maciej Sobczak wrote:
>
>> On 22 Sie, 15:22, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
>> wrote:
>>
>>> One last try. In your example, when
>>> characters of a message are sent *synchronously* (assuming E-mail as the
>>> transport layer, no back doors, etc) then each single character has to
>>> be
>>> sent as a reply to the answer to the earlier mail.

>>
>> Then we have a different notion of "synchronously".
>> When I write something to the file, the operation is synchronous when
>> the program *waits* for the transfer to complete.

>
> The transfer of the group, not the transfers of the individual items of.
>


Dmitry,

I have read enough of your posts on this newsgroup to know you're not a
troll, but is sure hard to tell from reading this thread.

In my experience (theory aside) sending one character a time to an OS is
considerably slower than buffering the data and sending blocks of data.

Several years ago I rewrote a driver on one of our system that we used to
communicate serially (using RS232) with a PLC (Programmable Logic
Controller). The driver was originally written to make separate calls to
the OS for each character sent to the PLC. The original implementation
utilized approximately 15% of the CPU. When I re-wrote the driver to buffer
the characters into blocks of up to 128 characters (defined by the PLC
protocol) and make one OS call for the buffered data, the CPU utilization
dropt to less than 1% of the CPU.

This behavior makes perfect sense to me because for each call to the OS a
buffer is allocated containing the data to be transmitted and placed in a
queue for the OS. The buffer itself contains more than just the data to be
sent, it includes some overhead, sometimes significant in size. The
addition of the buffer to the OS queue often includes considerable overhead,
context switches, mutexes, etc. When the number of characters in the buffer
is increased the overhead is not significantly increased.

Sure, if you're talking directly to hardware hardware that only handles one
character at a time then buffering and unbuffering data adds overhead. But
it is rare in these days to talk directly with the hardware. Even the
simpler systems often use a kernel or OS that makes buffering worthwhile.

If you're using TCP/IP to send data, if you're going to send a bunch of data
at a time it would be silly to send one byte at a time. IP has considerable
overhead for each block. You should try to include as much data as possible
to minimize the number of packets sent and minimize the overhead.

I find it interesting that you seem to be arguing that it is always better
to not bufffer, when the original poster has indicated that he has tried
both buffered and unbuffered approaches and observed thant unbuffered was
considerably slower on his system. Either you are miscommunicating or you
are just plain wrong.

Regards,
Steve

> --
> Regards,
> Dmitry A. Kazakov
> http://www.dmitry-kazakov.de



Reply With Quote
  #19  
Old 08-23-2008, 10:33 AM
Dmitry A. Kazakov
Guest
 
Default Re: File output and buffering

I have repeated the argument, provided all possible explanations and
examples more than three times.

Since it goes in circles, let's put an end to this.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
Reply With Quote
  #20  
Old 08-23-2008, 04:34 PM
Dennis Lee Bieber
Guest
 
Default Re: File output and buffering

On Fri, 22 Aug 2008 14:41:18 -0700 (PDT), Maciej Sobczak
<see.my.homepage@gmail.com> declaimed the following in comp.lang.ada:


> Then we have a different notion of "synchronously".
> When I write something to the file, the operation is synchronous when
> the program *waits* for the transfer to complete.
>

If I may slip in, since this thread has wandered into comparisons
that even I can't follow...

Define "complete"

Most I/O systems I've encountered are buffered by the OS... As far
as an application is concerned, an I/O "write" operation is "complete"
when the OS accepts the packet for buffering. The actual I/O, in terms
of writing to a disk block, say, is performed by the OS at some
indeterminate time after control has been returned to the application
(exceptions: explicit flush operations, buffer full condition during a
"write" -- which will flush that buffer and then package the rest of the
application packet but not flush it...)
--
Wulfraed Dennis Lee Bieber KD6MOG
wlfraed@ix.netcom.com wulfraed@bestiaria.com
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: web-asst@bestiaria.com)
HTTP://www.bestiaria.com/
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 09:34 PM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.