Most efficient way to write a large matrix to disk

This is a discussion on Most efficient way to write a large matrix to disk within the Fortran forums in Programming Languages category; I have to write to disk some rather large matrices (N=10000 or more) of double precision numbers. The output file is opened with FORM='UNFORMATTED'. At the moment the writing is done passing the matrix to a subroutine which treats the matrix as large vector and writes it down as such. In its minimal form it's something like: !========================================= program myprogram real(dp), dimension(Nmax,Nmax) :: matrix .... call write_vector( matrix , size(matrix), iunit) .... end program myprogram !=========================== subroutine write_vector( vector , ilength, iunit) real(dp), dimension(ilength) :: vector write(iunit) vector end subroutine write_vector !========================================= This way of doing things has two advantages ...

Go Back   Application Development Forum > Programming Languages > Fortran

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-25-2008, 10:15 AM
Lorents
Guest
 
Default Most efficient way to write a large matrix to disk

I have to write to disk some rather large matrices (N=10000 or more) of
double precision numbers.
The output file is opened with FORM='UNFORMATTED'.
At the moment the writing is done passing the matrix to a subroutine
which treats the matrix as large vector and writes it down as such.
In its minimal form it's something like:
!=========================================
program myprogram
real(dp), dimension(Nmax,Nmax) :: matrix
....
call write_vector( matrix , size(matrix), iunit)
....
end program myprogram
!===========================
subroutine write_vector( vector , ilength, iunit)
real(dp), dimension(ilength) :: vector

write(iunit) vector

end subroutine write_vector
!=========================================
This way of doing things has two advantages (as far as I understand):
1) it's quite fast
2) it minimizes the number of record-length descriptor written down (and
hence, the file size)

My problem is that, if the total size of the matrix as given by
size(matrix) is greater than 2**31 I get several kinds of problems, one
of which being that the record-length descriptor overflows.

I think I have two ways of dealing with the situation.

The first way is using the STREAM (i.e., pure binary) format. This would
solve the record-length overflow problem (NB: I am using the Intel
compiler v9.1 under Linux 32bit or Linux 64bit).
Note that on Linux64 I can specify the use of large integers but the
record-length descriptor always is a 32-bit integer.

The second way is keeping the UNFORMATTED format but splitting up the
writing process. For several reasons (going to STREAM would imply
changing too many other things), I'm trying to pursue this strategy.
The first thing I tried was to write down the matrix column-by-column
using a piece of code of the kind
!=========================================
ilength = size(matrix, 1)
do i=1, Nmax
call write_vector( matrix(:,i) , ilength, iunit)
enddo
!=========================================

This implies writing an overhead of Nmax record-descriptors, which
however is negligible for big runs.
Now, from some benchmarks I ran (see later) ***it seems*** that the
"column-by-column" (CBC) way is always slower, and sometimes very much
so, than the "all at once" (AAO) one.
Now my questions are:
Is there any reason to expect CBC-writing to be much slower than AAO?
Is this a too much system-dependent thing to say anything at all?
Are there better ways of doing what I need? Perhaps using pointers?

Any suggestion would be welcome and thank you for your attention.

Lorenzo


DETAILS ON THE BENCHMARKS:
I did some tests writing to disk a matrix of doubleprecision random
numbers using the two CBC and AAO formats. Tests were done under Linux32
on a Pentium IV 3GHz with 3GiB of ram and a 50GB disk (I don't know if
IDE or SATA). The seed for the random number generator was taken from
/dev/random and so was different at each run.
The benchmark program was structured as follows: 1)build the matrix, 2)
write it to disk with AAO 3) write it to disk with CBC
The program was compiled with no optimizations(-O0 ) and intel compiler
v9.1. Times were clocked using the function SYSTEM_CLOCK
I repeated each run for three times to see if the results were similar.
Here are some representative results (times in seconds, N is the matrix
size).

N , AAO , CBC , FileSize
5000, 0.743s , 0.885s , 200MB
8000, 2.110s , 15.870s , 512MB
10000, 3.245s , 41.230s , 800MB
15000, 97.723s ,113.670s , 1800MB

Unfortunately, I'm afraid that various kinds of hardware or software
disk cache mechanisms mask the real performances.
For example, I can hardly believe in the N=10000 case that my disk can
really write 800MB in 3.245 seconds (it must be some kind of
cache/buffer effect...).
Until about N=7000 AAO and CBC are about equally fast, then there is an
interval where AAO is much faster, and then again their performance
seems to even out.

I also did some tests at N=10000 using full compiler optimization (-O3)
but results didn't seem to change appreciably from the no-optimization case.
Reply With Quote
  #2  
Old 08-25-2008, 12:55 PM
glen herrmannsfeldt
Guest
 
Default Re: Most efficient way to write a large matrix to disk

Lorents wrote:
> I have to write to disk some rather large matrices (N=10000 or more) of
> double precision numbers.
> The output file is opened with FORM='UNFORMATTED'.

(snip)

> Now, from some benchmarks I ran (see later) ***it seems*** that the
> "column-by-column" (CBC) way is always slower, and sometimes very much
> so, than the "all at once" (AAO) one.


> Now my questions are:
> Is there any reason to expect CBC-writing to be much slower than AAO?
> Is this a too much system-dependent thing to say anything at all?
> Are there better ways of doing what I need? Perhaps using pointers?


For reasonable N, I would not expect a large difference.
Also, some systems might want a buffer big enough for a single
WRITE, which would give CBC a big advantage...
(snip)

> Unfortunately, I'm afraid that various kinds of hardware or software
> disk cache mechanisms mask the real performances.
> For example, I can hardly believe in the N=10000 case that my disk can
> really write 800MB in 3.245 seconds (it must be some kind of
> cache/buffer effect...).
> Until about N=7000 AAO and CBC are about equally fast, then there is an
> interval where AAO is much faster, and then again their performance
> seems to even out.


Yes, it sounds like you are filling the disk cache with
the first case, and then for the second it actually waits
until it is written to disk.

Try exchanging the order of AAO and CBC. I think you will then
see that CBC is faster when it fits in cache and AAO doesn't.

> I also did some tests at N=10000 using full compiler optimization (-O3)
> but results didn't seem to change appreciably from the no-optimization
> case.


Most of the work is in library routines that don't change
with optimization level. Disk drive speeds don't change,
either.

-- glen

Reply With Quote
  #3  
Old 08-25-2008, 04:02 PM
Joe
Guest
 
Default Re: Most efficient way to write a large matrix to disk

Writing unformatted data in Fortran includes binary record-size marks
at each end. Depending on the compiler, this can cause a large
overhead for each write. The last time I used g77, it would write
unformatted records by doing a seek() to step over the first size
mark, write data, seek backwards to fill in the missing size mark,
then seek forward again. When writing to an NFS-mounted file,
performance could easily be about 100X slower when writing multiple
small records.

Even though the total record size should be easy to calculate to avoid
g77's method, there may be other issues where inserting the file-size
marks are reducing the buffering efficiency, depending on how smart
the compiler does I/O. If your compiler has a raw binary I/O extension
(some have access='BINARY'), you could try raw binary output, and see
if these issues go away.
Reply With Quote
  #4  
Old 08-26-2008, 04:08 AM
Arjan
Guest
 
Default Re: Most efficient way to write a large matrix to disk

Hi!

Without any intent of being rude (this is a serious consideration):
when you're handling such large matrices,
isn't it time to ask yourself the question: "Are all the data that I
want to store still meaningful?"
Nobody is able to grasp the significance of 10000*10000 numbers up to
the last double precision digit
behind the comma. Maybe this is the time to cut out the relevant part
of your matrix and just save that?
Is your matrix sparse? Is the condition number very large? In these
cases (and probably many others)
you can significantly reduce the size of the data to store!

Regards,

Arjan
Reply With Quote
  #5  
Old 08-26-2008, 02:53 PM
glen herrmannsfeldt
Guest
 
Default Re: Most efficient way to write a large matrix to disk

Arjan wrote:

> Without any intent of being rude (this is a serious
> consideration): when you're handling such large matrices,
> isn't it time to ask yourself the question: "Are all the
> data that I want to store still meaningful?"
> Nobody is able to grasp the significance of 10000*10000
> numbers up to the last double precision digit behind the comma.


That would be true for printing them out, but not always
for intermediate storage. For most large matrix calculations
you need double precision to get any significance at all
in the result. (and likely more than double)

> Maybe this is the time to cut out the relevant part
> of your matrix and just save that?
> Is your matrix sparse? Is the condition number very large?
> In these cases (and probably many others) you can
> significantly reduce the size of the data to store!


Certainly worth thinking about, but often you want
to write it out so you can decide later how much is
relevant.

-- glen

Reply With Quote
  #6  
Old 08-26-2008, 03:18 PM
Lorents
Guest
 
Default Re: Most efficient way to write a large matrix to disk

glen herrmannsfeldt wrote:
[...]
> For reasonable N, I would not expect a large difference.
> Also, some systems might want a buffer big enough for a single
> WRITE, which would give CBC a big advantage...
> (snip)


Ok, good to know :-)


>
> Yes, it sounds like you are filling the disk cache with
> the first case, and then for the second it actually waits
> until it is written to disk.


I did a few more tests and yes, the (software, I presume) disk
cache / write behind makes my rudimentary benchmark strategy
almost useless...
For very big runs clocked times increase a lot probably because
all the memory is used to store the matrix and there is no memory
left for the disk cache.

>
> Most of the work is in library routines that don't change
> with optimization level. Disk drive speeds don't change,
> either.
>


Yes, I thought as much too.

Thank you,

Lorenzo


Reply With Quote
  #7  
Old 08-26-2008, 03:23 PM
Lorents
Guest
 
Default Re: Most efficient way to write a large matrix to disk

Arjan wrote:
> Hi!
>
> Without any intent of being rude (this is a serious consideration):
> when you're handling such large matrices,
> isn't it time to ask yourself the question: "Are all the data that I
> want to store still meaningful?"


Hi,
unfortunately the answer is "yes" as these data
are intermediate for computing other quantities.
In my particular case, they are the nuclear-motion
wave functions for a triatomic molecule.
What is stored are the values of
each wave function on a 3D grid.
A typical grid is about 100x100x50 (in some coordinate
system) while the number of wave functions (states)
is of the order of 500-1000, so it's not too difficult
to get a huge number of values.
The wave functions are needed later
to calculate transition intensities and other quantities.

Lorenzo


Reply With Quote
  #8  
Old 08-28-2008, 05:55 PM
Arjan
Guest
 
Default Re: Most efficient way to write a large matrix to disk

> In my particular case, they are the nuclear-motion
> wave functions for a triatomic molecule.
> What is stored are the values of
> each wave function on a 3D grid.
> A typical grid is about 100x100x50 (in some coordinate
> system) while the number of wave functions (states)
> is of the order of 500-1000, so it's not too difficult
> to get a huge number of values.
> The wave functions are needed later
> to calculate transition intensities and other quantities.


This sounds like a PhD-project, am I right?

I'll continue to play the devil's advocate (still with all due
respect!):

> (in some coordinate system)


What's the grid? Which base do you take? It must be a very inefficient
base if you need
that many nodes! In e.g. a Proper Orthogonal Decomposition, you'll get
the relevant eigenstates.
The ones with relatively very small eigenvalues are insignificant.
Therefore, their transitions are irrelevant.
Throw them away and reduce the space required for your problem!
Storing 100*100*50 nodes in double precision cannot be meaningful.
This is not a programming/computer issue, but a lack of insight in the
physics!
Talk to someone who knows about the physics, not to a programmer!

Sorry if the above sounds blunt (which it does now that I read it
back),
but I strongly feel that you'll gain time if you take a totally
different approach.
Sorry for the inconvenience.

Best regards and good luck with your project,


Arjan
Reply With Quote
  #9  
Old 08-29-2008, 06:42 AM
Lorents
Guest
 
Default Re: Most efficient way to write a large matrix to disk

Arjan wrote:
>> In my particular case, they are the nuclear-motion
>> wave functions for a triatomic molecule.

[...]
>
> This sounds like a PhD-project, am I right?


Well, you are right in thinking I'm a PhD student (fortunately,
towards the end). However, it's not I who wrote this program
(actually, it's a suite of programs). What I am doing is
maintenance, fixing bugs and adding a few functionalities.
The original program suite is called DVR3D and was
written over a period of 20 years in my research group,
firstly but my supervisor and then by many others.
For the record, the full program suite can be found here:
http://www.tampa.phys.ucl.ac.uk/ftp/vr/cpc03/
The file cpc.pdf contains the paper describing the program
(it was later published in 2004 in Computer Physics
Communications). If you have a look at this file you'll perhaps
get an idea of the level of complexity of this whole thing...

>
> I'll continue to play the devil's advocate (still with all due
> respect!):
>


Please, do :-)

>> (in some coordinate system)

>
> What's the grid? Which base do you take? It must be a very inefficient
> base if you need
> that many nodes!


Ok, in fact I simplified a little. The program is not "grid based" in
the "naive" way, it used a numerical technique know as DVR,
discrete variable representation, which also involves generating
a grid. The grid points are the Gaussian quadrature points associated
with the basis functions chosen for the three coordinates. The coordinate
system used in my case are Radau coordinates... If you're really interested
you can have a look at the paper, unfortunately I'm afraid
the details are far too involved to be explained here on the newsgroup...
I can only say that for "ordinary" runs the grid size is much smaller,
about 45x45x35 but in the particular calculations at hand
we have to use a much larger one (we are investigated very
high energy states close to dissociation which are very delocalised).

Also, even if a cleverer way of dealing with this particular situation
existed, writing an ad hoc modification for this is not an option,
it'd be way to complicated and time consusiming for me to do now.

> In e.g. a Proper Orthogonal Decomposition, you'll get
> the relevant eigenstates.
> The ones with relatively very small eigenvalues are insignificant.
> Therefore, their transitions are irrelevant.
> Throw them away and reduce the space required for your problem!
> Storing 100*100*50 nodes in double precision cannot be meaningful.
> This is not a programming/computer issue, but a lack of insight in the
> physics!
> Talk to someone who knows about the physics, not to a programmer!


I think you're underestimating the difficulty of the problem at hand.
A three-body system may look small but it can still be very, very difficult
in some cases. Many people on my group work on the molecular
ion H3+ and even with present-day computers/ computer clusters
computations are heavily limited by computational limitations...

Lorenzo


Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 02:30 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.