| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| Dear all, Some weeks ago I've noticed that IDL is rather slow with FOR loops the hard way...after reading about it and not believing it, offcourse. Last week I started removing some FOR loops in my code, apart from some embarasing ones like: sum = 0 FOR i=0,max_y-1 DO BEGIN FOR j=0,max-1 DO BEGIN sum_int = sum_int + data[i,j] ENDFOR ENDFOR which has a straightforward solution. However I have some other FOR loops which aren't that obvious at all. Like for instance this one: FOR l = 0, n*2 DO BEGIN temp = 0 FOR i =0,max_y-1 DO BEGIN FOR j=0,max_x-1 DO BEGIN jtemp = j + l jtemp2 = j + n temp = temp + (arr[i,jtemp] * arr [i,jtemp2]) ENDFOR ENDFOR output[l] = temp/(max_x*max_y) ENDFOR which I could alter, not that straigthforwardly into: z = size(arr) arr = reform(in_arr, z[1]*z[2], /overwrite) endt = (max_y*max_x)-1 FOR l = 0, n*2 DO BEGIN temp = 0 FOR i=0,endt DO temp = temp + (arr[i+l*max_y] * in_arr [i +n*max_y]) output[l] = temp/(max_x*max_y) ENDFOR in_arr = reform(in_arr, z[1],z[2], /overwrite) where 1 FOR loop is removed. However there is hardly any time profit at all. It is even so that the following code is faster than both, which is a very straightforward alteration of the first loop: FOR l = 0, n*2 DO BEGIN temp = 0 FOR i =0,max_y-1 DO FOR j=0,max_x-1 DO temp = temp + (arr[i,j j + l] * arr [i, j + n]) output[l] = temp/(max_x*max_y) ENDFOR. With the following variables set and the for loops repeated with another FOR loop of i= 0,10000 (to see a time difference, and in the full program it is repeated about that many times too, but with even larger arrays): n = 8 max_x = 5 max_y = 5 output = fltarr(2*n+1) arr = findgen(interval_y, region) +1 I have for the first for loop : 1.6279998 s the second : 1.6060002 s the third : 1.2749999 s I measured the times with SYSTIME, /SECONDS command. (the full program takes 22,5 h for a standard image, with the alterations I have allready done, I've came up with 18.1 h, which is still 17 h to go to make it workable, i've used the last loop in the above example so far...) Can anybody tell me why removing one loop doesn't help in this case or what i'm doing wrong? thnx Stijn Van der Linden |
|
#2
| |||
| |||
| On Tue, 19 Aug 2008 05:38:50 -0700 (PDT), loebasboy <stijn.vdl@gmail.com> wrote: > FOR l = 0, n*2 DO BEGIN > temp = 0 > FOR i =0,max_y-1 DO BEGIN > FOR j=0,max_x-1 DO BEGIN > jtemp = j + l > jtemp2 = j + n > temp = temp + (arr[i,jtemp] * arr [i,jtemp2]) > ENDFOR > ENDFOR > output[l] = temp/(max_x*max_y) > ENDFOR The code below is a start. Does this processing have a name? It feels familiar somehow. Btw, in IDL the first index of an array is the column and the second is the row. So in your case y are the columns and x are the rows. No problem with that off course, just check whether this is how you intended it. n = 8 max_x = 5 max_y = 5 output = fltarr(2*n+1) arr = findgen(max_y, 2*n+max_x) +1 arr2=arr[0:max_y-1,n:max_x-1+n] FOR l = 0, 2*n DO $ output[l] = total(arr[0:max_y-1,l:max_x-1+l]*arr2) output/=max_x*max_y |
|
#3
| |||
| |||
| Wox writes: > The code below is a start. Does this processing have a name? It feels > familiar somehow. Btw, in IDL the first index of an array is the > column and the second is the row. So in your case y are the columns > and x are the rows. No problem with that off course, just check > whether this is how you intended it. And, of course, it *will* matter (especially in loops) how you access the data. Always keep in mind that internally data is stored in memory in row order (column indices vary faster than row indices). Loops that keep this in mind are fast. Loops that don't keep this in mind are what IDL users complain about. :-) Cheers, David -- David Fanning, Ph.D. Fanning Software Consulting, Inc. Coyote's Guide to IDL Programming: http://www.dfanning.com/ Sepore ma de ni thui. ("Perhaps thou speakest truth.") |
|
#4
| |||
| |||
| On Aug 19, 1:38 pm, loebasboy <stijn....@gmail.com> wrote: > Like for instance this one: > > FOR l = 0, n*2 DO BEGIN > temp = 0 > FOR i =0,max_y-1 DO BEGIN > FOR j=0,max_x-1 DO BEGIN > jtemp = j + l > jtemp2 = j + n > temp = temp + (arr[i,jtemp] * arr [i,jtemp2]) > ENDFOR > ENDFOR > output[l] = temp/(max_x*max_y) > ENDFOR Hi Stijn, it's been a while since I've had my IDL hat on, but you may be able to make use of IDL's nifty array multiplication here - if two arrays A and B have the same number of elements, then C = A * B will also have that number of elements, with each individual element the product of that element in the initial two arrays. IDL does this VERY efficiently. Given this, I think the inner two FOR loops in the above example can be replaced by: Output[l] = TOTAL(arr[0:max_y-1, l:l+max_x-1] * $ arr[0:max_y-1, n:n+max_x-1]) / (max_x*max_y) Like I say, I'm out of practice, so this might not work. But I think it should. Regards, Chris > > thnx > Stijn Van der Linden |
|
#5
| |||
| |||
| > Can anybody tell me why removing one loop doesn't help in this case or > what i'm doing wrong? I think the main reason why your second code snippet isn't much faster than the first is because it's not a very good vectorization (the term to describe the elimination of loops in favor of array based operations). Not being a computer scientist, I don't actually 'understand' what IDL does when it compiles and runs code. But the image in my mind is akin this old man I saw in a post office one time. He couldn't really hear that well, and kept (loudly) asking the post office clerk 'when the hell those Rat-a-ville stamps are coming in' (after eavesdropping for a while, I realized that he was actually sent by his wife to buy stamps from the movie 'Ratatouille'). After the clerk (repeatedly) told him that a) it was pronounced 'rat-uh-too-eee' and b) they would get them next week, the old man was on his way. The impressive thing was that this 90 year old man FLEW out of the post office when he was done. He was fast - like Lolo Jones fast. How does this connect? IDL for loops are slow because the part of IDL that interprets your file a fast but crotchety old man who can't hear you very well and may not even really be listening. Any time you tell him to do something, it takes him a while to interpret what you just said - much longer than other, less crotchety men. Once he figures out what's going on, however, he's plenty fast (especially if you tell him to do something that he was already designed to do, for which he has been well optimized). Good vectorization, then, minimizes the number of instructions (e.g. iterations in a loop) while maximizing the amount of work to do with each instruction. Your second loop doesn't have any fewer iterations than the first loop - it just gets rid of one nested for loop and increases the size of the previous loop. Un-nesting the loops helps a bit (looping the loop is two layers of interpretation. IDL has no patience for such tasks. He lived through the depression and fought the Germans), but you really aren't following the principle of 'loop less with bigger processing chunks in each step.' Wox's code is the right way to vectorize your loop. It truly iterates fewer times, and gives IDL more to chew with each line of instruction. I wouldn't bother eliminating the L loop. As soon as you do some hefty processing in each iteration, the looping penalty goes away, and you don't need to worry about your vectorization creating huge temporary arrays and paying penalties in memory allocation. As long as you don't have any loops where, at each iteration, you are simply accessing an element of an array, IDL should be pretty fast. chris |
|
#6
| |||
| |||
| Chris writes: > IDL for loops are slow because the part of IDL > that interprets your file a fast but crotchety old man who can't hear > you very well and may not even really be listening. Any time you tell > him to do something, it takes him a while to interpret what you just > said - much longer than other, less crotchety men. Once he figures out > what's going on, however, he's plenty fast (especially if you tell him > to do something that he was already designed to do, for which he has > been well optimized). Good vectorization, then, minimizes the number > of instructions (e.g. iterations in a loop) while maximizing the > amount of work to do with each instruction. I think this is a transparent attempt to characterize certain persons on this newsgroup and I deeply resent it. :-( Cheers, David P.S. Let's just say I would type a smiley face, but my feelings have been cut too close to the bone. :-) -- David Fanning, Ph.D. Fanning Software Consulting, Inc. Coyote's Guide to IDL Programming: http://www.dfanning.com/ Sepore ma de ni thui. ("Perhaps thou speakest truth.") |
|
#7
| |||
| |||
| On Aug 19, 3:43*pm, Wox <nom...@hotmail.com> wrote: > On Tue, 19 Aug 2008 05:38:50 -0700 (PDT), loebasboy > > <stijn....@gmail.com> wrote: > > * * *FOR l = 0, n*2 DO BEGIN > > * * * *temp *= 0 > > * * * *FOR i =0,max_y-1 DO BEGIN > > * * * * *FOR j=0,max_x-1 DO BEGIN > > * * * * * *jtemp = j + l > > * * * * * *jtemp2 = j + n > > * * * * * *temp = temp + (arr[i,jtemp] * arr [i,jtemp2]) > > * * * * *ENDFOR > > * * * *ENDFOR > > * * * *output[l] = temp/(max_x*max_y) > > * * *ENDFOR > > The code below is a start. Does this processing have a name? It feels > familiar somehow. Btw, in IDL the first index of an array is the > column and the second is the row. So in your case y are the columns > and x are the rows. No problem with that off course, just check > whether this is how you intended it. > > n = 8 > max_x = 5 > max_y = 5 > output = fltarr(2*n+1) > arr = findgen(max_y, 2*n+max_x) +1 > > arr2=arr[0:max_y-1,n:max_x-1+n] > FOR l = 0, 2*n DO $ > * * * * output[l] = total(arr[0:max_y-1,l:max_x-1+l]*arr2) > output/=max_x*max_y Thank you for your code, it works rather well, maybe it seems familiar because it's a kind of autocorrelation that I'm calculating... . I think I still need some vectorisation training to get IDL much faster, I've calculated a time profit of 14 h (that makes 8.5 h instead of 22.5 h), so I still have some FOR loops I can train on .Thanks for helping finding my way and the fast answers, I think I will definitely post again when I'm really stuck again . |
|
#8
| |||
| |||
| On Aug 19, 9:43*am, Wox <nom...@hotmail.com> wrote: > On Tue, 19 Aug 2008 05:38:50 -0700 (PDT), loebasboy > > <stijn....@gmail.com> wrote: > > * * *FOR l = 0, n*2 DO BEGIN > > * * * *temp *= 0 > > * * * *FOR i =0,max_y-1 DO BEGIN > > * * * * *FOR j=0,max_x-1 DO BEGIN > > * * * * * *jtemp = j + l > > * * * * * *jtemp2 = j + n > > * * * * * *temp = temp + (arr[i,jtemp] * arr [i,jtemp2]) > > * * * * *ENDFOR > > * * * *ENDFOR > > * * * *output[l] = temp/(max_x*max_y) > > * * *ENDFOR > > The code below is a start. Does this processing have a name? It feels > familiar somehow. Btw, in IDL the first index of an array is the > column and the second is the row. So in your case y are the columns > and x are the rows. No problem with that off course, just check > whether this is how you intended it. > > n = 8 > max_x = 5 > max_y = 5 > output = fltarr(2*n+1) > arr = findgen(max_y, 2*n+max_x) +1 > > arr2=arr[0:max_y-1,n:max_x-1+n] > FOR l = 0, 2*n DO $ > * * * * output[l] = total(arr[0:max_y-1,l:max_x-1+l]*arr2) > output/=max_x*max_y Following on that last version, I think we can *completely* get rid of the loop... though at the expense (as usual) of memory: n = 8 max_x = 5 max_y = 5 arr = findgen(max_y, 2*n+max_x) +1 max_area = max_x*max_y output = total( arr[rebin(lindgen(max_area),max_area,2*n+1) + max_y*rebin(reform(lindgen(2*n+1),1,2*n+1),max_are a,2*n+1)] * rebin( (arr[*,n:max_x-1+n])[*], max_area,2*n+1), 1) / max_area Whether that's actually faster will depend on how big max_x, max_y and n are, of course... it ends up internally storing a couple of max_x*max_y*(2*n+1) arrays, so if that is going to take you into swap then you're best off sticking with Wox's version. If that stays in physical memory, though, I bet this will win. -Jeremy. |
|
#9
| |||
| |||
| On 20 aug, 13:50, Jeremy Bailin <astroco...@gmail.com> wrote: > On Aug 19, 9:43*am, Wox <nom...@hotmail.com> wrote: > > > > > > > On Tue, 19 Aug 2008 05:38:50 -0700 (PDT), loebasboy > > > <stijn....@gmail.com> wrote: > > > * * *FOR l = 0, n*2 DO BEGIN > > > * * * *temp *= 0 > > > * * * *FOR i =0,max_y-1 DO BEGIN > > > * * * * *FOR j=0,max_x-1 DO BEGIN > > > * * * * * *jtemp = j + l > > > * * * * * *jtemp2 = j + n > > > * * * * * *temp = temp + (arr[i,jtemp] * arr [i,jtemp2]) > > > * * * * *ENDFOR > > > * * * *ENDFOR > > > * * * *output[l] = temp/(max_x*max_y) > > > * * *ENDFOR > > > The code below is a start. Does this processing have a name? It feels > > familiar somehow. Btw, in IDL the first index of an array is the > > column and the second is the row. So in your case y are the columns > > and x are the rows. No problem with that off course, just check > > whether this is how you intended it. > > > n = 8 > > max_x = 5 > > max_y = 5 > > output = fltarr(2*n+1) > > arr = findgen(max_y, 2*n+max_x) +1 > > > arr2=arr[0:max_y-1,n:max_x-1+n] > > FOR l = 0, 2*n DO $ > > * * * * output[l] = total(arr[0:max_y-1,l:max_x-1+l]*arr2) > > output/=max_x*max_y > > Following on that last version, I think we can *completely* get rid of > the loop... though at the expense (as usual) of memory: > > n = 8 > max_x = 5 > max_y = 5 > arr = findgen(max_y, 2*n+max_x) +1 > > max_area = max_x*max_y > output = total( arr[rebin(lindgen(max_area),max_area,2*n+1) + > * max_y*rebin(reform(lindgen(2*n+1),1,2*n+1),max_are a,2*n+1)] * > * rebin( (arr[*,n:max_x-1+n])[*], max_area,2*n+1), 1) / max_area > > Whether that's actually faster will depend on how big max_x, max_y and > n are, of course... it ends up internally storing a couple of > max_x*max_y*(2*n+1) arrays, so if that is going to take you into swap > then you're best off sticking with Wox's version. If that stays in > physical memory, though, I bet this will win. > > -Jeremy.- Tekst uit oorspronkelijk bericht niet weergeven - > > - Tekst uit oorspronkelijk bericht weergeven - Great code, that is another hour of time profit again, and it works. So vectorization comes down to instead of repeating an action per element in matrix, putting all elements on the right spot in a matrix and doing the action on the matrix, right? The only problem is then, finding out when this is possible, or when it isn't .thank you all, for all the great info allready! |
|
#10
| |||
| |||
| > So vectorization comes down to instead of repeating an action per > element in matrix, putting all elements on the right spot in a matrix > and doing the action on the matrix, right? Yup, that sums it up pretty well! -Jeremy. |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.