| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| What is the best sorting algorithm if the file consists of many chunks of various sizes, each chunk is already ordered, but they are out of order? Example: 38013 38014 38015 33016 33017 33018 44034 44035 44036 44037 44038 44039 44040 44041 44042 44043 44044 44045 44046 44047 34892 34893 34894 34895 34896 34897 34898 34899 34900 34901 34902 34903 34904 34905 34906 38016 38017 38018 38019 38020 Here we have five already-ordered chunks, length 3, 3, 14, 15 and 5. I would think QuickSort is the best in such situation, but I am not sure. |
|
#2
| |||
| |||
| ilya2@rcn.com said: > What is the best sorting algorithm if the file consists of many chunks > of various sizes, each chunk is already ordered, but they are out of > order? Expect improvements from others on this idea: start a new group while the read of the next datum succeeds compare it to the previous datum (if there is one) in this group if it's greater, add it to this group otherwise, start a new group, and add this datum to the new group, which becomes the "current" group, the "this" group wend merge the groups in the canonical way -- Richard Heathfield <http://www.cpax.org.uk> Email: -http://www. +rjh@ Google users: <http://www.cpax.org.uk/prg/writings/googly.php> "Usenet is a strange place" - dmr 29 July 1999 |
|
#3
| |||
| |||
| ilya2@rcn.com wrote: > What is the best sorting algorithm if the file consists of many chunks > of various sizes, each chunk is already ordered, but they are out of > order? Read all the sorted chunks (detecting where a chunk ends and a new starts is a trivial conditional) and then merge-sort all the chunks together. (In other words, merge pairs of chunks, then merge pairs of the resulting chunks, etc. until you have only one left.) |
|
#4
| |||
| |||
| Richard Heathfield wrote: > ilya2@rcn.com said: > >> What is the best sorting algorithm if the file consists of many chunks >> of various sizes, each chunk is already ordered, but they are out of >> order? > > Expect improvements from others on this idea: > > start a new group > while the read of the next datum succeeds > compare it to the previous datum (if there is one) in this group > if it's greater, add it to this group > otherwise, start a new group, and add this datum to the new group, > which becomes the "current" group, the "this" group > wend > merge the groups in the canonical way I think this application would require great care. The scan to find where a new sorted group starts and ends will most likely cost more than a simple straightforward sort. There are too many unknowns to give much advice. How large is the file? How large and how many sorted groups are there? Are any elements repeated? This sure sounds like one of those cases where it could pay to make sure you are solving the right problem before jumping on what appears to be an obvious solution. On top of that this really seems like one of those times where it will be real easy to generate an O(n^2) solution when a standard library qsort or a standard mergesort would probably win without even considering any partial sort of the input. How about more information about the input? |
|
#5
| |||
| |||
| On Aug 21, 12:13*pm, Juha Nieminen <nos...@thanks.invalid> wrote: > il...@rcn.com wrote: > > What is the best sorting algorithm if the file consists of many chunks > > of various sizes, each chunk is already ordered, but they are out of > > order? > > * Read all the sorted chunks (detecting where a chunk ends and a new > starts is a trivial conditional) and then merge-sort all the chunks > together. (In other words, merge pairs of chunks, then merge pairs of > the resulting chunks, etc. until you have only one left.) Thanks -- it worked quite well. |
|
#6
| |||
| |||
| Juha Nieminen <nospam@thanks.invalid> writes: > ilya2@rcn.com wrote: >> What is the best sorting algorithm if the file consists of many chunks >> of various sizes, each chunk is already ordered, but they are out of >> order? > > Read all the sorted chunks (detecting where a chunk ends and a new > starts is a trivial conditional) and then merge-sort all the chunks > together. (In other words, merge pairs of chunks, then merge pairs of > the resulting chunks, etc. until you have only one left.) That's O(N^2). You must do the merge of all chunks at once to reduce it to O(N). As long as the number of chunks is not so big that at least one element from each chunk can be held in memory, it can be done in O(N). Read the file once, keeping in memory the start and end position of each chunk, and the first element of each chunk. Then scan all the chunks for the smallest element, and output it, advancing the chunk (read the next element, incrementing the current position for the chunk). Repeat until all chunks are empty. -- __Pascal Bourguignon__ http://www.informatimago.com/ HANDLE WITH EXTREME CARE: This product contains minute electrically charged particles moving at velocities in excess of five hundred million miles per hour. |
|
#7
| |||
| |||
| Pascal J. Bourguignon wrote: ) Juha Nieminen <nospam@thanks.invalid> writes: )> Read all the sorted chunks (detecting where a chunk ends and a new )> starts is a trivial conditional) and then merge-sort all the chunks )> together. (In other words, merge pairs of chunks, then merge pairs of )> the resulting chunks, etc. until you have only one left.) ) ) That's O(N^2). Nope, it's O(N log M) where M is the number of chunks. ) You must do the merge of all chunks at once to reduce it to O(N). That would imply that it is O(1) to find the smallest element each time. But it is actually O(log M) where M is the number of elements. ) As long as the number of chunks is not so big that at least one ) element from each chunk can be held in memory, it can be done in O(N). Because you're using 'O(N)' unqualified, I assume you're talking about time complexity. In which case, you're wrong, as shown above. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT |
|
#8
| |||
| |||
| stan wrote: ) Richard Heathfield wrote: )> start a new group )> while the read of the next datum succeeds )> compare it to the previous datum (if there is one) in this group )> if it's greater, add it to this group )> otherwise, start a new group, and add this datum to the new group, )> which becomes the "current" group, the "this" group )> wend )> merge the groups in the canonical way ) ) I think this application would require great care. The scan to ) find where a new sorted group starts and ends will most likely cost more ) than a simple straightforward sort. Don't be silly. Just above, Richard explains very simply how to fin d the start of each of the groups in O(N) time. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT |
|
#9
| |||
| |||
| On Thu, 21 Aug 2008 20:32:10 +0200, pjb@informatimago.com (Pascal J. Bourguignon) wrote: >Juha Nieminen <nospam@thanks.invalid> writes: > >> ilya2@rcn.com wrote: >>> What is the best sorting algorithm if the file consists of many chunks >>> of various sizes, each chunk is already ordered, but they are out of >>> order? >> >> Read all the sorted chunks (detecting where a chunk ends and a new >> starts is a trivial conditional) and then merge-sort all the chunks >> together. (In other words, merge pairs of chunks, then merge pairs of >> the resulting chunks, etc. until you have only one left.) > >That's O(N^2). No, its O(N*log2(M)) where M is the number of chunks. It's log2(M) passes with each pass being O(N). > >You must do the merge of all chunks at once to reduce it to O(N). It's still O(N*log2(M)). You need a data structure to hold headers for M chunks. Searching that data structure for the next item to merge in is O(log2(M)). > >As long as the number of chunks is not so big that at least one >element from each chunk can be held in memory, it can be done in O(N). > >Read the file once, keeping in memory the start and end position of >each chunk, and the first element of each chunk. > >Then scan all the chunks for the smallest element, and output it, >advancing the chunk (read the next element, incrementing the current >position for the chunk). Repeat until all chunks are empty. Richard Harter, cri@tiac.net http://home.tiac.net/~cri, http://www.varinoma.com Save the Earth now!! It's the only planet with chocolate. |
|
#10
| |||
| |||
| On Thu, 21 Aug 2008 16:13:56 GMT, Juha Nieminen <nospam@thanks.invalid> wrote: >ilya2@rcn.com wrote: >> What is the best sorting algorithm if the file consists of many chunks >> of various sizes, each chunk is already ordered, but they are out of >> order? > > Read all the sorted chunks (detecting where a chunk ends and a new >starts is a trivial conditional) and then merge-sort all the chunks >together. (In other words, merge pairs of chunks, then merge pairs of >the resulting chunks, etc. until you have only one left.) If the chunks are of significantly unequal size it is more efficient to sort the chunks by size and then successively merge the two smallest. The principle is the same as in Huffman encoding. Richard Harter, cri@tiac.net http://home.tiac.net/~cri, http://www.varinoma.com Save the Earth now!! It's the only planet with chocolate. |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.