| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| I am trying to research potential data structures for a feature in a program I am writing. I would like to build and maintain a numbered list of elements (like players in a game), ordered by an attribute (like points earned). Some pseudo code examples: > a=OrderedNumberedList() > a.set('joe', 500) > a.dump() (1, 'joe', 500) > a.set('fred', 200) > a.dump() (1, 'fred', 200) (2, 'fred', 500) > a.set('sally', 3) > a.dump() (1, 'fred', 200) (2, 'sally', 300) (3, 'joe', 500) > a.get('fred') (1, 'fred', 200) > a.get_value_range(200,400) (1, 'fred', 200) (2, 'sally', 300) > a.get_position_range(2,4) (2, 'sally', 300) (3, 'joe', 500) The challenge is that this must be a persistent data structure that needs to support millions of records and in-memory caching is not an available technique because the system is handling requests in a REST- like pattern, such that no memory state can be assumed to exist from the prior request. The system provides data indexes that can find records by value and walk ranges simply based on one attribute, but the system does not provide row numbers of any kind. This is not a SQL system, but more like an object store with simple indexes. It seems to me that even though keeping an accurate ordered numbering of records in a large, dynamic, disk based list is a known problem, a common solution does not appear evident to me. Please correct me if I am wrong. If this is a common problem with a common set of solutions please point me to some resources on this that I can research, because I haven't found much. I have seen it solved by periodically querying all records to a temporary table and numbering the records as you go. This solution is lacking in many ways, mainly accuracy, because it is just a snapshot. Believe me, I am considering whether to simplify the feature like this, but suggestions for how to implement it correctly would be more appreciated. Simply recording a ordered index number in each row and then attempting to keep the numbers up to date after every change would be 100% accurate but, in the worse case, it can result in updating every row in the table when one value changes. For example, if a change to a row's controlling value has moved it's position from the top of the list to the bottom, then every row in the table will have to be renumbered to move up one position. There are interesting possibilities for improving this. One possibility is to keep a index/numbered list, and maintain a separate modification list and, when querying, consult the ordered original list and the modification list to produce a final list. If I've inserted a record at the top using the modification list, I should be able to just adjust all index numbers by +1 and still be 100% accurate without renumbering. The reasoning here is that it would be acceptable to add processing to reads in order to eliminate the O(n) writes. This modification list could by incorporated into the main list by background or in-line processing that progressively eliminates the changes. Also, by building up a modification list, you can more efficiently aggregate work against the main list. As the time to adjust reads increases, this is balanced additional processing of the modification list to keep it small (hopefully achievable if the read/ write ratio is reasonable). I imagine heuristics could be employed to maximize efficiency since a general solution that is efficient for all list types seems unlikely. For example, in my situation I know that a rows controlling value should not change drastically, and so it should move up or down the list only a small amount. That may not be the case for all types of lists. So, before I go spending much more time on this problem I though I'd check with other folks to see if I have missed something. How does one go about keeping an accurate and up-to-date numbering or ranking of ordered records in a large, dynamically changing, disk based list? |
|
#2
| |||
| |||
| On Aug 30, 2:25*pm, gric...@gmail.com wrote: > I would like to build and maintain a numbered > list of elements (like players in a game), ordered by an attribute > (like points earned). Suggestion: keep the data as an ordered binary tree, and have each node maintain the size of its subtree. Node 0: [name: Sally; score: 300; left: node 1; right: node 2; size: 3] Node 1: [name: Fred; score: 200; left: null; right: null; size: 1] Node 2: [name: Joe; score: 500; left: null; right: null; size: 1] To find the N'th in rank in O(log N), start at the top and work recursively. If N <= left.size, find the N'th in rank in the left subtree. If N == 1 + left.size, it's the node you're looking at. If N > 1 + left.size, find the (N - 1 - left.size)'th in rank in the right subtree. To add a new element, use standard tree insertion but add one to each node on the path upwards to the root. Pete. |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.