Managing an ordered, numbered, large, disk-based ranking list

This is a discussion on Managing an ordered, numbered, large, disk-based ranking list within the Programming Languages forums in category; I am trying to research potential data structures for a feature in a program I am writing. I would like to build and maintain a numbered list of elements (like players in a game), ordered by an attribute (like points earned). Some pseudo code examples: > a=OrderedNumberedList() > a.set('joe', 500) > a.dump() (1, 'joe', 500) > a.set('fred', 200) > a.dump() (1, 'fred', 200) (2, 'fred', 500) > a.set('sally', 3) > a.dump() (1, 'fred', 200) (2, 'sally', 300) (3, 'joe', 500) > a.get('fred') (1, 'fred', 200) > a.get_value_range(200,400) (1, 'fred', 200) (2, 'sally', 300) > a.get_position_range(2,4) (2, 'sally', 300) (3, 'joe', ...

Go Back   Application Development Forum > Programming Languages

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-30-2008, 02:25 PM
grick23@gmail.com
Guest
 
Default Managing an ordered, numbered, large, disk-based ranking list

I am trying to research potential data structures for a feature in a
program I am writing. I would like to build and maintain a numbered
list of elements (like players in a game), ordered by an attribute
(like points earned). Some pseudo code examples:

> a=OrderedNumberedList()
> a.set('joe', 500)
> a.dump()

(1, 'joe', 500)

> a.set('fred', 200)
> a.dump()

(1, 'fred', 200)
(2, 'fred', 500)

> a.set('sally', 3)
> a.dump()

(1, 'fred', 200)
(2, 'sally', 300)
(3, 'joe', 500)

> a.get('fred')

(1, 'fred', 200)

> a.get_value_range(200,400)

(1, 'fred', 200)
(2, 'sally', 300)

> a.get_position_range(2,4)

(2, 'sally', 300)
(3, 'joe', 500)


The challenge is that this must be a persistent data structure that
needs to support millions of records and in-memory caching is not an
available technique because the system is handling requests in a REST-
like pattern, such that no memory state can be assumed to exist from
the prior request. The system provides data indexes that can find
records by value and walk ranges simply based on one attribute, but
the system does not provide row numbers of any kind. This is not a SQL
system, but more like an object store with simple indexes.

It seems to me that even though keeping an accurate ordered numbering
of records in a large, dynamic, disk based list is a known problem, a
common solution does not appear evident to me. Please correct me if I
am wrong. If this is a common problem with a common set of solutions
please point me to some resources on this that I can research, because
I haven't found much. I have seen it solved by periodically querying
all records to a temporary table and numbering the records as you go.
This solution is lacking in many ways, mainly accuracy, because it is
just a snapshot. Believe me, I am considering whether to simplify the
feature like this, but suggestions for how to implement it correctly
would be more appreciated.

Simply recording a ordered index number in each row and then
attempting to keep the numbers up to date after every change would be
100% accurate but, in the worse case, it can result in updating every
row in the table when one value changes. For example, if a change to a
row's controlling value has moved it's position from the top of the
list to the bottom, then every row in the table will have to be
renumbered to move up one position.

There are interesting possibilities for improving this. One
possibility is to keep a index/numbered list, and maintain a separate
modification list and, when querying, consult the ordered original
list and the modification list to produce a final list. If I've
inserted a record at the top using the modification list, I should be
able to just adjust all index numbers by +1 and still be 100% accurate
without renumbering. The reasoning here is that it would be acceptable
to add processing to reads in order to eliminate the O(n) writes. This
modification list could by incorporated into the main list by
background or in-line processing that progressively eliminates the
changes. Also, by building up a modification list, you can more
efficiently aggregate work against the main list. As the time to
adjust reads increases, this is balanced additional processing of the
modification list to keep it small (hopefully achievable if the read/
write ratio is reasonable). I imagine heuristics could be employed to
maximize efficiency since a general solution that is efficient for all
list types seems unlikely. For example, in my situation I know that a
rows controlling value should not change drastically, and so it should
move up or down the list only a small amount. That may not be the case
for all types of lists.


So, before I go spending much more time on this problem I though I'd
check with other folks to see if I have missed something. How does one
go about keeping an accurate and up-to-date numbering or ranking of
ordered records in a large, dynamically changing, disk based list?
Reply With Quote
  #2  
Old 09-01-2008, 10:16 AM
Pete Bevin
Guest
 
Default Re: Managing an ordered, numbered, large, disk-based ranking list

On Aug 30, 2:25*pm, gric...@gmail.com wrote:
> I would like to build and maintain a numbered
> list of elements (like players in a game), ordered by an attribute
> (like points earned).


Suggestion: keep the data as an ordered binary tree, and have each
node maintain the size of its subtree.

Node 0: [name: Sally; score: 300; left: node 1; right: node 2; size:
3]
Node 1: [name: Fred; score: 200; left: null; right: null; size: 1]
Node 2: [name: Joe; score: 500; left: null; right: null; size: 1]

To find the N'th in rank in O(log N), start at the top and work
recursively. If N <= left.size, find the N'th in rank in the left
subtree. If N == 1 + left.size, it's the node you're looking at. If
N > 1 + left.size, find the (N - 1 - left.size)'th in rank in the
right subtree.

To add a new element, use standard tree insertion but add one to each
node on the path upwards to the root.

Pete.
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 05:17 PM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.