| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| I have developed some simple tools for working with hierarchical data that basically operate on nested association lists. For example, rs-ref is function that operates on a record-set*, returns a list of record sets, and accepts :key and :test arguments for filtering. A sample query might look like: (rs-ref (as-record-set (root (record (sub-type1 (tagged-val 1)) (sub-type2 (tagged-val 2))) (record (sub-type1 (tagged-val 3)) (sub-type2 (tagged-val 4))))) '(record sub-type1) :key (lambda (rs) (rs-member rs 'tagged-val)) :test (lambda (val) (> val 2))) => (#<[TREE-NODE| :TYPE SUB-TYPE1 :RECORDS (#<[TREE-LEAF|:TYPE TAGGED-VAL :VALUE 3]>)]>) The implementation is not terribly optimized. Usually I read some semi- structured file into a record-set, then write out some appropriately massaged query data. It is usually I/O bound, and if I'm consing a lot I can provide a context for memoization: (with-rs-cache <lots of queries and output>) However, I am starting to do more ad hoc joins on data from different sources, and it would be nice if I could make this more general and efficient. Any ideas? Matt *A record-set is anything that implements the record-set protocol, but currently I have implemented only one set of concrete types. |
|
#2
| |||
| |||
| * Matthew D Swank <43_Qk.5154$J21.2343@newsfe01.iad> : Wrote on Fri, 07 Nov 2008 16:30:56 GMT: | I have developed some simple tools for working with hierarchical data | that basically operate on nested association lists. | | For example, rs-ref is function that operates on a record-set*, returns a | list of record sets, and accepts :key and :test arguments for filtering. | | A sample query might look like: | (rs-ref | (as-record-set (root | (record (sub-type1 (tagged-val 1)) | (sub-type2 (tagged-val 2))) | (record (sub-type1 (tagged-val 3)) | (sub-type2 (tagged-val 4))))) | '(record sub-type1) | :key (lambda (rs) (rs-member rs 'tagged-val)) | :test (lambda (val) (> val 2))) | | => | (#<[TREE-NODE| | :TYPE SUB-TYPE1 :RECORDS (#<[TREE-LEAF|:TYPE TAGGED-VAL :VALUE 3]>)]>) | | The implementation is not terribly optimized. Usually I read some semi- | structured file into a record-set, then write out some appropriately | massaged query data. It is usually I/O bound, and if I'm consing a lot I | can provide a context for memoization: | | However, I am starting to do more ad hoc joins on data from different | sources, and it would be nice if I could make this more general and | efficient. | | Any ideas? Perhaps an existing optimized pattern matcher would work. Here's how the above example might look in Sunil Mishra's Meta-Circular Pattern Matcher, MCPat <URL:http://www.sfmishras.com/smishra/mcpat/> ;; Without using :constraints matching, (defpackage "MCPAT-USER" (:use "CL" "MCPAT")) (in-package "MCPAT-USER") (defvar *sexp* '(root (record (sub-type1 (tagged-val 1)) (sub-type2 (tagged-val 2))) (record (sub-type1 (tagged-val 3)) (sub-type2 (tagged-val 4))))) (in-ruleset :MSWANK-RS encoded-ruleset) (clear-ruleset :MSWANK-RS) (defrule root-matcher-42 ((root . ?records)) `(:tree-node :type sub-type1 :records ,(mapcan 'solve ?records))) (defrule record-matcher-42 ((record . ?subtypes)) (mapcan (lambda (subtype) (solve `(subtype ,subtype))) ?subtypes))) (defrule subtype-matcher-42 ((subtype (?subtype (tagged-val ?val)))) (when (and (eq ?subtype 'sub-type1) (> ?val 2)) `((:tree-leaf tagged-val ,?val)))) (solve *sexp*) => (:TREE-NODE :TYPE SUB-TYPE1 :RECORDS ((:TREE-LEAF TAGGED-VAL 3))) -- Madhu |
|
#3
| |||
| |||
| On Sun, 09 Nov 2008 00:18:39 +0530, Madhu wrote: > * Matthew D Swank <43_Qk.5154$J21.2343@newsfe01.iad> : Wrote on Fri, 07 > Nov 2008 16:30:56 GMT: > > | I have developed some simple tools for working with hierarchical data > | that basically operate on nested association lists. .... > | However, I am starting to do more ad hoc joins on data from different > | sources, and it would be nice if I could make this more general and | > efficient. > | > | Any ideas? > > Perhaps an existing optimized pattern matcher would work. Here's how the > above example might look in Sunil Mishra's Meta-Circular Pattern > Matcher, MCPat <URL:http://www.sfmishras.com/smishra/mcpat/> > > ;; Without using :constraints matching, > > (defpackage "MCPAT-USER" > (:use "CL" "MCPAT")) > (in-package "MCPAT-USER") > > (defvar *sexp* '(root > (record > (sub-type1 (tagged-val 1)) > (sub-type2 (tagged-val 2))) > (record > (sub-type1 (tagged-val 3)) > (sub-type2 (tagged-val 4))))) > > > (in-ruleset :MSWANK-RS encoded-ruleset) (clear-ruleset :MSWANK-RS) > > (defrule root-matcher-42 > ((root . ?records)) > `(:tree-node :type sub-type1 :records ,(mapcan 'solve ?records))) > > (defrule record-matcher-42 > ((record . ?subtypes)) > (mapcan (lambda (subtype) (solve `(subtype ,subtype))) ?subtypes))) > > (defrule subtype-matcher-42 > ((subtype (?subtype (tagged-val ?val)))) > (when (and (eq ?subtype 'sub-type1) > (> ?val 2)) > `((:tree-leaf tagged-val ,?val)))) > > (solve *sexp*) > > => (:TREE-NODE :TYPE SUB-TYPE1 :RECORDS ((:TREE-LEAF TAGGED-VAL 3))) Interesting, if a bit verbose. It's not immediately evident to me how solve based one more than one tree. I suppose we're getting in to logic- programming/db-query-engine land. If look at it as a representation problem, there are things I like about nested lists. 1. functional construction 2. preservation of order 3. preservation of multiplicity However, naive construction of result sets requires lots of intermediate consing, and joins require linear search. Using the complete cached representation as _the_ representation would speed queries up, but not joins. And it would be very expensive (I suppose this is like the power set?) Ex. (root (record (sub-type1 (tagged-val 1)) (sub-type2 (tagged-val 2))) (record (sub-type1 (tagged-val 3)) (sub-type2 (tagged-val 4)))) becomes (as a map): { nil, ((root (record (sub-type1 (tagged-val 1)) (sub-type2 (tagged-val 2))) (record (sub-type1 (tagged-val 3)) (sub-type2 (tagged-val 4))))) (record), ((record (sub-type1 (tagged-val 1)) (sub-type2 (tagged-val 2))) (record (sub-type1 (tagged-val 3)) (sub-type2 (tagged-val 4)))) (record sub-type1), ((sub-type1 (tagged-val 1)) (sub-type1 (tagged-val 3))) .... etc } I know a lot of this is obvious, but I am thinking out loud and it's being caught by a net. I want to be fast lazy, and functional; is that so hard? ![]() Matt -- question = ( to ) ? be : ! be; -- Wm. Shakespeare |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.