splitting strings (swi-prolog) : PROLOG
This is a discussion on splitting strings (swi-prolog) within the PROLOG forums in Programming Languages category; Dustin Kick schrieb: > I'm trying to write a predicate that takes a string as input, and outputs > a list of strings delimited by spaces (for now). > How about : A List is a tokenization of a charact sequence separated by a separator string if every token is (ordered) within the sequence followed by the separator but the last token. %tokenized(sting, token list, separator). tokenized([], [[]], _). tokenized([C|Cs], [[C|TCs]|Ts], [S|Ss]) :- C \= S, tokenized(Cs, [TCs|Ts], [S|Ss]), !. tokenized([C|Cs], [[]|Ts], [C|Ss]) :- separated([C|Cs], Ts, [C|Ss], [C|Ss]). %separated(string, token list, separator, separator). separated([C|Cs], Ts, [], TempSs) :- tokenized([C|Cs], Ts, ...
![]() |
| | LinkBack | Thread Tools |
|
#11
| |||
| |||
| > I'm trying to write a predicate that takes a string as input, and outputs > a list of strings delimited by spaces (for now). > How about: A List is a tokenization of a charact sequence separated by a separator string if every token is (ordered) within the sequence followed by the separator but the last token. %tokenized(sting, token list, separator). tokenized([], [[]], _). tokenized([C|Cs], [[C|TCs]|Ts], [S|Ss]) :- C \= S, tokenized(Cs, [TCs|Ts], [S|Ss]), !. tokenized([C|Cs], [[]|Ts], [C|Ss]) :- separated([C|Cs], Ts, [C|Ss], [C|Ss]). %separated(string, token list, separator, separator). separated([C|Cs], Ts, [], TempSs) :- tokenized([C|Cs], Ts, TempSs). separated([S|Cs], Ts, [S|Ss], TempSs) :- separated(Cs, Ts, Ss, TempSs). ?- tokenized("test ; string", TokenList, " ; "), maplist(name, TextList, TokenList). TokenList = [[116, 101, 115, 116], [115, 116, 114, 105, 110, 103]], TextList = [test, string] ?- tokenized(String, ["test", "string"], " ; "), name(Text, String). String = [116, 101, 115, 116, 32, 59, 32, 115, 116|...], Text = 'test ; string' Regards Stephan |
|
#12
| |||
| |||
| On Tue, 04 Mar 2008 23:05:47 +0100 Markus Triska <triska@logic.at> wrote: > Dustin Kick<mac_vieuxnez@mac.com> writes: > > > I had to change one thing to make the munching work, this is > > functional just the way I wanted > > Consider DCGs for convenience - for example: > > string_tokens(Cs, Ts) :- phrase(tokens(Cs, []), Ts). > > tokens([], Ts) --> token(Ts). > tokens([C|Cs], Ts) --> > ( { C == 0' } -> token(Ts), tokens(Cs, []) > ; tokens(Cs, [C|Ts]) > ). > > token([]) --> []. > token([T|Ts]) --> { reverse([T|Ts], Token) }, [Token]. > > Yielding: > > ?- string_tokens("this is a test ", ["this", "is", "a", "test"]). > %@ true. string_tokens(Cs, StpS, Ts) :- phrase(tokens(Cs, StpS, []), Ts). tokens([], _, Ts) --> token(Ts). tokens([C|Cs], StpS, Ts) --> % ( { C == 0' } -> token(Ts), tokens(Cs, StpS, []) ( { memberchk(C,StpS) } -> token(Ts), tokens(Cs, StpS, []) ; tokens(Cs, StpS, [C|Ts]) ). token([]) --> []. token([T|Ts]) --> { reverse([T|Ts], Token) }, [Token]. Slight mods ... Dhu > > -- > comp.lang.prolog FAQ: http://www.logic.at/prolog/faq/ |
|
#13
| |||
| |||
| Thanks to those who have recommended definite clause grammars, I need to read about them, but it sound like it may have been what I had been looking for. I didn't find anything searching for tokenization, splitting strings, mapping characters or anything else I would have thought of. Definite Clause Grammar, of course, it just makes sense. If anyone has any ideas how to work difference lists into this, which I'm hoping will make it more efficient, and give me a chance to put difference lists into practice, I'd appreciate them. -- Dustin Kick http://homepage.mac.com/mac_vieuxnez |
|
#14
| |||
| |||
| On Tue, 04 Mar 2008 23:05:47 +0100 Markus Triska <triska@logic.at> wrote: > Dustin Kick<mac_vieuxnez@mac.com> writes: > > > I had to change one thing to make the munching work, this is > > functional just the way I wanted > > Consider DCGs for convenience - for example: > > string_tokens(Cs, Ts) :- phrase(tokens(Cs, []), Ts). > > tokens([], Ts) --> token(Ts). > tokens([C|Cs], Ts) --> Just as a matter of interest, what's this C == 0' notation? Why does 0' evaluate to 32 (space)? Dhu > ( { C == 0' } -> token(Ts), tokens(Cs, []) > ; tokens(Cs, [C|Ts]) > ). > > token([]) --> []. > token([T|Ts]) --> { reverse([T|Ts], Token) }, [Token]. > > Yielding: > > ?- string_tokens("this is a test ", ["this", "is", "a", "test"]). > %@ true. > > -- > comp.lang.prolog FAQ: http://www.logic.at/prolog/faq/ |
|
#15
| |||
| |||
| On Mar 6, 8:00 pm, Dustin Kick<mac_vieux...@mac.com> wrote: > Thanks to those who have recommended definite clause grammars, I need to > read about them, but it sound like it may have been what I had been > looking for. I didn't find anything searching for tokenization, splitting > strings, mapping characters or anything else I would have thought of. > Definite Clause Grammar, of course, it just makes sense. > If anyone has any ideas how to work difference lists into this, which I think that in most prolog systems, DCG's get translated into Prolog code with different lists, see: http://xsb.sourceforge.net/manual1/node155.html A DCG rule such as: p(X) -> q(X). will be translated (expanded) into: p(X, Li, Lo) :- q(X, Li, Lo). > I'm hoping will make it more efficient, and give me a chance to put > difference lists into practice, I'd appreciate them. > -- > > Dustin Kickhttp://homepage.mac.com/mac_vieuxnez DCG |
|
#16
| |||
| |||
| "Dustin Kick" <mac_vieuxnez@mac.com> ha scritto nel messaggio news:2t0Aj.110$zE5.34@newsfe02.lga... > > Thanks to those who have recommended definite clause grammars, I need to > read about them, but it sound like it may have been what I had been > looking for. I didn't find anything searching for tokenization, splitting > strings, mapping characters or anything else I would have thought of. > Definite Clause Grammar, of course, it just makes sense. > If anyone has any ideas how to work difference lists into this, which > I'm hoping will make it more efficient, and give me a chance to put > difference lists into practice, I'd appreciate them. > -- Many years ago, i wrote an interpreter, and the DCG via this code: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Definite Clause Grammar translator % from Clocksin, Mellish %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% :- op(251, fx, { ). :- op(250, xf, } ). :- op(255, xfx, -->). translate((P0 --> Q0), (P :- Q)) :- left_hand_side(P0, S0, S, P), right_hand_side(Q0, S0, S, Q1), flatten(Q1, Q), !. left_hand_side((NT, Ts), S0, _S, P) :- !, nonvar(NT), islist(Ts), tag(NT, S0, S1, P), append(Ts, S0, S1). left_hand_side(NT, S0, S, P) :- nonvar(NT), tag(NT, S0, S, P). right_hand_side((X1, X2), S0, S, P) :- !, right_hand_side(X1, S0, S1, P1), right_hand_side(X2, S1, S, P2), and(P1, P2, P). right_hand_side((X1 ; X2), S0, S, (P1 ; P2)) :- or(X1, S0, S, P1), or(X2, S0, S, P2). right_hand_side({P}, S, S, P) :- !. right_hand_side(!, S, S, !) :- !. right_hand_side(Ts, S0, S, true) :- islist(Ts), !, append(Ts, S, S0). right_hand_side(X, S0, S, P) :- tag(X, S0, S, P). or(X, S0, S, P) :- right_hand_side(X, S0a, S, Pa), ( var(S0a), S0a = S, !, S0 = S0a, ! = Pa; P = (S0 = S0a, Pa) ). tag(X, S0, S, P) :- X =.. [F | A], append(A, [S0, S], AX), P =.. [F | AX]. and(true, P, P) :- !. and(P, true, P) :- !. and(P, Q, (P, Q)). flatten(A, A) :- var(A), !. flatten((A, B), C) :- !, flatten1(A, C, R), flatten(B, R). flatten(A, A). flatten1(A, (A, R), R) :- var(A), !. flatten1((A, B), C, R) :- !, flatten1(A, C, R1), flatten1(B, R1, R). flatten1(A, (A, R), R). islist([]) :- !. islist([_|_]). append([A|B], C, [A|D]) :- append(B, C, D). append([], X, X). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % convert DCG rules to clauses %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% d2c :- clause((H-->T),true), translate((H-->T), Clause), assert(Clause), display(Clause), nl, fail. d2c. It's not so simple..., and indeed some time after i read inSterling-Shapiro 'The Art of Prolog' a simpler approach, maybe matched in SICTus implementation. Bye Carlo > > Dustin Kick > http://homepage.mac.com/mac_vieuxnez > |
|
#17
| |||
| |||
| I just got around to testing your solution, and it works nicely, just as you said it would, not that I doubted, but I don't understand the code, yet. Is there a goal I can run DCGs through to see the expanded code? Markus Triska <triska@logic.at> wrote: > > >Dustin Kick<mac_vieuxnez@mac.com> writes: > >> I had to change one thing to make the munching work, this is >> functional just the way I wanted > >Consider DCGs for convenience - for example: > > string_tokens(Cs, Ts) :- phrase(tokens(Cs, []), Ts). > > tokens([], Ts) --> token(Ts). > tokens([C|Cs], Ts) --> > ( { C == 0' } -> token(Ts), tokens(Cs, []) > ; tokens(Cs, [C|Ts]) > ). > > token([]) --> []. > token([T|Ts]) --> { reverse([T|Ts], Token) }, [Token]. > >Yielding: > > ?- string_tokens("this is a test ", ["this", "is", "a", "test"]). > %@ true. > >-- >comp.lang.prolog FAQ: http://www.logic.at/prolog/faq/ -- Dustin Kick http://homepage.mac.com/mac_vieuxnez |
|
#18
| |||
| |||
| Dustin Kick<mac_vieuxnez@mac.com> writes: > Is there a goal I can run DCGs through to see the expanded code? Use clause/2 to access its term representation. Also try listing/[01]: ?- listing(tokens). %@ tokens([], A, B, C) :- %@ token(A, B, C). %@ tokens([A|E], C, B, G) :- %@ ( A==32, %@ D=B %@ -> token(C, D, F), %@ tokens(E, [], F, G) %@ ; tokens(E, [A|C], B, G) %@ ). %@ true. -- comp.lang.prolog FAQ: http://www.logic.at/prolog/faq/ |
|
#19
| |||
| |||
| On Mon, 10 Mar 2008 21:30:07 +0100, Markus Triska wrote: > Dustin Kick<mac_vieuxnez@mac.com> writes: > >> Is there a goal I can run DCGs through to see the expanded code? > > Use clause/2 to access its term representation. Also try listing/[01]: > > ?- listing(tokens). > %@ tokens([], A, B, C) :- > %@ token(A, B, C). > %@ tokens([A|E], C, B, G) :- > %@ ( A==32, > %@ D=B > %@ -> token(C, D, F), > %@ tokens(E, [], F, G) > %@ ; tokens(E, [A|C], B, G) > %@ ). > %@ true. There are two things I can't grok: 1) the %@ : when I do ?- listing(tokens). those weird symbols don't show up. We are using the same SWI, or not ? 2) why is there 32 in the output, while the original program had 0' ? is this unavoidable, an SWI bug or an ISO Prolog inconsistency ? Cheers Bart Demoen |
|
#20
| |||
| |||
| On 2008-03-10, bart demoen <bmd@cs.kuleuven.be> wrote: > On Mon, 10 Mar 2008 21:30:07 +0100, Markus Triska wrote: > >> Dustin Kick<mac_vieuxnez@mac.com> writes: >> >>> Is there a goal I can run DCGs through to see the expanded code? >> >> Use clause/2 to access its term representation. Also try listing/[01]: >> >> ?- listing(tokens). >> %@ tokens([], A, B, C) :- >> %@ token(A, B, C). >> %@ tokens([A|E], C, B, G) :- >> %@ ( A==32, >> %@ D=B >> %@ -> token(C, D, F), >> %@ tokens(E, [], F, G) >> %@ ; tokens(E, [A|C], B, G) >> %@ ). >> %@ true. > > There are two things I can't grok: > > 1) the %@ : when I do ?- listing(tokens). those weird symbols don't show > up. We are using the same SWI, or not ? I leave that to Markus > 2) why is there 32 in the output, while the original program had 0' ? > is this unavoidable, an SWI bug or an ISO Prolog inconsistency ? You know the answer: as it stands in ISO, it is unavoidable. The tokeniser must translate 0' into the character code of the space. In general that is even undefined but SWI-Prolog is internally Unicode, so it is defined as 32, regardless of the locale. characters codes however are no special type and therefore cannot be distinguished from integers. I'm not sure whether ISO would allow for a subtype of integer that represents character codes. Possibly. Same for [32] and " ", etc. To a certain extend this can be remedied using ?- set_prolog_flag(double_quotes, chars). It doesn't fix all issues though, and a global flag that introduces such big incompatibilities causes more troubles than it solves. I never touch that flag for any real programming task. I once raised a similar issues about [] == [ ] == [/*empty list*/] == '[]' It is fine for the first three to be equal, but I still have doubts on the latter. Same for {}, though this causes less confusing on practice. I don't think there is an easy fix to these issues without introducing serious compatibility issues. Cheers --- Jan |


