Objectmix
Tags Register Mark Forums Read

splitting strings (swi-prolog) : PROLOG

This is a discussion on splitting strings (swi-prolog) within the PROLOG forums in Programming Languages category; Dustin Kick schrieb: > I'm trying to write a predicate that takes a string as input, and outputs > a list of strings delimited by spaces (for now). > How about : A List is a tokenization of a charact sequence separated by a separator string if every token is (ordered) within the sequence followed by the separator but the last token. %tokenized(sting, token list, separator). tokenized([], [[]], _). tokenized([C|Cs], [[C|TCs]|Ts], [S|Ss]) :- C \= S, tokenized(Cs, [TCs|Ts], [S|Ss]), !. tokenized([C|Cs], [[]|Ts], [C|Ss]) :- separated([C|Cs], Ts, [C|Ss], [C|Ss]). %separated(string, token list, separator, separator). separated([C|Cs], Ts, [], TempSs) :- tokenized([C|Cs], Ts, ...


Object Mix > Programming Languages > PROLOG > splitting strings (swi-prolog)

Reply

 

LinkBack Thread Tools
  #11  
Old 03-05-2008, 06:54 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: splitting strings (swi-prolog)

Dustin Kick schrieb:
> I'm trying to write a predicate that takes a string as input, and outputs
> a list of strings delimited by spaces (for now).
>


How about:
A List is a tokenization of a charact sequence separated by a
separator string if every token is (ordered) within the sequence
followed by the separator but the last token.

%tokenized(sting, token list, separator).

tokenized([], [[]], _).

tokenized([C|Cs], [[C|TCs]|Ts], [S|Ss]) :-
C \= S,
tokenized(Cs, [TCs|Ts], [S|Ss]), !.

tokenized([C|Cs], [[]|Ts], [C|Ss]) :-
separated([C|Cs], Ts, [C|Ss], [C|Ss]).

%separated(string, token list, separator, separator).

separated([C|Cs], Ts, [], TempSs) :-
tokenized([C|Cs], Ts, TempSs).

separated([S|Cs], Ts, [S|Ss], TempSs) :-
separated(Cs, Ts, Ss, TempSs).



?- tokenized("test ; string", TokenList, " ; "),
maplist(name, TextList, TokenList).

TokenList = [[116, 101, 115, 116], [115, 116, 114, 105, 110, 103]],
TextList = [test, string]

?- tokenized(String, ["test", "string"], " ; "), name(Text, String).

String = [116, 101, 115, 116, 32, 59, 32, 115, 116|...],
Text = 'test ; string'


Regards
Stephan
  #12  
Old 03-06-2008, 08:48 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: splitting strings (swi-prolog) (got it (fix))

On Tue, 04 Mar 2008 23:05:47 +0100
Markus Triska <triska@logic.at> wrote:

> Dustin Kick<mac_vieuxnez@mac.com> writes:
>
> > I had to change one thing to make the munching work, this is
> > functional just the way I wanted

>
> Consider DCGs for convenience - for example:
>
> string_tokens(Cs, Ts) :- phrase(tokens(Cs, []), Ts).
>
> tokens([], Ts) --> token(Ts).
> tokens([C|Cs], Ts) -->
> ( { C == 0' } -> token(Ts), tokens(Cs, [])
> ; tokens(Cs, [C|Ts])
> ).
>
> token([]) --> [].
> token([T|Ts]) --> { reverse([T|Ts], Token) }, [Token].
>
> Yielding:
>
> ?- string_tokens("this is a test ", ["this", "is", "a", "test"]).
> %@ true.


string_tokens(Cs, StpS, Ts) :- phrase(tokens(Cs, StpS, []), Ts).

tokens([], _, Ts) --> token(Ts).
tokens([C|Cs], StpS, Ts) -->
% ( { C == 0' } -> token(Ts), tokens(Cs, StpS, [])
( { memberchk(C,StpS) } -> token(Ts), tokens(Cs, StpS, [])
; tokens(Cs, StpS, [C|Ts])
).

token([]) --> [].
token([T|Ts]) --> { reverse([T|Ts], Token) }, [Token].

Slight mods ...

Dhu

>
> --
> comp.lang.prolog FAQ: http://www.logic.at/prolog/faq/



  #13  
Old 03-06-2008, 08:00 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: splitting strings (swi-prolog)


Thanks to those who have recommended definite clause grammars, I need to
read about them, but it sound like it may have been what I had been
looking for. I didn't find anything searching for tokenization, splitting
strings, mapping characters or anything else I would have thought of.
Definite Clause Grammar, of course, it just makes sense.
If anyone has any ideas how to work difference lists into this, which
I'm hoping will make it more efficient, and give me a chance to put
difference lists into practice, I'd appreciate them.
--

Dustin Kick
http://homepage.mac.com/mac_vieuxnez

  #14  
Old 03-06-2008, 09:56 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: splitting strings (swi-prolog) (got it (fix))

On Tue, 04 Mar 2008 23:05:47 +0100
Markus Triska <triska@logic.at> wrote:

> Dustin Kick<mac_vieuxnez@mac.com> writes:
>
> > I had to change one thing to make the munching work, this is
> > functional just the way I wanted

>
> Consider DCGs for convenience - for example:
>
> string_tokens(Cs, Ts) :- phrase(tokens(Cs, []), Ts).
>
> tokens([], Ts) --> token(Ts).
> tokens([C|Cs], Ts) -->


Just as a matter of interest, what's this C == 0' notation?
Why does 0' evaluate to 32 (space)?

Dhu


> ( { C == 0' } -> token(Ts), tokens(Cs, [])
> ; tokens(Cs, [C|Ts])
> ).
>
> token([]) --> [].
> token([T|Ts]) --> { reverse([T|Ts], Token) }, [Token].
>
> Yielding:
>
> ?- string_tokens("this is a test ", ["this", "is", "a", "test"]).
> %@ true.
>
> --
> comp.lang.prolog FAQ: http://www.logic.at/prolog/faq/

  #15  
Old 03-07-2008, 09:36 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: splitting strings (swi-prolog)

On Mar 6, 8:00 pm, Dustin Kick<mac_vieux...@mac.com> wrote:
> Thanks to those who have recommended definite clause grammars, I need to
> read about them, but it sound like it may have been what I had been
> looking for. I didn't find anything searching for tokenization, splitting
> strings, mapping characters or anything else I would have thought of.


> Definite Clause Grammar, of course, it just makes sense.
> If anyone has any ideas how to work difference lists into this, which


I think that in most prolog systems, DCG's get translated into Prolog
code with different lists, see:
http://xsb.sourceforge.net/manual1/node155.html

A DCG rule such as:
p(X) -> q(X).
will be translated (expanded) into:
p(X, Li, Lo) :- q(X, Li, Lo).

> I'm hoping will make it more efficient, and give me a chance to put
> difference lists into practice, I'd appreciate them.
> --
>
> Dustin Kickhttp://homepage.mac.com/mac_vieuxnez


DCG
  #16  
Old 03-07-2008, 10:46 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: splitting strings (swi-prolog)


"Dustin Kick" <mac_vieuxnez@mac.com> ha scritto nel messaggio
news:2t0Aj.110$zE5.34@newsfe02.lga...
>
> Thanks to those who have recommended definite clause grammars, I need to
> read about them, but it sound like it may have been what I had been
> looking for. I didn't find anything searching for tokenization, splitting
> strings, mapping characters or anything else I would have thought of.
> Definite Clause Grammar, of course, it just makes sense.
> If anyone has any ideas how to work difference lists into this, which
> I'm hoping will make it more efficient, and give me a chance to put
> difference lists into practice, I'd appreciate them.
> --


Many years ago, i wrote an interpreter, and the DCG via this code:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Definite Clause Grammar translator
% from Clocksin, Mellish
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

:- op(251, fx, { ).
:- op(250, xf, } ).
:- op(255, xfx, -->).

translate((P0 --> Q0), (P :- Q)) :-
left_hand_side(P0, S0, S, P),
right_hand_side(Q0, S0, S, Q1),
flatten(Q1, Q), !.

left_hand_side((NT, Ts), S0, _S, P) :- !,
nonvar(NT),
islist(Ts),
tag(NT, S0, S1, P),
append(Ts, S0, S1).
left_hand_side(NT, S0, S, P) :-
nonvar(NT),
tag(NT, S0, S, P).

right_hand_side((X1, X2), S0, S, P) :- !,
right_hand_side(X1, S0, S1, P1),
right_hand_side(X2, S1, S, P2),
and(P1, P2, P).
right_hand_side((X1 ; X2), S0, S, (P1 ; P2)) :-
or(X1, S0, S, P1),
or(X2, S0, S, P2).
right_hand_side({P}, S, S, P) :- !.
right_hand_side(!, S, S, !) :- !.
right_hand_side(Ts, S0, S, true) :-
islist(Ts),
!, append(Ts, S, S0).
right_hand_side(X, S0, S, P) :-
tag(X, S0, S, P).

or(X, S0, S, P) :-
right_hand_side(X, S0a, S, Pa),
( var(S0a), S0a = S, !, S0 = S0a, ! = Pa;
P = (S0 = S0a, Pa) ).

tag(X, S0, S, P) :-
X =.. [F | A],
append(A, [S0, S], AX),
P =.. [F | AX].

and(true, P, P) :- !.
and(P, true, P) :- !.
and(P, Q, (P, Q)).

flatten(A, A) :-
var(A), !.
flatten((A, B), C) :- !,
flatten1(A, C, R),
flatten(B, R).
flatten(A, A).

flatten1(A, (A, R), R) :-
var(A), !.
flatten1((A, B), C, R) :- !,
flatten1(A, C, R1),
flatten1(B, R1, R).
flatten1(A, (A, R), R).

islist([]) :- !.
islist([_|_]).

append([A|B], C, [A|D]) :- append(B, C, D).
append([], X, X).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% convert DCG rules to clauses
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
d2c :-
clause((H-->T),true),
translate((H-->T), Clause),
assert(Clause),
display(Clause), nl,
fail.
d2c.

It's not so simple..., and indeed some time after i read inSterling-Shapiro
'The Art of Prolog' a simpler approach, maybe matched in SICTus
implementation.

Bye Carlo

>
> Dustin Kick
> http://homepage.mac.com/mac_vieuxnez
>



  #17  
Old 03-07-2008, 12:08 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: splitting strings (swi-prolog) (got it (fix))


I just got around to testing your solution, and it works nicely, just as
you said it would, not that I doubted, but I don't understand the code,
yet. Is there a goal I can run DCGs through to see the expanded code?

Markus Triska <triska@logic.at> wrote:
>
>
>Dustin Kick<mac_vieuxnez@mac.com> writes:
>
>> I had to change one thing to make the munching work, this is
>> functional just the way I wanted

>
>Consider DCGs for convenience - for example:
>
> string_tokens(Cs, Ts) :- phrase(tokens(Cs, []), Ts).
>
> tokens([], Ts) --> token(Ts).
> tokens([C|Cs], Ts) -->
> ( { C == 0' } -> token(Ts), tokens(Cs, [])
> ; tokens(Cs, [C|Ts])
> ).
>
> token([]) --> [].
> token([T|Ts]) --> { reverse([T|Ts], Token) }, [Token].
>
>Yielding:
>
> ?- string_tokens("this is a test ", ["this", "is", "a", "test"]).
> %@ true.
>
>--
>comp.lang.prolog FAQ: http://www.logic.at/prolog/faq/




--

Dustin Kick
http://homepage.mac.com/mac_vieuxnez

  #18  
Old 03-10-2008, 03:30 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: splitting strings (swi-prolog) (got it (fix))

Dustin Kick<mac_vieuxnez@mac.com> writes:

> Is there a goal I can run DCGs through to see the expanded code?


Use clause/2 to access its term representation. Also try listing/[01]:

?- listing(tokens).
%@ tokens([], A, B, C) :-
%@ token(A, B, C).
%@ tokens([A|E], C, B, G) :-
%@ ( A==32,
%@ D=B
%@ -> token(C, D, F),
%@ tokens(E, [], F, G)
%@ ; tokens(E, [A|C], B, G)
%@ ).
%@ true.

--
comp.lang.prolog FAQ: http://www.logic.at/prolog/faq/
  #19  
Old 03-10-2008, 04:26 PM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: splitting strings (swi-prolog) (got it (fix))

On Mon, 10 Mar 2008 21:30:07 +0100, Markus Triska wrote:

> Dustin Kick<mac_vieuxnez@mac.com> writes:
>
>> Is there a goal I can run DCGs through to see the expanded code?

>
> Use clause/2 to access its term representation. Also try listing/[01]:
>
> ?- listing(tokens).
> %@ tokens([], A, B, C) :-
> %@ token(A, B, C).
> %@ tokens([A|E], C, B, G) :-
> %@ ( A==32,
> %@ D=B
> %@ -> token(C, D, F),
> %@ tokens(E, [], F, G)
> %@ ; tokens(E, [A|C], B, G)
> %@ ).
> %@ true.


There are two things I can't grok:

1) the %@ : when I do ?- listing(tokens). those weird symbols don't show
up. We are using the same SWI, or not ?

2) why is there 32 in the output, while the original program had 0' ?
is this unavoidable, an SWI bug or an ISO Prolog inconsistency ?

Cheers

Bart Demoen
  #20  
Old 03-11-2008, 04:39 AM
Junior Member
 
Join Date: Nov 2009
Posts: 0
Application Development is on a distinguished road
Default Re: splitting strings (swi-prolog) (got it (fix))

On 2008-03-10, bart demoen <bmd@cs.kuleuven.be> wrote:
> On Mon, 10 Mar 2008 21:30:07 +0100, Markus Triska wrote:
>
>> Dustin Kick<mac_vieuxnez@mac.com> writes:
>>
>>> Is there a goal I can run DCGs through to see the expanded code?

>>
>> Use clause/2 to access its term representation. Also try listing/[01]:
>>
>> ?- listing(tokens).
>> %@ tokens([], A, B, C) :-
>> %@ token(A, B, C).
>> %@ tokens([A|E], C, B, G) :-
>> %@ ( A==32,
>> %@ D=B
>> %@ -> token(C, D, F),
>> %@ tokens(E, [], F, G)
>> %@ ; tokens(E, [A|C], B, G)
>> %@ ).
>> %@ true.

>
> There are two things I can't grok:
>
> 1) the %@ : when I do ?- listing(tokens). those weird symbols don't show
> up. We are using the same SWI, or not ?


I leave that to Markus

> 2) why is there 32 in the output, while the original program had 0' ?
> is this unavoidable, an SWI bug or an ISO Prolog inconsistency ?


You know the answer: as it stands in ISO, it is unavoidable. The
tokeniser must translate 0' into the character code of the space. In
general that is even undefined but SWI-Prolog is internally Unicode,
so it is defined as 32, regardless of the locale. characters codes
however are no special type and therefore cannot be distinguished from
integers. I'm not sure whether ISO would allow for a subtype of
integer that represents character codes. Possibly.

Same for [32] and " ", etc. To a certain extend this can be remedied
using ?- set_prolog_flag(double_quotes, chars). It doesn't fix all
issues though, and a global flag that introduces such big
incompatibilities causes more troubles than it solves. I never touch
that flag for any real programming task.

I once raised a similar issues about [] == [ ] == [/*empty list*/] == '[]'
It is fine for the first three to be equal, but I still have doubts on the
latter. Same for {}, though this causes less confusing on practice.

I don't think there is an easy fix to these issues without introducing
serious compatibility issues.

Cheers --- Jan
Reply

Thread Tools



All times are GMT -5. The time now is 01:08 AM.

Managed by Infnx Pvt Ltd.