jcr:contains -- can I make use of stemming?

This is a discussion on jcr:contains -- can I make use of stemming? within the Apache forums in Application Servers & Tools category; I want to allow stemming in my queries but can't discover how. I'm using this list as a last ditch effort. I find lots of examples of using wildcard searches using jcr:like but I don't want to restrict my search to certain field names and I hear the performance of such queries is awful (Ard Schrijvers). But nothing like what I want. Can someone please help? Let's say I have two nodes A and B. A has property "mytext" which has a value "flash software" . B has property "mystring" which has a value "flash powder" . Now how do ...

Go Back   Application Development Forum > Application Servers & Tools > Apache

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 10-30-2008, 04:35 PM
Funkyjam Master
Guest
 
Default jcr:contains -- can I make use of stemming?

I want to allow stemming in my queries but can't discover how. I'm using
this list as a last ditch effort. I find lots of examples of using wildcard
searches using jcr:like but I don't want to restrict my search to certain
field names and I hear the performance of such queries is awful (Ard
Schrijvers). But nothing like what I want. Can someone please help?
Let's say I have two nodes A and B. A has property "mytext" which has a
value "flash software" . B has property "mystring" which has a value "flash
powder" . Now how do I get both nodes to show up as the result of a query
using jcr:contains?



Thanks!

Reply With Quote
  #2  
Old 10-31-2008, 11:04 AM
Funkyjam Master
Guest
 
Default Re: jcr:contains -- can I make use of stemming?

On Thu, Oct 30, 2008 at 3:35 PM, Funkyjam Master
<jam.master.funk@gmail.com>wrote:

> I want to allow stemming in my queries but can't discover how. I'm using
> this list as a last ditch effort. I find lots of examples of using wildcard
> searches using jcr:like but I don't want to restrict my search to certain
> field names and I hear the performance of such queries is awful (Ard
> Schrijvers). But nothing like what I want. Can someone please help?
> Let's say I have two nodes A and B. A has property "mytext" which has a
> value "flash software" . B has property "mystring" which has a value "flash
> powder" . Now how do I get both nodes to show up as the result of a query
> using jcr:contains?
>


Sorry, I meant to pose the question as how do I get both nodes to show up
as the result of a query using jcr:contains(., "ash"). I understand I can
exact match on the word "flash" but what if I want to match a substring as
in "ash" ? What, then? If this is something that gets asked a lot, point
me towards the last time it got answered, I'll write it up and someone can
put it on the website because this is day 3 and the best solution I've found
is jcr:like with wildcards.


>
>
> Thanks!
>


Reply With Quote
  #3  
Old 10-31-2008, 01:12 PM
Sébastien Launay
Guest
 
Default Re: jcr:contains -- can I make use of stemming?

Hi,

In jackrabbit jcr:contains allows full text searching and uses an inverted
index (i.e. Lucene), therefore '*pattern' queries requires to scan the whole
index, that's with they must be avoided, they are not scalable with the
growth of nodes / properties.

As the second parameter of the jcr:contains is quite similar to lucene query
syntax (except that it allows wildcard everywhere and i think that it
applies
to only one "field", the first parameter) therefore you can use the
following
queries:
- jcr:contains (., '*ash') will match A & B
- jcr:contains (., 'flas?') will match A & B
- jcr:contains (., 'fla*') will match A & B
- jcr:contains (., 'fl*sh') will match A & B
- jcr:contains (., 'flash~2') will additionally match "trash", "clash"
- jcr:contains (., '*ash -software') will match B

jcr:like() and jcr:contains queries with the pattern '*ash' are quite
the same
except that jcr:like search into the untokenized properties (case sensitive)
and jcr:contains on the tokenized properties (which depends on the Lucene
analyzer used / index configuration).

See the following links for more informations on jcr:contains query syntax:
http://svn.apache.org/repos/asf/jack...QueryTest.java
http://lucene.apache.org/java/2_3_2/...sersyntax.html

Regards,

--
Sébastien Launay

Funkyjam Master a écrit :
> On Thu, Oct 30, 2008 at 3:35 PM, Funkyjam Master
> <jam.master.funk@gmail.com>wrote:
>
>
>> I want to allow stemming in my queries but can't discover how. I'm using
>> this list as a last ditch effort. I find lots of examples of using wildcard
>> searches using jcr:like but I don't want to restrict my search to certain
>> field names and I hear the performance of such queries is awful (Ard
>> Schrijvers). But nothing like what I want. Can someone please help?
>> Let's say I have two nodes A and B. A has property "mytext" which has a
>> value "flash software" . B has property "mystring" which has a value "flash
>> powder" . Now how do I get both nodes to show up as the result of a query
>> using jcr:contains?
>>
>>

>
> Sorry, I meant to pose the question as how do I get both nodes to show up
> as the result of a query using jcr:contains(., "ash"). I understand I can
> exact match on the word "flash" but what if I want to match a substring as
> in "ash" ? What, then? If this is something that gets asked a lot, point
> me towards the last time it got answered, I'll write it up and someone can
> put it on the website because this is day 3 and the best solution I've found
> is jcr:like with wildcards.
>
>
>
>> Thanks!


Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 06:17 PM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.