MOSS Search: Crawl Rules: Crawler not crawling complex Urls : Sharepoint
This is a discussion on MOSS Search: Crawl Rules: Crawler not crawling complex Urls within the Sharepoint forums in Microsoft Tools category; I have a web part on one SharePoint Publishing page that displays data which includes a link to another SharePoint Publishing webpart page, but each row varies the querystring parm. So, pageA has: <a href="PageB.aspx?item=1">Item 1</a> <a href="PageB.aspx?item=2">Item 2</a> and page B displays details about the item. I would like these details crawled. There are not many and the list doesn't change much. I've set search visibility to "Yes" and "Always index" because I have fine-grained permissions on the site. Otheriwse it didn't seen the pagea webpart contents. I've also created crawl rule Include http://server/site/pagea.aspx; http://server/site/pageb.aspx with Crawl Complex URLs ...
| Sharepoint Microsoft sharepoint portal server development, administration and related discussions |
![]() |
| | LinkBack | Thread Tools |
|
#1
| |||
| |||
| includes a link to another SharePoint Publishing webpart page, but each row varies the querystring parm. So, pageA has: <a href="PageB.aspx?item=1">Item 1</a> <a href="PageB.aspx?item=2">Item 2</a> and page B displays details about the item. I would like these details crawled. There are not many and the list doesn't change much. I've set search visibility to "Yes" and "Always index" because I have fine-grained permissions on the site. Otheriwse it didn't seen the pagea webpart contents. I've also created crawl rule Include http://server/site/pagea.aspx; http://server/site/pageb.aspx with Crawl Complex URLs turned on. I've tried having "Crawl SharePoint content as Http pages" both on and off. I always do a Full crawl after changing anything. Looking at the crawl logs, the crawler only crawls pageb.aspx once with no querystring parm and searches for content that would appear on say pageb.aspx?item=1 won't be found. The only place pageb.aspx would be in a hyperlink without querystring is in the pages/forms/allitems.aspx list, so I guess that's why it was crawled that way, which I really didn't want but turning off search on the pages list, of course, makes the pages unsearchable (even if they are referenced by hyperlinks elsewhere). Content on pagea is found and content that would be on pageb if called without a querystring parm is found. I have verified that the links work from browser, and the querystring data was urlencoded (so spaces are replaced with %20). I even placed a couple of static hyperlinks in the content to pageb with querystrings and those aren't crawled either. How would I go about troubleshooting this problem? -LZ |
|
#2
| |||
| |||
| Solved. The issue is that if the content source type is SharePoint site, the crawl complex switch is a big nop. This is mention in the document "Administering Enterprise Search in Office SharePoint Server" (http://go.microsoft.com/fwlink/?LinkId=100254 - page 58) What isn't clearly mentioned is that this affects SharePoint even if you turn on the "Crawl as HTTP pages" switch. The problem is that "This option has no effect when crawling SharePoint sites, because Office SharePoint Server 2007 enumerates all content when crawling SharePoint sites". Created a different content source of type "Web Sites" for this case and set the crawl rules to crawl complex, added it to the scope and all is good. -LZ "Lou Zher" <abuse@127.0.0.1> wrote in message news:e2qEvd9KIHA.1184@TK2MSFTNGP04.phx.gbl... >I have a web part on one SharePoint Publishing page that displays data >which includes a link to another SharePoint Publishing webpart page, but >each row varies the querystring parm. > > So, pageA has: > <a href="PageB.aspx?item=1">Item 1</a> > <a href="PageB.aspx?item=2">Item 2</a> > > and page B displays details about the item. I would like these details > crawled. There are not many and the list doesn't change much. > > I've set search visibility to "Yes" and "Always index" because I have > fine-grained permissions on the site. Otheriwse it didn't seen the pagea > webpart contents. > > I've also created crawl rule > Include http://server/site/pagea.aspx; http://server/site/pageb.aspx > with Crawl Complex URLs turned on. > > I've tried having "Crawl SharePoint content as Http pages" both on and > off. > > I always do a Full crawl after changing anything. > > Looking at the crawl logs, the crawler only crawls pageb.aspx once with no > querystring parm and searches for content that would appear on say > pageb.aspx?item=1 won't be found. The only place pageb.aspx would be in a > hyperlink without querystring is in the pages/forms/allitems.aspx list, so > I guess that's why it was crawled that way, which I really didn't want but > turning off search on the pages list, of course, makes the pages > unsearchable (even if they are referenced by hyperlinks elsewhere). > > Content on pagea is found and content that would be on pageb if called > without a querystring parm is found. > > I have verified that the links work from browser, and the querystring data > was urlencoded (so spaces are replaced with %20). > > I even placed a couple of static hyperlinks in the content to pageb with > querystrings and those aren't crawled either. > > How would I go about troubleshooting this problem? > -LZ > |
|
#3
| |||
| |||
| Hi! I'm struggeling with the same problem. Can you please give me some more information about what you did? How did the new crawl rule look like? Ex: http://server/site/pageb.aspx* - Include - Crawl as HTTP -Complex URLs Or: http://server/site/pageb.aspx - Inlude - Crawl as HTTP - Complex URLs Or: http://server/site/* - Include - Crawl as HTTP - Complex URLs ...or something else? I would really appreciate any help. I have struggeled with this for a long time :/ Best regards, Erik |
|
#4
| |||
| |||
| Erik, Create a "New Content Source", select "Web Sites" under "Content Source Type". Note that crawling complex urls and crawl as http doesn't really help without setting the content source to "Web Sites" and you cannot change the content source type after you create it. I then created a crawl rule http://server/site/pageb.aspx*, include, crawl complex, crawl as http -LZ "eriso" <eriso@discussions.microsoft.com> wrote in message news:A47DCD68-F0BB-4B56-AD1E-BBB00B401C48@microsoft.com... > Hi! > > I'm struggeling with the same problem. Can you please give me some more > information about what you did? > > How did the new crawl rule look like? > Ex: http://server/site/pageb.aspx* - Include - Crawl as HTTP -Complex URLs > Or: http://server/site/pageb.aspx - Inlude - Crawl as HTTP - Complex URLs > Or: http://server/site/* - Include - Crawl as HTTP - Complex URLs > ..or something else? > > I would really appreciate any help. I have struggeled with this for a long > time :/ > > Best regards, > Erik |
|
#5
| |||
| |||
| Thank you very much for your help! Regards, Erik |
|
#6
| |||
| |||
| The tip below describes the programmatic approach for enabling and disabling Search Visibility of a website in Office Sharepoint Server.The code sample is provide in C#.Net and the keyword "web" refers to the SPWeb object retrieved from a site collection. /* -- Enable Search -- */ web.AllowAutomaticASPXPageIndexing = true; web.ASPXPageIndexMode = WebASPXPageIndexMode.Always; web.NoCrawl = false; web.Update(); /* -- Disable Search -- */ web.AllowAutomaticASPXPageIndexing = false; web.ASPXPageIndexMode = WebASPXPageIndexMode.Never; web.NoCrawl = true; web.Update(); |
![]() |
| Thread Tools | |
| |
| ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| MOSS 2007 Search - Excluding Complex URL's | usenet | Sharepoint | 1 | 01-09-2008 12:52 PM |
| MOSS 2007 search relative URLs | usenet | Sharepoint | 1 | 11-16-2007 09:04 AM |
| Crawl rules | usenet | Sharepoint | 2 | 08-27-2007 11:41 PM |
| MOSS: Add Crawler Impact Rule for a BDC source? | usenet | Sharepoint | 0 | 07-18-2007 11:19 AM |
| MOSS Search - crawling specific subsites as HTTP | usenet | Sharepoint | 1 | 05-22-2007 01:00 PM |




