Sitefinity 5.1 and Document Library Full Text Search

Posted by Community Admin on 04-Aug-2018 16:43

Sitefinity 5.1 and Document Library Full Text Search

All Replies

Posted by Community Admin on 14-Jun-2012 00:00

I know that Sitefinity does not not support searching of the document libraries out of the box, let alone doing full text search of their contents. So I am writing my own module (hopefully) to do this.

Looking at some of the examples and documentation, it looks like I would want to use an inbound pipe to tie into the publishing system to generate the index information for each document as they are saved (Using various libraries to parse the document and return the text contents). I think I have that mostly accomplished, but I'm missing something.

I have created the module, and registered it. My initialize code looks like this:

01.PublishingSystemFactory.RegisterPipe(DocumentSearchInboundPipe.PipeName, typeof(DocumentSearchInboundPipe));
02.var mappingsList = PublishingSystemFactory.GetDefaultInboundMappingForContent();
03.PublishingSystemFactory.RegisterPipeMappings(DocumentSearchInboundPipe.PipeName, true, mappingsList);
04. 
05.var pipeSettings = (SitefinityContentPipeSettings)PublishingSystemFactory.CreateDefaultContentInboundPipeSettings(DocumentSearchInboundPipe.PipeName);
06.pipeSettings.ContentTypeName = typeof(Telerik.Sitefinity.Libraries.Model.Document).FullName;
07.pipeSettings.UIName = "DocumentSearchInboundPipe";
08.pipeSettings.PipeName = DocumentSearchInboundPipe.PipeName;
09.pipeSettings.ResourceClassId = typeof(DocumentLibraryFullTextSearchModuleResources).Name;
10.PublishingSystemFactory.RegisterPipeSettings(DocumentSearchInboundPipe.PipeName, pipeSettings);
11. 
12.var definitions = PublishingSystemFactory.CreateDefaultContentPipeDefinitions();
13.PublishingSystemFactory.RegisterPipeDefinitions(DocumentSearchInboundPipe.PipeName, definitions);
14. 
15.var contentPipeSettings = PublishingSystemFactory.GetPipeSettings(DocumentSearchInboundPipe.PipeName);
16.contentPipeSettings.MaxItems = 0;
17.PublishingSystemFactory.RegisterTemplatePipe("SearchItemTemplate", contentPipeSettings, ps => ps.PipeName == DocumentSearchInboundPipe.PipeName);

The examples showed that last line as being the one that would add my inbound pipe to the list of options under creating a search index, but I do not get any new options there. Can anyone shed some light on how I get this in there?

Note: The InboundPipe I wrote essentially just returns the file's contents in the GetConvertedItemsForMapping method in each WrapperObject's Content field.

If anyone has any comments about my methodology here (This is the first thing I've written for Sitefinity), especially if I have just completely misunderstood how I should be doing this, please let me know! Like, for instance, if this would actually parse out each file in the documents library on EVERY search, that would be kind of bad... ideally I want to cache that in the search index, and I want the index updated on publishing events (Which is what I THINK I'm doing here...).

Posted by Community Admin on 15-Jun-2012 00:00

Ok, I think I got it now. I wasn't registering the Resources file, and I think it wasn't being pulled in right because of that.

Curiously, it shows up when Creating an index as "Documents", but shows up when Editing an index as "Text documents"... 

But otherwise, I now have a working module that handles indexing the full text of documents uploaded.

The only thing I might want to add for that is to allow for some configuration of what libraries to watch. Can anyone make a suggestion on how one might do that? Ideally through the admin interface instead of config files?

Posted by Community Admin on 15-Jun-2012 00:00

This should be coming out of the box with 5.1 in July though?

Posted by Community Admin on 15-Jun-2012 00:00

Oh. That's cool. Wish I'd seen that before. but that's ok, I needed this right now anyway.

Posted by Community Admin on 15-Jun-2012 00:00

I know, I'm always dancing that line of...do it now even though it'll be here later, or just wait on it...

Posted by Community Admin on 15-Jun-2012 00:00

Hmm, doesn't look like they are doing Word docs though? I'm just scanning the OpenXML ones (docx), but still, figured they would include that. Also wonder what library they are using for PDF reading. I'm using an old copy of iTextSharp (Before the fork to new licensing scheme).

Anyway, this has been a good exercise for me. I'll need it still for indexing Word docs probably, and I will also need to write something similar to index Video libraries (after I figure out how to add a custom field for choosing a document for captions).

I'm still interested in if there is a way to change the interface on the Search Index screens to allow for setting a library / collection of library folders, but perhaps that isn't possible...

Posted by Community Admin on 05-Aug-2013 00:00

Steve, I have version 6.1 and the documents check box is checked in the index configuration. Yet, none of my pdfs is getting searched. What do they mean by the "documents" then? Not the documents that are stored in the libraries?

This thread is closed