Document indexing for Search

Posted by Community Admin on 03-Aug-2018 01:46

Document indexing for Search

All Replies

Posted by Community Admin on 28-Jul-2010 00:00

Will version 4.0 have the ability to index the content of documents like PDF or Word Documents that will then be searchable? So if someone were to do a seach from the live site, and there was a match in a PDF file that is linked from a page within the site, will it show in the results?

Posted by Community Admin on 28-Jul-2010 00:00

Hello apollo,

There will be search engine for all content items. The exact content of a given file will not be indexed in 4.0, we will try to provide this functionality out of the box in 4.1

Greetings,
Ivan Dimitrov
the Telerik team

Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items

Posted by Community Admin on 29-Jul-2010 00:00

Thanks for the response Ivan.

Do you know of a solution that we could integrate in the meantime that will provide this functionality? Or do you have an idea when 4.1 will be available?

Posted by Community Admin on 29-Jul-2010 00:00

Hi apollo,

The search engine for Sitefinity 4 has not been implemented so far - we will do this after the BETA. We will use Lucene engine. To search inside document content you have to use 3rd party framework to extract the content of the files. Most probably we will provider  API that allows you to extract the content from the doc files. For other file types you could use some open source libraries as Apache PDFBox or iTextSharp.

If you use PDFBox you can read the stream by using PDDocument.load(stream); and then call getText of  PDFTextStripper instance.

You can use Apache PDFBox or iTextSharp with custom provider in 3.x editions as well.

Greetings,
Ivan Dimitrov
the Telerik team

Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items

Posted by Community Admin on 10-Jan-2011 00:00

hello,
can you please update me on this.... will I be able to search inside PDF with the realease of 14/01/2011?
Thanks

Posted by Community Admin on 10-Jan-2011 00:00

Hello,

In the official version of Sitefinity 4.0  we will not have PDF indexing. This will be implemented on a later stage, but we have not scheduled a time frame for the implementation yet.

Kind regards,
Ivan Dimitrov
the Telerik team

Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items

Posted by Community Admin on 10-Jan-2011 00:00

and for what concern the DOC/DOCX File will it be avaiable? could I implement my own pdf search service and integrate it with sitefinity?
Thanks

Posted by Community Admin on 10-Jan-2011 00:00

Hi,

Can you provide us a sample of implementing a custom search provider in SF 4.0. We have gone through extensive development for document indexing in SF3.7.

Regards,
Jean

Posted by Community Admin on 11-Jan-2011 00:00

Hi apollo,

We do not have a sample that shows how to create a custom index. The index is based on pipes, so you can check this post.

Regards,
Ivan Dimitrov
the Telerik team

Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items

Posted by Community Admin on 30-Mar-2011 00:00

Hello Ivan,
when the 4.1 will be released? Thanks

Posted by Community Admin on 01-Apr-2011 00:00

Ivan where can I find a sample of using PagePipe to develop my custom index?
Thanks

Posted by Community Admin on 04-Apr-2011 00:00

Hello Paolo,

We made a sample that shows how to create a custom pipes and it will be included in the Q1 release scheduled in the middle of April.

Best wishes,
Ivan Dimitrov
the Telerik team


Posted by Community Admin on 04-Apr-2011 00:00

Hello Ivan,
since I need to end the first part of the project for the end of april is it possible to have it before?? I don't want to overflow the deadline for the search index...thanks

Posted by Community Admin on 07-Apr-2011 00:00

Hi Paolo,
 
The sample will be released the next week. We will post a lin to this forum post when we are done.
We are sorry for not being to speed up our delivery. We are currently focused on the coming Q1 release next week and all our efforts go in this direction.

I hope the suggested timing can work for  you.

All the best,
Kalina
the Telerik team


Posted by Community Admin on 20-Apr-2011 00:00

Hello Telerik,
can you please provide me a sample working on pipes for search??
Thanks
Paolo

Posted by Community Admin on 20-Apr-2011 00:00

Hi Paolo,

We will have a sample with the SDK release that will be available next week.


All the best,
Ivan Dimitrov
the Telerik team


Posted by Community Admin on 20-Apr-2011 00:00

Hello Telerik....
can you just provide me some points to work on.... waiting for next week means I've got almost 4 days to develop my part of solution with search.....
Thanks

Posted by Community Admin on 22-Apr-2011 00:00

Hi Paolo,

The implementation is about 5 classes that are specific. I suggest that you should wait for the SDK release.

Kind regards,
Ivan Dimitrov
the Telerik team


Posted by Community Admin on 29-Apr-2011 00:00

Hello Ivan,
I've downloaded the SDK, can you please tell me which example I should look at?
Thanks
Paolo

Posted by Community Admin on 29-Apr-2011 00:00

Hello Paolo,

We removed the index from the SDK, because we are going to change the publishing API this Q and the entire code of the pipe should be rewritten. If you want you can open a support request and I will send you the current implementation, but you should know that this code will not work once Q2 release is done and you will have to create your custom pipe again from scratch. There are currently know issues related to the pipes and customizing the index at this stage is a handy task.

Kind regards,
Ivan Dimitrov
the Telerik team


Posted by Community Admin on 29-Apr-2011 00:00

hello Ivan,
since I'm helping our customer that holds a license and I don't how can I open a ticket to you? can I send for the .NET suite specifing it's for you?
Thanks

Posted by Community Admin on 03-May-2011 00:00

Hello Paolo,

I will send a sample in the general feedback request you opened, but please keep in mind that this will not work in 4.2 release, since we are going to change the API for the indexing in order to improve it.

Kind regards,
Ivan Dimitrov
the Telerik team

Do you want to have your say in the Sitefinity development roadmap? Do you want to know when a feature you requested is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items

Posted by Community Admin on 04-May-2011 00:00

Hello Ivan,
I've tried those two days to implement a sample but with no luck.... can you please tell me where should I start at? I've no idea what those pipes are used to...thanks
Paolo

Posted by Community Admin on 09-May-2011 00:00

Hi Paolo,

Please check this article

www.sitefinity.com/.../t_telerik_sitefinity_publishing_pipes_publishingpipebase.html

Best wishes,
Ivan Dimitrov
the Telerik team

Do you want to have your say in the Sitefinity development roadmap? Do you want to know when a feature you requested is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items

Posted by Community Admin on 16-May-2011 00:00

Hello Ivan,
I've tried but with no success... here' re my questions ... Need I to develop a customsearchprovider that inherits from LuceneSearchProvider? if so where do I tell to the search index to use this provider?
I really need to develop a solution for the end of the week that looks inside the pdf, can you please tell me how can I do this beign aware that with 4.2 things will change?

Thanks
Paolo

Posted by Community Admin on 17-May-2011 00:00

Hello Ivan,
your link doesn't leads me anywhere... where should I regiter a pipe?

Posted by Community Admin on 17-May-2011 00:00

Hello Paolo,

You can try to extend the index pipe that I sent you. You need to include an external library that will get the item content. Another option is not using a custom pipe, but making a  hack into the publishing point. Bwlow is a sample code. So youneed an instace of the SearchIndex pipe and there you should call HandleItemAction where you need to pass an IEnumerable of your content objects.

var pipesettings = PublishingManager.GetManager("Search")
                                                .GetPublishingPoints()
                                                .Where(pp => pp.Name == "MySearchPublishingPoint")
                                                .ToList()
                                                .First()
                                                .PipeSettings.Where(ps => ps.PipeName == "SearchIndex")
                                                .First();

            var pipe = PipeFactory.ResolvePipe2("SearchIndex").Initialize(pipesettings);
            pipe.HandleItemAction(new List<HandleActionArgs>() new HandleActionArgs() Item = new Content() );

Regards,
Ivan Dimitrov
the Telerik team

Do you want to have your say in the Sitefinity development roadmap? Do you want to know when a feature you requested is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items

Posted by Community Admin on 17-May-2011 00:00

Hello Ivan,
thanks for your reply...I've tried extending your example... but when I add the Products file for having a working sample, I got no luck with :

Could not find the specified key "ProductsLandingPageTitle" or class id "ProductsResources".

in the alternative example you gave me where and how I break the publishing? For you that developed SF it's easy for me not! and how breaking the publishing of an item leads me to search for it?

Posted by Community Admin on 17-May-2011 00:00

Please Ivan
tell me how can I achieve this, I'm really in a hurry for this...thanks

Posted by Community Admin on 20-May-2011 00:00

Hello Paolo,

I have replied to you in the support ticket you have opened - can you please verify if the Products module runs fine before implementing the custom pipe in it, so that we can be sure what might be the cause of this issue. It would also help if you could send over your implementation, so that we can give you a more focused response.

Best wishes,
Boyan Barnev
the Telerik team

Do you want to have your say in the Sitefinity development roadmap? Do you want to know when a feature you requested is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items

This thread is closed