Filter over multiple relation-steps leads to performance iss

Posted by mjoosen on 25-Oct-2017 04:34

Dear Community

We find problems when not all steps in een relationship path are bound. This seems counter intuitive.

Example

  • We have the entity Contract (5 instances) with a relation (cardinality 1->1) to an entity supplier (for each Contract 1).
  • Contract has an id: ContractId, Supplier has an id: SupplierId
  • We have the entity Article (120.000 instances).
  • Article has an in: SupplierId
  • We use a loop construct to loop over contracts (not directly relevant for the issue at hand). Therefore we have an entity Filter (1 instance) with a relation to Contract (cardinalitity 1->1)
  • Filter has an id: ContractId

Situation 1 (run is killed after 1 hour because it is stuck)

Scope:

Article (ALIAS:ART)

Filter (ALIAS:FLTR)

-> Contract (ALIAS: FLTR_CONTRACT)

---->Supplier (ALIAS: FLTR_SUPPLIER)

Filter:

ART.SupplierId = FLTR_SUPPLIER.SupplierId


Situation 2 (run is finished in less than 10 seconds)

Scope

Article (ALIAS:ART)

Filter (ALIAS:FLTR)

-> Contract (ALIAS: FLTR_CONTRACT)

---->Supplier (ALIAS: FLTR_SUPPLIER)

Filter:

FLTR.ContractId= FLTR_CONTRACT.ContractId

ART.SupplierId = FLTR_SUPPLIER.SupplierId

Posted by Harold-Jan Verlee on 26-Oct-2017 07:40

I have rebuilt your situation with an albeit smaller dataset of 10K items but am not noticing an difference in run time between the two situations. I suspect that something has been fixed with regards to filters between the version you are using and teh current version (I'm using 5.6.7). Please open a bug with Support to get advice on what HF has been fixing this issue and whether it can be implemented for your version. See details in sceenshot.

All Replies

Posted by Chris S. Hogan on 25-Oct-2017 14:39

Can you upload a sample vocab, rulesheet and ruletest so I can see the exact scenario that is causing an issue?

Posted by mjoosen on 26-Oct-2017 03:26

Nope, unfortunately not. We have already spent 5 mandays or more on this issue. The information I provided should be enough to construct a testcase (that is free of company information).

The set of 100.000 articles is an important one: using 10-20 testcases is not enough to recreate this behavior.

Posted by mjoosen on 26-Oct-2017 03:52

My reaction above was perhaps a bit short, but my initial post was to verify my assumption that the expected behavior is that one should not have to bind each part of the  relationpath to get results. As explained before, we already spent a lot of time tackling the problem and I don't have the time to build a clean example.

If nobody can explain to me why my assumptions are wrong, I will make a bug-report. For now, I'm just wondering why we see this behavior.

Posted by Harold-Jan Verlee on 26-Oct-2017 04:05

I'm looking at this particular situation and trying to understand it fully. Can you let me know if your entities are datastore persistent (are you using EDC)? Or is all filtering done in working memory? Please see attached the screenshots of what I presume is your vocab.

Posted by mjoosen on 26-Oct-2017 04:43

Dear Harold-Jan,

Thanks for already building an example.

Contract/Supplier/Filter are created using rulesheets. Article is created using a SCO (ICcDataObjectManager.createEntity(<name>, false), so no output entities are created). No connection to EDC is made.

Your example is exactly right (minor point is the order of the two filters is in our case different, thus first the contract binding and then the supplier binding to article).

Posted by mjoosen on 26-Oct-2017 04:45

@harold-jan

One addition: the filters are set on pre-condition, but that should not make a difference IMHO.

Posted by Harold-Jan Verlee on 26-Oct-2017 05:23

I can confirm that a small test set behaves identical between situation 1 and 2. That said, I feel filter FLTR.ContractID= FLTR_CONTRACT.ContractID in situation 2, shouldn't be necessary (I know I have it reversed, but that shouldn't matter). The filter on ARTICLE is identical in both situation. The filter on FLTR is executed first but it's impact is minimal as it acts on a really small dataset (you're just defining a filter for the ARTICLE). I'll try to replicate with 100K ARTICLES in working memory to confirm this behavior.

Posted by Harold-Jan Verlee on 26-Oct-2017 07:40

I have rebuilt your situation with an albeit smaller dataset of 10K items but am not noticing an difference in run time between the two situations. I suspect that something has been fixed with regards to filters between the version you are using and teh current version (I'm using 5.6.7). Please open a bug with Support to get advice on what HF has been fixing this issue and whether it can be implemented for your version. See details in sceenshot.

Posted by mjoosen on 26-Oct-2017 10:29

Dear Harold-Jan,

Thanks for the feedback. So, I conclude that I'm right in assuming that binding each step should not be necessary from  a functional standpoint. The version is a good one, because we are using 5.6.0. I will report a bug (I wanted to verify that our reported behavior should be considered a bug).

Posted by Harold-Jan Verlee on 26-Oct-2017 10:59

Hi Maarten, I confirm I can't reproduce (with 5.6.1.7) the observed behavior on your side which suggests it is a bug in 5.6.0. Per my earlier suggestion, please report this to Support (you can refer to me with regards to my reproduction of your case). The extra filter which alleviates your issue, shouldn't be required.

This thread is closed