Corticon 5.7.2.8 Batch Write

Posted by Anverion on 17-Jun-2019 22:25

Hello--

I have noticed that when a batch runs and I'm doing anything OTHER than read from the database, the IN indicator doesn't seem to work. now, let me explain.

1. Created a basic ruleflow with a single service connector that updates a detail's status.

2. Created a BATCH_READ query that pulls all details based on an earnings file id.

EXAMPLE

CORTICON_ADC_BATCH_READ

1 DetailsByFileId select detail_record_id from cms.cms_efd WHERE earnings_file_id = {CmsEfd.earningsFileId} CmsEfd true

This works and returns all of the distinct detail_record_ids based on the earnings_file_id.

Next, the service connector is called and the WRITE_DEF is the following:

UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ({CmsEfd.detailRecordId})

I would expect the following:

UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P1','613P2','613P3','613P4','613P5','613P6')

Instead, I get six UPDATE statements like the following:

UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P1')
UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P2')
UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P3')
UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P4')
UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P5')
UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P6')

What's the point of using an IN indicator if it's going to add an update statement for each detail that's processed? Is this how batch works? For a few details, this isn't a big deal, however, we are processing thousands of details. I've seen as many as 500k in our production environment so, 500k individual update statements isn't going to work.

Thanks for your time,

Branden

Posted by John Miller on 19-Jun-2019 21:10

I don’t believe it would make a difference in performance if the values were part of the Entity or if they are hard coded.  I was just trying to hit home the idea of the one-to-one mapping of an Entity to a database record.
 
To your last question, you will always see a SQL statement for each instance of an Entity in memory.  But as stated before, the updates are added to a PreparedStatement Batch, which is optimal.
 
Example:  3 Person Entities in memory
Person
   Id = 1
   Name = Tom
   Validated = true
 
Person
   Id = 2
   Name = ***
   Validated = false
 
Person
   Id = 3
   Name = Harry
   Validated = true
 
ADC_WRITE_DEF -> SQL
   UPDATE Person SET validated={Person.validated}, Person.updated_by='John Wick', Person.updated_at=now() where Person.in IN ({Person.id})
 
This produces 3 SQL statements:
   UPDATE Person SET validated=true, Person.updated_by='John Wick', Person.updated_at=now() where Person.in IN (1)
   UPDATE Person SET validated=false, Person.updated_by='John Wick', Person.updated_at=now() where Person.in IN (2)
   UPDATE Person SET validated=true, Person.updated_by='John Wick', Person.updated_at=now() where Person.in IN (3)
 
Because Person.id = 2 has a Validated = false, which is different than Person.id = 1 and Person.id = 3, these SQL statements can’t be combined.
   UPDATE Person SET validated=true, Person.updated_by='John Wick', Person.updated_at=now() where Person.in IN (1, 2, 3) is incorrect.
 
For me, it is easier to understand that there is a discrete SQL statement for each Entity.  This reduces complexity and hopefully it reduces confusion in creating a valid ADC_WRITE_DEF -> SQL statement.
 
I hope this helps.
 
-John
 
 

All Replies

Posted by Harold-Jan Verlee on 18-Jun-2019 07:53

Hi Branden,

Your batch read query fetches records one by one and seeds it to the decision service. You'd better use an IN statement there too: select detail_record_id from cms.cms_efd WHERE earnings_file_id IN ({CmsEfd.earningsFileId})

That way, the batch will process a chunked number of records  and will allow to write back to the DB with all its records in memory in one single pass (defined in your batch properties).

Hope this helps.

Regards,

Harold

Posted by Anverion on 18-Jun-2019 18:01

[View:/cfs-file/__key/communityserver-discussions-components-files/14/issue_5F00_with_5F00_batch_5F00_write.docx:320:240]

Thank you for your response, Harold. I made the change you suggested, however, since only a single file id will ever be submitted via the json request payload, this shouldn't have an impact. I have attached a word document with my whole process including the output. The issue is still present.

Thanks,

Branden

Posted by Harold-Jan Verlee on 19-Jun-2019 18:40

Thanks Branden for the very helpful clarifications. I see you're not doing any rules processing in your flow so Corticon may ignore updating your records in your table as it will see there is no difference with what is seeded by the batch processor to the flow (just the IDs), and the write step in your flow. Anyhow, I'm going on a 2 weeks break, so I'll hand over your request to one of my colleagues. Hope he'll get things sorted with you. Regards Harold

Posted by Anverion on 19-Jun-2019 18:48

Thanks Harold. I guess I'm just making sure that I'm not doing anything wrong with the SQL or how the UPDATE/INSERTS/DELETES work. READ using the IN clause perfectly, however, the UPDATE and DELETES do not follow the same convention by using a single statement and expanding the IN clause with the list of details. If this is how batch works, then great, I can try to make tweaks elsewhere. I'm just surprised if this is the case.

Enjoy your break!

-Branden

Posted by John Miller on 19-Jun-2019 20:15

Hi Branden,

You are doing everything correctly.  

ADC.write is designed to do a discrete SQL statement for each Entity that is in the Decision Service’s working memory.  The idea is that you want to update/insert/delete a discrete database record with the values from each discrete Entity instance…maintain this one-to-one mapping.  This allows us to keep database data in sync with the values in the in-memory Entities.  

In your example, you are updating values in the database with hard coded values instead of what is inside each Entity.  Even though you are allowed to update values with hard-coded values in this way, this is the exception and not the rule.

I hope this makes sense.

Regards,

John

Posted by Anverion on 19-Jun-2019 20:23

Thank you John. Yes, I had to create something simple instead of walking you through a 100+ rulesheet project :) So would it make a difference if all the values were coming from the entity? meaning, nothing hard-coded? Would I then, expect to see an update statement with multiple detail ids in in the IN clause?

Thank you for your prompt reply to the original. I'm a little under the gun right now to deploy my project but the I/O in the database is concerning.

Thanks,

Branden

Posted by Anverion on 19-Jun-2019 21:05

Hi John, just to see what would happen, I created a single rulesheet before my service connector that sets CmsEfd transient values for my UPDATE SQL. New UPDATE SQL:

UPDATE cms.cms_efd

   SET cms_efd.detail_status_id={CmsEfd.detailStatusId},

       cms_efd.updated_by={CmsEfd.tempUpdatedBy},

       cms_efd.updated_at={CmsEfd.tempUpdatedAt}

 where cms_efd.detail_record_id IN ({CmsEfd.detailRecordId})

The same behavior was the result:

SQL 1: select detail_record_id from cms.cms_efd WHERE earnings_file_id IN ('370')

SQL 2: UPDATE cms.cms_efd

  SET cms_efd.detail_status_id=1,

      cms_efd.updated_by='John Wick',

      cms_efd.updated_at='2019-06-19 13:55:59'

where cms_efd.detail_record_id IN ('370P1')

SQL 3: UPDATE cms.cms_efd

  SET cms_efd.detail_status_id=1,

      cms_efd.updated_by='John Wick',

      cms_efd.updated_at='2019-06-19 13:55:59'

where cms_efd.detail_record_id IN ('370P2')

SQL 4: UPDATE cms.cms_efd

  SET cms_efd.detail_status_id=1,

      cms_efd.updated_by='John Wick',

      cms_efd.updated_at='2019-06-19 13:55:59'

where cms_efd.detail_record_id IN ('370P3')

SQL 5:

UPDATE cms.cms_efd

  SET cms_efd.detail_status_id=1,

      cms_efd.updated_by='John Wick',

      cms_efd.updated_at='2019-06-19 13:55:59'

where cms_efd.detail_record_id IN ('370P4')

I'm guessing that this is just the behavior for a WRITE statement using batch. Like I said before, I would prefer to see 2 SQL statements:

SQL 1: select detail_record_id from cms.cms_efd WHERE earnings_file_id IN ('370')

SQL 2: UPDATE cms.cms_efd

  SET cms_efd.detail_status_id=1,

      cms_efd.updated_by='John Wick',

      cms_efd.updated_at='2019-06-19 13:55:59'

where cms_efd.detail_record_id IN ('370P1','370P2','370P3','370P4')

But maybe that'll be something they enhance down the road.

Thanks for your time and effort on this.

-Branden

Posted by John Miller on 19-Jun-2019 21:10

I don’t believe it would make a difference in performance if the values were part of the Entity or if they are hard coded.  I was just trying to hit home the idea of the one-to-one mapping of an Entity to a database record.
 
To your last question, you will always see a SQL statement for each instance of an Entity in memory.  But as stated before, the updates are added to a PreparedStatement Batch, which is optimal.
 
Example:  3 Person Entities in memory
Person
   Id = 1
   Name = Tom
   Validated = true
 
Person
   Id = 2
   Name = ***
   Validated = false
 
Person
   Id = 3
   Name = Harry
   Validated = true
 
ADC_WRITE_DEF -> SQL
   UPDATE Person SET validated={Person.validated}, Person.updated_by='John Wick', Person.updated_at=now() where Person.in IN ({Person.id})
 
This produces 3 SQL statements:
   UPDATE Person SET validated=true, Person.updated_by='John Wick', Person.updated_at=now() where Person.in IN (1)
   UPDATE Person SET validated=false, Person.updated_by='John Wick', Person.updated_at=now() where Person.in IN (2)
   UPDATE Person SET validated=true, Person.updated_by='John Wick', Person.updated_at=now() where Person.in IN (3)
 
Because Person.id = 2 has a Validated = false, which is different than Person.id = 1 and Person.id = 3, these SQL statements can’t be combined.
   UPDATE Person SET validated=true, Person.updated_by='John Wick', Person.updated_at=now() where Person.in IN (1, 2, 3) is incorrect.
 
For me, it is easier to understand that there is a discrete SQL statement for each Entity.  This reduces complexity and hopefully it reduces confusion in creating a valid ADC_WRITE_DEF -> SQL statement.
 
I hope this helps.
 
-John
 
 

Posted by John Miller on 19-Jun-2019 21:16

In your example, all of your CmsEfd Entities have the same detailStatusId (1), tempUpdateBy (‘John Wick’), and tempUpdatedAt ('2019-06-19 13:55:59'). 
 
What would you expect the SQL to look like if these values were different in each instance of the CmsEfd Entity?  There is no way to combine all 3 SQL statements into 1.
 

Posted by Anverion on 19-Jun-2019 21:16

John, it does help, thanks. I was just worried about the I/O in the database. If I'm processing 500k details, then, as things are now, I would expect to see 500k update statements. I was just worried about how MySQL would handle this many transactions in such a short amount of time.

I'm going to mark this as answered and give you the credit.

Thanks,

Branden

Posted by Anverion on 19-Jun-2019 21:19

Yes, you are correct. If they are different values, then I would expect different update statements.  I'm assuming that corticon has no way of grouping like values together in order to optimize the SQL output to the database. Now that I think about it, it makes a little more sense on why it's behaving this way.

This thread is closed