Hello--
I have noticed that when a batch runs and I'm doing anything OTHER than read from the database, the IN indicator doesn't seem to work. now, let me explain.
1. Created a basic ruleflow with a single service connector that updates a detail's status.
2. Created a BATCH_READ query that pulls all details based on an earnings file id.
EXAMPLE
CORTICON_ADC_BATCH_READ
1 | DetailsByFileId | select detail_record_id from cms.cms_efd WHERE earnings_file_id = {CmsEfd.earningsFileId} | CmsEfd | true |
This works and returns all of the distinct detail_record_ids based on the earnings_file_id.
Next, the service connector is called and the WRITE_DEF is the following:
UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ({CmsEfd.detailRecordId})
I would expect the following:
UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P1','613P2','613P3','613P4','613P5','613P6')
Instead, I get six UPDATE statements like the following:
UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P1')
UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P2')
UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P3')
UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P4')
UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P5')
UPDATE cms.cms_efd SET cms_efd.detail_status_id=1, cms_efd.updated_by='John Wick', cms_efd.updated_at=now() where cms_efd.detail_record_id IN ('613P6')
What's the point of using an IN indicator if it's going to add an update statement for each detail that's processed? Is this how batch works? For a few details, this isn't a big deal, however, we are processing thousands of details. I've seen as many as 500k in our production environment so, 500k individual update statements isn't going to work.
Thanks for your time,
Branden
Hi Branden,
Your batch read query fetches records one by one and seeds it to the decision service. You'd better use an IN statement there too: select detail_record_id from cms.cms_efd WHERE earnings_file_id IN ({CmsEfd.earningsFileId}).
That way, the batch will process a chunked number of records and will allow to write back to the DB with all its records in memory in one single pass (defined in your batch properties).
Hope this helps.
Regards,
Harold
[View:/cfs-file/__key/communityserver-discussions-components-files/14/issue_5F00_with_5F00_batch_5F00_write.docx:320:240]
Thank you for your response, Harold. I made the change you suggested, however, since only a single file id will ever be submitted via the json request payload, this shouldn't have an impact. I have attached a word document with my whole process including the output. The issue is still present.
Thanks,
Branden
Thanks Branden for the very helpful clarifications. I see you're not doing any rules processing in your flow so Corticon may ignore updating your records in your table as it will see there is no difference with what is seeded by the batch processor to the flow (just the IDs), and the write step in your flow. Anyhow, I'm going on a 2 weeks break, so I'll hand over your request to one of my colleagues. Hope he'll get things sorted with you. Regards Harold
Thanks Harold. I guess I'm just making sure that I'm not doing anything wrong with the SQL or how the UPDATE/INSERTS/DELETES work. READ using the IN clause perfectly, however, the UPDATE and DELETES do not follow the same convention by using a single statement and expanding the IN clause with the list of details. If this is how batch works, then great, I can try to make tweaks elsewhere. I'm just surprised if this is the case.
Enjoy your break!
-Branden
Hi Branden,
You are doing everything correctly.
ADC.write is designed to do a discrete SQL statement for each Entity that is in the Decision Service’s working memory. The idea is that you want to update/insert/delete a discrete database record with the values from each discrete Entity instance…maintain this one-to-one mapping. This allows us to keep database data in sync with the values in the in-memory Entities.
In your example, you are updating values in the database with hard coded values instead of what is inside each Entity. Even though you are allowed to update values with hard-coded values in this way, this is the exception and not the rule.
I hope this makes sense.
Regards,
John
Thank you John. Yes, I had to create something simple instead of walking you through a 100+ rulesheet project :) So would it make a difference if all the values were coming from the entity? meaning, nothing hard-coded? Would I then, expect to see an update statement with multiple detail ids in in the IN clause?
Thank you for your prompt reply to the original. I'm a little under the gun right now to deploy my project but the I/O in the database is concerning.
Thanks,
Branden
Hi John, just to see what would happen, I created a single rulesheet before my service connector that sets CmsEfd transient values for my UPDATE SQL. New UPDATE SQL:
UPDATE cms.cms_efd
SET cms_efd.detail_status_id={CmsEfd.detailStatusId},
cms_efd.updated_by={CmsEfd.tempUpdatedBy},
cms_efd.updated_at={CmsEfd.tempUpdatedAt}
where cms_efd.detail_record_id IN ({CmsEfd.detailRecordId})
The same behavior was the result:
SQL 1: select detail_record_id from cms.cms_efd WHERE earnings_file_id IN ('370')
SQL 2: UPDATE cms.cms_efd
SET cms_efd.detail_status_id=1,
cms_efd.updated_by='John Wick',
cms_efd.updated_at='2019-06-19 13:55:59'
where cms_efd.detail_record_id IN ('370P1')
SQL 3: UPDATE cms.cms_efd
SET cms_efd.detail_status_id=1,
cms_efd.updated_by='John Wick',
cms_efd.updated_at='2019-06-19 13:55:59'
where cms_efd.detail_record_id IN ('370P2')
SQL 4: UPDATE cms.cms_efd
SET cms_efd.detail_status_id=1,
cms_efd.updated_by='John Wick',
cms_efd.updated_at='2019-06-19 13:55:59'
where cms_efd.detail_record_id IN ('370P3')
SQL 5:
UPDATE cms.cms_efd
SET cms_efd.detail_status_id=1,
cms_efd.updated_by='John Wick',
cms_efd.updated_at='2019-06-19 13:55:59'
where cms_efd.detail_record_id IN ('370P4')
I'm guessing that this is just the behavior for a WRITE statement using batch. Like I said before, I would prefer to see 2 SQL statements:
SQL 1: select detail_record_id from cms.cms_efd WHERE earnings_file_id IN ('370')
SQL 2: UPDATE cms.cms_efd
SET cms_efd.detail_status_id=1,
cms_efd.updated_by='John Wick',
cms_efd.updated_at='2019-06-19 13:55:59'
where cms_efd.detail_record_id IN ('370P1','370P2','370P3','370P4')
But maybe that'll be something they enhance down the road.
Thanks for your time and effort on this.
-Branden
John, it does help, thanks. I was just worried about the I/O in the database. If I'm processing 500k details, then, as things are now, I would expect to see 500k update statements. I was just worried about how MySQL would handle this many transactions in such a short amount of time.
I'm going to mark this as answered and give you the credit.
Thanks,
Branden
Yes, you are correct. If they are different values, then I would expect different update statements. I'm assuming that corticon has no way of grouping like values together in order to optimize the SQL output to the database. Now that I think about it, it makes a little more sense on why it's behaving this way.