Can you use multi-tenancy for archiving purposes?

Posted by rbf on 28-Feb-2012 03:38

We have a complex database with, say, 500,000 employee records. Currently only about 25,000 of those are actually working for us but of course we have to keep all the data for our ex-employees on file.

However, each time we are searching the database we have have to fetch the last corresponding contract of each employee and check the start- and end dates in order to determine if they are 'active'. These queries are very expensive, especially since 800 concurrent users are executing them all the time. But since the queries are time-sensitive we cannot simply solve this with an index.

I know we could add a redundant logical field 'Active' in the employee record that is updated each night in a batch program and with each contract update but in real life the application is much more complex than described above and I am trying to weigh this option against the following idea.

As mentioned above, only 5% of the employees is really active and over time as the database grows that percentage will drop even further. If we could separate out that 5% while not loosing access to the 95% when needed performance would increase by a factor 20 without changing much else.

What we would really like to do is archive the 95% of inactive employees in such a way that they are still accessible on-line to certain users and available for re-activation when an employee returns. It would be great if we could store them in different tenant areas so the 'active' area is lean and mean while the 'inactive' area is slow but still available, basically for re-activation. There really are no queries on the inactive employees, when they are needed for reactivation they are identified by key and should be moved to the active tenant area.

Is this a feasable idea or is multi-tenancy not suitable for this?

All Replies

Posted by jmls on 28-Feb-2012 06:06

We did something similar in the beta phase, and it seemed to work well.

However, there didn't seem to be a change-tenant command, so buffer-copied

the record into a temp-table (tt records are not part of the tenant

functionality) , set-effective-tenant to the "archive" tenant and then

created the new record from the tt.

hopefully I didn't read the documentation properly , or some extra stuff

has been added to make this easier.

Julian

On Feb 28, 2012 9:39 AM, "Peter van Dam"

Posted by Admin on 28-Feb-2012 06:16

Just to round things up (not relevant for Peter in this situation at he's having 800 users).

Multi-tenancy is only available as an extra feature on top of the enterprise DB. So no option for users that may need to deploy (on other sites or remote workers) to workgroup DB or personal DB.

This lets me hesitate a bit, to look at MT as the wonder tool for every purpose.

Posted by jmls on 28-Feb-2012 06:46

Surely everyone has enterprise?

On Feb 28, 2012 12:17 PM, "Mike Fechner"

Posted by Admin on 28-Feb-2012 06:51

Surely everyone has enterprise?

I love your sense of humor.

Posted by Tim Kuehn on 28-Feb-2012 07:35

jmls wrote:

Surely everyone has enterprise?

That's like saying "surely everyone has deep pockets"

Posted by jmls on 28-Feb-2012 09:21

And you surely missed the

Posted by Tim Kuehn on 28-Feb-2012 10:01

why do you think I added my own ":)"?

Posted by Thomas Mercer-Hursh on 28-Feb-2012 11:09

How many table are involved in what you want to archive?   Could this be as simple as having one table for active employees and one for inactive?

Posted by rbf on 28-Feb-2012 17:14

How many table are involved in what you want to archive?

About a dozen. Some of them have hundreds of records.

Could this be as simple as having one table for active employees and one for inactive?

Sounds a bit oversimplified.

Posted by rbf on 28-Feb-2012 17:17

Julian wrote:

We did something similar in the beta phase, and it seemed to work well.

However, there didn't seem to be a change-tenant command, so buffer-copied

the record into a temp-table (tt records are not part of the tenant

functionality) , set-effective-tenant to the "archive" tenant and then

created the new record from the tt.

hopefully I didn't read the documentation properly , or some extra stuff

has been added to make this easier.

Ouch... I guess in a scenario like this, where the employee history spans maybe a dozen tables and thousands of records, it does not sound attractive to have to move data between tenants in this way.

For example, I would not want any time stamp to be affected (apart from the complexity of the above operation).

Can anybody shed more light on this?

Posted by Thomas Mercer-Hursh on 28-Feb-2012 17:31

Well, it certainly is a strategy used in a number of contexts.  E.g., OpenOrders vs ClosedOrders.   Lets one keep an indefinite amount of history and yet also keep the active set to a minimum.  Do all tables need to be separated or might some be left where they are?

Posted by rbf on 28-Feb-2012 17:41

tamhas wrote:

Well, it certainly is a strategy used in a number of contexts.  E.g., OpenOrders vs ClosedOrders.   Lets one keep an indefinite amount of history and yet also keep the active set to a minimum.

Like I mentioned in the OP there is no such thing as an 'active employee'. If I could create a flag for that I would not need to ponder about separating the data out into tenants.

Instead 'activity' is a property that changes over time and must be determined by querying several related tables by date ranges. That is my original problem.

So I would move employees to the archive if there has been no activity for, say, a year or six months.

There is your flag, you might say.

Do all tables need to be separated or might some be left where they are?

That is an approach that could be considered. It could be sufficient to separate out only the employee table and not all subtables, although it would certainly benefit performance on all secondary processes as well if the size of the active database (tenant) could be reduced by 95%.

Posted by Thomas Mercer-Hursh on 28-Feb-2012 18:00

My point is that either you leave everyone in the same table or you don't.  If you have some basis for moving them to another tenant, they you have a basis for moving them to another table.  Likewise, if you have a basis for separating them at all, then you have a basis for setting a flag that could be used to limit the number of records read.  If there is no basis for flag/tenant/table, then you are stuck with them in the same table.

Posted by Tim Kuehn on 01-Mar-2012 13:14

tamhas wrote:

My point is that either you leave everyone in the same table or you don't.  If you have some basis for moving them to another tenant, they you have a basis for moving them to another table.

Actually, this isn't the case. Had you read up on how MT works, you'd know that each data tenant parition is structurally identical across all partitions, but that each tenant partition's data is completely isolated from all the other tenant parititons. As such, Peter's company could setup an archiving system which copies "enough data" from the operational tenant partition to the archive tenant partition so that the archive tenant partition looks just like the active tenant partition, except that all it's data is archival, not currently operational.

Doing this would be a lot easier and more functional than moving the data to "another table" and then having to write code to maintain and access that alternate table or set of tables.

Posted by Thomas Mercer-Hursh on 01-Mar-2012 13:34

My point was that Peter was saying that "active" was a squishy concept because it ultimately required real time evaluation of the start and stop dates, but, if there is some basis for separating active from inactive ... which one would have to do if one were going to move something to another tenant ... then the same criteria could be used for moving to a separate table.  If there is no such basis, then one can't do either.

As to which might be programmatically easier to access once the division had been made, that is something that would require more examination.  If there is just one or two functions where someone needs to search the archived data, then adapting those to reference an additional table would not be a lot of work.  If it is a whole lot of functions and there were some people accessing the whole thing and most people not, then it might be easier to handle it by having two tenants with some people entitled to see the whole thing.   If the same person needs to look either at just the active or everyone then using multi-tenancy may not be so easy since it would require qualifying all queries for the active set with a restrictive tenant even though the user was otherwise authorized to access everything.

Posted by rbf on 01-Mar-2012 14:31

If it is a whole lot of functions and there were some people accessing the whole thing and most people not, then it might be easier to handle it by having two tenants with some people entitled to see the whole thing.

That is what I would like to do but someone told me this cannot be achieved with multi-tenancy. You cannot access multiple tenants at the same time because they reside in different places and each have their own indexes.

Or can you?

Posted by Thomas Mercer-Hursh on 01-Mar-2012 14:38

It is possible if a person is logged in as a super-tenant.  Then they can see everything.  But, they can see everything everywhere unless the query is qualified by a specific tenant.  I.e., it would take some work for them to look only at active records.

Posted by Tim Kuehn on 01-Mar-2012 16:07

rbf wrote:

That is what I would like to do but someone told me this cannot be achieved with multi-tenancy. You cannot access multiple tenants at the same time because they reside in different places and each have their own indexes.

Or can you?

It is possible to create an MT tenant group, assign a table to that group, and then add user tenants to that group.

There's also the shared data partition, which is visible by all tenants.

While it is possible for a super-tenant to access all data tenant records, I think trying to adapt a long-standing application to do that would be highly problematic.

Posted by Tim Kuehn on 01-Mar-2012 16:07

delete duplicate post

Posted by stefan.lang@havilog.com on 04-Mar-2012 14:01

It looks like if a normal archive application pattern should solve your problem.

1. Create a runing copy of your Application. The archive. Make it read-only.

2. Create an un-load (export to archive and delete from prod) inactive employees data from production to archive function and run it regulary.

3. Create an re-load (import from arch and optionally delete in archive) inactive employees back from archive to production function and call it on demand.

The inactive data can be access in the archive with all normal reports of your application.

Both systems must be managed to have the same version.

If you have write functions to inactive data today or if you do need reports about both data sets, you maybe need to change some designs.

The same is true if you can not make the archive read-only.

You can have some variations about if and how to delete data with the unload and reload functions.

You can do the same with multi-tenant functionality, if you are willing to pay and to add domains to your security.

But in this case the size of the production database will never decrease. In normal archive patterns it is desired to keep production as

small as possible.

Posted by rbf on 06-Mar-2012 15:25

Hi Stefan, thank you for your reply.

The scenario you are describing is certainly feasable, I was just hoping that multi-tenancy could make life a little easier.

I am beginning to think it would not.

Thanks everybody for your replies.

-peter

Posted by jmls on 06-Mar-2012 16:15

And so, for the record (and any poor soul searching for a solution to the same problem) , what was your findings, and what are you going to do moving forward ?

Thanks !

Posted by rbf on 06-Mar-2012 16:47

jmls wrote:

And so, for the record (and any poor soul searching for a solution to the same problem) , what was your findings, and what are you going to do moving forward ?

Thanks !

For now it means this is not low hanging fruit for which multi-tenancy is the holy grail.

Therefore, most likely, this project cannot be pushed higher up the priority list.

Posted by jmls on 06-Mar-2012 17:01

oh well. c'est la vie

This thread is closed