Link to home
Start Free TrialLog in
Avatar of Theo Kouwenhoven
Theo KouwenhovenFlag for Netherlands

asked on

What will happen with recordset when selected records are changed

What will happen in the following situation, I can't force this nor test this.

In a process a recordset is selected with SQL with a selection e.g. WHERE CODE = 1
(The result can contain a few hundred records.)
each record will be processed (within 15-30 seconds) and flagged as "CODE = 2"

Now a second job (same program) is started.
This program will also select a recordset with SQL with a selection e.g. WHERE CODE = 1.

What will happen when:
Job 1 select a records with code=1
(e.g result 250 records)

Job 2 select a records with code=1
(e.g result 240 records, 10 already processd by job 1)

Job 1 change value of a record to CODE = 2.
Is this record still in the recordset from Job 2?
Avatar of Éric Moreau
Éric Moreau
Flag of Canada image

a lot of information for us to correctly answer.

the recordset is a memory copy of the result of your query, it won't change even if your source is updated.

you could add "Where code = 1" in your update statement so that the record does not get updated a second time (but it will get processed a second time)
Avatar of Theo Kouwenhoven

ASKER

Hi Éric,

I will give you the detail information what is happening:
A poller job runs every 30 minutes.
This job is selecting a group of records with a readystatus (e.g. result 200).
followed by a loop like : For every record in Set:
Within this loop it will process each record and status is set to complete
This takes lets say 15 seconds.
So in 30 minutes the job can process round 120 records.

Then a new job is started by the poller and will select the last 80 records.
So they will be processed by job-1 and job-2,

If job-1 is setting a record to complete.
Is this record still available in the selection made by job-2 ?

If not, it will be enough to flag the status to "In progress" before processing it.
Otherwise, I have to change the program so that it will select the records one-by-one instead of a recordset.
if it is marked as complete only in memory (physical database still not updated), the job 2 will read exactly the same data. the 2 processes are not talking to each other.

you have a few options:
-take the TOP X records (that you are sure fits into your 30 minutes frame)
-use an "in progress" status as you said but you will need a way to reset that flag in case the job fails
-add a time to your table and only retrieve records inserted in the last 30 minutes
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

To further clarify:  Until the DML is committed, only the processes that performed the DML can see the modified records.  Most database products will not allow what is called a "dirty read".  That is:  Reading uncommitted changes.

You might also look into using a "select for update" in the jobs.  That will lock the selected rows until they have been processed.

In this case:  Job1 will select the rows for update and start processing them.  Job2 will try to query the rows but the job1 lock will make it wait.  When job1 commits, the lock is released and job2 will query the modified rows.
Hi slightwv,

I'm not sure when job1 commits, there is no commitment control on the file, I'm pretty sure that the status update is executed directly.

So the question is: Are the records in the selected set of job 2?

BTW, The database is DB2/400 approached with jdbc
I cannot answer your question. I can say that if they aren't committed in job1, then job2 will likely see the same records.

I'm not sure why you cannot test this yourself.  I googled around and there appears to be some methods to introduce a "sleep" in DB2.  Add a sleep to job1 to slow it down which will allow you time to query the rows from another session and see what you get.

If you can afford the time to lock the table, I would look into the "select for update" to prevent race conditions.
DB2 does allow dirty reads.  See this documentation page.  Not sure that is what you want as that can cause other issues, but it does allow it.  It all depends on what isolation level you have set.  DB2 is not my area of expertise, but it is certainly allowed.  CURSOR STABILITY is the default isolation level, which is what would be most common.  Allowing dirty reads and doing updates can lead to inconsistent results.
I too saw that when I googled it but didn't want to post it because it can get you into trouble pretty quick unless you 100% understand what you are doing.

For example:  Job1 sets a lot code=2 rows, job2 reads those and takes action then job1 for some reason rolls them back.  Job2 had "bad" information when it read those rows.
I guess I should have used the highlighting.  I meant to be that strong against it, but it didn't come through.  Definitely need to be very aware of the possible pitfalls of allowing dirty reads.  I thought the documentation link did explain the problems, but who reads those.
Oke clear (more or less :-)
So it's preferable to set the status to e.g. "Processing"
Then the question is how to do that safely,

There is a "Select from final table" on an insert command, but not on an update command.
What is the alternative way to get this done?
You would likely need a pre-processing step that "selects for update" to change the rows to "processing" to ensure the rows are locked to other processes.  That process would need to commit that change then continue on its way doing whatever else it does.
You would likely need something more than just PROCESSING.  If two of the processes can be running at once, there is no way to distinguish between the 2 sets of records with the same tag in the status field.  If your processing is keeping track of the records it updates in an array or something like that then the single status is OK, but if you are going to select the records back from the database based on the PROCESSING tag, that isn't enough.
@johnsone, I understand that :-)

@slightwv yes that is the plan, but how to set this status safely?
between select and update there is a little timegap (miliseconds), but still a gap.
I think that this page has a good example of SELECT ... FOR UPDATE and UPDATE ... WHERE CURRENT OF and how to make that flow.
Hi Theo,

DB2 controls concurrent access to data with a locking mechanism that is controlled by something IBM calls "isolation levels".  For your application, you want RS -- Read Stability.  It ensures that when a process reads a row for updating, another process cannot read the same row until the first process commits the update.

Here's IBM's description of the isolation levels:

 https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.admin.perf.doc/doc/c0004121.html

Good Luck!
Kent
Hi Kent,

I tried some locking, but it's not clear.
While using the STRSQL, I tried an select with RS lock, followed by an Update of the same record.
but the record stay unavailable for other processes.
I had to quit SQL to release the lock.
Why isn't is released after the update action?
Locks are released when the transaction is finalized.  After the update, if you issue a commit or rollback?  That will end the transaction and release the locks.
There is no commitment controll, so no commit os dond.
ASKER CERTIFIED SOLUTION
Avatar of Kent Olsen
Kent Olsen
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Kent,

I'm testing some options.