Restore taking too long - AS400

Hi,

I am having issues restoring a library or objects using RSTOBJ or RSTLIB commands. The restore normally takes 13-17 minutes, however, its taking 8-12 hours.  The save includes ACCPTH(*YES) so it shouldn't take too long to restore. Ironically, whenever, I save the same library using the SAVLIB command, it takes 13-20 minutes. But to restore is a nightmare.

Please help me with this. I have checked sysval (System Values) for access path and other sysval for restoring objects like verify objects with digital signatures, converting objects etc. All those sysval based on my research seem to be correct.

I have a feeling it has to do with journaling. However, I am not sure. The message I got from the log were:
1. New format X was created for the file because sharing of fomat X with file X in Library Y could not be established again. It could be dependencies (logical files).
2. We encountered an error message CPF3281
3. File X in library Y using format X with a correct format level identifier does not exist.  

Please in whatever way you can HELP!

Regards,
Prince
Prince BootheSystems Administrator Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Gary PattersonVP Technology / Senior Consultant Commented:
Can you please post a copy of the entire restore job log with second level text?  Ideally one of the fast ones and the slow one.  That should make it easy to zero in on the cause.

Is the system in a restricted state during these restores, or are there other workloads running at the time of the slow restore?  Were any other restores running simultaneously, or any other job that might have caused an extended period of disk utilization?

Do any of the logicals being restored depend on physical files in other libraries (libraries that were not being restored)?
0
Prince BootheSystems Administrator Author Commented:
Hi Gary,

Thank you for responding. Its greatly appreciated. as per your questions:

1. Is the system in a restricted state during these restores? NO
2.  are there other workloads running at the time of the slow restore? NO
3. Were any other restores running simultaneously? NO
4. any other job that might have caused an extended period of disk utilization? NO
5. Do any of the logicals being restored depend on physical files in other libraries (libraries that were not being restored)? YES. Normally, the physical files would restore first based on my understanding of the process (RSTLIB/RSTOBJ), but I am a novice to this thing. I am still learning so I might be wrong. But this issue occurred immediately after we completed a history purge by splitting a history file on November 11, 2017.

I was able to locate a log for November 16, 2017 when we tried to restore objects to a TEMPLIB as part of testing and I have a picture of the error received for the first restore attempt since the purge.

Please see attached files. The other logs were automatically deleted/removed based on a clean up agent we have on the system. It cleans every week.

I hope these can give you an hint into the issue. Thanks again.

Regards,
Prince.
ObjectRestore_Nov1617.txt
Restore-Error.JPG
0
Gary PattersonVP Technology / Senior Consultant Commented:
Looking at the job log you provided:

Restore job started at 11/16/17  15:37:45.750849
Got up to restoring FST500/CF9435 at (CF9435) at 11/16/17  15:38:11.565129 , less than 1 minute later.

Then nothing on the job log until the job was ended by user CAP500OPR at 11/16/17  17:31:56.027115  - almost 2 hours later.

I suggest you look at the backup tape (DSPTAP SAVRST(*YES)) and see what the next object in sequence is.  What type of object?  How big is it?  Does it have relationships to other objects?

Also, if you decide to do another test restore, use OUTPUT(*OUTFILE) to capture a detailed database table of each object as it is restored.  May help provide more information on long restore time causes.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Introduction to R

R is considered the predominant language for data scientist and statisticians. Learn how to use R for your own data science projects.

Prince BootheSystems Administrator Author Commented:
Hi Gary,

Great response.

I actually did that subsequent to this convo and we were able to identity the next object in the sequence. Ironically, we have a sandbox environment which is a modeled replica of the main environment on a different LPAR (Partition). Whenever, a restore is done using the same parameters in the modeled environment, its successful and ends in 17 minutes, while on our production/main system, its taking longer up to 8 hours.

I have checked and compared the restore parameters, system values (SYSVAL), SMAPP (System-managed access-path protection) parameters and they are all the same, yet the restore is taking long on the main system since the purge but takes normal time on the sandbox. We did an IPL to rebuild the database index incase there might be damaged objects or lost access paths and that didn't work, because it seem as though during the restore identifiers, keyes, access paths were rebuilding within the database. but...  

I maybe or rather I am missing something and I don't know what it is. As I said prior to this, we can backup (SAVLIB) files and it runs in the normal time and is successful. However, to restore is the problem.

Are there any parameters or software settings on the AS400 relating to the restore that might have changed? Are there settings or sysval that relate to the RSTLIB or RSTOBJ that can cause the restore to run longer than normal. Maybe?

Please advise.
Regards,
Prince
0
Gary PattersonVP Technology / Senior Consultant Commented:
It isn't unusual for a restore operation to take longer than a save operation.  

Lots of variables influence the actual speed difference.  The condition you describe is pretty extreme, however.

For most devices (HDD and SDD) raw disk read speeds are slightly faster than raw disk write speeds.  But if you are using a RAID 0 disk array, for example, write performance is cut in half compared to read performance, so it wouldn't be unreasonable to expect a restore operation with all other factors being even, to take twice as long as the original save.

If there is other activity in the system at the time of the restore, this can have an impact on restore performance.  If the restore job has to compete with other jobs to get locks on objects being restored, that is going to add time.  If some of the database files being restored have to have access paths rebuilt, that will cause restore time to increase.

Yes, there are system values and restore parameters that can increase restore time.  Unfortunately, there are a lot of variables - which makes this sort of problem difficult to diagnose by posting back and forth on a forum like this.  My basic recommendation is to run the restore jobs on both partitions, and capture job logs and post them.  Also capture the OUTPUT() information previously discussed.  A simple comparison of timestamps so that we can identify any particularly "slow" objects would go a long way to narrow down the remaining troubleshooting.

It also wouldn't be a bad idea to run a Job Watcher or Disk Watcher collection, or at least a Performance Collection while running the restore jobs.

Are you under and IBM support contract?  If so, you may want to open a support ticket.  IBM may be able to review performance data, job logs, systems settings, etc and provide you with a more accurate response.

What OS version are you running on each partition?
0
Prince BootheSystems Administrator Author Commented:
Good Day Gary,

Thank you for taking the time to respond. I will act upon the recommendations you've given me and let you know the outcome as well as the comparative logs from the OUTPUT() captured during the restores.

Are you under and IBM support contract? YES. We have opened a support ticket with IBM and are awaiting a response on when they would start the checks.

Regards,
Prince.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
IBM System i

From novice to tech pro — start learning today.