AS/400 File Reorganization...

We have an IBM System I with i5/OS V5R4.    We would like to do a file reorganization on the 100 largest files each month.     The problem I am having is trying to get the file information (filename, library, file size) into a file which I can sort.     What is the easiest way I can do that?
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Run the following command:
===> dsplib lib(*all) output(*print)

once complete, open iSeries Operations Navigation, select your system i --> Basic Operations --> Printer Output.
The spool file name should be Qpdsplib.  Drag-n-drop to your local hard drive.  You now have a text document you can alter.  Import into excel or another spreadsheet app and you should be able to sort.

It may be easier to run the dsplib command on each library in question individually.  The output would be much easier to work with and take far less time to generate than with lib(*all).

hope that helps....

Gary PattersonVP Technology / Senior Consultant Commented:

Use the DSPFD command with the TYPE(*ATR) FILEATR(*PF) OUTPUT(*FILE) option to dump a list of physical files to an output file.  If your files are in multiple libraries, use wildcards, or execute the command multiple times, ADDing records to the output member rather than REPLACing them.

In addition to the file name, you'll get other useful information, including the record length, number of records, and number of deleted records.  You can name the output file anything you want, and use any tool you like (Query/400, DFU, SQL, etc.) to view the contents of the file and field names, or you can download the file using Client Access and view it in Excel or the format of your choice.    

In the past, I've developed automated reorg processes for clients that work basically like this:

1) Dump a list of candidate files to an output file using DSPFD
2) Select files needing a reorg using SQL, RUNQRY, FMTDTA, OPNQRYF, etc., sorting by the files that contained the most slack space first (number of deleted records * bytes per record)
3) Process each selected file:
3a) Estimate the time to reorg and rebuild indexes.  If the remaining reorg time window is not long enough, stop here.  Otherwise:
3b) Use the RMVLFM command to remove logical file members.
3c) RGZPFM the file
3d) ADDLFM to rebuild access paths

If you are reorganizing principally to recover deleted space, you should consider changing your files to Reuse Deleted Records (CHGPFM command).  

On growing files, you'll never have to reorg again.  Generally, if it is safe to reorg a file, it is safe to Reuse Deleted Records.  Be aware that enabling Reuse Deleted Records can slow down file write activity, since the system has to search out deleted records throughout the file and write to that particular page of the file, rather than just accumulating a block of records and tacking it onto the end of the file in a single operation.

- Gary Patterson

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Gary PattersonVP Technology / Senior Consultant Commented:
By the way, there is no reason that you can't use the DSPLIB method hdgh outlines above.  The DSPFD approach does have several advantages, however:

DSPLIB does not offer an OUTPUT(*OUTFILE) option, so you have to parse a spooled file containing page and column headings that you have to deal with.  
DSPLIB includes all objects in a given library, not just physical files.  DSPFD FILEATR(*PF) produces a list of only physical files.
DSPFD gives you deleted records and record length, so you can calculate your space savings.
DSPFD gives you a native database file that you can use, for example, to drive an automated file reorg program or print a report for manual reorgs.

- Gary Patterson
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.


> We would like to do a file reorganization on the 100 largest files each month.


It's hard to imagine a system where this would make sense. Is this purely a learning exercise or do you seriously intend to implement something like this?

I did a project a number of years ago for a regional (Washington, Oregon, Idaho, and possibly sections of other states) wholesale food distributor. Their file structure was different from most others I've seen. E.g., there were essentially no "orders" entered from customers; but rather the "line items" were generated by various algorithms that predicted how much of a given commodity food item would need to be loaded onto trucks headed to various cities/towns, based on prior activities. (Simplified, but close enough. It took a while grasp order line items that had no "orders".)

Their files grew and shrank by enormous amounts over a day's processing and reorgs were part of doing business.

But even slightly reasonably designed databases (in DB2 under OS/400 or later) are rarely benefited from such regular reorgs... _especially_ if you're really thinking of the "100 largest files".

OTOH, I can imagine regular _examinations_ of the largest files to see if a reorg is applicable.

But even so, what is the reason for considering the largest, rather than looking for ones that would return some benefit? Maybe it's your _smallest_ files that need a reorg? What has led you to think that "largest" makes a difference?

I really hope that that doesn't come across as negatively critical. I only hope to make certain that the point of reorgs is well understood.

Certainly, one benefit of a reorg _can_ be the recovery of space that was previously used for records that have since been deleted. In those cases, for large files that have high volatility, a reorg can be worthwhile. Ideally, though, it probably should be done once and infrequently after that, rather than monthly. But at that first time, the file attributes should be set so that deleted record space doesn't accumulate in the future.

Also, under some circumstances, you might have volatility that's combined with a lot of sequential-by-key accesses through a specific index. You'd want to reorg those few files according to the order of the particular index that controls the sequential access. But I wouldn't expect the "100 largest files" to all fit in that category. Often, it's either volatile _or_ sequential-by-key; and just as often, one of those doesn't apply.

So, do you have a specific business problem to solve? Maybe there are much better methods. Are you looking to learn how to approach a problem such as you've laid out in order to understand how it would be handled? It's certainly an easy request, and example programs could be supplied. Or has the question been completely misunderstood?

Can you elaborate?

Gary PattersonVP Technology / Senior Consultant Commented:
As usual, Tom brings up a good point (or two).

Reorganizing the 100 largest files (unless you've already done the analysis and determined that the 100 largest also happen to have the largest amount of deleted space) isn't necessarily the best use of reorg time.

One good way to determine the optimum files to reorg is to use the mechanism that I outlined above.  Estimate the number of bytes to be recovered and reorg the files with the highest potential recovery:

Number of deleted records * record length = Estimated space recovered.

This assumes, of course, that deleted space recovery is your main goal, as opposed to physical reordering of records into key sequence.  That requires a different analysis process, including analysis of programs accessing these files.  You want to identify those files that have the highest incidence of sequential-by-key access and where a strong preference for a single index exists.

- Gary
jcurtis1027Author Commented:
Thanks for all your input...  Tom we are a wholesale food distributor (but on the East Coast).    We are looking as part of our month end processing reorg'ing the 100 largest files.    We figure in time these all files will eventually be reorg'ed.    This is more for disk space than performance.

Wholesale food distributor? Wow. Serendipity!

In that case, I can accept the possibility. Whether it's appropriate or not is still open.

Recovery of deleted-record space -- you _might_ be better off setting the files to re-use deleted-record space:

 ==>  chgpf  mylib/mylargefile  REUSEDLT( *YES )

The idea wouldn't "recover" the space, but it would help control the future allocations of space. The data spaces wouldn't be extended unless additional space was needed at run-time after deleted record areas were populated by active records.

The consideration would be whether or not you _need_ to ensure that sufficient space is allocated and available when your month-end runs. You might "recover" the space today, but month-end might crash when it tries to grow the files during month-end run-time. It's always possible that there simply won't be enough available space.

When space is released at the start of the month, it becomes available for anything else to use. How will you guarantee that some rogue query won't chew up just enough space in the 2nd week of the month that you'll exceed your limit in the middle of month-end?

That is, your 100 largest files _probably_ ought to hold on to a lot of space.

You can certainly run reorgs, but I would strongly recommend that the files get explicit size allocations also. The allocated sizes should be large enough to accommodate most normal month-ends with minimal additional allocations.

So, first thing would be to log the sizes of your large files after month-end completes. How many records fit into the sizes that are allocated, including deleted records? Whatever those sizes are, they are close to what ought to be allocated _before_ month-end even begins.

Note that space allocation takes time. You probably don't want your month-end spending more time linking new disk sectors into the file spaces than the time used for actual processing.

Consider running a job right after month-end every month. For each large file:

 ==>  dspfd  mylib/mylargefile type( *MBR ) output( *OUTFILE ) fileatr( *PF ) outfile( datalib/largembrs ) outmbr( *FIRST *ADD )

After all of the large files have their member attributes captured, you can collect the records into a history file to track growth or trends or whatever. You will have a picture of how big each file's _allocation_ ought to be. Once you know that, reorgs could be done to give you lots of room during the month.

But maybe a day before next month-end, run another job:

 ==>  chgpf  mylib/mylargefile  size( &x &y &z ) allocate( *YES )

The &x, &y and &z values would come from however you choose to assign sizes. Perhaps you'll have programming to read the member history and to generate size() values based on your history.

That job would allocate the spaces ahead of time. You'd not only know that space is actually available; you'd also have much of the processing done for allocations without impacting month-end run-times.

Note that the REUSDLT(*YES) attribute probably still is useful.

Also note that the ALLOCATE(*YES) attribute is actually applied to _members_ rather than files. It doesn't actually take effect until the members are "cleared, restored or reorganized". That means that a reorg would be appropriate right after each CHGPF command. It might also mean that similar jobs would be run after month-end -- CHGPF to reduce allocated attributes and RGZPFM to enforce reductions. (CLRPFM would work as well.)

In any case, consider creating a job description that has a library list that includes only those libraries that hold your "100 largest files". You can then submit a job using that job description for this command:

 ==>  dspfd  *USRLIBL/*ALL  type( *MBR ) output( *OUTFILE ) fileatr( *PF ) outfile( datalib/largembrs ) outmbr( *FIRST *ADD )

That job would then produce a member list of the files in the job description library list. You could use SQL DELETE to remove all rows that show smaller than some size limit that you choose. Remaining rows would be your "n largest files". (If you don't have SQL available, there are still plenty of possibilities.)

From that file, you can generate one, two or more other files to drive other jobs -- reorgs, allocations, whatever.

Overall, the whole thing ought to be possible to set up in a day's worth of time (scattered hours over a few days). Once done, it should more or less run itself.

Am I making sense? Plenty of details can be described or lots of different schemes can be proposed. (Or we can just do straight reorgs if you really want to.)

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Operating Systems

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.