Wednesday, February 17, 2010

9. Sorting and merging files

Sorting
Records in files frequently must be sorted into specific sequences for updating, answering inquiries, or generating reports. Sorting is a common procedure used for arranging records into a specific order so that sequential processing can be performed.
COBOL has a SORT verb, which can make it very useful as part of a COBOL program.
The programmer must specify whether the key field is to be an ASCENDING KEY or a DESCENDING KEY, depending on which sequence is required :
ASCENDING : From lowest to highest
DESCENDING : From highest to lowest
The SORT verb may be used to sequence records with more than one key field. For example, to sort an employee file so that it is in alphabetic sequence by name within each department.

Eg 9.1:
SORT SORT-FILE
ON ASCENDING KEY S-EMP-DEPT
ON ASCENDING KEY S-EMP-NAME
USING EMPLOYEE-FILE
GIVING SORT-EMPLOYEE-FILE.

There are three major files used in a sort :
Input file : File of unsorted input records.
Work or sort file : File used to store records temporarily during the sorting process.
Output file: File of sorted output records.

All these files would be defined in the ENVIRONMENT DIVISION using standard ASSIGN clauses, which are system dependent. The SORT-FILE is actually assigned to a temporary work area that is used during processing but not saved. Only the unsorted disk file and the sorted output disk file are assigned standard file-names so that they can be permanently stored.
FDs are used in the DATA DIVISION to define and describe the input and output files in the usual way. The sort or work file is described with an SD (sort file description) entry. The only difference between SD and FD entries is that an SD must not have a LABEL RECORDS clause. Note, too, that the field(s) specified as the KEY field(s) for sorting purposes must be defined as part of the sort record format.

Eg 9.2:
ENVIRONMENT DIVISION.
:
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT SORT-FILE
ASSIGN TO DISK.
:
DATA DIVISION.
FILE SECTION.
SD SORT-FILE.
01 SORT-REC.
05 S-EMP-NAME.
10 S-EMP-FIRST-NAME PIC X(10).
10 S-EMP-LAST-NAME PIC X(15).
05 S-EMP-DEPT PIC X(4).
05 FILLER PIC 9(13).

The SORT statement can, however, be used in conjunction with procedures that process records just before they are sorted and/or process records after they are sorted.

INPUT PROCEDURE
The INPUT PROCEDURE processes data from the incoming file prior to sorting. An INPUT PROCEDURE may be used to perform the following operations prior to sorting : (1) validate data in the input records, (2) eliminate records with blank fields, (3) count input records.

With COBOL 74, the procedure-name of an INPUT PROCEDURE must be a section-name and not a paragraph-name. A section is a series of PROCEDURE DIVISION paragraphs that is treated as a single entry or unit. Rule for forming section-names are the same as rules for forming paragraph-names. The word SECTION, however, follows a section-name (e.g., A000-ERROR SECTION). The end of a section is recognized when another section-name is encountered, or when the end of the program is reached.

Code for an INPUT PROCEDURE
Eg 9.3:
SORT SORT-FILE
ON ASCENDING KEY S-EMP-DEPT
ON ASCENDING KEY S-EMP-NAME
INPUT PROCEDURE A000-TEST-IT
GIVING SORT-EMPLOYEE-FILE.
STOP RUN
A000-TEST-IT SECTION.
A100-PARA-1.
OPEN INPUT IN-FILE.
READ IN-FILE
AT END MOVE “NO” TO ARE-THERE-MORE RECORDS.
PERFORM A200-TEST-RTN
UNTIL THERE-ARE-NO-MORE-RECORDS.
CLOSE IN-FILE.
GO TO A300-TEST-IT-EXIT.
A200-TEST-RTN.
IF QTY = ZEROS
NEXT SENTENCE
ELSE
MOVE IN-REC TO SORT-REC
RELEASE SORT-REC.
READ IN-FILE
AT END MOVE “NO” TO ARE-THERE-MORE RECORDS.
A300-TEST-IT-EXIT.
EXIT.

Explanation
The first section in the PROCEDURE DIVISION contains the SORT instruction, any processing to be performed before or after the SORT verb is executed, and a STOP RUN.
The second section begins with the main module of the INPUT PROCEDURE. It opens the input file, reads the first record, and then performs a process routine (in a separate paragraph within this second section) until there is no more data.
After the separate paragraph is executed until ARE-THER-MORE-RECORDS = “NO”, control returns to the main module of the second section to be terminated, control must pass to the last statement within the section. This means that a GO TO is required. We code GO TO A300-TEST-IT-EXIT as the last sentence. Since no operations are required in this last paragraph, EXIT is coded, which passes control back to the SORT statement, where the file is then sorted.

OUTPUT PROCEDURE
The OUTPUT PROCEDURE is used to process the sorted records prior to, or perhaps even instead of, placing them in the output file. The OUTPUT PROCEDURE can be used instead of the GIVING option. The OUTPUT PROCEDURE is similar to the INPUT PROCEDURE. When the INPUT PROCEDURE is complete, the file is then sorted. An OUTPUT PROCEDURE processes all sorted records in the sort file and handles the transfer of these records to the output file.
In an INPUT PROCEDURE we RELEASE records to a sort file rather than writing them. In an INPUT PROCEDURE we RETURN records from the sort file rather than reading them.

Merging
The MERGE statement combines two or more files into a single file. Its format is similar to the SORT. The file to be merged is a work file designated as an SD. At least two file-names must be included for a merge, but more than two are permitted. Unlike, the SORT, however, an INPUT PROCEDURE may not be specified with the MERGE statement. That is, using the MERGE statement, you may only process records after they have been merged, not before. The OUTPUT PROCEDURE has the same format as with the SORT.

Eg 9.4:
MERGE MERGE-FILE
ON ASCENDING KEY M-EMP-DEPT
USING OLD-PAYROLL
NEW-PAYROLL
GIVING EMPLOYEE-FILE.

Back to COBOL Index

No comments: