The Art of
ASSEMBLY LANGUAGE PROGRAMMING

Chapter Thirteen (Part 7)

Table of Content

Chapter Thirteen (Part 9) 

CHAPTER THIRTEEN:
MS-DOS, PC-BIOS AND FILE I/O (Part 8)
13.3.10 - Blocked File I/O
13.3.11 - The Program Segment Prefix (PSP)

13.3.10 Blocked File I/O

The examples in the previous section suffer from a major drawback, they are extremely slow. The performance problems with the code above are entirely due to DOS. Making a DOS call is not, shall we say, the fastest operation in the world. Calling DOS every time we want to read or write a single character from/to a file will bring the system to its knees. As it turns out, it doesn't take (practically) any more time to have DOS read or write two characters than it does to read or write one character. Since the amount of time we (usually) spend processing the data is negligible compared to the amount of time DOS takes to return or write the data, reading two characters at a time will essentially double the speed of the program. If reading two characters doubles the processing speed, how about reading four characters? Sure enough, it almost quadruples the processing speed. Likewise processing ten characters at a time almost increases the processing speed by an order of magnitude. Alas, this progression doesn't continue forever. There comes a point of diminishing returns- when it takes far too much memory to justify a (very) small improvement in performance (keeping in mind that reading 64K in a single operation requires a 64K memory buffer to hold the data). A good compromise is 256 or 512 bytes. Reading more data doesn't really improve the performance much, yet a 256 or 512 byte buffer is easier to deal with than larger buffers.

Reading data in groups or blocks is called blocked I/O. Blocked I/O is often one to two orders of magnitude faster than single character I/O, so obviously you should use blocked I/O whenever possible.

There is one minor drawback to blocked I/O-- it's a little more complex to program than single character I/O. Consider the example presented in the section on the DOS read command:

Example: This example opens a file and reads it to the EOF

                mov     ah, 3dh         ;Open the file
                mov     al, 0           ;Open for reading
                lea     dx, Filename    ;Presume DS points at filename
                int     21h             ; segment
                jc      BadOpen
                mov     FHndl, ax       ;Save file handle

LP:             mov     ah,3fh          ;Read data from the file
                lea     dx, Buffer      ;Address of data buffer
                mov     cx, 1           ;Read one byte
                mov     bx, FHndl       ;Get file handle value
                int     21h
                jc      ReadError
                cmp     ax, cx          ;EOF reached?
                jne     EOF
                mov     al, Buffer      ;Get character read
                putc                    ;Print it (IOSHELL call)
                jmp     LP              ;Read next byte

EOF:            mov     bx, FHndl
                mov     ah, 3eh         ;Close file
                int     21h
                jc      CloseError

There isn't much to this program at all. Now consider the same example rewritten to use blocked I/O:

Example: This example opens a file and reads it to the EOF using blocked I/O

                mov     ah, 3dh         ;Open the file
                mov     al, 0           ;Open for reading
                lea     dx, Filename    ;Presume DS points at filename 
                int     21h             ; segment
                jc      BadOpen
                mov     FHndl, ax       ;Save file handle

LP:             mov     ah,3fh          ;Read data from the file
                lea     dx, Buffer      ;Address of data buffer
                mov     cx, 256         ;Read 256 bytes
                mov     bx, FHndl       ;Get file handle value
                int     21h
                jc      ReadError
                cmp     ax, cx          ;EOF reached?
                jne     EOF
                mov     si, 0           ;Note: CX=256 at this point.
PrtLp:          mov     al, Buffer[si]  ;Get character read
                putc                    ;Print it
                inc     si
                loop    PrtLp
                jmp     LP              ;Read next block

; Note, just because the number of bytes read doesn't equal 256, 
; don't get the idea we're through, there could be up to 255 bytes 
; in the buffer still waiting to be processed.

EOF:            mov     cx, ax
                jcxz    EOF2            ;If CX is zero, we're really done.
                mov     si, 0           ;Process the last block of data read 
Finis:          mov     al, Buffer[si]  ; from the file which contains 
                putc                    ; 1..255 bytes of valid data.
                inc     si
                loop    Finis

EOF2:           mov     bx, FHndl
                mov     ah, 3eh         ;Close file
                int     21h
                jc      CloseError

This example demonstrates one major hassle with blocked I/O - when you reach the end of file, you haven't necessarily processed all of the data in the file. If the block size is 256 and there are 255 bytes left in the file, DOS will return an EOF condition (the number of bytes read don't match the request). In this case, we've still got to process the characters that were read. The code above does this in a rather straight-forward manner, using a second loop to finish up when the EOF is reached. You've probably noticed that the two print loops are virtually identical. This program can be reduced in size somewhat using the following code which is only a little more complex:

Example: This example opens a file and reads it to the EOF using blocked I/O

                mov     ah, 3dh         ;Open the file
                mov     al, 0           ;Open for reading
                lea     dx, Filename    ;Presume DS points at filename
                int     21h             ; segment.
                jc      BadOpen
                mov     FHndl, ax       ;Save file handle

LP:             mov     ah,3fh          ;Read data from the file
                lea     dx, Buffer      ;Address of data buffer
                mov     cx, 256          ;Read 256 bytes
                mov     bx, FHndl       ;Get file handle value
                int     21h
                jc      ReadError
                mov     bx, ax          ;Save for later
                mov     cx, ax
                jcxz    EOF
                mov     si, 0           ;Note: CX=256 at this point.
PrtLp:          mov     al, Buffer[si]  ;Get character read
                putc                    ;Print it
                inc     si
                loop    PrtLp
                cmp     bx, 256         ;Reach EOF yet?
                je      LP

EOF:            mov     bx, FHndl
                mov     ah, 3eh         ;Close file
                int     21h
                jc      CloseError

Blocked I/O works best on sequential files. That is, those files opened only for reading or writing (no seeking). When dealing with random access files, you should read or write whole records at one time using the DOS read/write commands to process the whole record. This is still considerably faster than manipulating the data one byte at a time.

13.3.11 The Program Segment Prefix (PSP)

When a program is loaded into memory for execution, DOS first builds up a program segment prefix immediately before the program is loaded into memory. This PSP contains lots of information, some of it useful, some of it obsolete. Understanding the layout of the PSP is essential for programmers designing assembly language programs.

The PSP is 256 bytes long and contains the following information:

Offset  Length  Description 
0       2       An INT 20h instruction is stored here
2       2       Program ending address
4       1       Unused, reserved by DOS
5       5       Call to DOS function dispatcher
0Ah     4       Address of program termination code
0Eh     4       Address of break handler routine
12h     4       Address of critical error handler routine
16h     22      Reserved for use by DOS
2Ch     2       Segment address of environment area
2Eh     34      Reserved by DOS
50h     3       INT 21h, RETF instructions
53h     9       Reserved by DOS
5Ch     16      Default FCB #1
6Ch     20      Default FCB #2
80h     1       Length of command line string
81h     127     Command line string

Note: locations 80h..FFh are used for the default DTA.

Most of the information in the PSP is of little use to a modern MS-DOS assembly language program. Buried in the PSP, however, are a couple of gems that are worth knowing about. Just for completeness, however, we'll take a look at all of the fields in the PSP.

The first field in the PSP contains an int 20h instruction. Int 20h is an obsolete mechanism used to terminate program execution. Back in the early days of DOS v1.0, your program would execute a jmp to this location in order to terminate. Nowadays, of course, we have DOS function 4Ch which is much easier (and safer) than jumping to location zero in the PSP. Therefore, this field is obsolete.

Field number two contains a value which points at the last paragraph allocated to your program By subtracting the address of the PSP from this value, you can determine the amount of memory allocated to your program (and quit if there is insufficient memory available).

The third field is the first of many "holes" left in the PSP by Microsoft. Why they're here is anyone's guess.

The fourth field is a call to the DOS function dispatcher. The purpose of this (now obsolete) DOS calling mechanism was to allow some additional compatibility with CP/M-80 programs. For modern DOS programs, there is absolutely no need to worry about this field.

The next three fields are used to store special addresses during the execution of a program. These fields contain the default terminate vector, break vector, and critical error handler vectors. These are the values normally stored in the interrupt vectors for int 22h, int 23h, and int 24h. By storing a copy of the values in the vectors for these interrupts, you can change these vectors so that they point into your own code. When your program terminates, DOS restores those three vectors from these three fields in the PSP. For more details on these interrupt vectors, please consult the DOS technical reference manual.

The eighth field in the PSP record is another reserved field, currently unavailable for use by your programs.

The ninth field is another real gem. It's the address of the environment strings area. This is a two-byte pointer which contains the segment address of the environment storage area. The environment strings always begin with an offset zero within this segment. The environment string area consists of a sequence of zero-terminated strings. It uses the following format:

string1 0 string2 0 string3 0 ... 0 stringn 0 0

That is, the environment area consists of a list of zero terminated strings, the list itself being terminated by a string of length zero (i.e., a zero all by itself, or two zeros in a row, however you want to look at it). Strings are (usually) placed in the environment area via DOS commands like PATH, SET, etc. Generally, a string in the environment area takes the form

 		name = parameters

For example, the "SET IPATH=C:\ASSEMBLY\INCLUDE" command copies the string "IPATH=C:\ASSEMBLY\INCLUDE" into the environment string storage area.

Many languages scan the environment storage area to find default filename paths and other pieces of default information set up by DOS. Your programs can take advantage of this as well.

The next field in the PSP is another block of reserved storage, currently undefined by DOS.

The 11th field in the PSP is another call to the DOS function dispatcher. Why this call exists (when the one at location 5 in the PSP already exists and nobody really uses either mechanism to call DOS) is an interesting question. In general, this field should be ignored by your programs.

The 12th field is another block of unused bytes in the PSP which should be ignored.

The 13th and 14th fields in the PSP are the default FCBs (File Control Blocks). File control blocks are another archaic data structure carried over from CP/M-80. FCBs are used only with the obsolete DOS v1.0 file handling routines, so they are of little interest to us. We'll ignore these FCBs in the PSP.

Locations 80h through the end of the PSP contain a very important piece of information- the command line parameters typed on the DOS command line along with your program's name. If the following is typed on the DOS command line:

		MYPGM parameter1, parameter2

the following is stored into the command line parameter field:

		23, " parameter1, parameter2", 0Dh

Location 80h contains 2310, the length of the parameters following the program name. Locations 81h through 97h contain the characters making up the parameter string. Location 98h contains a carriage return. Notice that the carriage return character is not figured into the length of the command line string.

Processing the command line string is such an important facet of assembly language programming that this process will be discussed in detail in the next section.

Locations 80h..FFh in the PSP also comprise the default DTA. Therefore, if you don't use DOS function 1Ah to change the DTA and you execute a FIND FIRST FILE, the filename information will be stored starting at location 80h in the PSP.

One important detail we've omitted until now is exactly how you access data in the PSP. Although the PSP is loaded into memory immediately before your program, that doesn't necessarily mean that it appears 100h bytes before your code. Your data segments may have been loaded into memory before your code segments, thereby invalidating this method of locating the PSP. The segment address of the PSP is passed to your program in the ds register. To store the PSP address away in your data segment, your programs should begin with the following code:

                push    ds              ;Save PSP value
                mov     ax, seg DSEG    ;Point DS and ES at our data
                mov     ds, ax                  ; segment.
                mov     es, ax
                pop     PSP             ;Store PSP value into "PSP"
                                        ; variable.
                 .
                 .
                 .

Another way to obtain the PSP address, in DOS 5.0 and later, is to make a DOS call. If you load ah with 51h and execute an int 21h instruction, MS-DOS will return the segment address of the current PSP in the bx register.

There are lots of tricky things you can do with the data in the PSP. Peter Norton's Programmer's Guide to the IBM PC lists all kinds of tricks. Such operations won't be discussed here because they're a little beyond the scope of this manual.

Chapter Thirteen (Part 7)

Table of Content

Chapter Thirteen (Part 9) 

Chapter Thirteen: MS-DOS, PC-BIOS and File I/O (Part 8)
28 SEP 1996