The Art of
ASSEMBLY LANGUAGE PROGRAMMING

Chapter Eight (Part 7)

Table of Content

Chapter Eight (Part 9)

CHAPTER EIGHT:
MASM: DIRECTIVES & PSEUDO-OPCODES (Part 8)
8.14.7 - Macro Functions
8.14.8 - Predefined Macros, Macro Functions, and Symbols
8.14.9 - Macros vs. Text Equates
8.14.10 - Macros: Good and Bad News

8.14.7 Macro Functions

A macro function is a macro whose sole purpose is to return a value for use in the operand field of some other statement. Although there is the obvious parallel between procedures and functions in a high level language and procedural macros and functional macros, the analogy is far from perfect. Macro functions do not let you create sequences of code that emit some instructions that compute a value when the program actually executes. Instead, macro functions simply compute some value at assembly time that MASM can use as an operand.

A good example of a macro function is the Date function. This macro function packs a five bit day, four bit month, and seven bit year value into 16 bits and returns that 16 bit value as the result. If you needed to create an initialized array of dates, you could use code like the following:

DateArray       word    Date(2, 4, 84)
                word    Date(1, 1, 94)
                word    Date(7, 20, 60)
                word    Date(7, 19, 69)
                word    Date(6, 18, 74)
                 .
                 .
                 .

The Date function would pack the data and the word directive would emit the 16 bit packed value for each date to the object code file. You invoke macro functions by using their name where MASM expects a text expression of some sort. If the macro function requires any parameters, you must enclose them within parentheses, just like the parameters to Date, above.

Macro functions look exactly like standard macros with two exceptions: they do not contain any statements that generate code and they return a text value via an operand to the exitm directive. Note that you cannot return a numeric value with a macro function. If you need to return a numeric value, you must first convert it to a text value.

The following macro function implements Date using the 16 bit date format given in Chapter One:

Date            macro   month, day, year
                local   Value
Value           =       (month shl 12) or (day shl 7) or year
                exitm   %Value
                endm

The text expansion operator ("%") is necessary in the operand field of the exitm directive because macro functions always return textual data, not numeric data. The expansion operator converts the numeric value to a string of digits acceptable to exitm.

One minor problem with the code above is that this function returns garbage if the date isn't legal. A better design would generate an error if the input date is illegal. You can use the ".err" directive and conditional assembly to do this. The following implementation of Date checks the month, day, and year values to see if they are somewhat reasonable:

Date            macro   month, day, year
                local   Value

                if      (month gt 12) or (month lt 1) or \
                        (day gt 31) or (day lt 1) or \
                        (year gt 99) (year lt 1)
                .err
                exitm   <0>             ;;Must return something!
                endif

Value           =       (month shl 12) or (day shl 7) or year
                exitm   %Value
                endm

With this version, any attempt to specify a totally outrageous date triggers the assembly of the ".err" directive that forces an error at assembly time.

8.14.8 Predefined Macros, Macro Functions, and Symbols

MASM provides four built-in macros and four corresponding macro functions. In addition, MASM also provides a large number of predefined symbols you can access during assembly. Although you would rarely use these macros, functions, and variables outside of moderately complex macros, they are essential when you do need them.

MASM Predefined Macros
Name operands Example Description
substr string, start, length

Returns: text data
NewStr substr Oldstr, 1, 3 Returns a string consisting of the characters from start to start+length in the string operand. The length operand is optional. If it is not present, MASM returns all characters from position start through the end of the string.
instr start, string, substr

 

Returns: numeric data
Pos instr 2, OldStr, <ax> Searches for "substr" within "string" starting at position "start." The starting value is optional. If it is missing, MASM begins searching for the string from position one. If MASM cannot find the substring within the string operand, it returns the value zero.
sizestr string

 

Returns: numeric data
StrSize sizestr OldStr Returns the size of the string in the operand field.
catstr string, string, ...

 

Returns: text data
NewStr catstr OldStr, <$$> Creates a new string by concatenating each of the strings appearing in the operand field of the catstr macro.

The substr and catstr macros return text data. In some respects, they are similar to the textequ directive since you use them to assign textual data to a symbol at assembly time. The instr and sizestr are similar to the "=" directive insofar as they return a numeric value.

The catstr macro can eliminate the need for the MakeLbl macro found in the ForLp macro. Compare the following version of ForLp to the previous version (see "A Sample Macro to Implement For Loops").

ForLp           macro   LCV, Start, Stop
                local   ForLoop

                ifndef  $$For&LCV&
$$For&LCV&      =       0
                else
$$For&LCV&      =       $$For&LCV& + 1
                endif

                mov     ax, Start
                mov     LCV, ax

; Due to bug in MASM, this won't actually work. The idea is sound, though
; Read on for correct solution. 

ForLoop         textequ @catstr($For&LCV&, %$$For&LCV&)
&ForLoop&:
                mov     ax, LCV
                cmp     ax, Stop
                jgDone  $$Next&LCV&, %$$For&LCV&
                endm

MASM also provides macro function forms for catstr, instr, sizestr, and substr. To differentiate these macro functions from the corresponding predefined macros, MASM uses the names @catstr, @instr, @sizestr, and @substr. The the following equivalences between these operations:

Symbol          catstr  String1, String2, ...
Symbol          textequ @catstr(String1, String2, ...)

Symbol          substr  SomeStr, 1, 5
Symbol          textequ @substr(SomeStr, 1, 5)

Symbol          instr   1, SomeStr, SearchStr
Symbol          =       @substr(1, SomeStr, SearchStr)

Symbol          sizestr SomeStr
Symbol          =       @sizestr(SomeStr)
MASM Predefined Macro Functions
Name Parameters Example
@substr string, start, length Returns: text data ifidn @substr(parm, 1, 4), <[bx]>
@instr start, string, substr Returns: numeric data if @instr(parm,<bx>)
@sizestr string Returns: numeric data byte @sizestr(SomeStr)
@catstr string, string, ... Returns: text data jg @catstr($$Next&LCV&, %$$For&LCV&)

The last example above shows how to get rid of the jgDone and jmpLoop macros in the ForLp macro. A final, improved, version of the ForLp and Next macros, eliminating the three support macros and working around the bug in MASM might look something like the following:

ForLp           macro   LCV, Start, Stop
                local   ForLoop

                ifndef  $$For&LCV&
$$For&LCV&      =       0
                else
$$For&LCV&      =       $$For&LCV& + 1
                endif

                mov     ax, Start
                mov     LCV, ax

ForLoop         textequ @catstr($For&LCV&, %$$For&LCV&)
&ForLoop&:
                mov     ax, LCV
                cmp     ax, Stop
                jg      @catstr($$Next&LCV&, %$$For&LCV&)
                endm

Next            macro   LCV
                local   NextLbl
                inc     LCV
                jmp     @catstr($$For&LCV&, %$$For&LCV&)
NextLbl         textequ @catstr($Next&LCV&, %$$For&LCV&)
&NextLbl&:
                endm

MASM also provides a large number of built in variables that return information about the current assembly. The following table describes these built in assembly time variables.

MASM Predefined Assembly Time Variables
Category Name Description Return result
Date & Time Information @Date Returns the date of assembly. Text value
@Time Returns a string denoting the time of assembly. Text value
Environment Information @CPU Returns a 16 bit value whose bits determine the active processor directive. Specifying the .8086, .186, .286, .386, .486, and .586 directives enable additional instructions in MASM. They also set the corresponding bits in the @cpu variable. Note that MASM sets all the bits for the processors it can handle at any one given time. For example, if you use the .386 directive, MASM sets bits zero, one, two, and three in the @cpu variable. Bit 0 - 8086 instrs permissible.

Bit 1 - 80186 instrs permissible.

Bit 2 - 80286 instrs permissible.

Bit 3- 80386 instrs permissible.

Bit 4- 80486 instrs permissible.

Bit 5- Pentium instrs permissible.

Bit 6- Reserved for 80686 (?).

Bit 7- Protected mode instrs okay.

 

Bit 8- 8087 instrs permissible.

Bit 10- 80287 instrs permissible.

Bit 11- 80386 instrs permissible.

(bit 11 is also set for 80486 and Pentium instr sets).
@Environ @Environ(name) returns the text associated with DOS environment variable name. The parameter must be a text value that evaluates to a valid DOS environment variable name. Text value
@Interface Returns a numeric value denoting the current language type in use. Note that this information is similar to that provided by the opattr attribute.

 

The H.O. bit determines if you are assembling code for MS-DOS/Windows or OS/2.

 

This directive is mainly useful for those using MASM's simplified segment directives. Since this text does not deal with the simplified directives, further discussion of this variable is unwarranted.
Bits 0-2

000- No language type

001- C

010- SYSCALL

011- STDCALL

100- Pascal

101- FORTRAN

110- BASIC

 

Bit 7

0- MS-DOS or Windows

1- OS/2
@Version Returns a numeric value that is the current MASM version number multiplied by 100. For example, MASM 6.11's @version variable returns 611. Numeric value
File Information @FileCur Returns the current source or include file name, including any necessary pathname information. Text value
@FileName Returns the current source file name (base name only, no path information). If in an include file, this variable returns the name of the source file that included the current file. Text value
@Line Returns the current line number in the source file. Numeric value
Segment

Information
@code Returns the name of the current code segment. Text value
@data Returns the name of the current data segment. Text value
@FarData? Returns the name of the current far data segment. Text value
@WordSize Returns two if this is a 16 bit segment, four if this is a 32 bit segment. Numeric value
@CodeSize Returns zero for Tiny, Small, Compact, and Flat models. Returns one for Medium, Large, and Huge models. Numeric value
@DataSize Returns zero for Tiny, Small, Medium, and Flat memory models. Returns one for Compact and Large models. Returns two for Huge model programs. Numeric value
@Model Returns one for Tiny model, two for Small model, three for Compact model, four for Medium model, five for Large model, six for Huge model, and seven for Flag model. Numeric value
@CurSeg Returns the name of the current code segment. Text value
@stack The name of the current stack segment. Text value

Although there is insufficient space to go into detail about the possible uses for each of these variables, a few examples might demonstrate some of the possibilities. Other uses of these variables will appear throughout the text; however, the most impressive uses will be the ones you discover.

The @CPU variable is quite useful if you want to assemble different code sequences in your program for different processors. The section on conditional assembly in this chapter described how you could create a symbol to determine if you are assembling the code for an 80386 and later processor or a stock 8086 processor. The @CPU symbol provides a symbol that will tell you exactly which instructions are allowable at any given point in your program. The following is a rework of that example using the @CPU variable:

                if      @CPU and 100b   ;Need an 80286 or later processor
                shl     ax, 4           ; for this instruction.
                else                    ;Must be 8086 processor.
                mov     cl, 4
                shl     ax, cl
                endif

You can use the @Line directive to put special diagnostic messages in your code. The following code would print an error message including the line number in the source file of the offending assertion, if it detects an error at run-time:

                mov     ax, ErrorFlag
                cmp     ax, 0
                je      NoError
                mov     ax, @Line       ;Load AX with current line #
                call    PrintError      ;Go print error message and Line #
                jmp     Quit            ;Terminate program.
8.14.9 Macros vs. Text Equates

Macros, macro functions, and text equates all substitute text in a program. While there is some overlap between them, they really do serve different purposes in an assembly language program.

Text equates perform a single text substitution on a line. They do not allow any parameters. However, you can replace text anywhere on a line with a text equate. You can expand a text equate in the label, mnemonic, operand, or even the comment field. Furthermore, you can replace multiple fields, even an entire line with a single symbol.

Macro functions are legal in the operand field only. However, you can pass parameters to macro functions making them considerably more general than simple text equates.

Procedural macros let you emit sequences of statements (with text equates you can emit, at most, one statement).

8.14.10 Macros: Good and Bad News

Macros offer considerable convenience. They let you insert several instructions into your source file by simply typing a single command. This can save you an incredible amount of typing when entering huge tables, each line of which contains some bizarre, but repeated calculation. It's useful (in certain cases) for helping make your programs more readable. Few would argue that ForLp I,1,10 is not more readable than the corresponding 80x86 code. Unfortunately, it's easy to get carried away and produce code that is inefficient, hard to read, and hard to maintain.

A lot of so-called "advanced" assembly language programmers get carried away with the idea that they can create their own instructions via macro definitions and they start creating macros for every imaginable function under the sun. The COPY macro presented earlier is a good example. The 80x86 doesn't support a memory to memory move operation. Fine, we'll create a macro that does the job for us. Soon, the assembly language program doesn't look like 80x86 assembly language at all. Instead, a large number of the statements are macro invocations. Now this may be great for the programmer who has created all these macros and intimately understands their operation. To the 80x86 programmer who isn't familiar with those macros, however, it's all gibberish. Maintaining a program someone else wrote, that contains "new" instructions implemented via macros, is a horrible task. Therefore, you should rarely use macros as a device to create new instructions on the 80x86.

Another problem with macros is that they tend to hide side effects. Consider the COPY macro presented earlier. If you encountered a statement of the form COPY VAR1,VAR2 in an assembly language program, you'd think that this was an innocuous statement that copies VAR2 to VAR1. Wrong! It also destroys the current contents of the ax register leaving a copy of the value in VAR2 in the ax register. This macro invocation doesn't make this very clear. Consider the following code sequence:

                mov     ax, 5
                copy    Var2, Var1
                mov     Var1, ax

This code sequence copies Var1 into Var2 and then (supposedly) stores five into Var1. Unfortunately, the COPY macro has wiped out the value in ax (leaving the value originally contained in Var1 alone), so this instruction sequence does not modify Var1 at all!

Another problem with macros is efficiency. Consider the following invocations of the COPY macro:

                copy    Var3, Var1
                copy    Var2, Var1
                copy    Var0, Var1

These three statements generate the code:

                mov     ax, Var1
                mov     Var3, ax
                mov     ax, Var1
                mov     Var2, ax
                mov     ax, Var1
                mov     Var0, ax

Clearly, the last two mov ax,Var1 instructions are superfluous. The ax register already contains a copy of Var1, there is no need to reload ax with this value. Unfortunately, this inefficiency, while perfectly obvious in the expanded code, isn't obvious at all in the macro invocations.

Another problem with macros is complexity. In order to generate efficient code, you can create extremely complex macros using conditional assembly (especially ifb, ifidn, etc.), repeat loops (described a little later), and other directives. Unfortunately, these macros are small programs all on their own. You can have bugs in your macros just as you can have bugs in your assembly language program. And the more complex your macros become, the more likely they'll contain bugs that will, of course, become bugs in your program when invoking the macro.

Overusing macros, especially complex ones, produces hard to read code that is hard to maintain. Despite the enthusiastic claims of those who love macros, the unbridled use of macros within a program generally causes more bugs than it helps to prevent. If you're going to use macros, go easy on them.

There is a good side to macros, however. If you standardize on a set of macros and document all your programs as using these macros, they may help make your programs more readable. Especially if those macros have easily identifiable names. The UCR Standard Library for 80x86 Assembly Language Programmers uses macros for most library calls.

Chapter Eight (Part 7)

Table of Content

Chapter Eight (Part 9)

Chapter Eight: MASM: Directives & Pseudo-Opcodes (Part 8)
26 SEP 1996