FLOATING POINT CALCULATIONS IN ASSEMBLY -Sanchit Karve born2c0de@hotmail.com -------------------------------------------------------------------------------- I. REQUIREMENTS --> A Simple Working Knowlegde of Mathematical Operators in C --> Working Knowledge of Assembly --> Any C Compiler. Borland C++ 5.02 is used here --> Any Disassembler. IDA Pro 4.1 is used here II. WHAT THIS COVERS or INTRODUCTION Every code digger analyses every commercial program and tries to find how a particular job is done eg. how the program saves a file etc. In the earlier stages of computing every code digger would stay away from floating point arithmetic since not many processors supported it then.But now things are different.Floating point math is as powerful and fast as interger arithmetic. Developers have realised that only a selected few have knowledge of how float variables work in Assembly.So wherever possible they use float and double variables so that their analysis becomes difficult.Hence my objective will be to help you understand how Floating point arithmetic works in Assembly and to explain the instruction set for float calculations. This Article is divided into 3 segments namely: 1) THE BASICS 2) Passing Float Arguments to Functions 3) Floating Point Instructions 4) Returning Float Values via Stack through Functions So you can jump onto any section incase you are interested in just one of them or just read through the whole thing if you wanna be a floating point master. III. THE BASICS Every Float Argument has to be pushed on the co-processor stack or the Floating Point Unit Stack (FPU). Hence every Floating point instruction is preceded by a 'F'. Usually every float operation starts with a FLD INSTRUCTION which "LOADS A FLOAT NUMBER ON TOP OF THE FPU STACK". Then it can be stored in a variable with the help of the FST and FSTP Instruction which is explained later in this Article. That's the first part. Another thing we must remember that both the double and float data types use the same instruction set.Now how do we differentiate whether the number used is a float or a double.In such a case we have to sink our teeth in the machine code of the program.Depending on the machine code we can determine whether a float or double is being manipulated and this will be shown a little later. IV. PASSING FLOAT ARGUMENTS TO FUNCTIONS Have a look at this example: #include void func(float f,double d) { printf("f=%f\n",f); printf("d=%f\n",d); } void main() { float ff=22.22f; double dd=11.11; func(ff,dd); } HERE IS THE DISASSEMBLED LISTING OF THE EXAMPLE func proc near var_8 = qword ptr -8 arg_0 = dword ptr 8 arg_4 = byte ptr 0Ch arg_8 = dword ptr 10h push ebp ;Original Value of ebp is stored mov ebp, esp ; Stack Frame is Opened fld [ebp+arg_0] ; The First Argument is pushed on top of the FPU Stack. add esp, 0FFFFFFF8h ; 8 bytes is allocated on the stack for the local variable. fstp [esp+8+var_8] ; var_8 is located at (esp - 8). So esp + 8 - 8 = esp. ; Hence the Float Value is stored in var_8 and popped off the FPU Stack push offset aFF ; "f=%f\n" call _printf add esp, 0Ch ; 12 bytes popped off the stack push [ebp+arg_8] ; The Last Half of the double data is pushed push dword ptr [ebp+arg_4] ; The First Half of the double data is pushed push offset aDF ; "d=%f\n" call _printf add esp, 0Ch ; 12 bytes popped from stack pop ebp ; Value of ebp Restored retn ; Return to main() func endp ; int __cdecl main(int argc,const char **argv,const char *envp) ; Attributes: bp-based frame _main proc near ; DATA XREF: .data:0040A0B8o var_dbl_1 = dword ptr -0Ch var_dbl_2 = dword ptr -8 var_float = dword ptr -4 argc = dword ptr 8 ; COMMAND-LINE argv = dword ptr 0Ch ; ARGUMENTS envp = dword ptr 10h ; ENVIRONMENT VARIABLES push ebp ; Original Value of ebp saved mov ebp, esp ; Stack Frame Opened add esp, 0FFFFFFF4h ; 12 bytes cleared from stack mov [ebp+var_float], 41B1C28Fh ; 22.22 is stored in var_float variable mov [ebp+var_dbl_1], 0EB851EB8h ; The last half of the double is assigned mov [ebp+var_dbl_2], 40263851h ; The First Half of the double is assigned push [ebp+var_dbl_2] ; Second Half Pushed push [ebp+var_dbl_1] ; First Half Pushed push [ebp+var_float] ; Float Value pushed call func ; func() called add esp, 0Ch ; 12 bytes freed from stack mov esp, ebp ; Stack Frame Closed pop ebp ; Value of ebp restored retn _main endp I have included comments on almost every line which in Assembly starts after a semicolon ';' and ends after the End of Line (EOL).However the main explanation is here. Looking at the listing above we can see that a simple PUSH instruction is used to pass Float and Double Types to Functions.They can also be passed via general purpose registers.They can also be passed by the registers of the FPU Stack.The 80x87 Coprocessor has eight 80-bit registers called ST(0),ST(1),ST(2).....ST(7). When we say that a value is on the top of the FPU Stack it also means that it is located in the ST(0) Register. The Contents of ST(1......7) are located immediately below the top of the FPU Stack. When the Float Value needs to be stored or manipulated they are first pushed on the top of the FPU Stack using the FLD instruction. To store a float value in a variable the FST instruction is used. Here the FSTP Instruction is used and the value on the top of the FPU Stack is popped after the value is assigned to the variable. V. FLOATING POINT INSTRUCTIONS Given Below is a list of the most frequently encountered float instructions. -------------------------------------------------------------------------------- Instruction Purpose -------------------------------------------------------------------------------- FLD [source] Pushes a Float Number from the source onto the top of the FPU Stack. FST [destination] Copies a Float Number from the top of the FPU Stack into the destination. FSTP [destination] Pops a Float Number from the top of the FPU Stack into the destination. FLDZ Pushes +0.0 on top of FPU Stack FLD1 Pushes +1.0 on top of FPU Stack FLDPI Pushes PI on the top of FPU Stack FILD [source] Pushes an integer from the source to the top of the FPU Stack. FIST [destination] Copies an integer from the top of the FPU Stack to the destination. FISTP [destination] Pops an integer from the top of FPU Stack into the destination. FCHS Compliments the sign-bit of a float value located on the top of the FPU Stack or ST(0) Register. FNOP Performs no FPU Operation.[It's a 2 byte instruction unlike that of NOP which is a 1 byte instruction.] FABS Replaces the float value on the top of the stack with it's absolute value. FADD [operand] Adds the value of the operand with the value located on the top of the FPU Stack and store the result on the top of the FPU Stack. FCOS/FSIN Replaces the value on the top of the FPU Stack with it's cosine/sine value. FDIV [operand] Divide the value on the top of the FPU Stack with the operand and store the result on the top of FPU Stack. FMUL [operand] Multiply the value on the top of the FPU Stack with the operand and store the result on top of FPU Stack. FSUB [operand] Subtract operand value from the value on top of FPU Stack and store the result on top of FPU Stack. FXCHST (index) Exchanges values between top of FPU Stack and the ST(index) register. FCOM Compares the float value located on top if FPU Stack with the operand located in memory or the FPU Stack. FCOMP Same as FCOM but pops the float value from the top of the FPU Stack. FNSTSW AX Store FPU Status Word in AX. {Used for Conditions.} -------------------------------------------------------------------------------- There are many more float instructions but these are the prominent ones.If you want to learn about the others you can refer Volume 2 of Intel's Software Developers Manual ie."Instruction Set Reference" V.I IS FADD == FADD ? While going through disassembled source code we may encounter instructions such as FADD or FSUB and we may wonder whether it's operating on a double or a float. In such a case we have to look up it's machine code instruction.Let's consider this example so you'll understand what I mean. #include void main() { double d=11.11; float f=2.2; printf("d+f=%f and f+d=%f\n",(d+f),(f+d)); printf("d-f=%f and f-d=%f\n",(d-f),(f-d)); printf("d*f=%f and f*d=%f\n",(d*f),(f*d)); printf("d/f=%f and f/d=%f\n",(d/f),(f/d)); } Now this example generates the following code: ; int __cdecl main(int argc,const char **argv,const char *envp) ; Attributes: bp-based frame _main proc near ; DATA XREF: .data:0040A0B8o var_14 = qword ptr -14h var_C = dword ptr -0Ch var_8 = qword ptr -8 argc = dword ptr 8 argv = dword ptr 0Ch envp = dword ptr 10h 55 push ebp 8B EC mov ebp, esp 83 C4+ add esp, 0FFFFFFF4h ; 12 bytes Allocated on Stack C7 45+ mov dword ptr [ebp+var_8], 0EB851EB8h C7 45+ mov dword ptr [ebp+var_8+4], 40263851h ; Double Stored in var_8 C7 45+ mov [ebp+var_C], 400CCCCDh ; Float Stored in var_C D9 45+ fld [ebp+var_C] ; Float Stored in ST(0) Register DC 45+ fadd [ebp+var_8] ; Add the double at var_8 to float at ST(0) and store ; the result in ST(0) Coprocessor Register. 83 C4+ add esp, 0FFFFFFF8h ; 8 bytes allocated DD 1C+ fstp [esp] ; Value in ST(0) Register is popped into the 8 bytes allocated on ; the CPU Stack. Since 8 bytes are being used the result is a DOUBLE. DD 45+ fld [ebp+var_8] ; The Double value located at var_8 is pushed into ST(0) Register. D8 45+ fadd [ebp+var_C] ; The float value is added to double value at ST(0) and the result ; is stored in ST(0). 83 C4+ add esp, 0FFFFFFF8h ; 8 more bytes are allocated DD 1C+ fstp [esp] ; The Result of the Addition at ST(0) is stored in the 8 bytes ; allocated in the CPU stack. Again the result is a double. 68 E8+ push offset aDFFAndFDF ; The Format "d+f=%f and f+d=%f\n" is pushed on the CPU Stack E8 CB+ call _printf ; The two addition results are displayed on the screen. 83 C4+ add esp, 14h ; 0x14 bytes are freed from the stack. ; two 8-byte doubles + one 4 byte offset = 0x14 bytes D9 45+ fld [ebp+var_C] ; The Float value is loaded in ST(0) DC 65+ fsub [ebp+var_8] ; Subtracts double value at var_8 from ST(0) and place the result in ; ST(0). 83 C4+ add esp, 0FFFFFFF8h ; 8 bytes for the result cleared DD 1C+ fstp [esp] ; The Result of the subtraction is popped on top of the CPU Stack ; occupying 8 bytes allocated for it. DD 45+ fld [ebp+var_8] ; Load the Double at var_8 into ST(0). D8 65+ fsub [ebp+var_C] ; Subtract float value at var_C from ST(0) and place the result in ; ST(0) 83 C4+ add esp, 0FFFFFFF8h ; 8 bytes for the result freed DD 1C+ fstp [esp] ; Result stored on top of CPU Stack occupying 8 bytes previously ; allocated by the add instruction. 68 FB+ push offset aDFFAndFDF_0 ; Format "d-f=%f and f-d=%f\n" is pushed E8 A6+ call _printf ; Both Results are Displayed. 83 C4+ add esp, 14h ; 0x14 bytes are freed as explained above D9 45+ fld [ebp+var_C] ; Float Value located at var_C loaded into ST(0) Register. DC 4D+ fmul [ebp+var_8] ; Multiply ST(0) with double value in var_8 and store the result in ; ST(0). 83 C4+ add esp, 0FFFFFFF8h ; 8 bytes for the result freed DD 1C+ fstp [esp] ; The Result of the Multiplication is popped onto the top of the ; stack occupying 8 bytes previously allocated by the add instruction DD 45+ fld [ebp+var_8] ; Load the Double value into ST(0). D8 4D+ fmul [ebp+var_C] ; Multiply ST(0) by float value in var_C and store result in ST(0). 83 C4+ add esp, 0FFFFFFF8h ; Free 8 bytes DD 1C+ fstp [esp] ; Pop the result from ST(0) into the top of the stack occupying 8 ; bytes. 68 0E+ push offset aDFFAndFDF_1 ; The format "d*f=%f and f*d=%f\n" is pushed E8 81+ call _printf ; The Result is displayed on screen. 83 C4+ add esp, 14h ; 14 bytes are popped off the stack. D9 45+ fld [ebp+var_C] ; Load the Float Value into ST(0) DC 75+ fdiv [ebp+var_8] ; Divide ST(0) by the double value and store result in ST(0) 83 C4+ add esp, 0FFFFFFF8h ; Make space for result DD 1C+ fstp [esp] ; Result stored on stack DD 45+ fld [ebp+var_8] ; Load Double into ST(0). D8 75+ fdiv [ebp+var_C] ; Divide ST(0) by Float Value and store result in ST(0). 83 C4+ add esp, 0FFFFFFF8h ; Make space for result DD 1C+ fstp [esp] ; Pop result on CPU Stack 68 21+ push offset aDFFAndFDF_2 ; Format "d/f=%f and f/d=%f\n" is pushed E8 5C+ call _printf ; and is displayed on screen 83 C4+ add esp, 14h ; 0x14 bytes deallocated off the Stack. 8B E5 mov esp, ebp ; Stack Frame Closed 5D pop ebp ; Original Value of ebp Restored C3 retn _main endp On the left of the Assembly code is given two bytes of machine code. Unlike instructions like add,sub,mov etc. which have the same machine code everywhere instructions like FADD,FSUB,FDIV,FMUL and FLD have different machine code depending on the data they act on.Each of the above instructions mentioned have been called twice and you can see that they differ from data-type to data-type. Hence I have also made a table that will help us distinguish which instruction is being called.You can modify this program to work on only float values and on disassembling you will find that the result of manipulating float numbers always results in a double value.So if you are trying to save code size by using float values for mathematical problems i'd suggest you to use double data type since in any case the result will be converted to a double.Infact if you are using a double the processor need not upcast the float into a double and you will save a lot of CPU clock cycles. -------------------------------------------------------------------------------- FIRST BYTE OF COMMON FLOAT INSTRUCTIONS DEPENDING ON DATA TYPE -------------------------------------------------------------------------------- Instruction DATA TYPE Float Double -------------------------------------------------------------------------------- FLD 0xD9 0xDD FSTP 0xD9 0xDD FST 0xD9 0xDD FADD 0xD8 0xDC FADDP 0xDE 0xDA FSUB 0xD8 0xDC FDIV 0xD8 0xDC FMUL 0xD8 0xDC FCOM 0xD8 0xDC FCOMP 0xD8 0xDC -------------------------------------------------------------------------------- Now that you have seen how basic math is performed on float values,let's move on to another program that will include a few more instructions. Here is the program: #include void main() { int i=16; float f=6.6f; printf("i+f=%d\n",(i+f)); printf("-f=%f\n",-f); float ff=16.16f; if(f==ff) printf("f==ff\n"); else printf("f!=ff\n"); } It's Disassembled Listing is as Follows: ; int __cdecl main(int argc,const char **argv,const char *envp) ; Attributes: bp-based frame _main proc near var_C = dword ptr -0Ch var_8 = dword ptr -8 var_4 = dword ptr -4 argc = dword ptr 8 argv = dword ptr 0Ch envp = dword ptr 10h push ebp mov ebp, esp add esp, 0FFFFFFF4h ; 12 bytes are allocated on the stack. mov eax, 10h ; EAX is set to 16 mov [ebp+var_4], 40D33333h ; var_4 contains a Float Value mov [ebp+var_C], eax ; Now var_C contains an Integer fild [ebp+var_C] ; Integer 16.0 is loaded in ST(0) fadd [ebp+var_4] ; ST(0) is added with float in var_4 and the result is stored in ST(0) add esp, 0FFFFFFF8h ; 8 bytes are allocated for the result of double type. fstp [esp] ; The Resulting Double is stored on top of CPU Stack occupying 8 bytes. push offset aIFD ; Format "i+f=%d\n" is pushed call _printf ; and the result is displayed add esp, 0Ch ; 12 bytes are freed from the stack. fld [ebp+var_4] ; The Float is loaded in ST(0) Register. fchs ; It's sign bit is inverted and the result is stored in ST(0) add esp, 0FFFFFFF8h ; 8 bytes for the resulting double is allocated on the CPU Stack fstp [esp] ; Result is pushed on the CPU Stack and popped from FPU Stack push offset aFF ; Format "-f=%f\n" is pushed call _printf ; And displayed on screen add esp, 0Ch ; 12 bytes freed from stack mov [ebp+var_8], 418147AEh ; Another float of value 16.16 is stored in var_8 fld [ebp+var_4] ; The previous float is loaded in ST(0) fcomp [ebp+var_8] ; Compares ST(0) with float in var_8 and pop register stack. fnstsw ax ; Store FPU Status Word in AX sahf ; Loads SF,ZF,AF,PF and CF Flags into the EFLAGS Register Values from the ; corresponding bits in the AH Register ie.(bits 7,6,4,2 respectively) jnz short not_equal ; Jump if ZERO_FLAG is ZERO to not_equal push offset aFFf ; Format "f==ff\n" is pushed call _printf ; And Displayed pop ecx ; An equivalent of disallocating 4 bytes on the stack jmp short end_condition ; Unconditional Jump to end_condition not_equal: push offset aFFf_0 ; Format "f!=ff\n" is pushed call _printf ; and displayed pop ecx ; 4 bytes popped off from stack end_condition: mov esp, ebp ; Stack Frame Closed pop ebp ; Original Value of ebp restored retn _main endp As you saw that every float value manipulation results in a double value even if an integer is added to it.The FCOMP Mechanism is slightly tricky.There are 4 condition code flags in the FPU. Here is the table by which we can understand the changes in the condition code flags when a FCOMP is used -------------------------------------------------------------------------------- THE 3 CONDITION CODE FLAGS MODIFIED BY FCOMP -------------------------------------------------------------------------------- CONDITION C3 C2 C0 -------------------------------------------------------------------------------- ST(0) > [source] 0 0 0 ST(0) < [source] 0 0 1 ST(0) = [source] 1 0 0 -------------------------------------------------------------------------------- The FST Register composes of 4 condition flags. And then the FST Value is transferred to AH Register. Then using SAHF instruction the CPU Flags are modified according to the corresponding bits in the AH Register(as shown above). Then and only then can a conditional jump take place on a float condition. VI. RETURNING FLOAT VALUES VIA STACK THROUGH FUNCTIONS Consider this program: #include template T ret(T a,T b) { return (a+b); } void main() { float f1=1.1f,f2=2.2f; double d1=3.3,d2=4.4; printf("f1 + f2 = %f\n",ret(f1,f2)); printf("d1 + d2 = %f\n",ret(d1,d2)); } To save space I have used function templates since the body of the function is the same for both data-types.If you don't know Function Templates Yet, refer to the Function Template Tutorial at: http://www.programmers-corner.com/viewTutorial.php?ID=2 Here is it's disassembled listing: ; int __cdecl main(int argc,const char **argv,const char *envp) ; Attributes: bp-based frame _main proc near ; DATA XREF: .data:0040A0B8o var_18 = dword ptr -18h var_14 = dword ptr -14h var_10 = dword ptr -10h var_C = dword ptr -0Ch var_8 = dword ptr -8 var_4 = dword ptr -4 argc = dword ptr 8 argv = dword ptr 0Ch envp = dword ptr 10h push ebp mov ebp, esp add esp, 0FFFFFFE8h ; 24 bytes allocated on the CPU Stack mov [ebp+var_4], 3F8CCCCDh ; Float stored at var_4 (1.1) mov [ebp+var_8], 400CCCCDh ; Float stored at var_8 (2.2) mov [ebp+var_10], 66666666h mov [ebp+var_C], 400A6666h ; Double Stored mov [ebp+var_18], 9999999Ah mov [ebp+var_14], 40119999h ; Double Stored push [ebp+var_8] push [ebp+var_4] ; Floats are passed to ret_float function call ret_float add esp, 8 add esp, 0FFFFFFF8h ; char fstp [esp] ; Result popped from ST(0) onto top of CPU Stack. ; Imitates a push instruction push offset aF1F2F ; __va_args call _printf add esp, 0Ch push [ebp+var_14] push [ebp+var_18] push [ebp+var_C] push [ebp+var_10] call ret_double add esp, 10h add esp, 0FFFFFFF8h ; char fstp [esp] ; Result popped from ST(0) onto top of CPU Stack push offset aD1D2F ; __va_args call _printf add esp, 0Ch mov esp, ebp pop ebp retn _main endp ret_float proc near ; CODE XREF: _main+36p arg_0 = dword ptr 8 arg_4 = dword ptr 0Ch push ebp mov ebp, esp fld [ebp+arg_0] fadd [ebp+arg_4] ; Result left on top of FPU Stack Itself pop ebp retn ret_float endp ret_double proc near ; CODE XREF: _main+5Dp arg_0 = qword ptr 8 arg_8 = qword ptr 10h push ebp mov ebp, esp fld [ebp+arg_0] fadd [ebp+arg_8] ; Result left in ST(0) or top of FPU Stack itself pop ebp retn ret_double endp So when a float has to be returned, instead of placing it in the EAX Register the value to be returned is kept in ST(0) or on the top of the FPU Stack.If the returned value is required just once the FSTP instruction is used, otherwise FST Instruction is used. Notice that even though the source code had just one templated function the actual code has different versions of the same function for different data types. This is the end of the Article. Hope you understand how float and double value calculations are done.If you want to learn more try using various functions included in the header file eg. log10,sin,cos etc. and you can see the other instructions at work. If you have any question on Float Interpretation you can mail me at: born2c0de@hotmail.com