Rounding functions:  

   Here is how I have interpreted C/C99 w/ regard to these functions
   and a brief explanation of why one rounding function might be
   useful over the others.  My own opinion which I believe to be
   correct is that the ANSI C committee has made an earnest attempt at
   tieing the C language with the universally accepted binary floating
   point standard IEEE 754.  The addition of all the new rounding
   functions supplies the user with an easy, flexible (and standard)
   means of using directed rounding as specified by IEEE 754 in their
   code.

In the table below we assume: 
 
 1. the IEEE 754 default rounding mode(round to nearest) is set in the FPU:
 2. sizeof(int)=sizeof(long)=4 bytes,sizeof(long long)=sizeof(double)=8 bytes.
 3. we are using both the range and full precision of the extended
 based X87 instruction set when necessary (e.g. the extended range ->
 DBL_MAX+DBL_MAX != INF )
 4. the type double is in the 8 byte double precision format as
 specified by IEEE 754.
 5. U means undefined behavior
 

sample input expression  | rint  |    lrint    |    llrint    |  round  |    lround    |  trunc  |  ceil  | floor  
----------------------------------------------------------------------------------------------------------------
        -1.5             | -2.0  |     -2      |     -2       |  -2.0   |     -2       |  -1.0   | -1.0   | -2.0   
----------------------------------------------------------------------------------------------------------------   
        -0.5             | -0.0  |      0      |      0       |  -1.0   |     -1       |  -0.0   | -0.0   | -1.0   
----------------------------------------------------------------------------------------------------------------  
      1.801439851e16     | 2^^54 | U(overflow) |     2^^54    |         |      U       |  -1.0   | -1.0   | -2.0   
----------------------------------------------------------------------------------------------------------------   


Note that the rounding preserves the sign of 0.

rint, lrint , and llrint:
------------------------

These are unique amongst the rounding functions in that they don't
check or set the rounding mode before rounding.  An example of where
one might improve the performance of their code would be float to int
conversion.

Consider the following simple program.

#include <math.h>
#include <stdio.h>
int main()
{
 float x=sqrt(2.0);
 int i  = x ;
 printf("x=%g i=%i\n",x,i);
 return 0;
}      

an example of the generated code for the statement "int i = x;" is:

 int i  = x ;
     fld        dword ptr [ebp]-0x10         <--- load x onto fp stack
     fnstcw      [ebp]-0x12                  <--- backup control word which contains the current rounding mode
     or         word ptr [ebp]-0x12,0x0c00   <--- set rounding mode to "toward zero" without disturbing the rest
                                                  of the control word
     
     fldcw       [ebp]-0x12                  <--- load the control word with new rounding mode
     fistp      dword ptr [ebp]-0x08         <--- round and store integer i
     and        word ptr [ebp]-0x12,0xf3ff   <--- restore previous control word
     fldcw       [ebp]-0x12                  <--- load previous control word
   

On the other hand consider,

#include <math.h>
#include <stdio.h>
#include <fenv.h>
int main()
{
 
 float x;
 int i ;
 fesetround(FE_TOWARDZERO);
 x=sqrt(2.0);
 i = lrint(x) ;
 printf("x=%g i=%i\n",x,i);
 return 0;
} 

now if we inline
 the lrint routine, the codegen for "i = lrint(x) ;"
is simply:

 i = lrint(x) ;
fld        x
frndint
fistp      i



Of course there is the overhead from setting the rounding mode to
"toward zero" before calling lrint, but this only needs to be
performed once and not every time we convert a float to an int.
Manipulating the control word is an extremely expensive operation,
each instruction taking multiple cycles to complete execution.  The
performance savings here can be significant.

Also, keep in mind that changing the rounding mode must be performed
carefully as it affects the results of ALL floating point operations
in an executing thread (eg. addition and multiplication).

llrint is similar to lrint, but is necessary only when you are
computing with numbers where every significant bit in the floating
point number is integral (e.g. x > pow(2.0,53.0).  This library
assumes at least a 32 bit 386 processor and therefore the 8 byte
integral type long long is not natural to the targetted 4 byte
architecture and requires "extra" instructions to implement the same
operations as would be required for a natural 4 byte integral type
(long and int).

