symbian-qemu-0.9.1-12/qemu-symbian-svp/tcg/README
changeset 1 2fb8b9db1c86
equal deleted inserted replaced
0:ffa851df0825 1:2fb8b9db1c86
       
     1 Tiny Code Generator - Fabrice Bellard.
       
     2 
       
     3 1) Introduction
       
     4 
       
     5 TCG (Tiny Code Generator) began as a generic backend for a C
       
     6 compiler. It was simplified to be used in QEMU. It also has its roots
       
     7 in the QOP code generator written by Paul Brook. 
       
     8 
       
     9 2) Definitions
       
    10 
       
    11 The TCG "target" is the architecture for which we generate the
       
    12 code. It is of course not the same as the "target" of QEMU which is
       
    13 the emulated architecture. As TCG started as a generic C backend used
       
    14 for cross compiling, it is assumed that the TCG target is different
       
    15 from the host, although it is never the case for QEMU.
       
    16 
       
    17 A TCG "function" corresponds to a QEMU Translated Block (TB).
       
    18 
       
    19 A TCG "temporary" is a variable only live in a basic
       
    20 block. Temporaries are allocated explicitly in each function.
       
    21 
       
    22 A TCG "local temporary" is a variable only live in a function. Local
       
    23 temporaries are allocated explicitly in each function.
       
    24 
       
    25 A TCG "global" is a variable which is live in all the functions
       
    26 (equivalent of a C global variable). They are defined before the
       
    27 functions defined. A TCG global can be a memory location (e.g. a QEMU
       
    28 CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
       
    29 or a memory location which is stored in a register outside QEMU TBs
       
    30 (not implemented yet).
       
    31 
       
    32 A TCG "basic block" corresponds to a list of instructions terminated
       
    33 by a branch instruction. 
       
    34 
       
    35 3) Intermediate representation
       
    36 
       
    37 3.1) Introduction
       
    38 
       
    39 TCG instructions operate on variables which are temporaries, local
       
    40 temporaries or globals. TCG instructions and variables are strongly
       
    41 typed. Two types are supported: 32 bit integers and 64 bit
       
    42 integers. Pointers are defined as an alias to 32 bit or 64 bit
       
    43 integers depending on the TCG target word size.
       
    44 
       
    45 Each instruction has a fixed number of output variable operands, input
       
    46 variable operands and always constant operands.
       
    47 
       
    48 The notable exception is the call instruction which has a variable
       
    49 number of outputs and inputs.
       
    50 
       
    51 In the textual form, output operands usually come first, followed by
       
    52 input operands, followed by constant operands. The output type is
       
    53 included in the instruction name. Constants are prefixed with a '$'.
       
    54 
       
    55 add_i32 t0, t1, t2  (t0 <- t1 + t2)
       
    56 
       
    57 3.2) Assumptions
       
    58 
       
    59 * Basic blocks
       
    60 
       
    61 - Basic blocks end after branches (e.g. brcond_i32 instruction),
       
    62   goto_tb and exit_tb instructions.
       
    63 - Basic blocks start after the end of a previous basic block, or at a
       
    64   set_label instruction.
       
    65 
       
    66 After the end of a basic block, the content of temporaries is
       
    67 destroyed, but local temporaries and globals are preserved.
       
    68 
       
    69 * Floating point types are not supported yet
       
    70 
       
    71 * Pointers: depending on the TCG target, pointer size is 32 bit or 64
       
    72   bit. The type TCG_TYPE_PTR is an alias to TCG_TYPE_I32 or
       
    73   TCG_TYPE_I64.
       
    74 
       
    75 * Helpers:
       
    76 
       
    77 Using the tcg_gen_helper_x_y it is possible to call any function
       
    78 taking i32, i64 or pointer types. Before calling an helper, all
       
    79 globals are stored at their canonical location and it is assumed that
       
    80 the function can modify them. In the future, function modifiers will
       
    81 be allowed to tell that the helper does not read or write some globals.
       
    82 
       
    83 On some TCG targets (e.g. x86), several calling conventions are
       
    84 supported.
       
    85 
       
    86 * Branches:
       
    87 
       
    88 Use the instruction 'br' to jump to a label. Use 'jmp' to jump to an
       
    89 explicit address. Conditional branches can only jump to labels.
       
    90 
       
    91 3.3) Code Optimizations
       
    92 
       
    93 When generating instructions, you can count on at least the following
       
    94 optimizations:
       
    95 
       
    96 - Single instructions are simplified, e.g.
       
    97 
       
    98    and_i32 t0, t0, $0xffffffff
       
    99     
       
   100   is suppressed.
       
   101 
       
   102 - A liveness analysis is done at the basic block level. The
       
   103   information is used to suppress moves from a dead variable to
       
   104   another one. It is also used to remove instructions which compute
       
   105   dead results. The later is especially useful for condition code
       
   106   optimization in QEMU.
       
   107 
       
   108   In the following example:
       
   109 
       
   110   add_i32 t0, t1, t2
       
   111   add_i32 t0, t0, $1
       
   112   mov_i32 t0, $1
       
   113 
       
   114   only the last instruction is kept.
       
   115 
       
   116 3.4) Instruction Reference
       
   117 
       
   118 ********* Function call
       
   119 
       
   120 * call <ret> <params> ptr
       
   121 
       
   122 call function 'ptr' (pointer type)
       
   123 
       
   124 <ret> optional 32 bit or 64 bit return value
       
   125 <params> optional 32 bit or 64 bit parameters
       
   126 
       
   127 ********* Jumps/Labels
       
   128 
       
   129 * jmp t0
       
   130 
       
   131 Absolute jump to address t0 (pointer type).
       
   132 
       
   133 * set_label $label
       
   134 
       
   135 Define label 'label' at the current program point.
       
   136 
       
   137 * br $label
       
   138 
       
   139 Jump to label.
       
   140 
       
   141 * brcond_i32/i64 cond, t0, t1, label
       
   142 
       
   143 Conditional jump if t0 cond t1 is true. cond can be:
       
   144     TCG_COND_EQ
       
   145     TCG_COND_NE
       
   146     TCG_COND_LT /* signed */
       
   147     TCG_COND_GE /* signed */
       
   148     TCG_COND_LE /* signed */
       
   149     TCG_COND_GT /* signed */
       
   150     TCG_COND_LTU /* unsigned */
       
   151     TCG_COND_GEU /* unsigned */
       
   152     TCG_COND_LEU /* unsigned */
       
   153     TCG_COND_GTU /* unsigned */
       
   154 
       
   155 ********* Arithmetic
       
   156 
       
   157 * add_i32/i64 t0, t1, t2
       
   158 
       
   159 t0=t1+t2
       
   160 
       
   161 * sub_i32/i64 t0, t1, t2
       
   162 
       
   163 t0=t1-t2
       
   164 
       
   165 * neg_i32/i64 t0, t1
       
   166 
       
   167 t0=-t1 (two's complement)
       
   168 
       
   169 * mul_i32/i64 t0, t1, t2
       
   170 
       
   171 t0=t1*t2
       
   172 
       
   173 * div_i32/i64 t0, t1, t2
       
   174 
       
   175 t0=t1/t2 (signed). Undefined behavior if division by zero or overflow.
       
   176 
       
   177 * divu_i32/i64 t0, t1, t2
       
   178 
       
   179 t0=t1/t2 (unsigned). Undefined behavior if division by zero.
       
   180 
       
   181 * rem_i32/i64 t0, t1, t2
       
   182 
       
   183 t0=t1%t2 (signed). Undefined behavior if division by zero or overflow.
       
   184 
       
   185 * remu_i32/i64 t0, t1, t2
       
   186 
       
   187 t0=t1%t2 (unsigned). Undefined behavior if division by zero.
       
   188 
       
   189 ********* Logical
       
   190 
       
   191 * and_i32/i64 t0, t1, t2
       
   192 
       
   193 t0=t1&t2
       
   194 
       
   195 * or_i32/i64 t0, t1, t2
       
   196 
       
   197 t0=t1|t2
       
   198 
       
   199 * xor_i32/i64 t0, t1, t2
       
   200 
       
   201 t0=t1^t2
       
   202 
       
   203 * not_i32/i64 t0, t1
       
   204 
       
   205 t0=~t1
       
   206 
       
   207 * andc_i32/i64 t0, t1, t2
       
   208 
       
   209 t0=t1&~t2
       
   210 
       
   211 * eqv_i32/i64 t0, t1, t2
       
   212 
       
   213 t0=~(t1^t2)
       
   214 
       
   215 * nand_i32/i64 t0, t1, t2
       
   216 
       
   217 t0=~(t1&t2)
       
   218 
       
   219 * nor_i32/i64 t0, t1, t2
       
   220 
       
   221 t0=~(t1|t2)
       
   222 
       
   223 * orc_i32/i64 t0, t1, t2
       
   224 
       
   225 t0=t1|~t2
       
   226 
       
   227 ********* Shifts/Rotates
       
   228 
       
   229 * shl_i32/i64 t0, t1, t2
       
   230 
       
   231 t0=t1 << t2. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
       
   232 
       
   233 * shr_i32/i64 t0, t1, t2
       
   234 
       
   235 t0=t1 >> t2 (unsigned). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
       
   236 
       
   237 * sar_i32/i64 t0, t1, t2
       
   238 
       
   239 t0=t1 >> t2 (signed). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
       
   240 
       
   241 * rotl_i32/i64 t0, t1, t2
       
   242 
       
   243 Rotation of t2 bits to the left. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
       
   244 
       
   245 * rotr_i32/i64 t0, t1, t2
       
   246 
       
   247 Rotation of t2 bits to the right. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
       
   248 
       
   249 ********* Misc
       
   250 
       
   251 * mov_i32/i64 t0, t1
       
   252 
       
   253 t0 = t1
       
   254 
       
   255 Move t1 to t0 (both operands must have the same type).
       
   256 
       
   257 * ext8s_i32/i64 t0, t1
       
   258 ext8u_i32/i64 t0, t1
       
   259 ext16s_i32/i64 t0, t1
       
   260 ext16u_i32/i64 t0, t1
       
   261 ext32s_i64 t0, t1
       
   262 ext32u_i64 t0, t1
       
   263 
       
   264 8, 16 or 32 bit sign/zero extension (both operands must have the same type)
       
   265 
       
   266 * bswap16_i32 t0, t1
       
   267 
       
   268 16 bit byte swap on a 32 bit value. The two high order bytes must be set
       
   269 to zero.
       
   270 
       
   271 * bswap_i32 t0, t1
       
   272 
       
   273 32 bit byte swap
       
   274 
       
   275 * bswap_i64 t0, t1
       
   276 
       
   277 64 bit byte swap
       
   278 
       
   279 * discard_i32/i64 t0
       
   280 
       
   281 Indicate that the value of t0 won't be used later. It is useful to
       
   282 force dead code elimination.
       
   283 
       
   284 ********* Type conversions
       
   285 
       
   286 * ext_i32_i64 t0, t1
       
   287 Convert t1 (32 bit) to t0 (64 bit) and does sign extension
       
   288 
       
   289 * extu_i32_i64 t0, t1
       
   290 Convert t1 (32 bit) to t0 (64 bit) and does zero extension
       
   291 
       
   292 * trunc_i64_i32 t0, t1
       
   293 Truncate t1 (64 bit) to t0 (32 bit)
       
   294 
       
   295 * concat_i32_i64 t0, t1, t2
       
   296 Construct t0 (64-bit) taking the low half from t1 (32 bit) and the high half
       
   297 from t2 (32 bit).
       
   298 
       
   299 * concat32_i64 t0, t1, t2
       
   300 Construct t0 (64-bit) taking the low half from t1 (64 bit) and the high half
       
   301 from t2 (64 bit).
       
   302 
       
   303 ********* Load/Store
       
   304 
       
   305 * ld_i32/i64 t0, t1, offset
       
   306 ld8s_i32/i64 t0, t1, offset
       
   307 ld8u_i32/i64 t0, t1, offset
       
   308 ld16s_i32/i64 t0, t1, offset
       
   309 ld16u_i32/i64 t0, t1, offset
       
   310 ld32s_i64 t0, t1, offset
       
   311 ld32u_i64 t0, t1, offset
       
   312 
       
   313 t0 = read(t1 + offset)
       
   314 Load 8, 16, 32 or 64 bits with or without sign extension from host memory. 
       
   315 offset must be a constant.
       
   316 
       
   317 * st_i32/i64 t0, t1, offset
       
   318 st8_i32/i64 t0, t1, offset
       
   319 st16_i32/i64 t0, t1, offset
       
   320 st32_i64 t0, t1, offset
       
   321 
       
   322 write(t0, t1 + offset)
       
   323 Write 8, 16, 32 or 64 bits to host memory.
       
   324 
       
   325 ********* QEMU specific operations
       
   326 
       
   327 * tb_exit t0
       
   328 
       
   329 Exit the current TB and return the value t0 (word type).
       
   330 
       
   331 * goto_tb index
       
   332 
       
   333 Exit the current TB and jump to the TB index 'index' (constant) if the
       
   334 current TB was linked to this TB. Otherwise execute the next
       
   335 instructions.
       
   336 
       
   337 * qemu_ld_i32/i64 t0, t1, flags
       
   338 qemu_ld8u_i32/i64 t0, t1, flags
       
   339 qemu_ld8s_i32/i64 t0, t1, flags
       
   340 qemu_ld16u_i32/i64 t0, t1, flags
       
   341 qemu_ld16s_i32/i64 t0, t1, flags
       
   342 qemu_ld32u_i64 t0, t1, flags
       
   343 qemu_ld32s_i64 t0, t1, flags
       
   344 
       
   345 Load data at the QEMU CPU address t1 into t0. t1 has the QEMU CPU
       
   346 address type. 'flags' contains the QEMU memory index (selects user or
       
   347 kernel access) for example.
       
   348 
       
   349 * qemu_st_i32/i64 t0, t1, flags
       
   350 qemu_st8_i32/i64 t0, t1, flags
       
   351 qemu_st16_i32/i64 t0, t1, flags
       
   352 qemu_st32_i64 t0, t1, flags
       
   353 
       
   354 Store the data t0 at the QEMU CPU Address t1. t1 has the QEMU CPU
       
   355 address type. 'flags' contains the QEMU memory index (selects user or
       
   356 kernel access) for example.
       
   357 
       
   358 Note 1: Some shortcuts are defined when the last operand is known to be
       
   359 a constant (e.g. addi for add, movi for mov).
       
   360 
       
   361 Note 2: When using TCG, the opcodes must never be generated directly
       
   362 as some of them may not be available as "real" opcodes. Always use the
       
   363 function tcg_gen_xxx(args).
       
   364 
       
   365 4) Backend
       
   366 
       
   367 tcg-target.h contains the target specific definitions. tcg-target.c
       
   368 contains the target specific code.
       
   369 
       
   370 4.1) Assumptions
       
   371 
       
   372 The target word size (TCG_TARGET_REG_BITS) is expected to be 32 bit or
       
   373 64 bit. It is expected that the pointer has the same size as the word.
       
   374 
       
   375 On a 32 bit target, all 64 bit operations are converted to 32 bits. A
       
   376 few specific operations must be implemented to allow it (see add2_i32,
       
   377 sub2_i32, brcond2_i32).
       
   378 
       
   379 Floating point operations are not supported in this version. A
       
   380 previous incarnation of the code generator had full support of them,
       
   381 but it is better to concentrate on integer operations first.
       
   382 
       
   383 On a 64 bit target, no assumption is made in TCG about the storage of
       
   384 the 32 bit values in 64 bit registers.
       
   385 
       
   386 4.2) Constraints
       
   387 
       
   388 GCC like constraints are used to define the constraints of every
       
   389 instruction. Memory constraints are not supported in this
       
   390 version. Aliases are specified in the input operands as for GCC.
       
   391 
       
   392 The same register may be used for both an input and an output, even when
       
   393 they are not explicitly aliased.  If an op expands to multiple target
       
   394 instructions then care must be taken to avoid clobbering input values.
       
   395 GCC style "early clobber" outputs are not currently supported.
       
   396 
       
   397 A target can define specific register or constant constraints. If an
       
   398 operation uses a constant input constraint which does not allow all
       
   399 constants, it must also accept registers in order to have a fallback.
       
   400 
       
   401 The movi_i32 and movi_i64 operations must accept any constants.
       
   402 
       
   403 The mov_i32 and mov_i64 operations must accept any registers of the
       
   404 same type.
       
   405 
       
   406 The ld/st instructions must accept signed 32 bit constant offsets. It
       
   407 can be implemented by reserving a specific register to compute the
       
   408 address if the offset is too big.
       
   409 
       
   410 The ld/st instructions must accept any destination (ld) or source (st)
       
   411 register.
       
   412 
       
   413 4.3) Function call assumptions
       
   414 
       
   415 - The only supported types for parameters and return value are: 32 and
       
   416   64 bit integers and pointer.
       
   417 - The stack grows downwards.
       
   418 - The first N parameters are passed in registers.
       
   419 - The next parameters are passed on the stack by storing them as words.
       
   420 - Some registers are clobbered during the call. 
       
   421 - The function can return 0 or 1 value in registers. On a 32 bit
       
   422   target, functions must be able to return 2 values in registers for
       
   423   64 bit return type.
       
   424 
       
   425 5) Recommended coding rules for best performance
       
   426 
       
   427 - Use globals to represent the parts of the QEMU CPU state which are
       
   428   often modified, e.g. the integer registers and the condition
       
   429   codes. TCG will be able to use host registers to store them.
       
   430 
       
   431 - Avoid globals stored in fixed registers. They must be used only to
       
   432   store the pointer to the CPU state and possibly to store a pointer
       
   433   to a register window.
       
   434 
       
   435 - Use temporaries. Use local temporaries only when really needed,
       
   436   e.g. when you need to use a value after a jump. Local temporaries
       
   437   introduce a performance hit in the current TCG implementation: their
       
   438   content is saved to memory at end of each basic block.
       
   439 
       
   440 - Free temporaries and local temporaries when they are no longer used
       
   441   (tcg_temp_free). Since tcg_const_x() also creates a temporary, you
       
   442   should free it after it is used. Freeing temporaries does not yield
       
   443   a better generated code, but it reduces the memory usage of TCG and
       
   444   the speed of the translation.
       
   445 
       
   446 - Don't hesitate to use helpers for complicated or seldom used target
       
   447   intructions. There is little performance advantage in using TCG to
       
   448   implement target instructions taking more than about twenty TCG
       
   449   instructions.
       
   450 
       
   451 - Use the 'discard' instruction if you know that TCG won't be able to
       
   452   prove that a given global is "dead" at a given program point. The
       
   453   x86 target uses it to improve the condition codes optimisation.