GCC

[|GCC and Portability]

GCC itself aims to be portable to any machine where int is at least a 32-bit type. It aims to target machines with a flat (non-segmented) byte addressed data address space (the code address space can be separate). Target ABIs may have 8, 16, 32 or 64-bit int type. charcan be wider than 8 bits.

GCC gets most of the information about the target machine from a machine description which gives an algebraic formula for each of the machine's instructions. This is a very clean way to describe the target. But when the compiler needs information that is difficult to express in this fashion, ad-hoc parameters have been defined for machine descriptions. The purpose of portability is to reduce the total work needed on the compiler; it was not of interest for its own sake.

GCC does not contain machine dependent code, but it does contain code that depends on machine parameters such as endianness (whether the most significant byte has the highest or lowest address of the bytes in a word) and the availability of autoincrement addressing. In the RTL-generation pass, it is often necessary to have multiple strategies for generating code for a particular kind of syntax tree, strategies that are usable for different combinations of parameters. Often, not all possible cases have been addressed, but only the common ones or only the ones that have been encountered. As a result, a new target may require additional strategies. You will know if this happens because the compiler will call abort. Fortunately, the new strategies can be added in a machine-independent fashion, and will affect only the target machines that need them.

[|Structure of GCC]
Like most portable compilers, the compilation process of a GCC-based compiler can be conceptually split up in three phases: Neither the AST nor the non-strict RTL representations are completely target independent, but the [|GIMPLE] language is, and in non-strict [|RTL] form the representation is still not really machine assembly, so the passes that work on non-strict RTL can still be considered target independent to some extent (even passes like [|combine]do not have to worry too much about the target machine). The passes working on strict RTL are really assembler optimizers, which clearly need to take into account far more information about the target architecture. The source file hierarchy is described in its [|documentation]. See also the page on [|regenerating configure scripts]
 * There is a separate [|front end] for each supported language. A front end
 * takes the source code, and does whatever is needed to translate that source code into a semantically equivalent, language independent abstract syntax tree (AST). The syntax and semantics of this AST are defined by the [|GIMPLE] language, the highest level language independent intermediate representation GCC has.
 * This AST is then run through a list of [|target independent code transformations]
 * that take care of such things as constructing a control flow graph, and [optimizing the [|AST] for optimizing compilations, lowering to non-strict [|RTL] ([|expand]), and running [|RTL based optimizations] for optimizing compilations. The non-strict RTL is handed over to more low-level passes.
 * The low-level passes are the passes that are part of the [|code generation]process.
 * The first job of these passes is to turn the non-strict RTL representation into strict RTL, or in other words, from RTL patterns that match definitions without taking constraints into consideration into RTL patterns that fully match the complete =insn= definition *including* all operand constraints (Right now the one pass that takes care of this now is [|reload], but this is suboptimal). Other jobs of the strict RTL passes include scheduling, doing peephole optimizations, and emitting the assembly output.