Compilation Process in C
C is a compiled language. Compiled languages provide faster execution performance as compared to interpreted languages. Different compiler products may be used to compile a C program. They are GCC, Clang, MSVC, etc. In this chapter, we will explain what goes in the background when you compile a C program using GCC compiler.
Compiling a C Program
A sequence of binary instructions consisting of 1 and 0 bits is called as machine code. High-level programming languages such as C, C++, Java, etc. consist of keywords that are closer to human languages such as English. Hence, a program written in C (or any other high-level language) needs to be converted to its equivalent machine code. This process is called compilation.
Note that the machine code is specific to the hardware architecture and the operating system. In other words, the machine code of a certain C program compiled on a computer with Windows OS will not be compatible with another computer using Linux OS. Hence, we must use the compiler suitable for the target OS.
C Compilation Process Steps
In this tutorial, we will be using the gcc (which stands for GNU Compiler Collection). The GNU project is a free-software project by Richard Stallman that allows developers to have access to powerful tools for free.
The gcc compiler supports various programming languages, including C. In order to use it, we should install its version compatible with the target computer.
The compilation process has four different steps −
- Preprocessing
- Compiling
- Assembling
- Linking
The following diagram illustrates the compilation process.
Example
To understand this process, let us consider the following source code in C languge (main.c) −
#include <stdio.h>
int main(){
/* my first program in C */
printf("Hello World! \n");
return 0;
}
Output
Run the code and check its output −
Hello World!
The “.c” is a file extension that usually means the file is written in C. The first line is the preprocessor directive #include that tells the compiler to include the stdio.h header file. The text inside /* and */ are comments and these are useful for documentation purpose.
The entry point of the program is the main() function. It means the program will start by executing the statements that are inside this function’s block. Here, in the given program code, there are only two statements: one that will print the sentence “Hello World” on the terminal, and another statement that tells the program to “return 0” if it exited or ended correctly. So, once we compiled it, if we run this program we will only see the phrase “Hello World” appearing.
What Goes Inside the C Compilation Process?
In order for our “main.c” code to be executable, we need to enter the command “gcc main.c”, and the compiling process will go through all of the four steps it contains.
Step 1: Preprocessing
The preprocessor performs the following actions −
- It removes all the comments in the source file(s).
- It includes the code of the header file(s), which is a file with extension .h which contains C function declarations and macro definitions.
- It replaces all of the macros (fragments of code which have been given a name) by their values.
The output of this step will be stored in a file with a “.i” extension, so here it will be in “main.i”.
In order to stop the compilation right after this step, we can use the option “-E” with the gcc command on the source file, and press Enter.
gcc -E main.c
Step 2: Compiling
The compiler generates the IR code (Intermediate Representation) from the preprocessed file, so this will produce a “.s” file. That being said, other compilers might produce assembly code at this step of compilation.
We can stop after this step with the “-S” option on the gcc command, and press Enter.
gcc -S main.c
This is what the main.s file should look like −
.file "helloworld.c"
.text
.def __main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.LC0:
.ascii "Hello, World! \0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
call __main
leaq .LC0(%rip), %rcx
call puts
movl $0, %eax
addq $32, %rsp
popq %rbp
ret
.seh_endproc
.ident "GCC: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0"
.def puts; .scl 2; .type 32; .endef
Step 3: Assembling
The assembler takes the IR code and transforms it into object code, that is code in machine language (i.e. binary). This will produce a file ending in “.o”.
We can stop the compilation process after this step by using the option “-c” with the gcc command, and pressing Enter.
Note that the “main.o” file is not a text file, hence its contents won’t be readable when you open this file with a text editor.
Step 4: Linking
The linker creates the final executable, in binary. It links object codes of all the source files together. The linker knows where to look for the function definitions in the static libraries or the dynamic libraries.
Static libraries are the result of the linker making a copy of all the used library functions to the executable file. The code in dynamic libraries is not copied entirely, only the name of the library is placed in the binary file.
By default, after this fourth and last step, that is when you type the whole “gcc main.c” command without any options, the compiler will create an executable program called main.out (or main.exe in case of Windows) that we can run from the command line.
We can also choose to create an executable program with the name we want, by adding the “-o” option to the gcc command, placed after the name of the file or files we are compiling.
gcc main.c -o hello.out
So now we could either type ”./hello.out” if you didn’t use the “-o” option or ”./hello” to execute the compiled code. The output will be “Hello World” and following it, the shell prompt will appear again.