Relocation

Prev: Symbol Resolution
TOC: Linking
Next: Executable Object Files

Linker's second job after symbol resolution.
Symbol Resolution was all about linker having an internal book-keeping of what goes where, and where are stuffs defined, where is it being reference, if definitions and referencing makes sense or not, if some step in the process fails then linker throws error, if not and everything seems fine, this is the step where it actually touches the object files.
Here, it merges the input Relocatable Object Files and begins assigning run-time address to each symbol.
Relocation process happens in two steps:
- Relocation sections and symbol definitions:
  - This is also two steps first everything in same named sections get merged, like .data section of all Relocatable Object Files gets aggregated to form one big .data section(segment) in an Executable Object Files.
  - Secondly the new aggregate section now each get their run-time memory, each instruction and global variable in the program now get their own unique run time memory address.
- Relocation symbol references withing section:
  - After each instruction, section and global vars get their own run time address, here the linker modifies the symbol references in the bodies of code and data sections to point to correct run-time address as defined in previous step.
  - To perform this the linker relies on data structure in the input module called #Relocation Entries.

Relocation Entries

When assembly generates the object module, for symbol references it can fine no definition for it generates a relocation entry that the linker goes through which will tell it how to modify these references during link time.
Each section has their own supplementing section for this purpose as required, for eg. if .text section needs som relocation entry, .rel.text is there to help, if .data needs it then there's .rel.data section etc...
Here's the data structure for relocation entry:
Explanations:
- offset: offset into the section.
- type: ELF defines 32 types many of then quite arcane, but we only need to remember 2:
  - R_X86_64_PC32: Relocate a reference that uses 32 bit relative PC address
  - R_X86_64_32: Relocates a reference that uses 32 bit absolute address
- symbol: It's the index in .symtab section for the corresponding symbol
- addend: further offset to "fine-tune" the address calculation, for eg say there's some global var like this defined in some file:
  int weights[10];
  and we want to reference it's 3rd element in some other file:
  int smref = weights[3];
  now the assembler needs to make reference to this, for that offset will point to weights[0] and addend will be 4x3 = 12 which will point to weights[3].
- Note that R_X86_64_PC32 and R_X86_64_32 both assume that total size of code and data in executable is smaller than 2GB and can be accessed at run=time using 32-bit PC-relative addresses. For modern system we have 64 bit equivalent

Relocating Symbol References

We can do this through following example:
The main.c will generate the follwing assembly:
As we can see for mov we have absolute referencing and for callq we have relative referencing.
We can use the following algorithm to perform both type of relocation steps:

foreach section s{
	foreach relocation entry r{
		refptr = s + r.offset;	
		if(r.type = R_X86_64_PC32){
			refaddr = ADDR(s) + r.offset;
			*refptr = (unsigned) (ADDR(r.symbol) + r.addend - refaddr);
		}
		
		if(r.type == R_X86_64_32){
			*refptr =(unsigned)(ADDR(r.symbol) + r.addend);	
		}
	}
}

We are assuming s section is an array of a set of instructions inside it. So s + r.offset is the location of the instruction that needs to have something changed (relocated).
We are also assuming that linker has already done relocation for all instructions, so ADDR(s) gives run time address of section, ADDR(r.symbol) gives the run-time address of the symbol in the final executable as well.
Prev: Symbol Resolution
TOC: Linking
Next: Executable Object Files