Skip to main content
Logo image

Section 2.5 The Data Segment

In addition to the code, programs also contain static data. The term “static” means that the extent (the size) of the data is known statically which means it is known prior to the program's execution and does not vary during execution. This means that the memory for static data can be provisioned during assembly and linking and allocated directly when the program is loaded into memory. As we will see later in Section 2.6, the memory the program uses is organized into segments that each serve a distinct purpose. The data segment is the segment that holds the static data.

Remark 2.5.1. Statically versus Dynamically Allocated Memory.

The fact that there is statically-allocated memory indicates that there is also dynamically-allocated memory. We do only touch on this in this chapter but will discuss dynamic memory allocation in more detail when we talk about C programming. A large part of the memory programs use is allocated dynamically, i.e. during the runtime of the program. The reason is that in most situations the input to the program influences the amount of memory that has to be allocated, so there is no way of knowing statically (i.e. before the program runs) how much memory we need. Consider, for example, a program that processes an image that the user can select after starting the program. We do not know the size of that image statically.
It is common that static data is initialized, i.e. it is set to certain values before the program is started. A typical example for such data are strings (character sequences, i.e. text) that the program uses. If a program prints out a message, that message has to be kept somewhere.
In the assembler file, the .data directive starts the declaration of static data. Use .space n to reserve some uninitialized \(n\) bytes. To reference this memory later on, you can put a label in front of the directive like so:
    .data
some_bytes:
    .space 1000
To make allocating static data more comfortable, there are a couple of directives to create static data of different sizes and initialize it at the same time. The directives .byte, .half, .word allocate bytes, half-words (MIPS slang for 2 bytes), and words (4 bytes). .ascii \(s\) and .asciiz \(s\) allocate memory that is initialized to the ASCII codes of the string \(s\text{.}\) .asciiz also appends a byte with value 0. This is a common way of signaling the end of the string that is used, for example, in the language C. For example:
    .data
hello:
    .asciiz "Hello World"
corresponds to the directives
    .data
hello:
    .byte 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20
    .byte 0x57, 0x6f, 0x72, 0x6c, 0x64, 0x00
One peculiar aspect is that data must be aligned depending on its size. If the processor accesses \(n\) bytes of memory using a load or store instruction, the address of access needs to be divisible by \(n\text{,}\) otherwise the processor will trigger an exception and stop executing. For example, the following program
    .data
    .ascii "Hallo"
x:
    .byte 8, 0, 0, 0

    .text
    .globl main
main:
    lw $t0 x
would cause an exception because the address of label x is 0x10000005 which is not divisible by 4. But since we are using the lw instruction that loads a word (4 bytes) into a register, we get an exception. We can force a specific alignment for the next label using the .align directive like so:
    .data
    .ascii "Hallo"
    .align 2
x:
    .byte 8, 0, 0, 0

    .text
    .globl main
main:
    lw $t0 x
Run

Remark 2.5.2. Practice.

In practice, the data segment is further subdivided. First, there is a division of read-only data, for example for string constants. The operating system ensures (using virtual memory, which do not go into here) that writing to read-only segments causes an exception and leads to the termination of the program. Second, data that is initialized to 0 is specifically marked and not allocated space in the binary file of the program. Instead, the operating system initializes such memory to 0 when the program is loaded.