Machine Code

Skip to main content

\(\newcommand{\powerset}[1]{\mathcal{P}(#1)} \def\domain{\operatorname{dom}} \def\codomain{\operatorname{cod}} \def\funimage{\operatorname{img}} \def\defeq{\coloneqq} \def\eqdef{\eqqcolon} \def\syndef{\Coloneqq} \def\simp{\Rightarrow} \newcommand{\edist}{\mathit{edist}} \newcommand{\rd}{\text{x}d} \newcommand{\rs}{\text{x}s} \newcommand{\rt}{\text{x}t} \newcommand{\lo}{\mathit{lo}} \newcommand{\hi}{\mathit{hi}} \newcommand{\pc}{\text{pc}} \newcommand{\shamt}{\mathit{n}} \newcommand{\offset}{i} \newcommand{\marke}{\mathit{label}} \newcommand{\addr}{\mathit{addr}} \newcommand{\imm}{\mathit{imm}} \newcommand{\sext}{\mathrm{se}} \newcommand{\zext}{\mathrm{ze}} \newcommand{\sei}{\sext(i)} \newcommand{\zei}{\zext(i)} \newcommand{\bft}[2]{[#1:#2]} \newcommand{\ite}[3]{\mathop{\mathit{if}}\,{#1}\mathrel{\mathit{then}}{#2}\mathrel{\mathit{ else }}{#3}} \newcommand{\cbranch}{\mathit{cbr}} \newcommand{\Nat} {\mathbb{N}} \newcommand{\Int} {\mathbb{Z}} \newcommand{\Bit} {\mathbb{B}} \newcommand{\bbase} {B} \newcommand{\bnine} {A} \newcommand{\bneg} [1] {\overline{#1}} \newcommand{\band} {\mathrel{\&}} \newcommand{\bor} {\mathrel{|}} \newcommand{\bxor} {\mathrel{\hat{\phantom{+}}}} \newcommand{\badd} {\mathrel{+}} \newcommand{\blts} {\mathrel{\lt_s}} \newcommand{\bltu} {\mathrel{\lt_u}} \newcommand{\bges} {\mathrel{\ge_s}} \newcommand{\bgeu} {\mathrel{\ge_u}} \newcommand{\baddn} {\mathrel{+_n}} \newcommand{\bsubn} {\mathrel{-_n}} \newcommand{\bsll} {\mathrel{\ll}} \newcommand{\bsrl} {\mathrel{\gg}} \newcommand{\bsra} {\mathrel{\overset s\gg}} \newcommand{\bconcat} {\,} \newcommand{\bseqq} [2] {{#1}_{#2}\dots {#1}_0} \newcommand{\bseq} [2] {{#1}_{#2-1}\dots {#1}_0} \newcommand{\bsintp} [1] {[#1]} \newcommand{\buintp} [1] {\langle #1\rangle} \newcommand{\buintpb}[2] {\buintp{#1}_{#2}} \def\Var{\mathit{Var}} \def\Val{\mathit{Val}} \def\Addr{\mathit{Addr}} \def\Expr{\mathit{Expr}} \def\LExpr{\mathit{LExpr}} \def\ExprS{\mathit{ExprS}} \def\Stmt{\mathit{Stmt}} \def\Prg{\mathit{Prg}} \def\Conf{\mathit{Conf}} \def\Ty{\mathit{Ty}} \def\Assert{\mathit{Assert}} \newcommand{\TWhile}{\mathtt{While}} \newcommand{\TIf}{\mathtt{If}} \newcommand{\TBlock}{\mathtt{Block}} \newcommand{\TAssign}{\mathtt{Assign}} \newcommand{\TAbort}{\mathtt{Abort}} \newcommand{\TVar}{\mathtt{Var}} \newcommand{\TConst}{\mathtt{Const}} \newcommand{\TBinary}{\mathtt{Binary}} \newcommand{\TAddrOf}{\mathtt{Addr}} \newcommand{\TIndir}{\mathtt{Indir}} \newcommand{\TSeq}{\mathtt{Seq}} \newcommand{\TTerm}{\mathtt{Term}} \newcommand{\TCrash}{\mathtt{Crash}} \newcommand{\AIf}[3]{\TIf\left[#1,#2,#3\right]} \newcommand{\AWhile}[2]{\TWhile\left[#1,#2\right]} \newcommand{\ABlock}[2]{\TBlock_{#1}\left[#2\right]} \newcommand{\AAssign}[2]{\TAssign\left[#1,#2\right]} \newcommand{\AAbort}{\TAbort} \newcommand{\ASeq}[2]{\TSeq\left[#1,#2\right]} \newcommand{\ATerm}{\TTerm} \newcommand{\ACrash}{\TCrash} \newcommand{\AVar}[1]{\mathtt{Var}_{#1}} \newcommand{\ALVal}[1]{\mathtt{LVal}\left[#1\right]} \newcommand{\AConst}[1]{\TConst_{#1}} \newcommand{\ABinary}[3]{\TBinary_{#1}\left[#2,#3\right]} \newcommand{\AAddrOf}[1]{\TAddrOf\left[#1\right]} \newcommand{\AIndir}[1]{\TIndir\left[#1\right]} \newcommand{\CIf}[3]{\mathtt{if}\,(#1)\,#2\,\mathtt{else}\,#3} \newcommand{\CWhile}[2]{\mathtt{while}\,(#1)\,#2} \newcommand{\CWhileInv}[3]{\mathtt{while}\,(#1)\,\mathtt{\_Inv}(#3)\,#2} \newcommand{\CBlock}[1]{\{#1\}} \newcommand{\CAssign}[2]{#1=#2;} \newcommand{\CAbort}{\mathtt{abort();}} \newcommand{\CBinary}[3]{#2\mathop{#1}#3} \newcommand{\CAddrOf}[1]{\&#1} \newcommand{\CIndir}[1]{*#1} \newcommand{\CSeq}[2]{{#1}\ {#2}} \newcommand{\CTerm}{\varepsilon} \newcommand{\CCrash}{\Lightning} \newcommand{\CConcat}[2]{#1\mathop{+\!+}#2} \newcommand{\CAssert}[1]{\mathtt{assert}(#1)} \newcommand{\CAssume}[1]{\mathtt{assume}(#1)} \newcommand{\CPtr}[1]{#1*} \newcommand{\CAssnVV}[2]{\CAssign{\CVar #1}{\CVar #2}} \newcommand{\CAssnVC}[2]{\CAssign{\CVar #1}{\CConst #2}} \newcommand{\CSeqThree}[3]{\CSeq{#1}{\CSeq{#2}{#3}}} \newcommand{\CSeqFour}[4]{\CSeq{#1}{\CSeqThree{#2}{#3}{#4}}} \newcommand{\COp}[1]{\mathbin{\mathtt{#1}}} \newcommand{\CVar}[1]{\mathtt{#1}} \newcommand{\CConst}[1]{\mathtt{#1}} \newcommand{\CPrg}[1]{\texttt{#1}} \newcommand{\CInt}{\mathtt{int}} \newcommand{\CChar}{\mathtt{char}} \newcommand{\CVoid}{\mathtt{void}} \newcommand{\tptr}[1]{#1\texttt{*}} \def\pto{\rightharpoonup} \def\steprel{\rightarrow} \def\terminates{\mathrel{\downarrow}} \newcommand\config[2]{\langle#1\mid #2\rangle} \newcommand\configt[1]{\config{\CTerm}{#1}} \def\castrel{\leftrightarrow} \newcommand{\stmtjudge}[2]{{#1}\mathrel{\vdash}{#2}} \newcommand{\stmtGamma}[1]{\stmtjudge{\Gamma}{#1}} \newcommand{\typejudge}[3]{{#1}\mathrel{\vdash}{#2}\mathrel{:}{#3}} \newcommand{\typeGamma}[2]{\typejudge{\Gamma}{#1}{#2}} \newcommand{\codeE}[5]{\mathit{#1}\enspace #2\enspace #3\enspace #4 = #5} \newcommand{\codeR}[3]{\codeE{codeR}{\mathit{offs}}{#1}{#2}{#3}} \newcommand{\codeL}[3]{\codeE{codeL}{\mathit{offs}}{#1}{#2}{#3}} \newcommand{\codeSO}[3]{\mathit{codeS}\enspace #1\enspace #2 = #3} \newcommand{\codeS}[2]{\mathit{codeS}\enspace \mathit{offs}\enspace #1 = #2} \newcommand{\codePO}[3]{\mathit{codeP}\enspace #1\enspace #2 = #3} \newcommand{\codeP}[2]{\mathit{codeP}\enspace \mathit{offs}\enspace #1 = #2} \newcommand{\mreg}[1]{\mathtt{\$#1}} \newcommand{\lcc}{\mathbin{::}} \def\rsl{r_1\lcc rs} \def\rsls{r_1\lcc r_2\lcc rs} \def\rlist{r\lcc{rs}} \ifdefined\isLatex \def\adra{\bigtriangleup} \def\adrb{\pentagon} \def\adrc{\diamond} \def\adrd{\bigtriangledown} \def\adre{\bigstar} \def\adrf{\male} \def\adrg{\female} \let\oldLightning\Lightning \renewcommand{\Lightning}{\text{\oldLightning}} \newcommand{\denot}[1]{\llbracket {#1} \rrbracket} \else \def\adra{\unicode{x25b3}} \def\adrb{\unicode{x2b20}} \def\adrc{\unicode{x25c7}} \def\adrd{\unicode{x25bd}} \def\adre{\unicode{x2605}} \def\adrf{\unicode{x2642}} \def\adrg{\unicode{x2640}} \def\Lightning{\unicode{x21af}} \newcommand{\denot}[1]{[\![{#1}]\!]} \fi \newcommand{\denotL}[1]{L\denot{#1}} \newcommand{\denotR}[1]{R\denot{#1}} \newcommand{\indetval}{\mathord{?}} \def\adra{\bigtriangleup} \def\adrb{\bigcirc} \def\adrc{\lozenge} \def\adrd{\bigtriangledown} \def\adre{\bigstar} \def\adrf{\dagger} \def\adrg{\ddagger} \newcommand{\cleave}{\mathbin{\blacksquare}} \def\models{\vDash} \def\limp{\Rightarrow} \def\liff{\Leftrightarrow} \def\lneg{\neg} \def\bigstep{\mathrel{\Downarrow}} \def\ltrue{\mathit{true}} \def\lfalse{\mathit{false}} \def\exprdef{\operatorname{def}} \newcommand{\hpc}{\operatorname{pc}} \newcommand{\hvc}{\operatorname{vc}} \newcommand{\amodels}[1]{M\denot{#1}} \newcommand{\hoare}[3]{\{#1\}\mathrel{#2}\{#3\}} \newcommand{\hoaret}[3]{[#1]\mathrel{#2}[#3]} \newcommand{\passingt}[1]{{#1}_\checkmark} \newcommand{\failingt}[1]{{#1}_\times} \newcommand{\passing}{\passingt\sigma} \newcommand{\failing}{\failingt\sigma} \newcommand{\slabel}[1]{\mathit{lab}} \newcommand{\subtyperel} {\mathrel{\le}} \newcommand{\prgltr}{s} \newcommand{\lt}{<} \newcommand{\gt}{>} \newcommand{\amp}{&} \definecolor{fillinmathshade}{gray}{0.9} \newcommand{\fillinmath}[1]{\mathchoice{\colorbox{fillinmathshade}{$\displaystyle \phantom{\,#1\,}$}}{\colorbox{fillinmathshade}{$\textstyle \phantom{\,#1\,}$}}{\colorbox{fillinmathshade}{$\scriptstyle \phantom{\,#1\,}$}}{\colorbox{fillinmathshade}{$\scriptscriptstyle\phantom{\,#1\,}$}}} \)

Chapter 2 Machine Code

We start with a very simple imperative programming language, the machine code for the RISC-V instruction set architecture. An instruction set architecture is the definition of a machine code language. There may be various different implementations (i.e. different processors) that all implement the same ISA. For example, all modern ARM processors from different vendors implement the AArch64 ISA.

Machine code is very primitive, does not contain a lot of concepts, and is very close to the machine. By learning machine code, we get an understanding of the lowest abstraction in programming, the so-called hardware-software interface, which is essentially how software “talks” to hardware. Understanding what happens at the machine code level is relevant in many situations: For example, many software security problems exploit the fact that some programming languages cannot really abstract the machine code level entirely which can be exploited to “make the program behave” in an unintended way. Another example is performance optimization: If code needs to run as fast as possible, engineers often look at the machine code generated from a compiler to understand possible performance bottlenecks. Sometimes performance-critical code is even written in machine code.

The abstractions that higher programming languages (such as C or Java) provide are of course helpful because they save us work. They relieve the programmer from having to think about low-level details and enable portable code. Portable means that a program can be run on different machines that have possibly different instruction set architectures. One way of enabling portability is using a compiler. A compiler is a program that translates a program written in one language (e.g. C) to a program in another language (such as machine code).

<Prev ^Top Next>