A Deep Dive into Compilers, Interpreters, Bytecode, and JIT Compilation

Teddy Aryono 21 min read 31 Jan, 2026

Introduction

How do programming languages actually work? When you write code in Python, JavaScript, C++, or Java, what happens between hitting “run” and seeing results? This guide explores the fundamental mechanisms that transform high-level code into instructions your computer can execute.

We’ll cover:

The difference between compilers and interpreters
What bytecode is and why it matters
How Just-In-Time (JIT) compilation revolutionizes performance
Real examples of machine code generation
Optimization techniques that make modern languages fast

Compilers vs Interpreters: The Basics

Both compilers and interpreters translate code from high-level programming languages into instructions a computer can execute, but they approach this task very differently.

Compilers

Compilers translate your entire program into machine code before execution. The workflow looks like this:

Write source code
Run the compiler
Get an executable file
Run the executable directly on your CPU

When you compile code, several stages happen:

Lexical analysis - breaks source code into tokens (keywords, operators, identifiers)
Syntax analysis - checks if tokens form valid statements according to grammar rules
Semantic analysis - verifies the code makes logical sense (type checking, variable declarations)
Optimization - improves code efficiency without changing behavior
Code generation - produces machine code or intermediate code

The result is a standalone executable. Languages like C, C++, Rust, and Go are typically compiled.

Interpreters

Interpreters translate and execute your code line-by-line (or statement-by-statement) at runtime. There’s no separate compilation step—the interpreter reads your source code directly and executes it.

Interpreters typically follow these steps:

Read the next statement
Parse and analyze it
Execute it immediately
Move to the next statement

Traditional interpreted languages include Python, Ruby, and older JavaScript implementations (though modern implementations are more complex).

Trade-offs

Compiled languages generally run faster because the translation work is done once upfront, and optimizations can be applied. They catch errors at compile-time. However, you need to recompile after every code change.

Interpreted languages are more flexible and portable—the same source code runs on any platform with the interpreter. They’re great for rapid development and debugging, but typically run slower because translation happens during execution.

The Reality: It’s Complicated

Most modern languages use hybrid approaches. Java compiles to bytecode (intermediate representation) which the JVM then interprets or JIT-compiles. Python compiles to bytecode that’s interpreted. JavaScript engines use JIT compilation. The line between compiled and interpreted has blurred significantly.

Concrete Examples

Compiled Language Example: C

1// hello.c
2#include <stdio.h>
3
4int main() {
5    printf("Hello, World!\n");
6    return 0;
7}

The compilation process:

1# Compile the source code
2gcc hello.c -o hello
3
4# This creates an executable file 'hello'
5# Now run it
6./hello

What happens: gcc reads hello.c, translates it entirely to machine code, and produces an executable binary file. This binary contains raw CPU instructions specific to your architecture (x86, ARM, etc.). Once compiled, you can run ./hello directly without needing the compiler or source code anymore.

Interpreted Language Example: Python

1# hello.py
2print("Hello, World!")

Running it:

1python hello.py

What happens: The Python interpreter reads hello.py, translates it line-by-line (actually to bytecode first, then executes), and outputs the result immediately. You always need the Python interpreter installed to run the script. No separate executable is created.

Side-by-Side Comparison: Factorial

C (compiled):

 1// factorial.c
 2#include <stdio.h>
 3
 4int factorial(int n) {
 5    if (n <= 1) return 1;
 6    return n * factorial(n - 1);
 7}
 8
 9int main() {
10    printf("5! = %d\n", factorial(5));
11    return 0;
12}

Workflow:

1gcc factorial.c -o factorial  # Compile (takes time)
2./factorial                    # Run (very fast)
3# Output: 5! = 120

Python (interpreted):

1# factorial.py
2def factorial(n):
3    if n <= 1:
4        return 1
5    return n * factorial(n - 1)
6
7print(f"5! = {factorial(5)}")

Workflow:

1python factorial.py  # Interpret and run (slower execution)
2# Output: 5! = 120

Hybrid Example: Java

Java demonstrates the middle ground:

1// Hello.java
2public class Hello {
3    public static void main(String[] args) {
4        System.out.println("Hello, World!");
5    }
6}

The process:

1# Step 1: Compile to bytecode
2javac Hello.java
3# This creates Hello.class (bytecode, not machine code)
4
5# Step 2: Run with the JVM
6java Hello
7# The JVM interprets or JIT-compiles the bytecode

Java compiles to an intermediate format (bytecode) that’s platform-independent, then the JVM interprets or compiles it to machine code at runtime.

Error Detection: Compile-time vs Runtime

C with compilation error:

1int main() {
2    int x = "hello";  // Type error
3    return 0;
4}

1gcc test.c -o test
2# Compiler catches error BEFORE running:
3# error: incompatible types when initializing 'int' using 'char *'

Python with runtime error:

1x = 5
2print(x + "hello")  # Type error

1python test.py
2# Runs until it hits the error:
3# TypeError: unsupported operand type(s) for +: 'int' and 'str'

The compiler catches the error before the program ever runs. The interpreter only finds it when that line executes.

Performance Comparison

Here’s a loop that shows the speed difference:

 1// loop.c
 2#include <stdio.h>
 3
 4int main() {
 5    long sum = 0;
 6    for (long i = 0; i < 1000000000; i++) {
 7        sum += i;
 8    }
 9    printf("%ld\n", sum);
10    return 0;
11}

Compile and run:

1gcc -O3 loop.c -o loop
2time ./loop
3# Typically runs in under 1 second

Python equivalent:

1# loop.py
2sum = 0
3for i in range(1000000000):
4    sum += i
5print(sum)

1time python loop.py
2# Typically takes 30-60+ seconds

The compiled C code is dramatically faster because the machine code is optimized and runs directly on the CPU, while Python interprets each iteration.

What Bytecode Looks Like

Bytecode is an intermediate representation between source code and machine code. Let’s explore what it actually looks like.

Python Bytecode

Python compiles your source code to bytecode before interpreting it. You can actually see this bytecode:

1# example.py
2def add(a, b):
3    return a + b
4
5result = add(5, 3)
6print(result)

Viewing the bytecode:

1import dis
2
3def add(a, b):
4    return a + b
5
6dis.dis(add)

Output:

  2           0 LOAD_FAST                0 (a)
              2 LOAD_FAST                1 (b)
              4 BINARY_ADD
              6 RETURN_VALUE

What this means:

Column 1 (2): Line number in source code
Column 2 (0, 2, 4, 6): Byte offset in the bytecode
Column 3: The instruction name (LOAD_FAST, BINARY_ADD, etc.)
Column 4: Instruction argument (if any)
Column 5: Human-readable interpretation

So a + b becomes:

Load variable a onto the stack
Load variable b onto the stack
Pop both, add them, push result
Return the value on top of stack

More Complex Example: Factorial

1import dis
2
3def factorial(n):
4    if n <= 1:
5        return 1
6    return n * factorial(n - 1)
7
8dis.dis(factorial)

Output:

  2           0 LOAD_FAST                0 (n)
              2 LOAD_CONST               1 (1)
              4 COMPARE_OP               1 (<=)
              6 POP_JUMP_IF_FALSE        6 (to 12)

  3           8 LOAD_CONST               1 (1)
             10 RETURN_VALUE

  4     >>   12 LOAD_FAST                0 (n)
             14 LOAD_GLOBAL              0 (factorial)
             16 LOAD_FAST                0 (n)
             18 LOAD_CONST               1 (1)
             20 BINARY_SUBTRACT
             22 CALL_FUNCTION            1
             24 BINARY_MULTIPLY
             26 RETURN_VALUE

You can see the if-statement becomes a comparison followed by a conditional jump instruction.

Java Bytecode

Java bytecode is more complex and closer to assembly language. Here’s a simple example:

1public class Example {
2    public int add(int a, int b) {
3        return a + b;
4    }
5}

Compile it:

1javac Example.java

View the bytecode:

1javap -c Example.class

Output:

public int add(int, int);
  Code:
     0: iload_1        // Load first parameter (a)
     1: iload_2        // Load second parameter (b)
     2: iadd           // Integer add
     3: ireturn        // Return integer

Actual Bytecode (Hexadecimal)

The instructions above are human-readable disassembly. The actual bytecode is binary. Here’s what Python bytecode looks like in raw form:

1import dis
2
3def add(a, b):
4    return a + b
5
6# Get the actual bytecode bytes
7print(add.__code__.co_code)

Output:

b'|\x00|\x01\x17\x00S\x00'

This is the raw bytecode! Each byte is an instruction or argument:

| (0x7C): LOAD_FAST
\x00: Argument 0 (first local variable)
| (0x7C): LOAD_FAST
\x01: Argument 1 (second local variable)
\x17: BINARY_ADD
\x00: (padding/argument)
S (0x53): RETURN_VALUE
\x00: (padding/argument)

Java Bytecode in Hex

Using a hex editor on Example.class, you’d see something like:

café babe 0000 0037 ...

café babe: Magic number identifying Java class files
Followed by version info, constant pool, and bytecode

The actual add method bytecode in hex:

1a 1b 60 ac

1a = iload_1
1b = iload_2
60 = iadd
ac = ireturn

Why Bytecode?

Bytecode sits between source code and machine code:

Source code → def add(a, b): return a + b

Bytecode → LOAD_FAST 0, LOAD_FAST 1, BINARY_ADD, RETURN_VALUE

Machine code (x86) → mov eax, [ebp+8]; add eax, [ebp+12]; ret

Benefits:

Platform independence: Same bytecode runs on any platform with the VM/interpreter
Faster than interpreting source: Parsing is already done
Smaller than machine code: More compact representation
Security: Can verify bytecode before execution
Optimization: JIT compilers can optimize bytecode to machine code at runtime

Stack-Based vs Register-Based

Most bytecode is stack-based (Python, Java):

LOAD_FAST a    // Push a onto stack: [a]
LOAD_FAST b    // Push b onto stack: [a, b]
BINARY_ADD     // Pop two, add, push result: [a+b]
RETURN_VALUE   // Return top of stack

Some use registers (like Android’s Dalvik):

add-int v0, v1, v2  // v0 = v1 + v2
return v0

Viewing Python’s .pyc Files

When you run Python code, it creates .pyc files with compiled bytecode:

1python -m compileall example.py
2# Creates __pycache__/example.cpython-*.pyc

You can decompile these:

1import dis
2import marshal
3
4with open('__pycache__/example.cpython-311.pyc', 'rb') as f:
5    f.read(16)  # Skip header
6    code = marshal.load(f)
7    dis.dis(code)

Just-In-Time (JIT) Compilation

JIT compilation is one of the most clever optimizations in modern computing—a hybrid approach that combines the best of both interpreted and compiled code.

The Core Idea

Instead of interpreting bytecode every time (slow) or compiling everything upfront (slow startup), JIT compilers watch your program run and compile the “hot” parts (frequently executed code) to native machine code during execution.

How Traditional Interpretation Works

1def calculate(n):
2    total = 0
3    for i in range(n):
4        total += i * i
5    return total
6
7# Call it 1000 times
8for _ in range(1000):
9    calculate(10000)

With a pure interpreter: Every single loop iteration, every addition, every multiplication gets interpreted from bytecode each time through. The interpreter reads bytecode, decodes what to do, executes it, repeat. This happens 1000 × 10000 = 10 million times!

How JIT Works

Step 1: Start by interpreting

First call to calculate(10000):
  - Interpreter executes bytecode
  - JIT compiler watches and counts

Step 2: Detect hot code

After 10-100 calls:
  - JIT notices: "Hey, calculate() is called a lot!"
  - Marks it as a "hot spot"

Step 3: Compile to machine code

  - JIT compiles calculate() to native x86/ARM instructions
  - Replaces bytecode with pointer to compiled code

Step 4: Execute compiled version

Subsequent calls (11-1000):
  - Jump directly to machine code
  - Runs at full CPU speed
  - No more interpretation overhead

Real Example: JavaScript V8 Engine

V8 (used in Chrome and Node.js) has a sophisticated JIT pipeline:

 1function sum(arr) {
 2    let total = 0;
 3    for (let i = 0; i < arr.length; i++) {
 4        total += arr[i];
 5    }
 6    return total;
 7}
 8
 9// First few calls
10sum([1, 2, 3]);  // Interpreted
11sum([4, 5, 6]);  // Still interpreted, profiling...
12sum([7, 8, 9]);  // JIT kicks in, compiles to machine code

V8’s multi-tier JIT:

Ignition (interpreter): Executes bytecode initially
Sparkplug (baseline JIT): Quick, simple compilation for warm code
TurboFan (optimizing JIT): Aggressive optimization for very hot code

JIT Optimization Example

Here’s what JIT can do:

1function add(a, b) {
2    return a + b;
3}
4
5// Called with numbers
6add(5, 3);
7add(10, 20);
8add(7, 14);

Initial bytecode interpretation:

LOAD a
LOAD b
CALL_GENERIC_ADD  // Could be number, string, object...
RETURN

After JIT profiling sees only numbers:

; Optimized machine code (x86)
mov eax, [a]      ; Load a into register
add eax, [b]      ; Add b directly
ret               ; Return

The JIT removed:

Type checking (knows they’re numbers)
Generic add operation (uses CPU ADD instruction)
Unnecessary stack manipulation

Result: 10-100x faster for that specific case.

Deoptimization

JIT makes assumptions. If they’re violated, it must “deoptimize”:

 1function add(a, b) {
 2    return a + b;
 3}
 4
 5// JIT optimizes for integers
 6add(5, 3);
 7add(10, 20);
 8
 9// Oops! Now called with string
10add("hello", "world");

What happens:

JIT’s optimized code expects integers
String arrives
JIT throws away compiled code
Falls back to interpreter
May recompile with different assumptions

This is why this code is slow:

1function process(x) {
2    return x * 2;
3}
4
5// Polymorphic: sometimes number, sometimes string
6process(5);
7process("hello");
8process(10);
9process("world");

JIT can’t optimize well because types keep changing.

Java’s HotSpot JVM

Java has one of the most sophisticated JIT compilers:

 1public class Example {
 2    public static int fibonacci(int n) {
 3        if (n <= 1) return n;
 4        return fibonacci(n - 1) + fibonacci(n - 2);
 5    }
 6    
 7    public static void main(String[] args) {
 8        // Warm-up phase
 9        for (int i = 0; i < 10000; i++) {
10            fibonacci(20);
11        }
12        
13        // Now it's compiled and optimized
14        long start = System.nanoTime();
15        fibonacci(30);
16        long end = System.nanoTime();
17        System.out.println("Time: " + (end - start));
18    }
19}

HotSpot’s tiers:

Tier 0: Interpreter (just bytecode)
Tier 1: C1 compiler (client compiler) - fast compilation, basic optimizations
Tier 2: C2 compiler (server compiler) - slow compilation, aggressive optimizations

For fibonacci():

First few calls: Interpreted
After ~2000 invocations: C1 compiles it
After ~10000 invocations: C2 recompiles with advanced optimizations

Common JIT Optimizations

1. Inlining

Before:

1int square(int x) {
2    return x * x;
3}
4
5int sumOfSquares(int a, int b) {
6    return square(a) + square(b);
7}

After JIT inlining:

1int sumOfSquares(int a, int b) {
2    return (a * a) + (b * b);  // No function calls!
3}

2. Dead Code Elimination

Before:

1int compute(int x) {
2    int unused = x * 5;  // Never used
3    return x + 10;
4}

After JIT:

1int compute(int x) {
2    return x + 10;  // Removed unused calculation
3}

3. Loop Unrolling

Before:

1for (int i = 0; i < 4; i++) {
2    array[i] = i;
3}

After JIT:

1array[0] = 0;
2array[1] = 1;
3array[2] = 2;
4array[3] = 3;
5// No loop overhead!

4. Escape Analysis

Before:

1Point createPoint() {
2    Point p = new Point(10, 20);
3    return p.x + p.y;
4}

After JIT:

1int createPoint() {
2    // Point never escapes function
3    // JIT allocates on stack, not heap
4    // Or eliminates object entirely!
5    return 10 + 20;
6}

PyPy: Python with JIT

PyPy is a Python implementation with JIT compilation:

1def calculate(n):
2    total = 0
3    for i in range(n):
4        total += i * i
5    return total
6
7# Run many times
8for _ in range(1000):
9    calculate(10000)

CPython (no JIT): ~5-10 seconds PyPy (with JIT): ~0.1 seconds

PyPy traces execution, identifies loops, and compiles them to machine code.

Tracing JIT

PyPy uses a “tracing JIT”:

1total = 0
2for i in range(1000000):
3    total += i

What PyPy does:

Starts interpreting
Detects loop
Records what happens in ONE iteration (a “trace”)
Compiles that trace to machine code
Executes compiled trace for remaining iterations

The trace might look like:

i = load(i_location)
total = load(total_location)
new_total = int_add(total, i)
store(total_location, new_total)
new_i = int_add(i, 1)
store(i_location, new_i)
if new_i < 1000000: jump_to_start

This gets compiled to tight assembly code.

Trade-offs

Advantages:

Fast execution (near compiled speed)
Platform independence (bytecode is portable)
Can optimize based on actual runtime behavior
Adaptive optimization (gets faster over time)

Disadvantages:

Warm-up time (slow initial execution)
Memory overhead (stores both bytecode and compiled code)
Unpredictable performance (before/after JIT kicks in)
Deoptimization can cause sudden slowdowns

Monitoring JIT Activity

Node.js/V8:

1node --trace-opt script.js
2# Shows when functions get optimized
3
4node --trace-deopt script.js
5# Shows when optimizations fail

Java:

1java -XX:+PrintCompilation Example
2# Shows JIT compilation events

PyPy:

1pypy --jit log=jit.log script.py
2# Logs JIT decisions

Real JIT-Generated Machine Code

Let’s look at actual machine code produced by JIT compilers.

Java HotSpot Example

Simple Java program:

 1public class JitDemo {
 2    public static int add(int a, int b) {
 3        return a + b;
 4    }
 5    
 6    public static void main(String[] args) {
 7        // Warm up the JIT
 8        for (int i = 0; i < 20000; i++) {
 9            add(i, i + 1);
10        }
11        
12        // Now it should be compiled
13        System.out.println(add(5, 3));
14    }
15}

Compile and run with JIT logging:

1javac JitDemo.java
2java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly JitDemo

Output (simplified x86-64 assembly):

# Before JIT (interpreted):
# The JVM is interpreting bytecode, lots of overhead

# After JIT compilation of add():
  0x00007f8b2d000020: mov    %eax,%eax          ; Clear upper bits
  0x00007f8b2d000022: add    %edx,%eax          ; eax = eax + edx (a + b)
  0x00007f8b2d000024: ret                       ; Return
  
# That's it! Just 3 instructions for a + b

Compare this to the original bytecode:

0: iload_0      ; Load parameter a
1: iload_1      ; Load parameter b  
2: iadd         ; Add them
3: ireturn      ; Return result

The bytecode needs the interpreter to decode each instruction. The JIT-compiled version is direct CPU instructions.

Node.js V8 Example

 1// jit-demo.js
 2function multiply(a, b) {
 3    return a * b;
 4}
 5
 6// Warm up
 7for (let i = 0; i < 100000; i++) {
 8    multiply(i, i + 1);
 9}
10
11// Use it
12console.log(multiply(5, 3));

Run with optimization tracing:

1node --trace-opt --trace-deopt jit-demo.js

Output:

[marking 0x... <JS Function multiply> for optimization]
[compiling method 0x... <JS Function multiply> using TurboFan]
[optimizing 0x... <JS Function multiply> - took 0.123 ms]

To see the actual machine code:

1node --print-opt-code jit-demo.js

Simplified generated code:

; TurboFan optimized code for multiply
movq rax, [rbp-0x18]    ; Load a
movq rbx, [rbp-0x20]    ; Load b
imul rax, rbx           ; rax = rax * rbx
ret                     ; Return

Advanced JIT Optimization Techniques

Type Specialization

This is one of the most powerful JIT optimizations.

1function process(x) {
2    return x + x;
3}
4
5// Scenario: Always called with numbers
6for (let i = 0; i < 10000; i++) {
7    process(42);
8}

Without type specialization (interpreter):

 1// Pseudocode of what interpreter does
 2function process(x) {
 3    // Check: is x a number? string? object?
 4    let type = typeof x;
 5    
 6    if (type === 'number') {
 7        return x + x;  // Numeric addition
 8    } else if (type === 'string') {
 9        return x + x;  // String concatenation
10    } else if (type === 'object') {
11        return x.valueOf() + x.valueOf();  // Call valueOf
12    }
13    // ... more type checks
14}

With JIT type specialization:

; After profiling shows x is ALWAYS a number
; Generated machine code:
movsd xmm0, [x]      ; Load x (as float)
addsd xmm0, xmm0     ; Add x + x (double precision)
ret                  ; Return

; NO type checks!
; NO branches!
; Just one add instruction!

The Impact of Polymorphism

 1function add(a, b) {
 2    return a + b;
 3}
 4
 5// Case 1: Monomorphic (one type)
 6console.time('monomorphic');
 7for (let i = 0; i < 10000000; i++) {
 8    add(i, i + 1);  // Always integers
 9}
10console.timeEnd('monomorphic');
11
12// Case 2: Polymorphic (multiple types)
13console.time('polymorphic');
14for (let i = 0; i < 10000000; i++) {
15    if (i % 2 === 0) {
16        add(i, i + 1);        // Sometimes integers
17    } else {
18        add("a", "b");        // Sometimes strings
19    }
20}
21console.timeEnd('polymorphic');

Results (approximate):

monomorphic: ~50ms
polymorphic: ~400ms

Why the difference?

Monomorphic JIT code:

; Fast path - knows both are integers
mov eax, [a]
add eax, [b]
ret

Polymorphic JIT code:

; Must handle multiple types
mov rax, [a]
test rax, 1          ; Check if integer (SMI)
jne string_case      ; Jump if not integer

integer_case:
  mov ebx, [b]
  add eax, ebx
  ret

string_case:
  ; Complex string concatenation code
  call string_concat
  ret

More branching, more type checks, slower execution.

Hidden Classes / Shapes

V8 and other modern JIT compilers use “hidden classes” (also called “shapes” or “maps”) to optimize object property access.

Without optimization:

1function Point(x, y) {
2    this.x = x;
3    this.y = y;
4}
5
6function getX(point) {
7    return point.x;
8}

Naive property access requires hash table lookup every time—slow!

With hidden classes:

1let p1 = new Point(10, 20);
2let p2 = new Point(30, 40);

V8 creates a “hidden class” (internal structure):

HiddenClass_Point {
    property_map: {
        'x': offset 0,
        'y': offset 8
    }
}

Now getX can be optimized:

; JIT-compiled getX
mov rax, [point]           ; Load point object
cmp [rax], HiddenClass_Point  ; Check hidden class
jne deopt                  ; Deoptimize if wrong shape
mov rax, [rax + 0]         ; Direct memory access at offset 0!
ret

deopt:
  call slow_path           ; Fall back if object has different shape

Direct memory access! No hash table lookup.

Breaking hidden classes:

This code breaks the optimization:

 1function Point(x, y) {
 2    this.x = x;
 3    this.y = y;
 4}
 5
 6let p1 = new Point(10, 20);
 7let p2 = new Point(30, 40);
 8
 9// This breaks it!
10p2.z = 50;  // p2 now has different shape than p1

Now you have two hidden classes and the JIT can’t assume a single shape anymore!

Better approach:

1function Point(x, y, z) {
2    this.x = x;
3    this.y = y;
4    this.z = z || 0;  // Always initialize all properties
5}
6
7// All Points have the same shape
8let p1 = new Point(10, 20);
9let p2 = new Point(30, 40, 50);

Inline Caching

Related to hidden classes, inline caching is how JIT optimizes property access:

1function getX(obj) {
2    return obj.x;
3}
4
5getX({x: 1, y: 2});
6getX({x: 3, y: 4});
7getX({x: 5, y: 6});

Evolution of the JIT code:

First call (uninitialized):

call generic_property_lookup  ; Slow path
; Records: "obj had HiddenClass_A, x at offset 0"

Second call (monomorphic):

cmp [obj_class], HiddenClass_A
jne slow_path
mov rax, [obj + 0]    ; Fast: direct access
ret

If called with different shape:

; Now polymorphic (2-4 different shapes)
cmp [obj_class], HiddenClass_A
je load_offset_0
cmp [obj_class], HiddenClass_B
je load_offset_0
cmp [obj_class], HiddenClass_C
je load_offset_8
jmp megamorphic_slow_path

If too many shapes (>4):

; Megamorphic - give up on inline cache
call generic_property_lookup  ; Back to slow path

Bounds Check Elimination

Arrays are heavily optimized:

1function sumArray(arr) {
2    let sum = 0;
3    for (let i = 0; i < arr.length; i++) {
4        sum += arr[i];
5    }
6    return sum;
7}

Without optimization: Every array access checks bounds (if i >= arr.length throw RangeError).

With JIT optimization:

; Loop header
mov ecx, [arr.length]
xor eax, eax          ; sum = 0
xor ebx, ebx          ; i = 0

loop:
  cmp ebx, ecx        ; i < length?
  jge done
  
  ; JIT knows: if i < length, arr[i] is safe!
  ; NO bounds check here!
  mov edx, [arr + ebx*8]
  add eax, edx
  inc ebx
  jmp loop

done:
  ret

The JIT proved the bounds check in the loop condition is sufficient.

Loop Invariant Code Motion

1function process(arr, multiplier) {
2    for (let i = 0; i < arr.length; i++) {
3        arr[i] = arr[i] * (multiplier * 2);
4    }
5}

Unoptimized: Computes multiplier * 2 every iteration!

JIT optimized:

1let temp = multiplier * 2;  // Moved outside loop!
2for (let i = 0; i < arr.length; i++) {
3    arr[i] = arr[i] * temp;
4}

Machine code:

mov edx, [multiplier]
shl edx, 1            ; edx = multiplier * 2 (ONCE!)

loop:
  mov eax, [arr + i]
  imul eax, edx       ; Use precomputed value
  mov [arr + i], eax
  inc i
  jmp loop

Escape Analysis

This is super clever:

1function createPoint() {
2    let p = {x: 10, y: 20};
3    return p.x + p.y;
4}

Naive compilation:

call allocate_object    ; Allocate on heap
mov [obj.x], 10
mov [obj.y], 20
mov eax, [obj.x]
add eax, [obj.y]
call garbage_collect    ; Later...
ret

With escape analysis:

; Object never escapes function!
; JIT eliminates it entirely:
mov eax, 10
add eax, 20    ; Just compute 10 + 20 = 30
ret

; NO allocation!
; NO garbage collection!

The JIT proved the object doesn’t escape the function, so it completely eliminated it.

On-Stack Replacement (OSR)

What if a function is already running when JIT decides to optimize it?

1function longRunning() {
2    let sum = 0;
3    for (let i = 0; i < 10000000; i++) {  // Long loop
4        sum += i;
5        // After ~1000 iterations, JIT compiles this
6    }
7    return sum;
8}

On-Stack Replacement:

Iteration 1-1000: Interpreted
Iteration 1001: JIT says "I've compiled optimized version!"
                Switch to compiled code MID-EXECUTION
Iteration 1001-10000000: Native compiled code

The JIT replaces the interpreted stack frame with a compiled one while the function is running!

Complete Example: Multiple Optimizations

Here’s a complete example showing multiple optimizations working together:

 1class Point {
 2    constructor(x, y) {
 3        this.x = x;  // Hidden class optimization
 4        this.y = y;
 5    }
 6    
 7    distance() {
 8        return Math.sqrt(this.x * this.x + this.y * this.y);
 9    }
10}
11
12function processPoints(points) {
13    let totalDistance = 0;
14    for (let i = 0; i < points.length; i++) {  // Bounds check elimination
15        totalDistance += points[i].distance();   // Inlining
16    }
17    return totalDistance;
18}
19
20// Create many points with same shape
21let points = [];
22for (let i = 0; i < 100000; i++) {
23    points.push(new Point(i, i + 1));  // All same hidden class
24}
25
26// Warm up JIT
27for (let i = 0; i < 10; i++) {
28    processPoints(points);
29}
30
31// Now fully optimized
32console.time('optimized');
33let result = processPoints(points);
34console.timeEnd('optimized');

JIT optimizations applied:

Type specialization (knows points are always Point objects)
Hidden classes (all Points have same shape)
Inline caching (fast property access to x, y)
Function inlining (distance() inlined into loop)
Bounds check elimination (loop proves i < length)
Loop unrolling (might unroll inner calculations)

Conclusion

Modern programming languages use sophisticated techniques to transform high-level code into fast machine instructions:

Compilers translate entire programs upfront, producing optimized executables
Interpreters execute code directly, providing flexibility and portability
Bytecode serves as a platform-independent intermediate representation
JIT compilers combine the benefits of both approaches, starting with interpretation and dynamically compiling hot code paths to native machine code

The evolution from pure interpretation to JIT compilation represents decades of innovation in making high-level languages fast without sacrificing their ease of use. Understanding these mechanisms helps you write code that plays to each language’s strengths and avoid patterns that prevent optimization.

The key insight: Modern runtimes are incredibly sophisticated. They watch what your code actually does (not what it theoretically could do) and generate highly specialized machine code for those specific patterns. Write consistent, predictable code, and the JIT will reward you with performance that rivals hand-written C.

#engineering #programming

Reply to this post by email ↪