JVM Overview
JVM Overview
JVM Components
Compiler translates Java source to bytecode, which is interpreted by the JVM on a host machine. Translation process looks as follows:
Resolver, Loader
Loads class files and sets up their internal memory. It can do so in two different ways, Eager Loading and Lazy Loading. In Eager Loading, a reference is resolved when a class is first loaded. In Lazy Loading, a reference is resolved when it is actually needed, i.e., a class used in an if statement is loaded only when the if statement evaluates to true (if the class is not used outside the if statement scope).
Static initialization of a class is non-trivial, can be done concurrently by many threads. Typically done before a class is used.
Bytecode Verification
Automatically verifies bytecode provided to JVM to satisfy certain security constraints. This is usually done right after the class is loaded, but before static initialization. JVM bytecode is strongly-typed.
Minor Problem: Automated verification is undecidable. This means that the verifies may reject valid programs that actually do satisfy the constraints. The goal is to design a verifies that accepts as many valid programs as possible.
Bytecode Interpreter
A program inside the JVM that interprets the bytecodes in the class files generated by the Java compiler using a stack and local variable storage.
The JVM is a stack-based abstract machine. Bytecodes push and pop values on the stack. It uses registers which are accessed by load and store instructions. For each method, the number of stack slots and registers is specified in the class file. Most JVM bytecodes are typed.
The bytecode interpreter is typically slow as its pushing and popping values from a stack. One can speed up the interpreter, but in practice parts of code that are frequently executed simply get compiled by the Just in Time (JIT) compiler to native code.
Just-In-Time Compiler (JIT)
Compiler the bytecode to machine code on-demand, especially when a method is frequently executed (hot method). JIT makes bytecodes fast.
Compilation of bytecode to machine code happens during program execution. Typically needs profiling data to know which method is hot or cold. Can be expensive to gather during execution.
Memory Allocators
Consists of, often concurrent algorithms, which are invoked when a Java program allocates memory.
Object allocation in Java invokes the JVM memory allocator. The JVM memory allocator often has to ask the underlying OS for memory, which it then manages internally. The allocator algorithms typically have to be concurrent because multiple Java threads can allocate memory. Otherwise, if sequential, one may see major pause times in their application.
Garbage Collector (GC)
JVM uses many different GC algorithms, often concurrent and parallel, invoked periodically to collect memory unreachable by your program.
The GC frees the programming from having to free memory manually, which is good as it avoids tricky bugs.
Many different GC algorithms: generational, concurrent, parallel, mark and sweep,…. Trade-off: different performance goals. Concurrent GC algorithms are very difficult to get correct.
The Object.finalize()
method is called on an object when the GC collects it.
Native interface
When a Java program calls a native method, one has to convert the JVM parameters (e.g., what is on the stack) into machine registers following the calling convention.
As the JVM interpreter is executing the Java program, at some point it may have to call a native method: some code written in C let’s say. To do so, the JVM has to pass the parameters to the native method in a particular way so to interact with binaries on the particular platform. This is not a large module but it can be tricky to get the types correct. java.lang.Object
contains many native methods (e.g., starting a thread) for which the JVM provides an internal implementation.
Portability Layer
When the JVM is to run on top of Windows vs. Linux or x86 vs. ARM, the JVM designer must implement a small number of JVM constructs (e.g. synchronization, threading) using the primitives of the underlying operating system and architecture.
Example: Java provides its own notion of thread. However, the operating system has a different notion of what a thread is. So this layer somehow needs to come up with a way to use the OS notion of a thread to provide the Java notion of a thread that the Java programmer expects.
Bytecode
Bytecode can be examined using javap -c ClassName
. Particularly helpful with constructs such as synchronized
and volatile
.