Java Bindings for Rust: A Comprehensive Guide
by Akil Mohideen, Natalya McKay, Santiago Martinez Sverko, and Seth Kaul Sponsored by Ethan McCue
This document assumes you have Rust 1.81.0 and Java 22 or later. If you have not installed Rust or Java, you can install Rust here and Java here.
Introduction
Welcome to Java Bindings for Rust: A Comprehensive Guide, a guide to using Rust from within Java. This process can be notoriously confusing, and the information on how to do it is dense and scattered across various sources. This guide teaches how to make these bindings in a digestible way. Every section was created to be as short and readable as possible without barring important information. Complex details will be kept in their own sections, and will be linked to when they are applicable.
Purpose of the Manual
The purpose of this manual is to provide a comprehensive guide for creating Java bindings to Rust libraries using Java 22 and Rust 1.81.0. It will cover the essential steps and concepts required to allow Java applications to call Rust functions, making use of the Foreign Function and Memory API. By the end of this manual, developers will be able to seamlessly integrate Rust’s high-performance, memory-safe capabilities into their Java applications, enabling cross-language functionality.
Why Java 22?
Java 22 introduces the Foreign Function and Memory API (FFM API), a modern alternative to the legacy Java Native Interface (JNI). JNI was traditionally used to interact with C-like functions and data types in external libraries. However, JNI is cumbersome, error-prone, and introduces significant overhead due to repeated native function calls and lack of Just-In-Time (JIT) optimizations. Java objects needed to be passed through JNI, requiring additional work on the native side to identify object types and data locations, making the entire process tedious and slow. With the FFM API, Java now pushes much of the integration work to the Java side, eliminating the need for custom C headers and providing more visibility for the JIT compiler.
This change leads to Better Performance, as the JIT compiler can now optimize calls to native libraries more effectively. It also leads to Simplified Integration because there are fewer requirements on native function signatures. This reduces the overhead of native-to-Java translation. Additionally, the API provides Enhanced Flexibility, as it supports working with various languages like Rust while maintaining full control over how memory and function calls are handled.
Java 22 is the first version to stabilize this API, making it the ideal choice for this manual. It enables efficient, direct interaction with Rust libraries without the historical drawbacks of JNI.
How Java and Rust Work Together
Rust is a system-level language that provides fine-grained control over memory management, making it a popular choice for performance-critical applications. Java, on the other hand, excels in providing portability and high-level abstractions. By using the FFM API in Java 22, developers can leverage Rust’s performance and memory safety in Java applications.
It provides access to classes such as SymbolLookup
, FunctionDescriptor
, Linker
, MethodHandle
, Arena
, and MemorySegment
, which enable Java to call foreign functions and manage memory in more effective ways. On Rust's end, functions exposed to Java must adhere to the C ABI, ensuring compatibility between the two languages. The manual will explore how to allocate, manage, and release memory safely between Java and Rust, ensuring optimal performance and avoiding memory leaks or undefined behavior.
What This Manual Hopes to Accomplish
By the end, this manual will:
- Provide a Step-by-Step Guide: Developers will be walked through setting up bindings between Rust and Java, and configuring these bindings for projects.
- Demonstrate Practical Examples: Examples of properly designed bindings will be provided and explained. These examples will be provided for both easy and complex topics including exposing Rust functions, handling complex data types, managing lifetimes and memory, and handling multi-threading.
- Simplify Rust-Java Integration: The manual will demystify the integration process, helping developers avoid common pitfalls related to ownership, memory management, and data layout discrepancies.
- Address Advanced Topics: In addition to the basics, the manual will explore advanced topics such as thread safety, handling Rust’s ownership and borrowing rules in Java, and how to handle complex data structures and edge cases.
By following this guide, developers will gain a deep understanding of how to efficiently and safely call Rust libraries from Java, making full use of both Java 22’s FFM API and Rust’s robust performance and memory safety features.
Setting Up and Linking Rust and Java
In this section, we will explain how to set up and link both Rust and Java code, in order to create Java bindings for Rust libraries. This process involves exporting Rust functions in a way Java can access, and using Java's FFM API to dynamically link to Rust code. We will also cover how to work with FunctionDescriptor
, Arena
, MemoryLayout
, and other key components necessary to ensure safe and efficient communication between Java and Rust.
How Rust and Java Communicate
To enable communication between Rust and Java, Rust compiles code into a shared library that Java can load and invoke using FFM API
This process requires separate files for Rust code and Java bindings, each with specific configurations and naming conventions. The file names and formats may vary depending on your operating system and Rust version.
-
Rust File: Contains the functions to be exposed to Java. This file is typically named
lib.rs
and located in thesrc/
directory of a Rust project. -
Shared Library File: This is the compiled output of the Rust code. The file name and extension depend on your operating system.
- Windows:
FileName.dll
- Linux:
FileName.so
- macOS:
FileName.dylib
- Windows:
-
Java File: Contains the Java code that binds to the Rust shared library. You can name it based on your application, e.g., RustBindings.java.
Setting Up Rust
Step 1: Exporting Rust Functions
To make Rust functions callable from Java, you need to:
- Prevent name mangling: Add
#[no_mangle]
to ensure Java can find the Rust function by its exact name. - Use the C ABI: Add
extern "C"
so the function adheres to the C Application Binary Interface (C ABI), which Java can interact with.
Before Adding #[no_mangle] and extern "C"
Here’s a simple Rust function that adds two numbers. This version cannot yet be used by Java because the function name will be mangled and will not conform to the C ABI.
// Rust function: Adds two numbers
pub fn add_numbers(x: i32, y: i32) -> i32 {
x + y
}
After Adding #[no_mangle] and extern “C"
To make the function callable from Java, modify it as follows:
// Rust function: Adds two numbers
#[no_mangle]
pub extern "C" fn add_numbers(x: i32, y: i32) -> i32 {
x + y
}
Step 2: Modify Cargo.toml
Add the following to the Cargo.toml file to specify that the library should be compiled as a shared library:
[lib]
crate-type = [“cdylib”]
Step 3: Build Cargo.toml
Run the following command to compile the Rust project into a shared library:
cargo build --release
Output: The shared library will be generated in the target/release/
directory.
- On Windows:
FileName.dll
- On Linux:
FileName.so
- On macOS:
FileName.dylib
File Organization Example
Here’s how your project should look after building the shared library:
rust_project/
├── src/
│ └── lib.rs # Contains the Rust function
├── Cargo.toml # Rust project configuration
└── target/
└── release/
└── librust_lib.dylib # Shared library (for macOS)
Setting Up Java
Creating a MethodHandle for Rust Functions
Set up a MethodHandle
: This will act as a function name wrapper in Java, allowing you to call Rust functions as if they were Java methods.
public class ClassName {
static MethodHandle yourMethodName;// Creates MethodHandle
// ...
}
Create a static
block:
This block initializes the native linker and loads the Rust library.
Step-by-Step: Inside the static Block
- Initialize the Native Linker
static{
Linker linker = Linker.nativeLinker(); // Initializes the native linker
// ...
}
The linker is used to bind Java code to native functions from the Rust library.
- Load the Rust Shared Library
SymbolLookup lib = SymbolLookup.libraryLookup("/* path/to/your/FileName.dylib */", Arena.global()); // Loads the Rust library
Replace "/* path/to/your/FileName.dylib */
" with the actual path to the shared library (e.g., librust_lib.dylib
on macOS).
The Arena.global()
ensures global memory scope for linked symbols.
- Link the Functions
yourMethodName = linker.downcallHandle(
lib.find("/* function_name */").orElseThrow(), // Replace with the Rust function name
FunctionDescriptor.of(
ValueLayout.JAVA_INT, // Match Java's return type to Rust's return type
ValueLayout.JAVA_INT, // Match Java's first parameter type to Rust's first parameter type
ValueLayout.JAVA_INT // Match Java's second parameter type to Rust's second parameter type
)
);
- Replace
/* function_name */
with the name of your Rust function (e.g.,add_numbers
). - Use
FunctionDescriptor
to map Java types to Rust types. - Refer to the Value Layout appendix for a full table of Java and Rust type mappings.
Full Java Binding for Adding Two Numbers
import java.lang.foreign.*;
import java.lang.invoke.MethodHandle;
public class RustBindings {
static MethodHandle addNumbers; // Wrapper for the Rust function
static {
// Initialize the linker
Linker linker = Linker.nativeLinker();
// Load the Rust library
SymbolLookup lib = SymbolLookup.libraryLookup("/* path/to/your/FileName.dylib */", Arena.global());
// Link the Rust function
addNumbers = linker.downcallHandle(
lib.find("add_numbers").orElseThrow(), // Replace with the function name from Rust
FunctionDescriptor.of(
ValueLayout.JAVA_INT, // Rust's return type: i32
ValueLayout.JAVA_INT, // Rust's first parameter: i32
ValueLayout.JAVA_INT // Rust's second parameter: i32
)
);
}
}
Main
public class Main {
public static void main(String[] args) throws Throwable {
// Call the Rust function
int result = (int) RustBindings.addNumbers.invokeExact(10, 20);
System.out.println("Result: " + result); // Output should be 30
}
}
Make sure to add throws Throwable
to main function.
.invokeExact()
is a powerful and strict method for calling functions using a MethodHandle
. It ensures precise type matching, which is especially important for interoperability with external libraries (like Rust) where argument types are fixed. Always ensure that the types match exactly to avoid runtime exceptions.
Mapping Rust Features to Java
This section will cover how to properly account for features that are important to Rust. For each Rust concept, the process to Identify a feature and handle it properly in Java will be shown.
By the end of this section, the steps to analyze a Rust function, and determine what needs to be written in Java to bind it correctly, will be clear. This includes handling ownership, memory layouts, and thread safety
Handling Ownership and Borrowing
Identifying Ownership and Borrowing in Rust
Rust enforces strict ownership rules. When a function in Rust takes ownership of a value (e.g., Box
, Vec
), it means the caller no longer owns that value and cannot use it again unless ownership is returned. Borrowing (&T
or &mut T
) allows temporary access to a value without transferring ownership.
Pre-Modified Rust Function:
Here’s an example of two functions that demonstrate ownership and borrowing in Rust before modification for interoperability with Java:
#![allow(unused)] fn main() { fn take_ownership(v: Vec<i32>) -> Vec<i32> { // Takes ownership of v v } fn borrow(v: &Vec<i32>) -> i32 { // Borrows v temporarily v[0] } }
Explanation:
take_ownership
:
- Transfers ownership of the
Vec
to the function. - The original owner can no longer use
v
unless ownership is explicitly returned.borrow
: - Borrows the
Vec
temporarily to access its first element without transferring ownership. - The caller retains ownership.
Modified Rust Function
#![allow(unused)] fn main() { #[no_mangle] pub extern "C" fn take_ownership(v: *mut Vec<i32>) -> *mut Vec<i32> { // Transfers ownership of the vector v } #[no_mangle] pub extern "C" fn borrow(v: *mut Vec<i32>) -> i32 { unsafe { (*v)[0] } // Borrowing a raw pointer } }
Explanation:
Check Chapter 2.2(Setting Up Rust)
Handling Ownership in Java
When Rust functions take ownership of values or borrow them, Java developers must manage memory explicitly to prevent leaks or invalid references.
What You Need to Do:
- For functions that take ownership: You need to call the appropriate Rust cleanup function (
like
drop orfree
) usingMethodHandle
in Java. is released once the object is no longer needed. - For borrowed references: Manage memory using
Arena
to ensure that memory remains valid for the borrowed duration.
Java Example (Handling Ownership):
// Create a Rust-owned Box and pass ownership MemorySegment rustBox = (MemorySegment)
RustBindings.createBox.invokeExact(10);
// Call Rust function to take ownership of the box
RustBindings.takeOwnership.invokeExact(rustBox);
// Manually free the Box when done
RustBindings.freeBox.invokeExact(rustBox); // Ensures no memory leaks
Explanation:
MemorySegment represents the Rust-allocated memory in Java. Java interacts with this using the Foreign Function & Memory API. When takeOwnership
is called, the Rust function takes ownership of rustBox
. Java then explicitly calls freeBox
to release the memory allocated in Rust and prevent leaks.
Memory Layouts and Structs
Identifying Structs and Memory Layouts in Rust
When Rust returns complex data types like structs or arrays, Java needs to correctly interpret their memory layout. Rust’s struct fields are aligned in memory based on their type sizes, so Java must use StructLayout and ValueLayout to match the Rust memory layout exactly.
Example:
#![allow(unused)] fn main() { #[repr(C)] // Add this! struct Point { x: i32, y: i32, } }
The #[repr(C)]
attribute ensures that the memory layout of Point
follows the C ABI
, making it compatible with Java’s FFM API.
Handling Structs in Java
Java uses StructLayout to define memory layouts that match Rust’s struct layouts. When dealing with Rust structs, it’s essential to ensure that the memory allocated on the Java side is properly aligned and of the correct size to match the layout of the Rust struct.
What You Need to Do:
- Use StructLayout to define the memory layout that mirrors the fields of the Rust struct.
- Allocate a[MemorySegment that is large enough and properly aligned to hold the struct’s data.
Java Example (Handling Structs):
// Define the memory layout of the Rust `Point` struct in Java
StructLayout pointLayout = MemoryLayout.structLayout(
ValueLayout.JAVA_INT.withName("x"), // Field `x` (i32 in Rust)
ValueLayout.JAVA_INT.withName("y") // Field `y` (i32 in Rust)
);
// Allocate memory for the struct
var arena = Arena.ofConfined(); // Confined Arena for memory management
MemorySegment pointSegment = arena.allocate(pointLayout);
// Set the fields of the Point struct
VarHandle xHandle = pointLayout.varHandle(PathElement.groupElement("x"));
VarHandle yHandle = pointLayout.varHandle(PathElement.groupElement("y"));
xHandle.set(pointSegment, 0, 10); // Set x to 10
yHandle.set(pointSegment, 0, 20); // Set y to 20
Explanation:
- StructLayout: Defines the layout of the Rust
Point
struct, where each field is aligned according to its type (in this case, both fields arei32
, so each is 4 bytes). - VarHandle: Used to access and set individual fields (
x
andy
) in the memory segment allocated for the struct. - MemorySegment: Represents the allocated memory for the struct, and Java can safely manipulate it according to the struct’s layout.
Handling Thread Safety in Rust
Identifying Thread Safety in Rust
In Rust, thread safety is ensured using the Send and Sync traits. If a Rust function operates across multiple threads, the types used in the function must implement Send
or Sync
. For example, if a Rust function uses a Mutex or Arc to manage shared data, it is thread-safe.
Modified Rust Function:
#![allow(unused)] fn main() { #[no_mangle] pub extern "C" fn create_shared_data() -> *mut Arc<Mutex<i32>> { let shared_data = Arc::new(Mutex::new(42)); Box::into_raw(Box::new(shared_data)) } }
Read through chapter Chapter 2.2(Setting Up Rust) if you are confused about the modification
The function returns a thread-safe Arc<Mutex<i32>>
, which ensures that multiple threads can safely access and modify the shared data.
Ensuring Thread Safety in Java
When dealing with thread safety across languages, Java must ensure that memory is safely shared between threads. Java’s FFM API provides Shared Arenas, which allow memory to be safely accessed by multiple threads.
What to Do:
- Use Shared Arenas when shared memory or thread-safe operations are expected in Rust.
- Java also provides synchronization mechanisms like synchronized blocks to ensure thread safety.
Java Example (Handling Thread Safety):
// Create a shared arena for multi-threaded operations
var sharedArena = Arena.ofShared();
MemorySegment sharedSegment = sharedArena.allocate(8); // Allocate space for shared memory
// Call Rust function that operates on shared data
RustBindings.createSharedData.invokeExact(sharedSegment);
// Access shared data across threads (ensure proper synchronization in Java)
synchronized (sharedSegment) {
// Safe access to shared memory here
}
Explanation:
- Shared Arena: Ensures that memory is safely shared across threads in Java when interacting with Rust’s thread-safe types like
Arc
andMutex
. - Synchronized Block: Ensures that only one thread accesses the shared memory at a time, mimicking Rust’s ownership rules for shared data.
Handling Common Data Structures
This section will walk through how to handle common Rust data structures (like structs, arrays, and enums) in Java, explaining why each element is needed, how it functions, and what to watch out for. We’ll go through practical examples, showing how to declare, access, and clean up these data structures from Java.
Handling Rust Structs in Java
Rust Side
In Rust, a struct is a user-defined type that groups related values. Structs use specific memory layouts, which must match on the Java side. The layout of structs is especially crucial for cross-language bindings because memory misalignment can lead to undefined behavior.
Example Rust Struct:
#![allow(unused)] fn main() { #[repr(C)] // Ensures compatibility with C-style memory layout struct Point { x: i32, y: i32, } }
Explanation: The #[repr(C)]
attribute ensures that the struct is laid out in memory according to the C ABI, which is compatible with Java's FFM API.
Java Side
To use this struct in Java, we need to:
- Define a
StructLayout
that matches the Rust struct layout. - Use
VarHandles
to access each struct field.
Example Java Code:
StructLayout pointLayout = MemoryLayout.structLayout(
ValueLayout.JAVA_INT.withName("x"), // Maps to Rust's i32 `x`
ValueLayout.JAVA_INT.withName("y") // Maps to Rust's i32 `y`
);
VarHandle xHandle = pointLayout.varHandle(PathElement.groupElement("x"));
VarHandle yHandle = pointLayout.varHandle(PathElement.groupElement("y"));
Explanation:
ValueLayout.JAVA_INT
: This matches Rust’s i32 type.withName("x")
andwithName("y")
: Naming each field lets us retrieve a VarHandle to read and write to specific fields of the MemorySegment that represents the Rust struct.
Allocating and Using the Struct
Allocate Memory: Use an arena to manage the memory allocation.
Access Fields: Access x and y using VarHandles.
var arena = Arena.ofConfined();
MemorySegment pointSegment = arena.allocate(pointLayout);
xHandle.set(pointSegment, 0, 10); // Set x = 10
yHandle.set(pointSegment, 0, 20); // Set y = 20
int x = (int) xHandle.get(pointSegment); // Get x value
int y = (int) yHandle.get(pointSegment); // Get y value
Explanation:
Arena Allocation: Using an arena (e.g., Arena.ofConfined()
) ensures the struct’s memory is safely managed.
Set and Get Values: VarHandle
operations allow us to interact with Rust struct fields directly, facilitating cross-language data manipulation.
Handling Rust Arrays in Java
Rust Side
Arrays in Rust are fixed-size collections, and their size and layout must be precisely known for Java to interact with them effectively.
Example Rust Array:
#![allow(unused)] fn main() { #[no_mangle] pub extern "C" fn create_array() -> *mut [i32; 5] { Box::into_raw(Box::new([1, 2, 3, 4, 5])) } }
Explanation: Box::into_raw
creates a raw pointer, enabling Java to handle the array. Here, #[no_mangle]
ensures the Rust function name remains unmangled, making it accessible from Java.
Java Side
To handle arrays from Rust in Java:
- Define a
SequenceLayout
for the array. - Access elements via
VarHandle
.
SequenceLayout arrayLayout = MemoryLayout.sequenceLayout(5, ValueLayout.JAVA_INT);
VarHandle elementHandle = arrayLayout.elementVarHandle(ValueLayout.JAVA_INT);
Explanation:
SequenceLayout: This layout describes a fixed-size array (5 elements of i32
).
VarHandle: Provides access to each element in the array.
Allocating and Accessing Elements
var arena = Arena.ofConfined();
MemorySegment arraySegment = arena.allocate(arrayLayout);
for (int i = 0; i < 5; i++) {
int value = (int) elementHandle.get(arraySegment, (long) i);
System.out.println("Array element " + i + ": " + value);
}
Explanation:
Memory Allocation: The array memory is managed within an arena, ensuring safety and easy cleanup.
Element Access: Each element is accessed via elementHandle
, following Rust’s array layout.
Handling Rust Vectors (Vec<T>1
) in Java
Rust Side
In Rust, a Vec<T>
is a dynamically-sized array that includes metadata such as capacity and length. Working with vectors across FFI boundaries requires us to manage these fields carefully on both sides.
Example Rust Vector:
#![allow(unused)] fn main() { #[no_mangle] extern "C" fn create_vector() -> *mut Vec<i32> { Box::into_raw(Box::new(vec![10, 20, 30])) } #[no_mangle] extern "C" fn vector_push(vec: *mut Vec<i32>, value: i32) { unsafe { if let Some(vec) = vec.as_mut() { vec.push(value); } } } #[no_mangle] extern "C" fn vector_get(vec: *const Vec<i32>, index: usize) -> i32 { unsafe { if let Some(vec) = vec.as_ref() { vec[index] } else { 0 // Or some error handling } } } #[no_mangle] extern "C" fn vector_len(vec: *const Vec<i32>) -> usize { unsafe { if let Some(vec) = vec.as_ref() { vec.len() } else { 0 } } } }
Explanation:
create_vector
: Initializes a Vec<i32>
and returns a raw pointer to allow Java to manage the vector.
vector_push
: Provides functionality for adding elements to the vector, with error handling in case of null pointers.
vector_get
and vector_len
: Fetch elements from the vector and get its length, making direct access possible from Java.
Java Side
To handle Vec<T>
in Java:
- Define a
StructLayout
that represents the memory layout for a Rust vector (data pointer, length, and capacity). - Use
MethodHandles
to call Rust functions to manipulate the vector.
Example Java Code:
// Define the layout for Vec<i32>
StructLayout vecLayout = MemoryLayout.structLayout(
ValueLayout.ADDRESS.withName("ptr"), // Data pointer
ValueLayout.JAVA_LONG.withName("cap"), // Capacity
ValueLayout.JAVA_LONG.withName("len") // Length
);
// MethodHandles to call Rust functions
MethodHandle vectorPush = linker.downcallHandle(
symbolLookup.lookup("vector_push").get(),
FunctionDescriptor.ofVoid(ValueLayout.ADDRESS, ValueLayout.JAVA_INT)
);
MethodHandle vectorGet = linker.downcallHandle(
symbolLookup.lookup("vector_get").get(),
FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.ADDRESS, ValueLayout.JAVA_LONG)
);
MethodHandle vectorLen = linker.downcallHandle(
symbolLookup.lookup("vector_len").get(),
FunctionDescriptor.of(ValueLayout.JAVA_LONG, ValueLayout.ADDRESS)
);
Explanation:
vecLayout
: Defines the structure of the Vec<T>
memory, including data pointer, length, and capacity.
MethodHandles
(vectorPush
, vectorGet
, vectorLen
): Enable Java to interact with the vector’s core functions.
Allocating and Using the Vector
var arena = Arena.ofConfined();
MemorySegment vecSegment = arena.allocate(vecLayout);
vectorPush.invokeExact(vecSegment, 42); // Push 42 to vector
long len = (long) vectorLen.invokeExact(vecSegment); // Get vector length
int value = (int) vectorGet.invokeExact(vecSegment, 0L); // Get first element
Explanation:
Arena Allocation: Using an Arena
for safe memory management.
Push, Length, and Get: MethodHandle
invocations facilitate direct manipulation of the Rust vector from Java.
Handling Rust Slices (&[T]
and &mut [T]
) in Java
Rust Side
In Rust, slices (&[T]
and &mut [T]
) represent a reference to a contiguous sequence of elements, without ownership. For FFI, we pass both the pointer to the data and the length of the slice.
Example Rust Slice:
#![allow(unused)] fn main() { #[no_mangle] extern "C" fn sum_slice(slice: *const i32, len: usize) -> i32 { let slice = unsafe { std::slice::from_raw_parts(slice, len) }; slice.iter().sum() } }
Explanation:
sum_slice
: Accepts a pointer and a length, allowing Rust to treat them as a slice. This approach enables safe manipulation and reading of slice data in Rust while preserving FFI compatibility.
Java Side
To interact with Rust slices from Java:
- Define a
SequenceLayout
that reflects the slice structure. - Use a MethodHandle to invoke Rust’s functions on the slice.
Example Java Code:
// Define the layout for an array of 5 integers
SequenceLayout sliceLayout = MemoryLayout.sequenceLayout(5, ValueLayout.JAVA_INT);
// MethodHandle for sum_slice function
MethodHandle sumSlice = linker.downcallHandle(
symbolLookup.lookup("sum_slice").get(),
FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.ADDRESS, ValueLayout.JAVA_LONG)
);
Explanation:
sliceLayout
: Defines the memory layout for a fixed-size slice.- MethodHandle (
sumSlice
): Links to Rust’ssum_slice
function, allowing Java to call it with a memory segment and length.
Allocating and Accessing Slice Elements
var arena = Arena.ofConfined();
MemorySegment sliceSegment = arena.allocate(sliceLayout);
VarHandle intHandle = ValueLayout.JAVA_INT.varHandle();
intHandle.set(sliceSegment, 0, 10);
intHandle.set(sliceSegment, 1, 20);
int result = (int) sumSlice.invokeExact(sliceSegment, 5L); // Sum the slice
System.out.println("Sum of slice elements: " + result);
Explanation: Arena Allocation: Allocates the slice’s memory in an arena for safe usage.
Setting and Summing Elements: Uses VarHandles for direct element access and sumSlice for calculating the sum, bridging Rust’s slice handling with Java effectively.
Edge Cases and Troubleshooting
This section is designed to provide solutions for challenging edge cases and common errors Java developers may encounter when working with Rust bindings. Each subsection includes practical examples in Rust and Java, with solutions and explanations on handling complex scenarios such as memory alignment issues, lifetimes, and data races.
Handling Rust Lifetimes in Java
Rust’s lifetime annotations ensure that references do not outlive the data they point to. Since Java lacks a direct equivalent, memory management must be handled with precision to avoid accessing invalidated memory.
Example: Short-Lived Borrowed Reference
#![allow(unused)] fn main() { #[no_mangle] pub extern "C" fn get_reference<'a>(value: &'a i32) -> &'a i32 { value } }
Here, get_reference
returns a reference to an integer. In Rust, the lifetime 'a
ensures that the reference value
will be valid while it’s borrowed. This reference cannot outlive its source.
Java Side Solution:
To prevent accessing invalid memory, Java can use confined arenas for short-lived data.
var arena = Arena.ofConfined();
MemorySegment segment = arena.allocate(ValueLayout.JAVA_INT); // Allocate memory for the reference
MethodHandle getReference = RustBindings.getReferenceHandle();
// Pass and retrieve the reference within the arena's lifetime
int value = 42;
segment.set(ValueLayout.JAVA_INT, 0, value);
MemorySegment borrowed = (MemorySegment) getReference.invokeExact(segment);
arena.close(); // Ensures memory is freed
Explanation and Solution:
Confined Arena: The confined arena restricts access to a single thread, ensuring safe memory management. The arena is closed immediately after the operation, so Java cannot access the memory after it’s freed.
Memory Safety: By confining the memory within the arena, Java developers can ensure they only use memory while it’s valid, preventing accidental reuse.
Why It’s Tricky:
Rust’s lifetimes prevent data from being used after it’s freed, while Java’s garbage collection doesn’t directly support this. Confined arenas provide a reliable method to approximate Rust’s memory safety, but they require Java developers to actively manage their memory, which can be challenging.
Handling Enums with Data Variants
Rust enums are often more complex than Java enums since they can carry data. Java needs to map Rust enums to a compatible structure, identifying active variants to avoid misinterpreting memory.
Example: Enum with Multiple Variants
Rust Side:
#![allow(unused)] fn main() { #[repr(C)] pub enum Status { Ok(i32), Error(String), } #[no_mangle] pub extern "C" fn get_status() -> Status { Status::Ok(200) } }
Java Side Solution: To handle this enum, Java needs to use a layout that supports both an enum tag (discriminator) and the associated data.
StructLayout statusLayout = MemoryLayout.structLayout(
ValueLayout.JAVA_INT.withName("tag"), // Enum discriminator
ValueLayout.JAVA_INT.withName("value") // Holds Ok value or error pointer
);
VarHandle tagHandle = statusLayout.varHandle(PathElement.groupElement("tag"));
VarHandle valueHandle = statusLayout.varHandle(PathElement.groupElement("value"));
MemorySegment statusSegment = arena.allocate(statusLayout);
int tag = (int) tagHandle.get(statusSegment);
if (tag == 0) { // Ok variant
int okValue = (int) valueHandle.get(statusSegment);
System.out.println("Status OK: " + okValue);
} else { // Error variant
// Process error value appropriately
System.out.println("Status Error");
}
Explanation and Solution:
Discriminator and Value Fields: tag
differentiates between Ok
and Error
, while value
holds associated data. By reading tag
, Java can branch to handle each case correctly.
Memory Layout Compatibility: Using a StructLayout
with specific VarHandles
ensures memory alignment and prevents misinterpretation of data.
Why It’s Tricky:
Enums in Rust can carry various data types for each variant, which Java enums don’t support. The solution requires careful layout management and handling each variant’s data accordingly.
WrongMethodTypeException in invokeExact()
Cause: Java’s MethodHandle.invokeExact()
requires an exact match between arguments and the function signature. A mismatch in argument types or order will throw this error.
Solution:
- Verify
FunctionDescriptor
: Ensure that theFunctionDescriptor
matches the Rust function’s expected argument and return types exactly. - Check Argument Casts: Explicitly cast arguments to their expected types, and cast return values as needed.
Example:
#![allow(unused)] fn main() { // Rust function signature: pub extern "C" fn add(x: i32, y: i32) -> i32 FunctionDescriptor addDescriptor = FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.JAVA_INT, ValueLayout.JAVA_INT); MethodHandle addHandle = linker.downcallHandle(lib.find("add").orElseThrow(), addDescriptor); int result = (int) addHandle.invokeExact(5, 3); // Cast to int as expected }
Explanation and Solution:
Type Matching: FunctionDescriptor
ensures that Java and Rust types align.
Exact Casting: Casting return values and arguments to their exact types avoids this error, as Java’s type system is stricter here than Rust’s.
Why It’s Tricky:
Rust function signatures may allow implicit casting that Java does not, so ensuring exact types in the descriptor is essential.
Segmentation Fault or Undefined Behavior
Cause: This typically results from misaligned memory or accessing freed memory. Common causes include mismatched layouts, accessing unallocated memory, or not using the correct arena.
Solution:
- Verify
MemoryLayout
Alignment: EnsureMemoryLayout
precisely matches Rust’s struct or array layout, particularly if#[repr(C)]
is used. - Use Arenas Appropriately: Manage memory with confined or auto arenas to ensure data remains valid only as long as needed.
Example
In Rust:
#![allow(unused)] fn main() { #[repr(C)] struct Data { x: i32, y: i64, } #[no_mangle] pub extern "C" fn create_data() -> *mut Data { Box::into_raw(Box::new(Data { x: 1, y: 2 })) } }
In Java:
StructLayout dataLayout = MemoryLayout.structLayout(
ValueLayout.JAVA_INT.withName("x"),
ValueLayout.JAVA_LONG.withName("y")
);
var arena = Arena.ofConfined();
MemorySegment dataSegment = arena.allocate(dataLayout);
MethodHandle createData = RustBindings.createDataHandle();
dataSegment = (MemorySegment) createData.invokeExact();
Explanation and Solution:
Alignment Matching: Ensure JAVA_INT
and JAVA_LONG
are aligned with Rust’s i32
and i64
. Java’s layout must match precisely, as alignment affects performance and stability.
Safe Memory Access: Use confined arenas to allocate and manage Rust data safely, freeing memory once Java no longer requires it.
Why It’s Tricky:
Alignment and memory lifetime issues can cause silent data corruption or segmentation faults, making layout precision and memory management critical for stability.
UnsatisfiedLinkError When Loading Rust Shared Library
Cause: Java cannot find the Rust shared library file (e.g., .so
, .dll
, .dylib
) because the file path is incorrect or the library name is misspelled.
Solution
Specify Library Path and Name Correctly
: Ensure that the shared library file is available in the system path or specified explicitly.Check System Compatibility
: Ensure that the library file matches the OS format (e.g., .dll on Windows).
Example:
// Ensure correct file name for your OS
SymbolLookup lib = SymbolLookup.libraryLookup("libmylibrary.so", Arena.global());
Explanation and Solution:
Library Path Validation: Confirm that the library file path is correct, and the file exists. Specifying the full path or ensuring the library is on the system’s path will solve this issue.
Why It’s Tricky:
If Java cannot locate the Rust library, it throws a runtime error, which can be hard to trace if the path is only slightly incorrect.
Value Layout
ValueLayout
is the most primitive layout type, representing the layout of,
well, primitives. They are:
- ValueLayout.ADDRESS
- ValueLayout.JAVA_BOOLEAN
- ValueLayout.JAVA_BYTE
- ValueLayout.JAVA_CHAR
- ValueLayout.JAVA_DOUBLE
- ValueLayout.JAVA_FLOAT
- ValueLayout.JAVA_INT
- ValueLayout.JAVA_LONG
- ValueLayout.JAVA_SHORT
- ValueLayout.ADDRESS_UNALIGNED
- ValueLayout.JAVA_CHAR_UNALIGNED
- ValueLayout.JAVA_DOUBLE_UNALIGNED
- ValueLayout.JAVA_FLOAT_UNALIGNED
- ValueLayout.JAVA_INT_UNALIGNED
- ValueLayout.JAVA_LONG_UNALIGNED
- ValueLayout.JAVA_SHORT_UNALIGNED
These all correspond to the Java
primitives (ADDRESS
is a bit special), aligned and unaligned, which have
direct mappings to C primitive types.
Type Mappings: Java, C, and Rust
Java Type | C Type | Rust Type | Description |
---|---|---|---|
ValueLayout.ADDRESS | Pointer | *mut , *const | Pointer to a memory location. |
ValueLayout.JAVA_INT | int | i32 | 32-bit signed integer. |
ValueLayout.JAVA_LONG | long | i64 | 64-bit signed integer. |
ValueLayout.JAVA_SHORT | short | i16 | 16-bit signed integer. |
ValueLayout.JAVA_BYTE | char | i8 | 8-bit signed integer. |
ValueLayout.JAVA_BOOLEAN | char (0 or 1) | bool | Boolean value (true or false). |
ValueLayout.JAVA_FLOAT | float | f32 | 32-bit floating-point number. |
ValueLayout.JAVA_DOUBLE | double | f64 | 64-bit floating-point number. |
ValueLayout.JAVA_CHAR | short (UTF-16) | u16 | 16-bit unsigned integer for UTF-16. |
Unsigned Types
Java Type | C Type | Rust Type | Description |
---|---|---|---|
ValueLayout.JAVA_INT | unsigned int | u8 | 8-bit unsigned integer. |
ValueLayout.JAVA_INT | unsigned int | u16 | 16-bit unsigned integer. |
ValueLayout.JAVA_LONG | unsigned long | u32 | 32-bit unsigned integer. |
ValueLayout.JAVA_LONG | unsigned long | u64 | 64-bit unsigned integer. |
So the _UNALIGNED
versions are exactly the same as their counterparts
except that they have an alignment of 1. This allows storing them unaligned,
but it will also force the JVM to issue special instruction sequences to load
values, since most CPU architectures do not natively support unaligned loads
and stores from or to memory. It is also worth noting that
ValueLayout.JAVA_DOUBLE
and ValueLayout.JAVA_LONG
have
platform-dependent alignment because some CPU architectures require
natural alignment (size = alignment, so 8 in this case) whereas some like
x86 only require an alignment of 4. All other primitives are defined to have
natural alignment.
Beyond representing primitive types, ValueLayouts
also provide access to
different byte ordering (also known as endianness) through the
.withOrder(ByteOrder)
method. The choices for ByteOrder
are BIG_ENDIAN
,
and LITTLE_ENDIAN
, although the static method ByteOrder.nativeOrder()
will return whichever of those your CPU natively uses (usually
LITTLE_ENDIAN
). This is required by many serialization formats, such as
most network formats, because many of them require BIG_ENDIAN
byte
order while most CPU architectures only natively support LITTLE_ENDIAN
.
Rust doesn’t have int
, long
, etc., so we must use a different translation to
For additional information on ValueLayout
, visit Oracle's official documentation, and official Rust resource The Rustonomicon.
Method Handle
MethodHandle is where one of the most essential tools in the FFM API
. The absolute most important method on MethodHandles returned from the Linker
is invokeExact(…)
.
.invokeExact(…)
takes in the parameters of the function according to the
FunctionDescriptor
and returns a value with type also specified by the
FunctionDescriptor
. Java will throw an exception at runtime if the arguments
passed to the method do not match up with the FunctionDescriptor
. Because
of some Java Virtual Machine details, the return
value must also be explicitly cast to the expected return type. Otherwise, Java will once again throw an exception at
runtime, this time because the return type was wrong. A
function with signature FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.JAVA_FLOAT)
would be called like so:
int returnValue = (int)handleName.invokeExact(myFloat)
.
For more information on MethodHandle
, visit Oracle's official documentation.
Memory Layout
Memory Layouts can be used in
order to streamline the allocation of off-heap memory. Here is an overview
of how MemoryLayout
differs from MemorySegment
.
Assume an array of structs needs to be declared for the following example. First an
Arena
must be created, any arena type desired will do. Next a
MemoryLayout.sequenceLayout()
can be used, with arguments n, that reflect the length
of the array, and MemoryLayout.structLayout()
, that takes in the
value layouts and names of elements within the struct. After this, create
VarHandles
for each element within the struct, which create a reference for
each respective element. Then create a MemorySegment
that
corresponds to the entire memory layout of the array, and allocate it to the
appropriate arena, and finally the structs can be accessed.
For additional information on MemoryLayout
, visit Oracle's official documentation.
Memory Segment
MemorySegment
represents a fat pointer, that is, a pointer with associated
bounds information, much like a mutable slice in Rust. The main method
associated with memory segments is .get(
MemoryLayout
, offset)
, which
indexes offset amount into the pointer and reads whatever memory is there
as if it’s of the associated type.
For instance,
segment.get(
ValueLayout
.JAVA_INT, 1)
is basically the same as C code doing
((int*)segment)[1]
. The only difference from the C code is that Java will
throw an exception if the program attempts to access an index outside of the
bounds associated with the MemorySegment
. The most common sources of
MemorySegments are functions returning pointers. MemorySegments
returned to Java through the foreign function interface will automatically be
assigned a length of zero, since Java does not have enough information to
determine the bounds. However, invoking the .reinterpret(size)
method will
edit the bounds information. This is extremely unsafe and must
be used with caution. Assigning a logically incorrect bound
could allow normal Java code to cause a segmentation fault (or worse).
Finally, like Rust slices, MemorySegments
can be subsliced using
.asSlice(offset, size)
, which is also bounds-checked, returning a new slice
with the associated pointer and length values and the same lifetime as the
original.
For more information on MemorySegment
, visit Oracle's official documentation.
Variable Handle
A VarHandle
represents a handle to a sub-layout given a layout. It helps
solve the problem of, say, accessing an int
field of a struct, or accessing
an element of an array. Variable handles are used to construct a path to a value
that needs to be given a certain layout (basically a type). Say there is a pointer to
an array of struct foo
, which has an integer member x
that must be read.
This is how to construct a VarHandle
to get x
from any such
pointer:
MemoryLayout layoutOfPointer =
ValueLayout.ADDRESS.withTargetLayout(
MemoryLayout.sequenceLayout(arrayLen,
MemoryLayout.structLayout(
ValueLayout.JAVA_INT.withName(“x”),
ValueLayout.JAVA_INT.withName(“y”),
)
)
);
VarHandle xHandle = layoutOfPointer.varHandle(
PathElement.dereferenceElement(),
PathElement.sequenceElement(),
PathElement.groupElement(“x”)
);
Now whenever x
is needed from this kind of pointer, call
(int)xHandle.get(
MemorySegment
, 0, index)
.
For more information on VariableHandle
, visit Oracle's official documentation.
Function Descriptor
FunctionDescriptor represents the signature of a function.
FunctionDescriptor.of(MemoryLayout, … )
takes a variadic1 input of
MemoryLayouts
. The first argument is the memory layout of the return
type, and the rest correspond to the memory layouts of the function
arguments.
For example, int foo(float, void*)
would be represented as
FunctionDescriptor.of(
ValueLayout
.JAVA_INT, ValueLayout.JAVA_FLOAT, ValueLayout.ADDRESS)
For void functions,
FunctionDescriptor.ofVoid(MemoryLayout, … )
is a static method that is
exactly the same as FunctionDescriptor.of(MemoryLayout, … )
except that its
first argument corresponds to the first function argument rather than the
return value.
For example, void foo(float, void*)
would translate to
FunctionDescriptor(ValueLayout.JAVA_FLOAT, ValueLayout.ADDRESS)
For additional information on FunctionDescriptor
, visit Oracle's official documentation.
The function can take a variable amount of arguments
Struct Layout
A StructLayout
represents the layout of a C-style struct, including the layout
of all its members, all their members (if applicable), and so on. It does
exactly the same job as a struct definition in C. The class itself has no
interesting methods, but you can create a StructLayout using
MemoryLayout.structLayout(MemoryLayout…)
. To translate the following
structs to the Java FFM API, we would use the
following Java code:
C:
struct foo {
int num;
char* string;
struct bar baz;
}
Java:
StructLayout bar = …;
StructLayout foo = MemoryLayout.structLayout(
ValueLayout.JAVA_INT.withName(“num”),
ValueLayout.ADDRESS.withTargetLayout(0,
ValueLayout.JAVA_BYTE).withName(“string”),
bar.withName(“baz”)
);
The .withName(String)
method allows you to later retrieve a VarHandle
using that name, covered in the VarHandle
section.
Constructing a StructLayout
like this will automatically generate the
appropriate total size and alignment, as well as member offsets and padding
that C would add on this platform. Generally, the size is greater than or
equal to the sum of the sizes of the members (making room for padding as
necessary to keep all members aligned) and the alignment is the maximum
of the member alignments. Some exotic C programs may use overaligned
structs1, for which you can add a final
.withAlignment(alignment)
to override the automatic alignment calculated by
Java.
This all still applies to Rust, but only on:
#[repr(C)]
structs#[repr(C)]
tuple structs2#[repr(integer type)]
enums with only valueless variants- enums with
exactly one nonnullable
#[repr(C)]
variant and up to one zero-sized variant3 #[repr(transparent)]
structs and tuple structs with exactly one#[repr(C)]
member and all other members being zero-sized
#[repr(C)]
requires all members, and members of members, and members of those members, etc. to be #[repr(C)]
as well, which is very
invasive to code. For the sake of performance, some may choose to do this,
but it also greatly limits what you can use in the standard library.
Common non #[repr(C)]
types include:
Vec
String
&str
- slices
- anonymous
- tuples
dyn
referencesBox<dyn T>
- most enums with a variant that holds a value (
Option<T>
for mostT
) - all enums with more than one variant that holds a value
- every single container type4
If a type uses any of these types (and most types from external libraries too) by
value, that type cannot be #[repr(C)]
. The only way around this restriction
is through pointer indirection, like Box<T>
5, because pointers are always
representable even if the thing they are pointing to is not. People wanting
every last ounce of performance can deal with this, but the average Rust
type cannot, and so it cannot be represented as a StructLayout
or a
MemoryLayout
. The last class important specifically to StructLayout
is PaddingLayout
. This is the layout of padding in StructLayouts. It exists purely to pad
the struct.
For more information on StructLayout
, visit Oracle's official documentation.
Many compilers accept __attribute__((aligned(x)))
to align a struct
to x
, or they keep its original alignment if x
is less than or equal to that. Rust
has #[align(x)]
to specify overalignment.
Tuple structs are just structs with anonymous members.
This case exists pretty much purely to allow Option
VecDeque
, HashMap
, HashSet
,
BTreeMap
, BTreeSet
, every iterator in the entire standard library, every IO
type, every FS type (including File
), Rc
, Arc
, RefCell
, RwLock
, Mutex
.
Still doesn’t work for dyn
, use
ThinBox
for that. Box<T>
is guaranteed to be represented by just a pointer,
semantically like one returned from malloc.
Union Layout
UnionLayout
represents a C union. Much like a C union, it is used to specify and
access the different members like it was a struct. However, only one of
those members exists at any one time. You can create a UnionLayout
with
MemoryLayout
.unionLayout(MemoryLayout…)
. Just like in C, a
MemorySegment
referencing a UnionLayout
can be treated as actually referencing the
layout of one of its members, such as by calling .get()
with the associated
MemoryLayout
.
Alternatively, Variable Handles can be used to
reference members
in a process similar to that used in C.
Generally, union layouts will have a size equal to the maximum size of its
members and an alignment equal to the maximum alignment of its
members. Similarly to structs, unions can be overaligned, which can be
specified by adding .withAlignment(alignment)
to the end of the method
chain to overwrite Java’s automatically-determined alignment for that type.
For more information on UnionLayout
, visit Oracle's official documentation.
Sequence Layout
SequenceLayout
represents the layout of arrays. To create a
SequenceLayout
, call MemoryLayout
.sequenceLayout(numberOfElements, MemoryLayout)
. There is no get method or any direct way to get the nth
element of an array. Instead, create a special VarHandle
to the needed
data within the member, then call get on that with the index. For instance, to
get the x-coordinates of the structs in an array, use:
SequenceLayout arrayOfStruct = MemoryLayout.sequenceLayout(10,
MemoryLayout.structLayout(
ValueLayout.JAVA_INT.withName(“x”),
ValueLayout.JAVA_INT.withName(“y”)
).withName(“struct”)
);
VarHandle varHandle =
arrayOfStruct.arrayElementVarHandle(PathElement.groupElement(“x”));
for (int i=0; i<10; i++) {
System.out.println(varHandle.get(memorySegment,
0L,
(long)i)
);
}
SequenceLayout
provides some interesting methods.
sequenceLayout.elementCount()
will, as the name suggests, give the
length of the array, which is useful for passing around slices as it is not necessary to store the length itself.
sequenceLayout.reshape(long dim1, long dim2, …)
and sequenceLayout.flatten()
are both related to reinterpreting
multidimensional arrays. Multidimensional arrays are just arrays of arrays,
but their layout means they can safely be reinterpreted as a single
dimension array of size (dim 1 size)*(dim 2 size)*...
, which is exactly what
sequenceLayout.flatten()
does. sequenceLayout.reshape
does the inverse of
sequenceLayout.flatten()
, but is also fallible. Obviously, if an attempt is made to reshape
an array to AxBxC but the array’s length isn’t divisible by A and B and C, this
method will throw an exception. Another nice property of
sequenceLayout.reshape()
is that one argument may be set to -1, in which
case sequenceLayout.reshape()
will do the math based on the array’s length
to determine what that dimension must be.
A Java type can be used to act as a wrapper around Rust slices, so
SequenceLayout
would feature heavily in that kind of implementation. While
a slice object, composed of a pointer and a length, is not application binary
interface (ABI) stable, the underlying array is ABI stable.
Rust provides methods to get the pointer and length from a slice, as well as
functions to construct slices from a pointer and a length, so while it is not
ABI safe, it is easy enough to disassemble and
reassemble into safe forms as needed. While it is easier to just keep an
opaque blob of data and ask Rust any time it must be used, it is much
faster for Java to have direct access to the array.
The Just-In-Time (JIT) compiler knows how array accesses work, and can optimize the corresponding Java code, possibly with automatic vectorization which is a great boost to throughput. In contrast, every time a call is made out to a Rust function, the JIT compiler has no idea what that function is doing. This means that it can not optimize the memory accesses, and it must also assume that the function breaks every optimization assumption it has. For instance, the function could touch any value in memory, preventing the JIT compiler from reordering any reads or writes from before the function call to after the function call, and vise versa.
The Rust compiler has the same issue: it
does not know what the Java code is doing, so there is no way it can optimize
around that such as automatic vectorization either. This does not matter so
much for one-off functions, functions that are only called a few thousand
times, or large functions where execution time is dominated by actually
running the function and not on function call overhead, but for simple code
in loops this can be brutal. And how are arrays typically
used? Usually small bits of code run many times in a loop. The performance
gains are too great to ignore. While doing the loop in Rust will beat Java
almost every time, it is not reasonable for every possible loop body to be put
in Rust. However, developers have the option to write all of their
loops in Rust if they so choose. Still, SequenceLayout
provides a great opportunity to allow easy, direct access to arrays and
array elements for Java.
For more information on SequenceLayout
, visit Oracle's official documentation.
Arenas
Arenas are a way that Java provides developers to allocate memory in a way that is particularly useful for creating bindings. Arenas are like a stack of memory, and its space can be split in various ways, and its lifetime can be set by various types. The main idea of where arenas can be used is that they can create space to store objects in Java called Memory Segments. These memory segments can store data such as variables, data structures, and functions in a space that the garbage collector treats differently. That means information stored in these arenas can be passed to and from foreign functions without worrying about whether Java’s garbage collector has tampered with the space.
There are four different types of arenas: confined, automatic, shared,
and custom. Confined arenas and shared arenas are very similar. They both will live as
long as the Java program unless they are manually closed by the user using
the .close()
method on the arena object. The key difference between the
two is that confined arenas can only be accessed by a single thread, while
shared arenas can be accessed by multiple threads. This causes a weird
interaction with shared arenas. When a confined arena is closed, its memory
is immediately freed and that’s all there is to it. When a shared arena is
closed, it invalidates all Java references to the space in memory, but it does
not immediately free it as the process takes longer, meaning that the space
in memory is technically alive for a very short amount of time after the
arena is closed. These arenas are useful for creating Rust bindings because they can
guarantee a space in memory cannot be accessed once closed, so they can be
implemented into functions to guarantee proper memory safety practices.
The API descriptions for automatic arenas typically vaguely describe their closing behavior, such as “the garbage collector eventually frees it automatically”. To better describe its behavior, The garbage collector will only free the automatic arena either at the end of the Java program or when it determines that the arena is unreachable. But what does the garbage collector see as unreachable?
Testing will show that Java will not close the arena even if every memory segment inside is set to null. The information inside the arena has no bearing on the garbage collector’s decision to keep it around. However, a way to guarantee that the garbage collector determines the arena as unreachable is to set the arena to null. This means that automatic arenas can be useful and reliable for creating bindings as well, especially if it is not clear when a certain arena should be closed. The only downside of the automatic arena is its interaction with the garbage collector. It is possible this could cause some sort of increased overhead.
With an Arena, you can call arena.allocate(
size, alignment
)
to allocate
memory within the arena. Allocations cannot be individually freed with
Arenas, it’s either all or nothing. Global Arenas
are useful for set-and-forget things, like for loading the Rust library, since this
does not need to be freed. Confined Arenas are good for data that cannot be
safely shared across threads, so for types that don’t implement the
Send trait. Auto Arenas are nice if it is difficult to figure out
when something should be deallocated. Although this isn’t very common as drop()
should be called on Rust objects that require cleanup, and Java’s
garbage collector will not take care of this.
For more information on arenas, visit Oracle's official documentation.
Shared Object and Dynamic Library Files
Shared object and dynamic library files effectively serve the same purpose in this scope. They provide executable code to outside sources. This means that once Java is given the address to the code for a function in this file, it is ready to run once called. Although they effectively share the same purpose, their file types differ based on the system running. Below is a table with each file extension used by three of the most common operating systems.
System | File Extension |
---|---|
Linux | .so |
Windows | .dll |
Mac | .dylib |
Ownership
A piece of data must be owned by at most one variable at any given time,
even across an FFI boundary. If Rust has ownership of a Vec<T>
for
instance, Java cannot decide to take control of it, as, in this case, that would
lead to both Java and Rust calling drop when done with the type, causing a
double free of the backing array. And that’s one of the better outcomes, as
generally types do not expect to suddenly be in an invalid state due to
external mucking, nor is there much they can do about it. One exception to
this rule are types that implement Copy, as they can be blindly memcopied
to create an identical clone of the original (barring any atomicity issues if
this is done across threads), though most types do not implement Copy so
this isn’t very useful when creating these bindings.
Example of Ownership
In this calculator code, ownership is demonstrated in how PostfixCalculator manages its stack:
struct PostfixCalculator {
stack: VecDeque<f64>,
}
impl PostfixCalculator {
fn new() -> Self {
PostfixCalculator {
stack: VecDeque::new(),
}
}
}
PostfixCalculator owns its stack. When PostfixCalculator is dropped, so is its stack, which automatically cleans up without the programmer needing to manually manage memory.
To learn more about ownership, it is recommended to read these official Rust resources: The Rust Programming Language chapter 4, and The Rustonomicon chapter 6.
Borrowing and Aliasing
Data can be “borrowed” as references, either immutably &T
or mutably
&mut T
. The compiler enforces a sort of reader-writer lock on the type: it
can have either multiple readers (immutable/shared references, &T
) or a
singular writer (mutable/exclusive reference, &mut T
). The compiler will
assume that the data behind a shared reference will not mutate (unless the
type opts out of it with UnsafeCell
, which can be used for custom special
types, which should not be used to enforce users’ types) and the compiler
will assume that no other code or data can reference, read, or mutate the
data behind an exclusive reference (there is no opt out, this must never
happen!). The fact that Rust can make these assumptions is what makes it
so fast and efficient, but it also means you are restricted from coding practices
that break them.
This is approximately the exact opposite of Java’s memory model, where
everything is a mutable reference to the underlying object. While Java can’t
arbitrarily clone objects, meaning it can’t make copies of a class holding an
exclusive reference, it can make those objects live arbitrarily long. This
means it is essential to either detect that the reference is still live and refuse to
service any other borrows, or invalidate the reference in order to service
other borrows. There is a Rust type that effectively performs this latter
approach: RefCell<T>
.
Raw pointers in Rust do not have such aliasing restrictions with regard to
each other, so we are free to have any number of constant *const T
and
mutable *mut T
pointers coexisting. Raw pointer semantics are just like
they are in C, and are in fact even more lenient than C pointers since C
pointers of differing types are not allowed to alias. You’re still not allowed to
mess with ownership – the owner of the type still acts like your pointers
don’t exist and so still assumes it is the arbiter of reads and writes – but if
you have ownership of the type you can just make sure to only interact with
it using raw pointers. This is exactly what UnsafeCell<T>
and Cell<T>
do to
enable shared mutability, and those are the primitives fancy types like
Rc<T>
use to allow shared ownership.
Example of Borrowing and Aliasing
In this calculator code, Borrowing and Aliasing is demonstrated.
struct PostfixCalculator {
stack: VecDeque<f64>,
}
impl PostfixCalculator {
fn new() -> Self {
PostfixCalculator {
stack: VecDeque::new(),
}
}
}
Rust's borrowing rules ensure that references to data (borrowing) do not outlive the data they reference (ownership). This prevents dangling pointers
To learn more about borrowing and aliasing, it is recommended to read these official Rust resources: The Rust Programming Language chapter 4.2, and The Rustonomicon chapters 3.1 and 3.2.
Lifetimes
Rust constantly wants to know "what exactly is that reference referencing?". Most
things don’t live
forever, so Rust also checks that developers don’t try to use it or reference it
after it has been moved. A move is a change in ownership which potentially means
physically moving it in memory and invalidating any pointers to it. drop()
,
for instance, takes ownership of an object so it can kill it. Anyone familiar
with pointers in C has a decent understanding of the concept of
pointer lifetimes: do not use the pointer after the object has been deleted or
moved. As long as a shared reference exists, no mutable references may exist
and the object must not be moved; and as long as a mutable reference
exists, no other references may exist and the object must not be moved.
The compiler enforces a more stringent test on safe code, that breaking
those rules must provably never happen, leading to some cases where you
know it will not happen, yet the compiler can not prove it, so it does not allow
it. Luckily we do not need to follow the compiler’s test, we only need to follow
those simple rules.
Unfortunately, for arbitrary code the lifetimes involved can get quite
intricate. fn foo<’a>(input: &’a) -> TypeWithLifetime<’a>
creates a
transitive relationship between the lifetime of input and
TypeWithLifetime<’a>
. While we may be able to enforce a simple one-to-one
lifetime relationship, it’s unclear if we can feasibly enforce that A lives as
long as B lives as long as C lives as long as D lives as long as… Certainly, if it
requires invasive changes to types crossing the FFI boundary, such as every
reference in every struct needing to be converted to a RefCell<&T>
, that
would be very inconvenient for users.
Example of Lifetimes
The code does not explicitly use annotated lifetimes because it does not require them due to its simplicity. However, the concept is there implicitly:
struct PostfixCalculator {
stack: VecDeque<f64>,
}
impl PostfixCalculator {
fn new() -> Self {
PostfixCalculator {
stack: VecDeque::new(),
}
}
}
fn evaluate(&mut self, tokens: Vec<&str>) -> Result<f64, String>
{
// use of `self` which has an implicit lifetime
}
This example implicitly uses lifetimes to ensure that references within the evaluate function do not outlive the PostfixCalculator instance they reference. Rust's lifetime elision rules automatically handle this in most cases, but explicit lifetime annotations can be used for more complex scenarios.
To learn more about lifetimes, it is recommended to read these official Rust resources: The Rust Programming Language chapter 10.3, and The Rustonomicon chapter 3.3.
Symbols, Extern, Generics
By default, Rust functions have an undefined application binary interface (ABI), thus
they are incompatible with what C expects. Rust functions also have mangled symbol
names1. To
guarantee a C ABI (assuming the types
themselves are C ABI compatible, the next
section provides details on that), the function declaration must be prefixed with
extern “C”
. So
extern “C” foo(number: i32) -> i32
would be equivalent to the
C function int foo(int number)
. To guarantee the symbol name is that of the
function name, like in C, you must annotate the function with the
#[no_mangle] attribute
.
However, this does not cover functions with generic types. Rust allows
creating functions that act on unknown types, so that a fucntion like
fn add<T: Add<Output=T>>(a: T, b: T) -> T { return a + b; }
can be reused
with any type as long as it implements Add
. How does the same function
handle multiple types? On the machine code level it doesn’t, that’s why
functions are first monomorphized, creating a version of the function for
every used combination of generic types. Calling add(1u32, 1u32)
would
generate a function equivalent to fn add(a: u32, b: u32) -> u32
, whereas
calling add(1u8, 1u8)
would generate fn add(a: u8, b: u8) -> u8
.
Java cannot see generic functions, it only sees monomorphized functions that
exist in the shared object file. Rust only generates monomorphizations for
types that are used in that function, so if the Rust library code does not use fn add<T: Add<Output=T>>(a: T, b: T) -> T
at all, there are no used generic
types and thus the compiler does not generate anything related to that
function. Even if it did, it can not possibly support every type a programmer
might use, especially if a function had multiple type parameters. fn foo<A, B>()
would require the square of the number of possible types. The best thing to do without using dyn pointers
is enforcing
wrapper functions without generic parameters:
fn add_u32(a: u32, b: u32) -> u32 { return add::<u32>(a, b); }
.
Specifying dyn
references in a type instructs the Rust compiler to use fat
pointers - pointers that store the normal pointer as well as a pointer to a
vtable containing methods that can be called on the pointer. This works
almost exactly like in C++ with exactly the same tradeoff. There is only
one function in the final binary (no monomorphization needed) but it is not
specialized for a type (so no automatic vectorization on integers for
instance). Additionally, it requires dereferencing the pointer to the vtable, as well as
that function then needing to dereference the real pointer once it is called.
This can lead to memory access / cache missing overhead.
It also breaks a common
idiom: Vec<T>
. &dyn Vec<T>
can be done, but chances are T
will need to be
accessed. If Vec<&dyn T>
is used, there will be
lifetime issues and it will be necessary to
restructure everything that touches the vector to deal with Vec<&dyn T>
,
even if they otherwise could have used the easier Vec<T>
. The biggest issue
with using dyn
, however, is that some trait methods simply do not work with
dyn
. The Rust Reference
specifies the conditions that are required for a method to be object-safe: it
must not return Self
directly2 (the compiler doesn’t know the ABI layout of a function with an unknown return type), it must
not take Self
as an argument directly3, and it must not use any generics beyond Self
4.
A final issue with dyn
is that fat pointers do not
have a stable ABI. There is an experimental feature,
ptr_metadata
, that allows splitting the pointer and its metadata as well as
creating a fat pointer from a raw pointer and metadata. Although, the Metadata
is not object safe. DynMetadata<dyn T>
may have a stable representation
for different T
5, but it requires lots of transmuting to make that work and it
might technically be undefined behavior. Ultimately, dyn
saves some code size at
the expense of poor ergonomics, using confusing experimental
Rust features, and performance. Therefore, a developer might just be
better off writing everything in Java instead of trying to interoperate with
Rust code.
This means the symbol name for a function can not be known.
This is because the compiler does not know the ABI layout of a function with an unknown return type.
This is because the compiler does not know the ABI layout of a function with unknown argument types.
Ditto ABI of arguments.
This is needed for passing it to Java through the C ABI.
Size and Alignment
Allocating a Rust object within Java to pass to Rust functions requires respecting the type’s size and alignment. If the space allocated is too small that leads to buffer overflows or overreading, but another property is alignment.
An alignment of 2 means that that type must be addressed at an address that is a multiple of 2. For instance, a 16-bit integer on x86 has an alignment of 2, and so if you try to load a 16-bit integer from say the address 0x7ffff01, the CPU will throw an exception because that number is not a multiple of 2. x86 is a little less picky than most other architectures, with the highest alignment being 4 bytes1, but ARM and most other RISCs align a type to its size. This all means that Java needs to know the alignment of a type in order to allocate space for it somewhere.
Some Rust types have well-known
alignments due to matching one-to-one with types defined in the ISA, but
most Rust types have compile-time undefined layout. However, Rust does
provide the compile-time constant functions core::mem::size_of::<T>()
and
core::mem::align_of::<T>()
for querying the size and alignment of a type.
Unfortunately, types are not guaranteed to maintain their layout across
compilations, especially if the compiler version were to change. Therefore, calls to
these functions must be made in the same compiled library as all users of
them.
Technically, SIMD vectors have higher alignment with certain instructions.
Subtyping and Variance
As a warning, this section will be complex and type-theory heavy, but the gist for this scope is that there are three types of lifetime relationships:
-
Covariant: ‘a can be used where a ‘b is expected if ‘a is as long or longer than ‘b. Shared references are covariant because a longer-living reference than required can always be given. Tree structures where you can only delete leaves kind of act like this (so a
RefCell<&T>
chain of references follows this). -
Contravariant: ‘a can be used where a ‘b is expected if ‘a lives as long or shorter than ‘b. This only applies to arguments inside of functions or closures, so those should be banned from use to avoid any headaches. Closures aren’t application binary interface safe so they are already banned, and functions as arguments can be replaced with Java upcalls where less care is needed.
-
Invariant: ‘a cannot be used, the thing you pass in must live exactly as long as ‘b. This applies to exclusive references, because Rust allows you to modify data behind an exclusive reference and potentially change its lifetime, and the caller would have no idea its lifetime got changed, so that would fail once the caller tries to use it within its old lifetime, but outside its new lifetime. If an exclusive reference is checked for validity first before every time it is used, this can work (it’s effectively
RefCell<&mut T>
), but that then still bans every function that touches an exclusive reference directly. Honestly, this may not be truly solvable, it might just have to be invasive to the programmer.
To learn more about subtyping and variance, it is recommended to read the official Rust resource The Rustonomicon chapter 3.8.
Unwinding
With the default panic handler (fittingly named “unwind”), when Rust code
calls panic!()
Rust will begin walking local variables in the call stack to drop
them, then kill the thread. If the type is mutably shared across threads,
such as with a Mutex<T>
does, then it may be in an inconsistent state,
though it should not be necessary to have a custom type doing that. However, What
is a concern is Rust calling drop on some types while they’re potentially in
inconsistent states. For example, say a JavaRef<T>
type is used to represent a
reference held by Java. If it is busy updating its pointer for instance, and it
panics in that function, Rust’s unwinding will eventually call drop()
on it, so
now the drop code is working with a JavaRef<T>
with an invalid pointer.
Rust does have another panic handler called “abort” which just prints a stack
trace and aborts the process, which might be a better option if the types being used
are not believed to be unwind-safe.
Example of Unwinding
Unwinding is implicit in Rust's error handling if a panic occurs. For explicit handling:
match calculator.evaluate(tokens) {
Ok(result) => println!("Result: {}", result),
Err(e) => println!("Error: {}", e),
}
To learn more about unwinding, it is recommended to read the official Rust resource The Rustonomicon chapter 7.
Phantom Data
Sometimes, when working with unsafe code, there may be a situation where lifetimes are associated with a struct, but not part of a field. For example:
struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
}
‘a isn’t being used in the body of this struct, so it’s unbounded. In Rust,
making these types of lifetime annotations for structs is not allowed because
of the implications it would have with maintaining correct variance and drop
checking. The solution Rust offers is PhantomData
, which is a special marker
type. It doesn’t take up more memory, but it simulates a field of the desired struct
type to implement for static analysis. It is easy to implement, the
resulting struct would be:
struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
_marker: marker::PhantomData<&'a T>,
}
This way, the lifetime will be bounded to a “field” of the struct Iter
. This may
bring up complications when writing a tool that automatically generates
bindings to call code because of the way it is designed. As previously
explained, method handles must be written for the different types a
function may be working with, and the FFM API
may be incompatible or unable to accommodate for a case where
PhantomData
is used.
Rust uses unwinding to handle panics (unexpected errors) by default.
In this code, any panic (e.g., an out-of-bounds error) would unwind the stack
safely, cleaning up as it goes. Rust allows opting out of unwinding with
panic=abort
for faster binaries.
To learn more about phantom data, it is recommended to read the official Rust resource The Rustonomicon chapter 3.10.
Send and Sync
- Send: the type can be moved between threads.
- Sync: a type can be shared between threads (logically equivalent to
&T
being Send)
By default, most types are Send and Sync. If a type is moved to another
thread, it is fine because it owns its data and therefore nothing else can
touch that data or cause thread safety issues. If a shared reference is moved to
another thread, that is fine because the mere existence of a
shared reference means the data can no longer mutate, so there’s nothing
needing synchronization between threads. If an exclusive reference is moved, again
it is fine because that exclusive reference is the only thing
allowed to look at or modify the underlying data, so there is no need to
synchronize anything. The only types that are not both Send and Sync are
types that cheat the aliasing and ownership rules like UnsafeCell<T>
and
Rc<T>
.
Luckily, Java actually allows for this to be enforced. Arena.ofConfined()
gives us
a thread-local memory arena, and if code tries to use a MemorySegment
allocated from this arena in another thread it will throw an exception. This is
an absolute life saver, as it allows for the use of RefCell<T>
, which is neither
Send nor Sync, and which is useful for fixing many of the incongruities
between Java and Rust’s memory models.
Example of Thread Safety and Send and Sync
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let calculator = Arc::new(Mutex::new(PostfixCalculator::new()));
let calculator_clone = Arc::clone(&calculator);
let handle = thread::spawn(move || {
let mut calc = calculator_clone.lock().unwrap();
let tokens: Vec<&str> = "3 4 +".split_whitespace().collect();
calc.evaluate(tokens)
});
match handle.join().unwrap() {
Ok(result) => println!("Result from thread: {}", result),
Err(e) => println!("Error from thread: {}", e),
}
}
Thread Safety: The Arc and Mutex wrapping of PostfixCalculator
ensures that
it can be safely shared and mutated across threads. Arc allows for shared
ownership across threads, while Mutex provides mutual exclusion,
preventing data races.
To learn more about Send and Sync traits, it is recommended to read these official Rust resources: The Rust Programming Language chapter 16.4, and The Rustonomicon chapter 8.2.
Data Races
Data races occur when multiple threads try to access the same memory segment, trying to write to it, and they can cause undefined behavior. Safe Rust guarantees that no data races will occur, and a big player for this is the ownership model. By definition, if a value can only have one owner (can make changes), then it can only be written to by its single owner. However, general race conditions are not prevented in Rust. They simply can’t be prevented from a mathematical standpoint, due to the way the scheduler works in different operating systems. This is something that is out of the developer's' control. This means that while a program may get deadlocked, or have incorrect synchronization, a Rust program will still be safe.
To learn more about subtyping and variance, it is recommended to read the official Rust resource The Rustonomicon chapter 8.1.
Atomics
Atomics are types that support operations in a thread-safe manner without
external synchronization. For example, consider wanting to use a counter,
foo
, you want to use across different threads. It would not be safe to
increment the counter using foo++
, because that could result in a race
condition: different threads trying to increment foo
by one will cause
undefined behavior. Locking can be used to make sure one thread
increments the value of foo
by one, and then the other one, but it has
severe performance costs. Let’s say at first, foo = 0
. Then, after both
threads write to it, foo = 2
should be true. The way atomics would handle
this is: both threads would check if the value of foo
is 0, and if it is,
increment to 1, otherwise, reevaluate. This would ensure that, no matter the
order the operating system decides to call these operations, at the end, foo
will be 2. Rust makes it very easy to work with atomics, for foo
, just
write:
let foo = Arc::new(AtomicUsize::new(0));
To learn more about atomics, it is recommended to read the official Rust resource The Rustonomicon chapter 8.3.
Compiler and Hardware Reordering
Compiler Reordering
Rust’s compiler makes many optimizations to reduce the number of operations the CPU will actually have to process. Sometimes it may as well just remove operations. For example:
let x: i32 = 1;
x = 2;
x = 3;
The compiler would remove the second line, x = 2
, because it does not
change the result. The code will still define x
, initialize it as an i32 variable
with value 1, and end with x having the value 3. However, if the result is not
used, the compiler is likely to completely remove all mentions of x. Why
bother generating code and allocating stack space for a value nobody will
notice is missing?
Rust uses the LLVM compiler infrastructure as its backend, the same thing
that the clang C compiler and clang++ C++ compiler use to generate
machine code. LLVM is very smart, and will do things such as delete dead
code, reorder operations to better saturate out-of-order CPUs, merge
redundant operations (x += 1; x += 1
will be transformed to x += 2
), keep
things in registers rather than ever touching memory, turn loops of normal
arithmetic into loops using SIMD/vector instructions. The point is, it is not clear
what the code is actually going to look like. The only thing that is guranteed is
that the compiler isn’t allowed to reorder things like print statements around each
other, or move x += 1
to after a function call that uses x
.
However, if there is access to another thread, these changes can be observed (with raw pointers at least, Rust won’t normally let you do this sort of thing without synchronization for a reason). So when multithreading, the developer must be explicit to the compiler: “I want all writes performed before this point to be visible before this operation, so other threads see what I want them to see”. That’s where atomics come into play.
Hardware Reordering
Despite compiler reordering, depending on the hardware architecture, some operations may be done in a different order by the CPU. This may be the case due to how memory is accessed internally. Global memory can be accessible everywhere but is slow, and cache memory is localized and faster. Programs may have different threads running at the same time. Rust guarantees that in each thread, the ordering will be correct. Despite that, having different memory access speeds means that if two threads are accessing memories that are vastly different in retrieval speed, the order in which those threads run operations may be in the wrong order relative to each other. If you now take a wrapper class into consideration, ordering might be thrown off even more. In these cases, Rust and Java’s atomic design will put more strain on hardware by stalling some threads so that order guarantees are kept.
To learn more about reordering, it is recommended to read the official Rust resource The Rustonomicon chapter 8.3.
Data Accesses
Another way the atomicity model Rust employs deals with providing strong guarantees is by introducing the concept of causality and providing tools to establish relationships between different parts of a program and the threads executing them. One of these, and potentially the most important, is the “happens before” relationship. It defines the order of a program: if there is a statement 1 and statement 2, and there is a relationship of “statement 1 happens before statement 2”, then statement 1 will be run before statement 2. This provides extra information to the compiler and hardware about the ordering of the operations, and allows for bigger optimizations on operations that are not affected by the order they are executed in. Data accesses are unsynchronized, which allows compilers to move them around as much as they want to optimize performance, especially if the program is single-threaded. The downside is that it can cause data races, which results in undefined behavior. Atomic accesses tell the compiler and hardware that the program is multi-threaded. They are marked with an ordering, which limits how the compiler and hardware can reorder these statements. In Rust,there are four types of orderings: sequentially consistent, release, acquire, relaxed.
To learn more about data accesses, it is recommended to read the official Rust resource The Rustonomicon chapter 8.3.
Orderings
Sequentially Consistent
As its name suggests, operations that are sequentially consistent will be executed sequentially. In other words, it guarantees that the execution of a program with multiple threads behaves as if each thread’s operations occurred in a certain order, without any reordering or interleaving. This means that if thread A is supposed to write to value x before thread B writes to value x, B will only be able to write to value x once A has written to it. It is implemented by using memory barriers: they are protecting x from B, and they are only letting their guards down once A has written to it. Compiler and hardware reordering makes a big difference in performance, so by restricting the program in these fields, performance tends to suffer.
Acquire and Release
Acquire and release work closely together, and they do so by acquiring locks and releasing locks. This is similar to how locks are used in real life to shut a door. On the outside anything can happen, but once a room is entered through a door, the space there is completely separated from the outside. In ordering, this means that any operations that are written after a lock is acquired can not be reordered, and the whole block of code will be executed sequentially in relation to the “outside world”. Once the block of code is executed and the lock is released, all operations that come after that are free to be reordered.
Relaxed
Relaxed data accesses can be reordered freely, and doesn’t
provide the
“happens before” relationship. Despite that, they are still atomic,
and they
are used when a section needs to be executed without its order really
mattering. For example, using fetch_add()
is a safe way of writing to a
counter (incrementing its value), assuming the counter isn’t used to
determine other accesses.
To learn more about orderings, it is recommended to read the official Rust resource The Rustonomicon chapter 8.3.
Uninitialized Memory
Rust allows developers to work with uninitialized memory. All memory that is allocated during runtime is uninitialized at first, and it will contain garbage values. Any novice programmer will know that working with this memory will cause undefined behavior. Regardless, Rust provides ways of working with uninitialized memory in safe and unsafe ways.
Checked
Rust by default doesn’t allow access to a memory segment that has not been initialized yet. This is great for Java-Rust bindings because it ensures that even if an attempt is made to access uninitialized memory from the Java side (which would normally be allowed and would produce undefined behavior) that is being allocated with Rust through the FFM API, it won’t produce undefined behavior or retrieve garbage values.
Drop Flags
This is related to the concept of lifetimes. Whenever a variable goes out of
scope, suppose a variable named x
defined as let mut x = Box::new(0);
, Rust
assigns the drop
flag, which then pushes the drop function, drop(x)
, on the stack.
The concept of ownership applies here too, where there can be only one pointer to a
memory segment.
Drop flags are tracked on the stack, and Rust decides when to drop a
value during runtime. This is relevant to creating bindings, because even though
Rust may have dropped a value, the Java variable that points to it when
using the FFM API
usually would not know that
happened. Having access to a drop flag allows for tracking when such
behavior happens, so they can be invalidated from the Java side too.
Unchecked
Arrays can not be partially initialized, since null does not exist in Rust, so arrays that are defined
have to be fully initialized, with a value to every
section of memory that is represented by the indexes. This can make
developing code harder, especially when trying to work with dynamically
allocated arrays. To solve this, Rust implements the MaybeUninit
type.
For example, to define an array that may be uninitialized, we would write:
let mut x: [MaybeUninit<Box<u32>>; SIZE] = unsafe {
MaybeUninit::uninit().assume_init() };
}
This works because the MaybeUninit
is the only type that can be partially
initialized, and .assume_init()
makes the Rust compiler think that the array
of MaybeUninit<T>
was fully initialized. In this case, we are pointing to a
Box
, which is a container for the type u32
. The array can then be initialized
with the following:
for i in 0..SIZE {
x[i] = MaybeUninit::new(Box::new(i as u32));
}
Usually, when working with an array of pointers, assigning a new value to
x[i]
would mean that the left hand side value would be dropped. But this is not
a problem when the left hand side contains MaybeUninit<Box<u32>>
because it does not contain anything, it just works as a placeholder. Finally,
that array that may be uninitialized may be turned into an array that we know
has been uninitialized with this line of code:
unsafe { mem::transmute::<_, [Box<u32>; SIZE]>(x) }
To learn more about checked uninitialized memory, it is recommended to read the official Rust resource The Rustonomicon chapter 5.1.