Java Bindings for Rust: A Comprehensive Guide

by Akil Mohideen, Natalya McKay, Santiago Martinez Sverko, and Seth Kaul Sponsored by Ethan McCue

This document assumes you have Rust 1.81.0 and Java 22 or later. If you have not installed Rust or Java, you can install Rust here and Java here.

Introduction

Welcome to Java Bindings for Rust: A Comprehensive Guide, a guide to using Rust from within Java. This process can be notoriously confusing, and the information on how to do it is dense and scattered across various sources. This guide teaches how to make these bindings in a digestible way. Every section was created to be as short and readable as possible without barring important information. Complex details will be kept in their own sections, and will be linked to when they are applicable.

Purpose of the Manual

The purpose of this manual is to provide a comprehensive guide for creating Java bindings to Rust libraries using Java 22 and Rust 1.81.0. It will cover the essential steps and concepts required to allow Java applications to call Rust functions, making use of the Foreign Function and Memory API. By the end of this manual, developers will be able to seamlessly integrate Rust’s high-performance, memory-safe capabilities into their Java applications, enabling cross-language functionality.

Why Java 22?

Java 22 introduces the Foreign Function and Memory API (FFM API), a modern alternative to the legacy Java Native Interface (JNI). JNI was traditionally used to interact with C-like functions and data types in external libraries. However, JNI is cumbersome, error-prone, and introduces significant overhead due to repeated native function calls and lack of Just-In-Time (JIT) optimizations. Java objects needed to be passed through JNI, requiring additional work on the native side to identify object types and data locations, making the entire process tedious and slow. With the FFM API, Java now pushes much of the integration work to the Java side, eliminating the need for custom C headers and providing more visibility for the JIT compiler.

This change leads to Better Performance, as the JIT compiler can now optimize calls to native libraries more effectively. It also leads to Simplified Integration because there are fewer requirements on native function signatures. This reduces the overhead of native-to-Java translation. Additionally, the API provides Enhanced Flexibility, as it supports working with various languages like Rust while maintaining full control over how memory and function calls are handled.

Java 22 is the first version to stabilize this API, making it the ideal choice for this manual. It enables efficient, direct interaction with Rust libraries without the historical drawbacks of JNI.

How Java and Rust Work Together

Rust is a system-level language that provides fine-grained control over memory management, making it a popular choice for performance-critical applications. Java, on the other hand, excels in providing portability and high-level abstractions. By using the FFM API in Java 22, developers can leverage Rust’s performance and memory safety in Java applications.

It provides access to classes such as SymbolLookup, FunctionDescriptor, Linker, MethodHandle, Arena, and MemorySegment, which enable Java to call foreign functions and manage memory in more effective ways. On Rust's end, functions exposed to Java must adhere to the C ABI, ensuring compatibility between the two languages. The manual will explore how to allocate, manage, and release memory safely between Java and Rust, ensuring optimal performance and avoiding memory leaks or undefined behavior.

What This Manual Hopes to Accomplish

By the end, this manual will:

Provide a Step-by-Step Guide: Developers will be walked through setting up bindings between Rust and Java, and configuring these bindings for projects.
Demonstrate Practical Examples: Examples of properly designed bindings will be provided and explained. These examples will be provided for both easy and complex topics including exposing Rust functions, handling complex data types, managing lifetimes and memory, and handling multi-threading.
Simplify Rust-Java Integration: The manual will demystify the integration process, helping developers avoid common pitfalls related to ownership, memory management, and data layout discrepancies.
Address Advanced Topics: In addition to the basics, the manual will explore advanced topics such as thread safety, handling Rust’s ownership and borrowing rules in Java, and how to handle complex data structures and edge cases.

By following this guide, developers will gain a deep understanding of how to efficiently and safely call Rust libraries from Java, making full use of both Java 22’s FFM API and Rust’s robust performance and memory safety features.

Setting Up and Linking Rust and Java

In this section, we will explain how to set up and link both Rust and Java code, in order to create Java bindings for Rust libraries. This process involves exporting Rust functions in a way Java can access, and using Java's FFM API to dynamically link to Rust code. We will also cover how to work with FunctionDescriptor, Arena, MemoryLayout, and other key components necessary to ensure safe and efficient communication between Java and Rust.

How Rust and Java Communicate

To enable communication between Rust and Java, Rust compiles code into a shared library that Java can load and invoke using FFM API

This process requires separate files for Rust code and Java bindings, each with specific configurations and naming conventions. The file names and formats may vary depending on your operating system and Rust version.

Rust File: Contains the functions to be exposed to Java. This file is typically named lib.rs and located in the src/ directory of a Rust project.
Shared Library File: This is the compiled output of the Rust code. The file name and extension depend on your operating system.
- Windows: FileName.dll
- Linux: FileName.so
- macOS: FileName.dylib
Java File: Contains the Java code that binds to the Rust shared library. You can name it based on your application, e.g., RustBindings.java.

Setting Up Rust

Step 1: Exporting Rust Functions

To make Rust functions callable from Java, you need to:

Prevent name mangling: Add #[no_mangle] to ensure Java can find the Rust function by its exact name.
Use the C ABI: Add extern "C" so the function adheres to the C Application Binary Interface (C ABI), which Java can interact with.

Before Adding #[no_mangle] and extern "C"

Here’s a simple Rust function that adds two numbers. This version cannot yet be used by Java because the function name will be mangled and will not conform to the C ABI.

// Rust function: Adds two numbers
pub fn add_numbers(x: i32, y: i32) -> i32 {
    x + y
}

After Adding #[no_mangle] and extern “C"

To make the function callable from Java, modify it as follows:

// Rust function: Adds two numbers
#[no_mangle]
pub extern "C" fn add_numbers(x: i32, y: i32) -> i32 {
    x + y
}

Step 2: Modify Cargo.toml

Add the following to the Cargo.toml file to specify that the library should be compiled as a shared library:

[lib] 
crate-type = [“cdylib”]

Step 3: Build Cargo.toml

Run the following command to compile the Rust project into a shared library: cargo build --release

Output: The shared library will be generated in the target/release/ directory.

On Windows: FileName.dll
On Linux: FileName.so
On macOS: FileName.dylib

File Organization Example

Here’s how your project should look after building the shared library:

rust_project/
├── src/
│   └── lib.rs          # Contains the Rust function
├── Cargo.toml          # Rust project configuration
└── target/
    └── release/
        └── librust_lib.dylib  # Shared library (for macOS)

Setting Up Java

Creating a MethodHandle for Rust Functions

Set up a MethodHandle: This will act as a function name wrapper in Java, allowing you to call Rust functions as if they were Java methods.

public class ClassName {
    static MethodHandle yourMethodName;// Creates MethodHandle
    // ...
}

Create a static block: This block initializes the native linker and loads the Rust library.

Step-by-Step: Inside the static Block

Initialize the Native Linker

static{
    Linker linker = Linker.nativeLinker(); // Initializes the native linker
    // ...
}

The linker is used to bind Java code to native functions from the Rust library.

Load the Rust Shared Library

SymbolLookup lib = SymbolLookup.libraryLookup("/* path/to/your/FileName.dylib */", Arena.global()); // Loads the Rust library

Replace "/* path/to/your/FileName.dylib */" with the actual path to the shared library (e.g., librust_lib.dylib on macOS). The Arena.global() ensures global memory scope for linked symbols.

Link the Functions

 yourMethodName = linker.downcallHandle(
    lib.find("/* function_name */").orElseThrow(), // Replace with the Rust function name
    FunctionDescriptor.of(
        ValueLayout.JAVA_INT,                     // Match Java's return type to Rust's return type
        ValueLayout.JAVA_INT,                     // Match Java's first parameter type to Rust's first parameter type
        ValueLayout.JAVA_INT                      // Match Java's second parameter type to Rust's second parameter type
    )
);

Replace /* function_name */ with the name of your Rust function (e.g., add_numbers).
Use FunctionDescriptor to map Java types to Rust types.
Refer to the Value Layout appendix for a full table of Java and Rust type mappings.

Full Java Binding for Adding Two Numbers


import java.lang.foreign.*;
import java.lang.invoke.MethodHandle;

public class RustBindings {
    static MethodHandle addNumbers; // Wrapper for the Rust function

    static {
        // Initialize the linker
        Linker linker = Linker.nativeLinker();

        // Load the Rust library
        SymbolLookup lib = SymbolLookup.libraryLookup("/* path/to/your/FileName.dylib */", Arena.global());

        // Link the Rust function
        addNumbers = linker.downcallHandle(
            lib.find("add_numbers").orElseThrow(), // Replace with the function name from Rust
            FunctionDescriptor.of(
                ValueLayout.JAVA_INT,              // Rust's return type: i32
                ValueLayout.JAVA_INT,              // Rust's first parameter: i32
                ValueLayout.JAVA_INT               // Rust's second parameter: i32
            )
        );
    }
}

Main

public class Main {
    public static void main(String[] args) throws Throwable {
        // Call the Rust function
        int result = (int) RustBindings.addNumbers.invokeExact(10, 20);
        System.out.println("Result: " + result); // Output should be 30
    }
}

Make sure to add throws Throwable to main function.

.invokeExact() is a powerful and strict method for calling functions using a MethodHandle. It ensures precise type matching, which is especially important for interoperability with external libraries (like Rust) where argument types are fixed. Always ensure that the types match exactly to avoid runtime exceptions.

Mapping Rust Features to Java

This section will cover how to properly account for features that are important to Rust. For each Rust concept, the process to Identify a feature and handle it properly in Java will be shown.

By the end of this section, the steps to analyze a Rust function, and determine what needs to be written in Java to bind it correctly, will be clear. This includes handling ownership, memory layouts, and thread safety

Handling Ownership and Borrowing

Identifying Ownership and Borrowing in Rust

Rust enforces strict ownership rules. When a function in Rust takes ownership of a value (e.g., Box, Vec), it means the caller no longer owns that value and cannot use it again unless ownership is returned. Borrowing (&T or &mut T) allows temporary access to a value without transferring ownership.

Pre-Modified Rust Function:

Here’s an example of two functions that demonstrate ownership and borrowing in Rust before modification for interoperability with Java:

#![allow(unused)]
fn main() {
fn take_ownership(v: Vec<i32>) -> Vec<i32> {
    // Takes ownership of v
    v
}

fn borrow(v: &Vec<i32>) -> i32 {
    // Borrows v temporarily
    v[0]
}
}

Explanation:

take_ownership:

Transfers ownership of the Vec to the function.
The original owner can no longer use v unless ownership is explicitly returned. borrow:
Borrows the Vec temporarily to access its first element without transferring ownership.
The caller retains ownership.

Modified Rust Function

#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" fn take_ownership(v: *mut Vec<i32>) -> *mut Vec<i32> {
    // Transfers ownership of the vector
    v
}

#[no_mangle]
pub extern "C" fn borrow(v: *mut Vec<i32>) -> i32 {
    unsafe { (*v)[0] } // Borrowing a raw pointer
}
}

Explanation:

Check Chapter 2.2(Setting Up Rust)

Handling Ownership in Java

When Rust functions take ownership of values or borrow them, Java developers must manage memory explicitly to prevent leaks or invalid references.

What You Need to Do:

For functions that take ownership: You need to call the appropriate Rust cleanup function (like drop or free) using MethodHandle in Java. is released once the object is no longer needed.
For borrowed references: Manage memory using Arena to ensure that memory remains valid for the borrowed duration.

Java Example (Handling Ownership):

// Create a Rust-owned Box and pass ownership MemorySegment rustBox = (MemorySegment) 
RustBindings.createBox.invokeExact(10); 

// Call Rust function to take ownership of the box 
RustBindings.takeOwnership.invokeExact(rustBox); 

// Manually free the Box when done 
RustBindings.freeBox.invokeExact(rustBox); // Ensures no memory leaks

Explanation:

MemorySegment represents the Rust-allocated memory in Java. Java interacts with this using the Foreign Function & Memory API. When takeOwnership is called, the Rust function takes ownership of rustBox. Java then explicitly calls freeBox to release the memory allocated in Rust and prevent leaks.

Memory Layouts and Structs

Identifying Structs and Memory Layouts in Rust

When Rust returns complex data types like structs or arrays, Java needs to correctly interpret their memory layout. Rust’s struct fields are aligned in memory based on their type sizes, so Java must use StructLayout and ValueLayout to match the Rust memory layout exactly.

Example:

#![allow(unused)]
fn main() {
#[repr(C)] // Add this!
struct Point {
    x: i32,
    y: i32,
}
}

The #[repr(C)] attribute ensures that the memory layout of Point follows the C ABI, making it compatible with Java’s FFM API.

Handling Structs in Java

Java uses StructLayout to define memory layouts that match Rust’s struct layouts. When dealing with Rust structs, it’s essential to ensure that the memory allocated on the Java side is properly aligned and of the correct size to match the layout of the Rust struct.

What You Need to Do:

Use StructLayout to define the memory layout that mirrors the fields of the Rust struct.
Allocate a[MemorySegment that is large enough and properly aligned to hold the struct’s data.

Java Example (Handling Structs):

// Define the memory layout of the Rust `Point` struct in Java
StructLayout pointLayout = MemoryLayout.structLayout(
    ValueLayout.JAVA_INT.withName("x"),  // Field `x` (i32 in Rust)
    ValueLayout.JAVA_INT.withName("y")   // Field `y` (i32 in Rust)
);

// Allocate memory for the struct
var arena = Arena.ofConfined();  // Confined Arena for memory management
MemorySegment pointSegment = arena.allocate(pointLayout);

// Set the fields of the Point struct
VarHandle xHandle = pointLayout.varHandle(PathElement.groupElement("x"));
VarHandle yHandle = pointLayout.varHandle(PathElement.groupElement("y"));
xHandle.set(pointSegment, 0, 10);  // Set x to 10
yHandle.set(pointSegment, 0, 20);  // Set y to 20

Explanation:

StructLayout: Defines the layout of the Rust Point struct, where each field is aligned according to its type (in this case, both fields are i32, so each is 4 bytes).
VarHandle: Used to access and set individual fields (x and y) in the memory segment allocated for the struct.
MemorySegment: Represents the allocated memory for the struct, and Java can safely manipulate it according to the struct’s layout.

Handling Thread Safety in Rust

Identifying Thread Safety in Rust

In Rust, thread safety is ensured using the Send and Sync traits. If a Rust function operates across multiple threads, the types used in the function must implement Send or Sync. For example, if a Rust function uses a Mutex or Arc to manage shared data, it is thread-safe.

Modified Rust Function:

#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" fn create_shared_data() -> *mut Arc<Mutex<i32>> {
    let shared_data = Arc::new(Mutex::new(42));
    Box::into_raw(Box::new(shared_data))
}
}

Read through chapter Chapter 2.2(Setting Up Rust) if you are confused about the modification

The function returns a thread-safe Arc<Mutex<i32>>, which ensures that multiple threads can safely access and modify the shared data.

Ensuring Thread Safety in Java

When dealing with thread safety across languages, Java must ensure that memory is safely shared between threads. Java’s FFM API provides Shared Arenas, which allow memory to be safely accessed by multiple threads.

What to Do:

Use Shared Arenas when shared memory or thread-safe operations are expected in Rust.
Java also provides synchronization mechanisms like synchronized blocks to ensure thread safety.

Java Example (Handling Thread Safety):

// Create a shared arena for multi-threaded operations
var sharedArena = Arena.ofShared();
MemorySegment sharedSegment = sharedArena.allocate(8);  // Allocate space for shared memory

// Call Rust function that operates on shared data
RustBindings.createSharedData.invokeExact(sharedSegment);

// Access shared data across threads (ensure proper synchronization in Java)
synchronized (sharedSegment) {
    // Safe access to shared memory here
}

Explanation:

Shared Arena: Ensures that memory is safely shared across threads in Java when interacting with Rust’s thread-safe types like Arc and Mutex.
Synchronized Block: Ensures that only one thread accesses the shared memory at a time, mimicking Rust’s ownership rules for shared data.

Handling Common Data Structures

This section will walk through how to handle common Rust data structures (like structs, arrays, and enums) in Java, explaining why each element is needed, how it functions, and what to watch out for. We’ll go through practical examples, showing how to declare, access, and clean up these data structures from Java.

Handling Rust Structs in Java

Rust Side

In Rust, a struct is a user-defined type that groups related values. Structs use specific memory layouts, which must match on the Java side. The layout of structs is especially crucial for cross-language bindings because memory misalignment can lead to undefined behavior.

Example Rust Struct:

#![allow(unused)]
fn main() {
#[repr(C)]  // Ensures compatibility with C-style memory layout
struct Point {
    x: i32,
    y: i32,
}
}

Explanation: The #[repr(C)] attribute ensures that the struct is laid out in memory according to the C ABI, which is compatible with Java's FFM API.

Java Side

To use this struct in Java, we need to:

Define a StructLayout that matches the Rust struct layout.
Use VarHandles to access each struct field.

Example Java Code:

StructLayout pointLayout = MemoryLayout.structLayout(
    ValueLayout.JAVA_INT.withName("x"),  // Maps to Rust's i32 `x`
    ValueLayout.JAVA_INT.withName("y")   // Maps to Rust's i32 `y`
);

VarHandle xHandle = pointLayout.varHandle(PathElement.groupElement("x"));
VarHandle yHandle = pointLayout.varHandle(PathElement.groupElement("y"));

Explanation:

ValueLayout.JAVA_INT: This matches Rust’s i32 type.
withName("x") and withName("y"): Naming each field lets us retrieve a VarHandle to read and write to specific fields of the MemorySegment that represents the Rust struct.

Allocating and Using the Struct

Allocate Memory: Use an arena to manage the memory allocation.

Access Fields: Access x and y using VarHandles.

var arena = Arena.ofConfined();
MemorySegment pointSegment = arena.allocate(pointLayout);

xHandle.set(pointSegment, 0, 10);  // Set x = 10
yHandle.set(pointSegment, 0, 20);  // Set y = 20

int x = (int) xHandle.get(pointSegment);  // Get x value
int y = (int) yHandle.get(pointSegment);  // Get y value

Explanation:

Arena Allocation: Using an arena (e.g., Arena.ofConfined()) ensures the struct’s memory is safely managed.

Set and Get Values: VarHandle operations allow us to interact with Rust struct fields directly, facilitating cross-language data manipulation.

Handling Rust Arrays in Java

Rust Side

Arrays in Rust are fixed-size collections, and their size and layout must be precisely known for Java to interact with them effectively.

Example Rust Array:

#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" fn create_array() -> *mut [i32; 5] {
    Box::into_raw(Box::new([1, 2, 3, 4, 5]))
}
}

Explanation: Box::into_raw creates a raw pointer, enabling Java to handle the array. Here, #[no_mangle] ensures the Rust function name remains unmangled, making it accessible from Java.

Java Side

To handle arrays from Rust in Java:

Define a SequenceLayout for the array.
Access elements via VarHandle.

SequenceLayout arrayLayout = MemoryLayout.sequenceLayout(5, ValueLayout.JAVA_INT);
VarHandle elementHandle = arrayLayout.elementVarHandle(ValueLayout.JAVA_INT);

Explanation:

SequenceLayout: This layout describes a fixed-size array (5 elements of i32).

VarHandle: Provides access to each element in the array.

Allocating and Accessing Elements

var arena = Arena.ofConfined();
MemorySegment arraySegment = arena.allocate(arrayLayout);

for (int i = 0; i < 5; i++) {
    int value = (int) elementHandle.get(arraySegment, (long) i);
    System.out.println("Array element " + i + ": " + value);
}

Explanation:

Memory Allocation: The array memory is managed within an arena, ensuring safety and easy cleanup.

Element Access: Each element is accessed via elementHandle, following Rust’s array layout.

Handling Rust Vectors (`Vec<T>1`) in Java

Rust Side

In Rust, a Vec<T> is a dynamically-sized array that includes metadata such as capacity and length. Working with vectors across FFI boundaries requires us to manage these fields carefully on both sides.

Example Rust Vector:

#![allow(unused)]
fn main() {
#[no_mangle]
extern "C" fn create_vector() -> *mut Vec<i32> {
    Box::into_raw(Box::new(vec![10, 20, 30]))
}

#[no_mangle]
extern "C" fn vector_push(vec: *mut Vec<i32>, value: i32) {
    unsafe {
        if let Some(vec) = vec.as_mut() {
            vec.push(value);
        }
    }
}

#[no_mangle]
extern "C" fn vector_get(vec: *const Vec<i32>, index: usize) -> i32 {
    unsafe {
        if let Some(vec) = vec.as_ref() {
            vec[index]
        } else {
            0 // Or some error handling
        }
    }
}

#[no_mangle]
extern "C" fn vector_len(vec: *const Vec<i32>) -> usize {
    unsafe {
        if let Some(vec) = vec.as_ref() {
            vec.len()
        } else {
            0
        }
    }
}
}

Explanation:

create_vector: Initializes a Vec<i32> and returns a raw pointer to allow Java to manage the vector.

vector_push: Provides functionality for adding elements to the vector, with error handling in case of null pointers.

vector_get and vector_len: Fetch elements from the vector and get its length, making direct access possible from Java.

Java Side

To handle Vec<T> in Java:

Define a StructLayout that represents the memory layout for a Rust vector (data pointer, length, and capacity).
Use MethodHandles to call Rust functions to manipulate the vector.

Example Java Code:

// Define the layout for Vec<i32>
StructLayout vecLayout = MemoryLayout.structLayout(
    ValueLayout.ADDRESS.withName("ptr"),  // Data pointer
    ValueLayout.JAVA_LONG.withName("cap"), // Capacity
    ValueLayout.JAVA_LONG.withName("len")  // Length
);

// MethodHandles to call Rust functions
MethodHandle vectorPush = linker.downcallHandle(
    symbolLookup.lookup("vector_push").get(),
    FunctionDescriptor.ofVoid(ValueLayout.ADDRESS, ValueLayout.JAVA_INT)
);

MethodHandle vectorGet = linker.downcallHandle(
    symbolLookup.lookup("vector_get").get(),
    FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.ADDRESS, ValueLayout.JAVA_LONG)
);

MethodHandle vectorLen = linker.downcallHandle(
    symbolLookup.lookup("vector_len").get(),
    FunctionDescriptor.of(ValueLayout.JAVA_LONG, ValueLayout.ADDRESS)
);

Explanation:

vecLayout: Defines the structure of the Vec<T> memory, including data pointer, length, and capacity.

MethodHandles (vectorPush, vectorGet, vectorLen): Enable Java to interact with the vector’s core functions.

Allocating and Using the Vector

var arena = Arena.ofConfined();
MemorySegment vecSegment = arena.allocate(vecLayout);

vectorPush.invokeExact(vecSegment, 42);  // Push 42 to vector
long len = (long) vectorLen.invokeExact(vecSegment); // Get vector length
int value = (int) vectorGet.invokeExact(vecSegment, 0L); // Get first element

Explanation:

Arena Allocation: Using an Arena for safe memory management. Push, Length, and Get: MethodHandle invocations facilitate direct manipulation of the Rust vector from Java.

Handling Rust Slices (`&[T]` and `&mut [T]`) in Java

Rust Side

In Rust, slices (&[T] and &mut [T]) represent a reference to a contiguous sequence of elements, without ownership. For FFI, we pass both the pointer to the data and the length of the slice.

Example Rust Slice:

#![allow(unused)]
fn main() {
#[no_mangle]
extern "C" fn sum_slice(slice: *const i32, len: usize) -> i32 {
    let slice = unsafe { std::slice::from_raw_parts(slice, len) };
    slice.iter().sum()
}
}

Explanation: sum_slice: Accepts a pointer and a length, allowing Rust to treat them as a slice. This approach enables safe manipulation and reading of slice data in Rust while preserving FFI compatibility.

Java Side

To interact with Rust slices from Java:

Define a SequenceLayout that reflects the slice structure.
Use a MethodHandle to invoke Rust’s functions on the slice.

Example Java Code:

// Define the layout for an array of 5 integers
SequenceLayout sliceLayout = MemoryLayout.sequenceLayout(5, ValueLayout.JAVA_INT);

// MethodHandle for sum_slice function
MethodHandle sumSlice = linker.downcallHandle(
    symbolLookup.lookup("sum_slice").get(),
    FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.ADDRESS, ValueLayout.JAVA_LONG)
);

Explanation:

sliceLayout: Defines the memory layout for a fixed-size slice.
MethodHandle (sumSlice): Links to Rust’s sum_slice function, allowing Java to call it with a memory segment and length.

Allocating and Accessing Slice Elements

var arena = Arena.ofConfined();
MemorySegment sliceSegment = arena.allocate(sliceLayout);

VarHandle intHandle = ValueLayout.JAVA_INT.varHandle();
intHandle.set(sliceSegment, 0, 10);
intHandle.set(sliceSegment, 1, 20);

int result = (int) sumSlice.invokeExact(sliceSegment, 5L);  // Sum the slice
System.out.println("Sum of slice elements: " + result);

Explanation: Arena Allocation: Allocates the slice’s memory in an arena for safe usage.

Setting and Summing Elements: Uses VarHandles for direct element access and sumSlice for calculating the sum, bridging Rust’s slice handling with Java effectively.

Edge Cases and Troubleshooting

This section is designed to provide solutions for challenging edge cases and common errors Java developers may encounter when working with Rust bindings. Each subsection includes practical examples in Rust and Java, with solutions and explanations on handling complex scenarios such as memory alignment issues, lifetimes, and data races.

Handling Rust Lifetimes in Java

Rust’s lifetime annotations ensure that references do not outlive the data they point to. Since Java lacks a direct equivalent, memory management must be handled with precision to avoid accessing invalidated memory.

Example: Short-Lived Borrowed Reference

#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" fn get_reference<'a>(value: &'a i32) -> &'a i32 {
    value
}
}

Here, get_reference returns a reference to an integer. In Rust, the lifetime 'a ensures that the reference value will be valid while it’s borrowed. This reference cannot outlive its source.

Java Side Solution:

To prevent accessing invalid memory, Java can use confined arenas for short-lived data.

var arena = Arena.ofConfined();
MemorySegment segment = arena.allocate(ValueLayout.JAVA_INT);  // Allocate memory for the reference
MethodHandle getReference = RustBindings.getReferenceHandle();

// Pass and retrieve the reference within the arena's lifetime
int value = 42;
segment.set(ValueLayout.JAVA_INT, 0, value);
MemorySegment borrowed = (MemorySegment) getReference.invokeExact(segment);
arena.close();  // Ensures memory is freed

Explanation and Solution:

Confined Arena: The confined arena restricts access to a single thread, ensuring safe memory management. The arena is closed immediately after the operation, so Java cannot access the memory after it’s freed.

Memory Safety: By confining the memory within the arena, Java developers can ensure they only use memory while it’s valid, preventing accidental reuse.

Why It’s Tricky:

Rust’s lifetimes prevent data from being used after it’s freed, while Java’s garbage collection doesn’t directly support this. Confined arenas provide a reliable method to approximate Rust’s memory safety, but they require Java developers to actively manage their memory, which can be challenging.

Handling Enums with Data Variants

Rust enums are often more complex than Java enums since they can carry data. Java needs to map Rust enums to a compatible structure, identifying active variants to avoid misinterpreting memory.

Example: Enum with Multiple Variants

Rust Side:

#![allow(unused)]
fn main() {
#[repr(C)]
pub enum Status {
    Ok(i32),
    Error(String),
}

#[no_mangle]
pub extern "C" fn get_status() -> Status {
    Status::Ok(200)
}
}

Java Side Solution: To handle this enum, Java needs to use a layout that supports both an enum tag (discriminator) and the associated data.

StructLayout statusLayout = MemoryLayout.structLayout(
    ValueLayout.JAVA_INT.withName("tag"),     // Enum discriminator
    ValueLayout.JAVA_INT.withName("value")    // Holds Ok value or error pointer
);

VarHandle tagHandle = statusLayout.varHandle(PathElement.groupElement("tag"));
VarHandle valueHandle = statusLayout.varHandle(PathElement.groupElement("value"));

MemorySegment statusSegment = arena.allocate(statusLayout);
int tag = (int) tagHandle.get(statusSegment);

if (tag == 0) {  // Ok variant
    int okValue = (int) valueHandle.get(statusSegment);
    System.out.println("Status OK: " + okValue);
} else {  // Error variant
    // Process error value appropriately
    System.out.println("Status Error");
}

Explanation and Solution:

Discriminator and Value Fields: tag differentiates between Ok and Error, while value holds associated data. By reading tag, Java can branch to handle each case correctly.

Memory Layout Compatibility: Using a StructLayout with specific VarHandles ensures memory alignment and prevents misinterpretation of data.

Why It’s Tricky:

Enums in Rust can carry various data types for each variant, which Java enums don’t support. The solution requires careful layout management and handling each variant’s data accordingly.

WrongMethodTypeException in `invokeExact()`

Cause: Java’s MethodHandle.invokeExact() requires an exact match between arguments and the function signature. A mismatch in argument types or order will throw this error.

Solution:

Verify FunctionDescriptor: Ensure that the FunctionDescriptor matches the Rust function’s expected argument and return types exactly.
Check Argument Casts: Explicitly cast arguments to their expected types, and cast return values as needed.

Example:

#![allow(unused)]
fn main() {
// Rust function signature: pub extern "C" fn add(x: i32, y: i32) -> i32
FunctionDescriptor addDescriptor = FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.JAVA_INT, ValueLayout.JAVA_INT);
MethodHandle addHandle = linker.downcallHandle(lib.find("add").orElseThrow(), addDescriptor);

int result = (int) addHandle.invokeExact(5, 3);  // Cast to int as expected
}

Explanation and Solution:

Type Matching: FunctionDescriptor ensures that Java and Rust types align.

Exact Casting: Casting return values and arguments to their exact types avoids this error, as Java’s type system is stricter here than Rust’s.

Why It’s Tricky:

Rust function signatures may allow implicit casting that Java does not, so ensuring exact types in the descriptor is essential.

Segmentation Fault or Undefined Behavior

Cause: This typically results from misaligned memory or accessing freed memory. Common causes include mismatched layouts, accessing unallocated memory, or not using the correct arena.

Solution:

Verify MemoryLayout Alignment: Ensure MemoryLayout precisely matches Rust’s struct or array layout, particularly if #[repr(C)] is used.
Use Arenas Appropriately: Manage memory with confined or auto arenas to ensure data remains valid only as long as needed.

Example

In Rust:

#![allow(unused)]
fn main() {
#[repr(C)]
struct Data {
    x: i32,
    y: i64,
}

#[no_mangle]
pub extern "C" fn create_data() -> *mut Data {
    Box::into_raw(Box::new(Data { x: 1, y: 2 }))
}
}

In Java:

StructLayout dataLayout = MemoryLayout.structLayout(
    ValueLayout.JAVA_INT.withName("x"),
    ValueLayout.JAVA_LONG.withName("y")
);

var arena = Arena.ofConfined();
MemorySegment dataSegment = arena.allocate(dataLayout);
MethodHandle createData = RustBindings.createDataHandle();
dataSegment = (MemorySegment) createData.invokeExact();

Explanation and Solution:

Alignment Matching: Ensure JAVA_INT and JAVA_LONG are aligned with Rust’s i32 and i64. Java’s layout must match precisely, as alignment affects performance and stability.

Safe Memory Access: Use confined arenas to allocate and manage Rust data safely, freeing memory once Java no longer requires it.

Why It’s Tricky:

Alignment and memory lifetime issues can cause silent data corruption or segmentation faults, making layout precision and memory management critical for stability.

Cause: Java cannot find the Rust shared library file (e.g., .so, .dll, .dylib) because the file path is incorrect or the library name is misspelled.

Solution

Specify Library Path and Name Correctly: Ensure that the shared library file is available in the system path or specified explicitly.
Check System Compatibility: Ensure that the library file matches the OS format (e.g., .dll on Windows).

Example:

// Ensure correct file name for your OS
SymbolLookup lib = SymbolLookup.libraryLookup("libmylibrary.so", Arena.global());

Explanation and Solution:

Library Path Validation: Confirm that the library file path is correct, and the file exists. Specifying the full path or ensuring the library is on the system’s path will solve this issue.

Why It’s Tricky:

If Java cannot locate the Rust library, it throws a runtime error, which can be hard to trace if the path is only slightly incorrect.

Value Layout

ValueLayout is the most primitive layout type, representing the layout of, well, primitives. They are:

ValueLayout.ADDRESS
ValueLayout.JAVA_BOOLEAN
ValueLayout.JAVA_BYTE
ValueLayout.JAVA_CHAR
ValueLayout.JAVA_DOUBLE
ValueLayout.JAVA_FLOAT
ValueLayout.JAVA_INT
ValueLayout.JAVA_LONG
ValueLayout.JAVA_SHORT
ValueLayout.ADDRESS_UNALIGNED
ValueLayout.JAVA_CHAR_UNALIGNED
ValueLayout.JAVA_DOUBLE_UNALIGNED
ValueLayout.JAVA_FLOAT_UNALIGNED
ValueLayout.JAVA_INT_UNALIGNED
ValueLayout.JAVA_LONG_UNALIGNED
ValueLayout.JAVA_SHORT_UNALIGNED

These all correspond to the Java primitives (ADDRESS is a bit special), aligned and unaligned, which have direct mappings to C primitive types.

Type Mappings: Java, C, and Rust

Java Type	C Type	Rust Type	Description
`ValueLayout.ADDRESS`	Pointer	`mut`, `const`	Pointer to a memory location.
`ValueLayout.JAVA_INT`	`int`	`i32`	32-bit signed integer.
`ValueLayout.JAVA_LONG`	`long`	`i64`	64-bit signed integer.
`ValueLayout.JAVA_SHORT`	`short`	`i16`	16-bit signed integer.
`ValueLayout.JAVA_BYTE`	`char`	`i8`	8-bit signed integer.
`ValueLayout.JAVA_BOOLEAN`	`char (0 or 1)`	`bool`	Boolean value (true or false).
`ValueLayout.JAVA_FLOAT`	`float`	`f32`	32-bit floating-point number.
`ValueLayout.JAVA_DOUBLE`	`double`	`f64`	64-bit floating-point number.
`ValueLayout.JAVA_CHAR`	`short (UTF-16)`	`u16`	16-bit unsigned integer for UTF-16.

Unsigned Types

Java Type	C Type	Rust Type	Description
`ValueLayout.JAVA_INT`	`unsigned int`	`u8`	8-bit unsigned integer.
`ValueLayout.JAVA_INT`	`unsigned int`	`u16`	16-bit unsigned integer.
`ValueLayout.JAVA_LONG`	`unsigned long`	`u32`	32-bit unsigned integer.
`ValueLayout.JAVA_LONG`	`unsigned long`	`u64`	64-bit unsigned integer.

So the _UNALIGNED versions are exactly the same as their counterparts except that they have an alignment of 1. This allows storing them unaligned, but it will also force the JVM to issue special instruction sequences to load values, since most CPU architectures do not natively support unaligned loads and stores from or to memory. It is also worth noting that ValueLayout.JAVA_DOUBLE and ValueLayout.JAVA_LONG have platform-dependent alignment because some CPU architectures require natural alignment (size = alignment, so 8 in this case) whereas some like x86 only require an alignment of 4. All other primitives are defined to have natural alignment.

Beyond representing primitive types, ValueLayouts also provide access to different byte ordering (also known as endianness) through the .withOrder(ByteOrder) method. The choices for ByteOrder are BIG_ENDIAN, and LITTLE_ENDIAN, although the static method ByteOrder.nativeOrder() will return whichever of those your CPU natively uses (usually LITTLE_ENDIAN). This is required by many serialization formats, such as most network formats, because many of them require BIG_ENDIAN byte order while most CPU architectures only natively support LITTLE_ENDIAN. Rust doesn’t have int, long, etc., so we must use a different translation to

For additional information on ValueLayout, visit Oracle's official documentation, and official Rust resource The Rustonomicon.

Method Handle

MethodHandle is where one of the most essential tools in the FFM API . The absolute most important method on MethodHandles returned from the Linker is invokeExact(…). .invokeExact(…) takes in the parameters of the function according to the FunctionDescriptor and returns a value with type also specified by the FunctionDescriptor. Java will throw an exception at runtime if the arguments passed to the method do not match up with the FunctionDescriptor. Because of some Java Virtual Machine details, the return value must also be explicitly cast to the expected return type. Otherwise, Java will once again throw an exception at runtime, this time because the return type was wrong. A function with signature FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.JAVA_FLOAT) would be called like so:
int returnValue = (int)handleName.invokeExact(myFloat).

For more information on MethodHandle, visit Oracle's official documentation.

Memory Layout

Memory Layouts can be used in order to streamline the allocation of off-heap memory. Here is an overview of how MemoryLayout differs from MemorySegment.

Assume an array of structs needs to be declared for the following example. First an Arena must be created, any arena type desired will do. Next a MemoryLayout.sequenceLayout() can be used, with arguments n, that reflect the length of the array, and MemoryLayout.structLayout(), that takes in the value layouts and names of elements within the struct. After this, create VarHandles for each element within the struct, which create a reference for each respective element. Then create a MemorySegment that corresponds to the entire memory layout of the array, and allocate it to the appropriate arena, and finally the structs can be accessed.

For additional information on MemoryLayout, visit Oracle's official documentation.

Memory Segment

MemorySegment represents a fat pointer, that is, a pointer with associated bounds information, much like a mutable slice in Rust. The main method associated with memory segments is .get(MemoryLayout, offset), which indexes offset amount into the pointer and reads whatever memory is there as if it’s of the associated type.

For instance, segment.get(ValueLayout.JAVA_INT, 1) is basically the same as C code doing ((int*)segment)[1]. The only difference from the C code is that Java will throw an exception if the program attempts to access an index outside of the bounds associated with the MemorySegment. The most common sources of MemorySegments are functions returning pointers. MemorySegments returned to Java through the foreign function interface will automatically be assigned a length of zero, since Java does not have enough information to determine the bounds. However, invoking the .reinterpret(size) method will edit the bounds information. This is extremely unsafe and must be used with caution. Assigning a logically incorrect bound could allow normal Java code to cause a segmentation fault (or worse).

Finally, like Rust slices, MemorySegments can be subsliced using .asSlice(offset, size), which is also bounds-checked, returning a new slice with the associated pointer and length values and the same lifetime as the original.

For more information on MemorySegment, visit Oracle's official documentation.

Variable Handle

A VarHandle represents a handle to a sub-layout given a layout. It helps solve the problem of, say, accessing an int field of a struct, or accessing an element of an array. Variable handles are used to construct a path to a value that needs to be given a certain layout (basically a type). Say there is a pointer to an array of struct foo, which has an integer member x that must be read. This is how to construct a VarHandle to get x from any such pointer:

MemoryLayout layoutOfPointer =
ValueLayout.ADDRESS.withTargetLayout(
    MemoryLayout.sequenceLayout(arrayLen,
        MemoryLayout.structLayout(
            ValueLayout.JAVA_INT.withName(“x”),
            ValueLayout.JAVA_INT.withName(“y”),
        )
    )
);
VarHandle xHandle = layoutOfPointer.varHandle(
    PathElement.dereferenceElement(),
    PathElement.sequenceElement(),
    PathElement.groupElement(“x”)
);

Now whenever x is needed from this kind of pointer, call (int)xHandle.get(MemorySegment, 0, index).

For more information on VariableHandle, visit Oracle's official documentation.

Function Descriptor

FunctionDescriptor represents the signature of a function.
FunctionDescriptor.of(MemoryLayout, … ) takes a variadic¹ input of MemoryLayouts. The first argument is the memory layout of the return type, and the rest correspond to the memory layouts of the function arguments.

For example, int foo(float, void*) would be represented as FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.JAVA_FLOAT, ValueLayout.ADDRESS)

For void functions, FunctionDescriptor.ofVoid(MemoryLayout, … ) is a static method that is exactly the same as FunctionDescriptor.of(MemoryLayout, … ) except that its first argument corresponds to the first function argument rather than the return value.

For example, void foo(float, void*) would translate to FunctionDescriptor(ValueLayout.JAVA_FLOAT, ValueLayout.ADDRESS)

For additional information on FunctionDescriptor, visit Oracle's official documentation.

The function can take a variable amount of arguments

Struct Layout

A StructLayout represents the layout of a C-style struct, including the layout of all its members, all their members (if applicable), and so on. It does exactly the same job as a struct definition in C. The class itself has no interesting methods, but you can create a StructLayout using MemoryLayout.structLayout(MemoryLayout…). To translate the following structs to the Java FFM API, we would use the following Java code:

struct foo {
    int num;
    char* string;
    struct bar baz;
}

Java:

StructLayout bar = …;
StructLayout foo = MemoryLayout.structLayout(
    ValueLayout.JAVA_INT.withName(“num”),
    ValueLayout.ADDRESS.withTargetLayout(0,
        ValueLayout.JAVA_BYTE).withName(“string”),
    bar.withName(“baz”)
);

The .withName(String) method allows you to later retrieve a VarHandle using that name, covered in the VarHandle section. Constructing a StructLayout like this will automatically generate the appropriate total size and alignment, as well as member offsets and padding that C would add on this platform. Generally, the size is greater than or equal to the sum of the sizes of the members (making room for padding as necessary to keep all members aligned) and the alignment is the maximum of the member alignments. Some exotic C programs may use overaligned structs¹, for which you can add a final .withAlignment(alignment) to override the automatic alignment calculated by Java.

This all still applies to Rust, but only on:

#[repr(C)] structs
#[repr(C)] tuple structs²
#[repr(integer type)] enums with only valueless variants
enums with exactly one nonnullable #[repr(C)] variant and up to one zero-sized variant³
#[repr(transparent)] structs and tuple structs with exactly one #[repr(C)] member and all other members being zero-sized

#[repr(C)] requires all members, and members of members, and members of those members, etc. to be #[repr(C)] as well, which is very invasive to code. For the sake of performance, some may choose to do this, but it also greatly limits what you can use in the standard library. Common non #[repr(C)] types include:

Vec
String
&str
slices
anonymous
tuples
dyn references
Box<dyn T>
most enums with a variant that holds a value (Option<T> for most T)
all enums with more than one variant that holds a value
every single container type⁴

If a type uses any of these types (and most types from external libraries too) by value, that type cannot be #[repr(C)]. The only way around this restriction is through pointer indirection, like Box<T>⁵, because pointers are always representable even if the thing they are pointing to is not. People wanting every last ounce of performance can deal with this, but the average Rust type cannot, and so it cannot be represented as a StructLayout or a MemoryLayout. The last class important specifically to StructLayout is PaddingLayout. This is the layout of padding in StructLayouts. It exists purely to pad the struct.

For more information on StructLayout, visit Oracle's official documentation.

Many compilers accept __attribute__((aligned(x))) to align a struct to x, or they keep its original alignment if x is less than or equal to that. Rust has #[align(x)] to specify overalignment.

Tuple structs are just structs with anonymous members.

This case exists pretty much purely to allow Option to be exchanged as a nullable pointer

⁴

VecDeque, HashMap, HashSet, BTreeMap, BTreeSet, every iterator in the entire standard library, every IO type, every FS type (including File), Rc, Arc, RefCell, RwLock, Mutex.

⁵

Still doesn’t work for dyn, use ThinBox for that. Box<T> is guaranteed to be represented by just a pointer, semantically like one returned from malloc.

Union Layout

UnionLayout represents a C union. Much like a C union, it is used to specify and access the different members like it was a struct. However, only one of those members exists at any one time. You can create a UnionLayout with MemoryLayout.unionLayout(MemoryLayout…). Just like in C, a MemorySegment referencing a UnionLayout can be treated as actually referencing the layout of one of its members, such as by calling .get() with the associated MemoryLayout.

Alternatively, Variable Handles can be used to reference members in a process similar to that used in C. Generally, union layouts will have a size equal to the maximum size of its members and an alignment equal to the maximum alignment of its members. Similarly to structs, unions can be overaligned, which can be specified by adding .withAlignment(alignment) to the end of the method chain to overwrite Java’s automatically-determined alignment for that type.

For more information on UnionLayout, visit Oracle's official documentation.

Sequence Layout

SequenceLayout represents the layout of arrays. To create a SequenceLayout, call MemoryLayout.sequenceLayout(numberOfElements, MemoryLayout). There is no get method or any direct way to get the nth element of an array. Instead, create a special VarHandle to the needed data within the member, then call get on that with the index. For instance, to get the x-coordinates of the structs in an array, use:

SequenceLayout arrayOfStruct = MemoryLayout.sequenceLayout(10,
    MemoryLayout.structLayout(
        ValueLayout.JAVA_INT.withName(“x”),
        ValueLayout.JAVA_INT.withName(“y”)
    ).withName(“struct”)
);
VarHandle varHandle =
arrayOfStruct.arrayElementVarHandle(PathElement.groupElement(“x”));
for (int i=0; i<10; i++) {
    System.out.println(varHandle.get(memorySegment, 
        0L,
        (long)i)
    );
}

SequenceLayout provides some interesting methods. sequenceLayout.elementCount() will, as the name suggests, give the length of the array, which is useful for passing around slices as it is not necessary to store the length itself. sequenceLayout.reshape(long dim1, long dim2, …) and sequenceLayout.flatten() are both related to reinterpreting multidimensional arrays. Multidimensional arrays are just arrays of arrays, but their layout means they can safely be reinterpreted as a single dimension array of size (dim 1 size)*(dim 2 size)*..., which is exactly what sequenceLayout.flatten() does. sequenceLayout.reshape does the inverse of sequenceLayout.flatten(), but is also fallible. Obviously, if an attempt is made to reshape an array to AxBxC but the array’s length isn’t divisible by A and B and C, this method will throw an exception. Another nice property of sequenceLayout.reshape() is that one argument may be set to -1, in which case sequenceLayout.reshape() will do the math based on the array’s length to determine what that dimension must be.

A Java type can be used to act as a wrapper around Rust slices, so SequenceLayout would feature heavily in that kind of implementation. While a slice object, composed of a pointer and a length, is not application binary interface (ABI) stable, the underlying array is ABI stable. Rust provides methods to get the pointer and length from a slice, as well as functions to construct slices from a pointer and a length, so while it is not ABI safe, it is easy enough to disassemble and reassemble into safe forms as needed. While it is easier to just keep an opaque blob of data and ask Rust any time it must be used, it is much faster for Java to have direct access to the array.

The Just-In-Time (JIT) compiler knows how array accesses work, and can optimize the corresponding Java code, possibly with automatic vectorization which is a great boost to throughput. In contrast, every time a call is made out to a Rust function, the JIT compiler has no idea what that function is doing. This means that it can not optimize the memory accesses, and it must also assume that the function breaks every optimization assumption it has. For instance, the function could touch any value in memory, preventing the JIT compiler from reordering any reads or writes from before the function call to after the function call, and vise versa.

The Rust compiler has the same issue: it does not know what the Java code is doing, so there is no way it can optimize around that such as automatic vectorization either. This does not matter so much for one-off functions, functions that are only called a few thousand times, or large functions where execution time is dominated by actually running the function and not on function call overhead, but for simple code in loops this can be brutal. And how are arrays typically used? Usually small bits of code run many times in a loop. The performance gains are too great to ignore. While doing the loop in Rust will beat Java almost every time, it is not reasonable for every possible loop body to be put in Rust. However, developers have the option to write all of their loops in Rust if they so choose. Still, SequenceLayout provides a great opportunity to allow easy, direct access to arrays and array elements for Java.

For more information on SequenceLayout, visit Oracle's official documentation.

Arenas

Arenas are a way that Java provides developers to allocate memory in a way that is particularly useful for creating bindings. Arenas are like a stack of memory, and its space can be split in various ways, and its lifetime can be set by various types. The main idea of where arenas can be used is that they can create space to store objects in Java called Memory Segments. These memory segments can store data such as variables, data structures, and functions in a space that the garbage collector treats differently. That means information stored in these arenas can be passed to and from foreign functions without worrying about whether Java’s garbage collector has tampered with the space.

There are four different types of arenas: confined, automatic, shared, and custom. Confined arenas and shared arenas are very similar. They both will live as long as the Java program unless they are manually closed by the user using the .close() method on the arena object. The key difference between the two is that confined arenas can only be accessed by a single thread, while shared arenas can be accessed by multiple threads. This causes a weird interaction with shared arenas. When a confined arena is closed, its memory is immediately freed and that’s all there is to it. When a shared arena is closed, it invalidates all Java references to the space in memory, but it does not immediately free it as the process takes longer, meaning that the space in memory is technically alive for a very short amount of time after the arena is closed. These arenas are useful for creating Rust bindings because they can guarantee a space in memory cannot be accessed once closed, so they can be implemented into functions to guarantee proper memory safety practices.

The API descriptions for automatic arenas typically vaguely describe their closing behavior, such as “the garbage collector eventually frees it automatically”. To better describe its behavior, The garbage collector will only free the automatic arena either at the end of the Java program or when it determines that the arena is unreachable. But what does the garbage collector see as unreachable?

Testing will show that Java will not close the arena even if every memory segment inside is set to null. The information inside the arena has no bearing on the garbage collector’s decision to keep it around. However, a way to guarantee that the garbage collector determines the arena as unreachable is to set the arena to null. This means that automatic arenas can be useful and reliable for creating bindings as well, especially if it is not clear when a certain arena should be closed. The only downside of the automatic arena is its interaction with the garbage collector. It is possible this could cause some sort of increased overhead.

With an Arena, you can call arena.allocate(size, alignment) to allocate memory within the arena. Allocations cannot be individually freed with Arenas, it’s either all or nothing. Global Arenas are useful for set-and-forget things, like for loading the Rust library, since this does not need to be freed. Confined Arenas are good for data that cannot be safely shared across threads, so for types that don’t implement the Send trait. Auto Arenas are nice if it is difficult to figure out when something should be deallocated. Although this isn’t very common as drop() should be called on Rust objects that require cleanup, and Java’s garbage collector will not take care of this.

For more information on arenas, visit Oracle's official documentation.

Shared Object and Dynamic Library Files

Shared object and dynamic library files effectively serve the same purpose in this scope. They provide executable code to outside sources. This means that once Java is given the address to the code for a function in this file, it is ready to run once called. Although they effectively share the same purpose, their file types differ based on the system running. Below is a table with each file extension used by three of the most common operating systems.

System	File Extension
Linux	.so
Windows	.dll
Mac	.dylib

Ownership

A piece of data must be owned by at most one variable at any given time, even across an FFI boundary. If Rust has ownership of a Vec<T> for instance, Java cannot decide to take control of it, as, in this case, that would lead to both Java and Rust calling drop when done with the type, causing a double free of the backing array. And that’s one of the better outcomes, as generally types do not expect to suddenly be in an invalid state due to external mucking, nor is there much they can do about it. One exception to this rule are types that implement Copy, as they can be blindly memcopied to create an identical clone of the original (barring any atomicity issues if this is done across threads), though most types do not implement Copy so this isn’t very useful when creating these bindings.

Example of Ownership

In this calculator code, ownership is demonstrated in how PostfixCalculator manages its stack:

struct PostfixCalculator {
    stack: VecDeque<f64>,
}

impl PostfixCalculator {
    fn new() -> Self {
        PostfixCalculator {
            stack: VecDeque::new(),
        }
    }
}

PostfixCalculator owns its stack. When PostfixCalculator is dropped, so is its stack, which automatically cleans up without the programmer needing to manually manage memory.

To learn more about ownership, it is recommended to read these official Rust resources: The Rust Programming Language chapter 4, and The Rustonomicon chapter 6.

Borrowing and Aliasing

Data can be “borrowed” as references, either immutably &T or mutably &mut T. The compiler enforces a sort of reader-writer lock on the type: it can have either multiple readers (immutable/shared references, &T) or a singular writer (mutable/exclusive reference, &mut T). The compiler will assume that the data behind a shared reference will not mutate (unless the type opts out of it with UnsafeCell, which can be used for custom special types, which should not be used to enforce users’ types) and the compiler will assume that no other code or data can reference, read, or mutate the data behind an exclusive reference (there is no opt out, this must never happen!). The fact that Rust can make these assumptions is what makes it so fast and efficient, but it also means you are restricted from coding practices that break them.

This is approximately the exact opposite of Java’s memory model, where everything is a mutable reference to the underlying object. While Java can’t arbitrarily clone objects, meaning it can’t make copies of a class holding an exclusive reference, it can make those objects live arbitrarily long. This means it is essential to either detect that the reference is still live and refuse to service any other borrows, or invalidate the reference in order to service other borrows. There is a Rust type that effectively performs this latter approach: RefCell<T>.

Raw pointers in Rust do not have such aliasing restrictions with regard to each other, so we are free to have any number of constant *const T and mutable *mut T pointers coexisting. Raw pointer semantics are just like they are in C, and are in fact even more lenient than C pointers since C pointers of differing types are not allowed to alias. You’re still not allowed to mess with ownership – the owner of the type still acts like your pointers don’t exist and so still assumes it is the arbiter of reads and writes – but if you have ownership of the type you can just make sure to only interact with it using raw pointers. This is exactly what UnsafeCell<T> and Cell<T> do to enable shared mutability, and those are the primitives fancy types like Rc<T> use to allow shared ownership.

Example of Borrowing and Aliasing

In this calculator code, Borrowing and Aliasing is demonstrated.

struct PostfixCalculator {
    stack: VecDeque<f64>,
}

impl PostfixCalculator {
    fn new() -> Self {
        PostfixCalculator {
            stack: VecDeque::new(),
        }
    }
}

Rust's borrowing rules ensure that references to data (borrowing) do not outlive the data they reference (ownership). This prevents dangling pointers

To learn more about borrowing and aliasing, it is recommended to read these official Rust resources: The Rust Programming Language chapter 4.2, and The Rustonomicon chapters 3.1 and 3.2.

Lifetimes

Rust constantly wants to know "what exactly is that reference referencing?". Most things don’t live forever, so Rust also checks that developers don’t try to use it or reference it after it has been moved. A move is a change in ownership which potentially means physically moving it in memory and invalidating any pointers to it. drop(), for instance, takes ownership of an object so it can kill it. Anyone familiar with pointers in C has a decent understanding of the concept of pointer lifetimes: do not use the pointer after the object has been deleted or moved. As long as a shared reference exists, no mutable references may exist and the object must not be moved; and as long as a mutable reference exists, no other references may exist and the object must not be moved. The compiler enforces a more stringent test on safe code, that breaking those rules must provably never happen, leading to some cases where you know it will not happen, yet the compiler can not prove it, so it does not allow it. Luckily we do not need to follow the compiler’s test, we only need to follow those simple rules.

Unfortunately, for arbitrary code the lifetimes involved can get quite intricate. fn foo<’a>(input: &’a) -> TypeWithLifetime<’a> creates a transitive relationship between the lifetime of input and TypeWithLifetime<’a>. While we may be able to enforce a simple one-to-one lifetime relationship, it’s unclear if we can feasibly enforce that A lives as long as B lives as long as C lives as long as D lives as long as… Certainly, if it requires invasive changes to types crossing the FFI boundary, such as every reference in every struct needing to be converted to a RefCell<&T>, that would be very inconvenient for users.

Example of Lifetimes

The code does not explicitly use annotated lifetimes because it does not require them due to its simplicity. However, the concept is there implicitly:

struct PostfixCalculator {
    stack: VecDeque<f64>,
}

impl PostfixCalculator {
    fn new() -> Self {
        PostfixCalculator {
            stack: VecDeque::new(),
        }
    }
}

fn evaluate(&mut self, tokens: Vec<&str>) -> Result<f64, String>
{
    // use of `self` which has an implicit lifetime
}

This example implicitly uses lifetimes to ensure that references within the evaluate function do not outlive the PostfixCalculator instance they reference. Rust's lifetime elision rules automatically handle this in most cases, but explicit lifetime annotations can be used for more complex scenarios.

To learn more about lifetimes, it is recommended to read these official Rust resources: The Rust Programming Language chapter 10.3, and The Rustonomicon chapter 3.3.

Symbols, Extern, Generics

By default, Rust functions have an undefined application binary interface (ABI), thus they are incompatible with what C expects. Rust functions also have mangled symbol names¹. To guarantee a C ABI (assuming the types themselves are C ABI compatible, the next section provides details on that), the function declaration must be prefixed with extern “C”. So
extern “C” foo(number: i32) -> i32 would be equivalent to the C function int foo(int number). To guarantee the symbol name is that of the function name, like in C, you must annotate the function with the #[no_mangle] attribute.

However, this does not cover functions with generic types. Rust allows creating functions that act on unknown types, so that a fucntion like
fn add<T: Add<Output=T>>(a: T, b: T) -> T { return a + b; } can be reused with any type as long as it implements Add. How does the same function handle multiple types? On the machine code level it doesn’t, that’s why functions are first monomorphized, creating a version of the function for every used combination of generic types. Calling add(1u32, 1u32) would generate a function equivalent to fn add(a: u32, b: u32) -> u32, whereas calling add(1u8, 1u8) would generate fn add(a: u8, b: u8) -> u8.

Java cannot see generic functions, it only sees monomorphized functions that exist in the shared object file. Rust only generates monomorphizations for types that are used in that function, so if the Rust library code does not use fn add<T: Add<Output=T>>(a: T, b: T) -> T at all, there are no used generic types and thus the compiler does not generate anything related to that function. Even if it did, it can not possibly support every type a programmer might use, especially if a function had multiple type parameters. fn foo<A, B>() would require the square of the number of possible types. The best thing to do without using dyn pointers is enforcing wrapper functions without generic parameters: fn add_u32(a: u32, b: u32) -> u32 { return add::<u32>(a, b); }.

Specifying dyn references in a type instructs the Rust compiler to use fat pointers - pointers that store the normal pointer as well as a pointer to a vtable containing methods that can be called on the pointer. This works almost exactly like in C++ with exactly the same tradeoff. There is only one function in the final binary (no monomorphization needed) but it is not specialized for a type (so no automatic vectorization on integers for instance). Additionally, it requires dereferencing the pointer to the vtable, as well as that function then needing to dereference the real pointer once it is called. This can lead to memory access / cache missing overhead.

It also breaks a common idiom: Vec<T>. &dyn Vec<T> can be done, but chances are T will need to be accessed. If Vec<&dyn T> is used, there will be lifetime issues and it will be necessary to restructure everything that touches the vector to deal with Vec<&dyn T>, even if they otherwise could have used the easier Vec<T>. The biggest issue with using dyn, however, is that some trait methods simply do not work with dyn. The Rust Reference specifies the conditions that are required for a method to be object-safe: it must not return Self directly² (the compiler doesn’t know the ABI layout of a function with an unknown return type), it must not take Self as an argument directly³, and it must not use any generics beyond Self⁴.

A final issue with dyn is that fat pointers do not have a stable ABI. There is an experimental feature, ptr_metadata, that allows splitting the pointer and its metadata as well as creating a fat pointer from a raw pointer and metadata. Although, the Metadata is not object safe. DynMetadata<dyn T> may have a stable representation for different T⁵, but it requires lots of transmuting to make that work and it might technically be undefined behavior. Ultimately, dyn saves some code size at the expense of poor ergonomics, using confusing experimental Rust features, and performance. Therefore, a developer might just be better off writing everything in Java instead of trying to interoperate with Rust code.

This means the symbol name for a function can not be known.

This is because the compiler does not know the ABI layout of a function with an unknown return type.

This is because the compiler does not know the ABI layout of a function with unknown argument types.

⁴

Ditto ABI of arguments.

⁵

This is needed for passing it to Java through the C ABI.

Size and Alignment

Allocating a Rust object within Java to pass to Rust functions requires respecting the type’s size and alignment. If the space allocated is too small that leads to buffer overflows or overreading, but another property is alignment.

An alignment of 2 means that that type must be addressed at an address that is a multiple of 2. For instance, a 16-bit integer on x86 has an alignment of 2, and so if you try to load a 16-bit integer from say the address 0x7ffff01, the CPU will throw an exception because that number is not a multiple of 2. x86 is a little less picky than most other architectures, with the highest alignment being 4 bytes¹, but ARM and most other RISCs align a type to its size. This all means that Java needs to know the alignment of a type in order to allocate space for it somewhere.

Some Rust types have well-known alignments due to matching one-to-one with types defined in the ISA, but most Rust types have compile-time undefined layout. However, Rust does provide the compile-time constant functions core::mem::size_of::<T>() and core::mem::align_of::<T>() for querying the size and alignment of a type. Unfortunately, types are not guaranteed to maintain their layout across compilations, especially if the compiler version were to change. Therefore, calls to these functions must be made in the same compiled library as all users of them.

Technically, SIMD vectors have higher alignment with certain instructions.

Subtyping and Variance

As a warning, this section will be complex and type-theory heavy, but the gist for this scope is that there are three types of lifetime relationships:

Covariant: ‘a can be used where a ‘b is expected if ‘a is as long or longer than ‘b. Shared references are covariant because a longer-living reference than required can always be given. Tree structures where you can only delete leaves kind of act like this (so a RefCell<&T> chain of references follows this).
Contravariant: ‘a can be used where a ‘b is expected if ‘a lives as long or shorter than ‘b. This only applies to arguments inside of functions or closures, so those should be banned from use to avoid any headaches. Closures aren’t application binary interface safe so they are already banned, and functions as arguments can be replaced with Java upcalls where less care is needed.
Invariant: ‘a cannot be used, the thing you pass in must live exactly as long as ‘b. This applies to exclusive references, because Rust allows you to modify data behind an exclusive reference and potentially change its lifetime, and the caller would have no idea its lifetime got changed, so that would fail once the caller tries to use it within its old lifetime, but outside its new lifetime. If an exclusive reference is checked for validity first before every time it is used, this can work (it’s effectively RefCell<&mut T>), but that then still bans every function that touches an exclusive reference directly. Honestly, this may not be truly solvable, it might just have to be invasive to the programmer.

To learn more about subtyping and variance, it is recommended to read the official Rust resource The Rustonomicon chapter 3.8.

Unwinding

With the default panic handler (fittingly named “unwind”), when Rust code calls panic!() Rust will begin walking local variables in the call stack to drop them, then kill the thread. If the type is mutably shared across threads, such as with a Mutex<T> does, then it may be in an inconsistent state, though it should not be necessary to have a custom type doing that. However, What is a concern is Rust calling drop on some types while they’re potentially in inconsistent states. For example, say a JavaRef<T> type is used to represent a reference held by Java. If it is busy updating its pointer for instance, and it panics in that function, Rust’s unwinding will eventually call drop() on it, so now the drop code is working with a JavaRef<T> with an invalid pointer. Rust does have another panic handler called “abort” which just prints a stack trace and aborts the process, which might be a better option if the types being used are not believed to be unwind-safe.

Example of Unwinding

Unwinding is implicit in Rust's error handling if a panic occurs. For explicit handling:

match calculator.evaluate(tokens) {
    Ok(result) => println!("Result: {}", result),
    Err(e) => println!("Error: {}", e),
}

To learn more about unwinding, it is recommended to read the official Rust resource The Rustonomicon chapter 7.

Phantom Data

Sometimes, when working with unsafe code, there may be a situation where lifetimes are associated with a struct, but not part of a field. For example:

struct Iter<'a, T: 'a> {
    ptr: *const T,
    end: *const T,
}

‘a isn’t being used in the body of this struct, so it’s unbounded. In Rust, making these types of lifetime annotations for structs is not allowed because of the implications it would have with maintaining correct variance and drop checking. The solution Rust offers is PhantomData, which is a special marker type. It doesn’t take up more memory, but it simulates a field of the desired struct type to implement for static analysis. It is easy to implement, the resulting struct would be:

struct Iter<'a, T: 'a> {
    ptr: *const T,
    end: *const T,
    _marker: marker::PhantomData<&'a T>,
}

This way, the lifetime will be bounded to a “field” of the struct Iter. This may bring up complications when writing a tool that automatically generates bindings to call code because of the way it is designed. As previously explained, method handles must be written for the different types a function may be working with, and the FFM API may be incompatible or unable to accommodate for a case where PhantomData is used. Rust uses unwinding to handle panics (unexpected errors) by default. In this code, any panic (e.g., an out-of-bounds error) would unwind the stack safely, cleaning up as it goes. Rust allows opting out of unwinding with panic=abort for faster binaries.

To learn more about phantom data, it is recommended to read the official Rust resource The Rustonomicon chapter 3.10.

Send and Sync

Send: the type can be moved between threads.
Sync: a type can be shared between threads (logically equivalent to &T being Send)

By default, most types are Send and Sync. If a type is moved to another thread, it is fine because it owns its data and therefore nothing else can touch that data or cause thread safety issues. If a shared reference is moved to another thread, that is fine because the mere existence of a shared reference means the data can no longer mutate, so there’s nothing needing synchronization between threads. If an exclusive reference is moved, again it is fine because that exclusive reference is the only thing allowed to look at or modify the underlying data, so there is no need to synchronize anything. The only types that are not both Send and Sync are types that cheat the aliasing and ownership rules like UnsafeCell<T> and Rc<T>.

Luckily, Java actually allows for this to be enforced. Arena.ofConfined() gives us a thread-local memory arena, and if code tries to use a MemorySegment allocated from this arena in another thread it will throw an exception. This is an absolute life saver, as it allows for the use of RefCell<T>, which is neither Send nor Sync, and which is useful for fixing many of the incongruities between Java and Rust’s memory models.

Example of Thread Safety and Send and Sync

use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    let calculator = Arc::new(Mutex::new(PostfixCalculator::new()));
    let calculator_clone = Arc::clone(&calculator);
    let handle = thread::spawn(move || {
        let mut calc = calculator_clone.lock().unwrap();
        let tokens: Vec<&str> = "3 4 +".split_whitespace().collect();
        calc.evaluate(tokens)
    });
    
    match handle.join().unwrap() {
        Ok(result) => println!("Result from thread: {}", result),
        Err(e) => println!("Error from thread: {}", e),
    }
}

Thread Safety: The Arc and Mutex wrapping of PostfixCalculator ensures that it can be safely shared and mutated across threads. Arc allows for shared ownership across threads, while Mutex provides mutual exclusion, preventing data races.

To learn more about Send and Sync traits, it is recommended to read these official Rust resources: The Rust Programming Language chapter 16.4, and The Rustonomicon chapter 8.2.

Data Races

Data races occur when multiple threads try to access the same memory segment, trying to write to it, and they can cause undefined behavior. Safe Rust guarantees that no data races will occur, and a big player for this is the ownership model. By definition, if a value can only have one owner (can make changes), then it can only be written to by its single owner. However, general race conditions are not prevented in Rust. They simply can’t be prevented from a mathematical standpoint, due to the way the scheduler works in different operating systems. This is something that is out of the developer's' control. This means that while a program may get deadlocked, or have incorrect synchronization, a Rust program will still be safe.

To learn more about subtyping and variance, it is recommended to read the official Rust resource The Rustonomicon chapter 8.1.

Atomics

Atomics are types that support operations in a thread-safe manner without external synchronization. For example, consider wanting to use a counter, foo, you want to use across different threads. It would not be safe to increment the counter using foo++, because that could result in a race condition: different threads trying to increment foo by one will cause undefined behavior. Locking can be used to make sure one thread increments the value of foo by one, and then the other one, but it has severe performance costs. Let’s say at first, foo = 0. Then, after both threads write to it, foo = 2 should be true. The way atomics would handle this is: both threads would check if the value of foo is 0, and if it is, increment to 1, otherwise, reevaluate. This would ensure that, no matter the order the operating system decides to call these operations, at the end, foo will be 2. Rust makes it very easy to work with atomics, for foo, just write:

let foo = Arc::new(AtomicUsize::new(0));

To learn more about atomics, it is recommended to read the official Rust resource The Rustonomicon chapter 8.3.

Compiler and Hardware Reordering

Compiler Reordering

Rust’s compiler makes many optimizations to reduce the number of operations the CPU will actually have to process. Sometimes it may as well just remove operations. For example:

let x: i32 = 1;
x = 2;
x = 3;

The compiler would remove the second line, x = 2, because it does not change the result. The code will still define x, initialize it as an i32 variable with value 1, and end with x having the value 3. However, if the result is not used, the compiler is likely to completely remove all mentions of x. Why bother generating code and allocating stack space for a value nobody will notice is missing?

Rust uses the LLVM compiler infrastructure as its backend, the same thing that the clang C compiler and clang++ C++ compiler use to generate machine code. LLVM is very smart, and will do things such as delete dead code, reorder operations to better saturate out-of-order CPUs, merge redundant operations (x += 1; x += 1 will be transformed to x += 2), keep things in registers rather than ever touching memory, turn loops of normal arithmetic into loops using SIMD/vector instructions. The point is, it is not clear what the code is actually going to look like. The only thing that is guranteed is that the compiler isn’t allowed to reorder things like print statements around each other, or move x += 1 to after a function call that uses x.

However, if there is access to another thread, these changes can be observed (with raw pointers at least, Rust won’t normally let you do this sort of thing without synchronization for a reason). So when multithreading, the developer must be explicit to the compiler: “I want all writes performed before this point to be visible before this operation, so other threads see what I want them to see”. That’s where atomics come into play.

Hardware Reordering

Despite compiler reordering, depending on the hardware architecture, some operations may be done in a different order by the CPU. This may be the case due to how memory is accessed internally. Global memory can be accessible everywhere but is slow, and cache memory is localized and faster. Programs may have different threads running at the same time. Rust guarantees that in each thread, the ordering will be correct. Despite that, having different memory access speeds means that if two threads are accessing memories that are vastly different in retrieval speed, the order in which those threads run operations may be in the wrong order relative to each other. If you now take a wrapper class into consideration, ordering might be thrown off even more. In these cases, Rust and Java’s atomic design will put more strain on hardware by stalling some threads so that order guarantees are kept.

To learn more about reordering, it is recommended to read the official Rust resource The Rustonomicon chapter 8.3.

Data Accesses

Another way the atomicity model Rust employs deals with providing strong guarantees is by introducing the concept of causality and providing tools to establish relationships between different parts of a program and the threads executing them. One of these, and potentially the most important, is the “happens before” relationship. It defines the order of a program: if there is a statement 1 and statement 2, and there is a relationship of “statement 1 happens before statement 2”, then statement 1 will be run before statement 2. This provides extra information to the compiler and hardware about the ordering of the operations, and allows for bigger optimizations on operations that are not affected by the order they are executed in. Data accesses are unsynchronized, which allows compilers to move them around as much as they want to optimize performance, especially if the program is single-threaded. The downside is that it can cause data races, which results in undefined behavior. Atomic accesses tell the compiler and hardware that the program is multi-threaded. They are marked with an ordering, which limits how the compiler and hardware can reorder these statements. In Rust,there are four types of orderings: sequentially consistent, release, acquire, relaxed.

To learn more about data accesses, it is recommended to read the official Rust resource The Rustonomicon chapter 8.3.

Orderings

Sequentially Consistent

As its name suggests, operations that are sequentially consistent will be executed sequentially. In other words, it guarantees that the execution of a program with multiple threads behaves as if each thread’s operations occurred in a certain order, without any reordering or interleaving. This means that if thread A is supposed to write to value x before thread B writes to value x, B will only be able to write to value x once A has written to it. It is implemented by using memory barriers: they are protecting x from B, and they are only letting their guards down once A has written to it. Compiler and hardware reordering makes a big difference in performance, so by restricting the program in these fields, performance tends to suffer.

Acquire and Release

Acquire and release work closely together, and they do so by acquiring locks and releasing locks. This is similar to how locks are used in real life to shut a door. On the outside anything can happen, but once a room is entered through a door, the space there is completely separated from the outside. In ordering, this means that any operations that are written after a lock is acquired can not be reordered, and the whole block of code will be executed sequentially in relation to the “outside world”. Once the block of code is executed and the lock is released, all operations that come after that are free to be reordered.

Relaxed

Relaxed data accesses can be reordered freely, and doesn’t provide the “happens before” relationship. Despite that, they are still atomic, and they are used when a section needs to be executed without its order really mattering. For example, using fetch_add() is a safe way of writing to a counter (incrementing its value), assuming the counter isn’t used to determine other accesses.

To learn more about orderings, it is recommended to read the official Rust resource The Rustonomicon chapter 8.3.

Uninitialized Memory

Rust allows developers to work with uninitialized memory. All memory that is allocated during runtime is uninitialized at first, and it will contain garbage values. Any novice programmer will know that working with this memory will cause undefined behavior. Regardless, Rust provides ways of working with uninitialized memory in safe and unsafe ways.

Checked

Rust by default doesn’t allow access to a memory segment that has not been initialized yet. This is great for Java-Rust bindings because it ensures that even if an attempt is made to access uninitialized memory from the Java side (which would normally be allowed and would produce undefined behavior) that is being allocated with Rust through the FFM API, it won’t produce undefined behavior or retrieve garbage values.

Drop Flags

This is related to the concept of lifetimes. Whenever a variable goes out of scope, suppose a variable named x defined as let mut x = Box::new(0);, Rust assigns the drop flag, which then pushes the drop function, drop(x), on the stack. The concept of ownership applies here too, where there can be only one pointer to a memory segment. Drop flags are tracked on the stack, and Rust decides when to drop a value during runtime. This is relevant to creating bindings, because even though Rust may have dropped a value, the Java variable that points to it when using the FFM API usually would not know that happened. Having access to a drop flag allows for tracking when such behavior happens, so they can be invalidated from the Java side too.

Unchecked

Arrays can not be partially initialized, since null does not exist in Rust, so arrays that are defined have to be fully initialized, with a value to every section of memory that is represented by the indexes. This can make developing code harder, especially when trying to work with dynamically allocated arrays. To solve this, Rust implements the MaybeUninit type. For example, to define an array that may be uninitialized, we would write:

let mut x: [MaybeUninit<Box<u32>>; SIZE] = unsafe {
    MaybeUninit::uninit().assume_init() };
}

This works because the MaybeUninit is the only type that can be partially initialized, and .assume_init() makes the Rust compiler think that the array of MaybeUninit<T> was fully initialized. In this case, we are pointing to a Box, which is a container for the type u32. The array can then be initialized with the following:

for i in 0..SIZE {
    x[i] = MaybeUninit::new(Box::new(i as u32));
}

Usually, when working with an array of pointers, assigning a new value to x[i] would mean that the left hand side value would be dropped. But this is not a problem when the left hand side contains MaybeUninit<Box<u32>> because it does not contain anything, it just works as a placeholder. Finally, that array that may be uninitialized may be turned into an array that we know has been uninitialized with this line of code:
unsafe { mem::transmute::<_, [Box<u32>; SIZE]>(x) }

To learn more about checked uninitialized memory, it is recommended to read the official Rust resource The Rustonomicon chapter 5.1.