Elara math/Elara array

In Python it is frequent to write code like this:

x = np.linspace(0, 50)
x2 = x**2
x3 = np.sin(x2)

In Rust, this would be very bad coding. If the underlying power and sine math operations wanted to replicate the “Python feel” and keep all three variables (x, x2, x3) accessible in memory, then in Rust those operations will require a clone each time, which wastes memory (since you’ll have 3 copies of the same array). Meanwhile, if those same (power and sine) operations followed standard Rust convention, then the memory would be moved twice: first x -> x2 and then x2 -> x3, meaning that x and x2 will both be inaccessible afterwards. Both are definitely not ideal.

So ultimately there are only three real options:

  1. Every time you perform one of those operations (power and sine, specifically), they will mutate x, and the result should return a reference to x. This means that x2 and x3 will both be references to x and will be mutable references. An explicit clone
  2. Implement all operations via copy, and say explicitly in the user guide that it is highly recommended to use the .mapv() (vector map) operation as much as possible to be able to do all your operations at once (for instance, y = x.mapv(|el| math.sin(pow(el, 2))).collect()) so that you don’t need to use intermediate variables that waste memory by unnecessarily copying.
  3. Lazy operations: instead of returning an actual new array, you’d return a type like MathOp<Sin<Pow<2>>, NdArray> that evaluates the actual values only when they’re read

The conventions of ndarray follow a relatively composite approach that allows both strategies (1) and (2). However, Elara Math is designed for speed, so the plan is to maximize performance, but this requires careful writing of code:

OperationImplementationDependent on order of operands?Overwrites/moves (if any)
c = &a + &b where &a and &b are mutable referencesConsumes a and updates it, then returns it. a is moved to c.Yes, first operand is always the operand is always the array that is moved and updated.a is moved to c, b remains accessible
c = a + bConsumes a and updates it, then returns it. a is moved to c.Yes, first operand is always the operand is always the array that is moved and updated.a is moved to c, b remains accessible
c = a + &bConsumes a and updates it, then returns it. a is moved to c.Yes, the operand passed by value (as opposed to by reference) is always the array that is moved and updated.a is moved to c, b remains accessible
c = b + &aConsumes b and updates it, then returns it. b is moved to c.Yes, the operand passed by value (as opposed to by reference) is always the array that is moved and updated.b is moved to c, a remains accessible
c = a.clone() + bCopies data from a and b into a new array c, then returns the new array.NoNone, a and b both remain accessible
c = a.sin()Consumes a and updates it. a is moved to c.N/Aa is moved to c
c = a.sin_copy()Equivalent to c = a.copy().sin(). Copies data from a and returns a new array.N/ANone, a remains accessible
c = a.mapv(...)Consumes a and updates it. a is moved to c.N/Aa is moved to c
c = a.mapv_copy(...)Equivalent to c = a.copy().mapv(...). Copies data from a and returns a new array.N/ANone, a and c both remain accessible.
c = a.dot(&b) or c = a.dot(b)Consumes a and updates it. a is moved to c.N/Aa is moved to c
c = a.dot_copy(&b) or c = a.dot_copy(b)Equivalent to c = a.copy().dot(&b). Copies data from a and returns a new array.N/ANone, a and c both remain accessible.

The general idea is this: Elara array and Elara math will aggressively avoid clones by consuming and updating your arrays. This means that if you want to make sure that the original array stays, you must pass by a clone, e.g. c = a.clone() + b rather than c = a + b. This means that most operations are fast and efficient by default (although we’ll need to do benchmarking to be sure).

Finally, another important performance optimization is to switch from the backing datatype for storing the data in the NdArray from Vec<T> to Cow (copy-on-write smart pointer). After all, NdArrays by definition are supposed to be fixed-length arrays: you can reshape them and slice them but you’re not supposed to insert/remove elements from them. Thus, there is no need for the backing datatype to be a dynamically-sized array when that’s not necessary in the first place. Additionally, the shape should also be a Cow: this is less to do with performance and more to do with user-friendliness. Right now, it is necessary to write Ndarray<T, N> in specifying the shape in the type signature, which is extremely unintuitive and makes the library very difficult to write and use, since you have to both specify the type and the dimensionality of the array (which you might not even remember!) Rather, it is better to omit that entirely and just do a single generic parameter (that being the dtype). A minimal sketch of the revised code would be this:

use std::borrow::Cow;
 
#[derive(Clone, Debug)]
struct NdArray<'a, T: Clone> {
    shape: Cow<'a, [usize]>,
    data: Cow<'a, [T]>,
}
 
// Convenience types for specific
// types of arrays
 
type NdArrayi32<'a> = NdArray<'a, i32>
type NdArrayu32<'a> = NdArray<'a, u32>
type NdArrayf32<'a> = NdArray<'a, f32>
type NdArrayf64<'a> = NdArray<'a, f64>
 
// Subtypes/classes for 2D and 3D
// arrays (since some operations
// e.g. cross/dot product are only
// well-defined for arrays of
// certain dimensionality)
// this strong typing helps avoid
// runtime errors like ".cross() operation
// not supported for non-2D/3D arrays"
 
#[derive(Clone, Debug)]
struct NdArray1D<'a, T: Clone> {
    shape: [usize; 1],
    data: Cow<'a, [T]>,
} // dot product exclusively supported
// you can run .flatten() on other
// ndarrays to convert them to NdArray1D
 
#[derive(Clone, Debug)]
struct NdArray2D<'a, T: Clone> {
    shape: [usize; 2],
    data: Cow<'a, [T]>,
}
 
#[derive(Clone, Debug)]
struct NdArray3D<'a, T: Clone> {
    shape: [usize; 3],
    data: Cow<'a, [T]>,
}
 
impl<'a, T: Clone> NdArray<'a, T> {
    // We default to not owning unless
    // we have to, this also makes it
    // (hopefully) easier to avoid
    // the constant borrow checker
    // problems
    fn new(data: &'a[T], shape: &'a[usize]) -> Self {
        NdArray {
            shape: Cow::Borrowed(shape),
            data: Cow::Borrowed(data)
        }
    }
    
    // This DOES own the data but it comes from passing
    // already owned-data (so this will cause a move)
    // (in this case, either an array or Vec, though
    // array is preferred)
    fn new_owned<A, S>(data: A, shape: S) -> Self 
    where A: Into<Vec<T>>, S: Into<Vec<usize>>
    {
        NdArray {
            // not 100% sure if .into()
            // causes a clone, hopefully
            // it is just a borrow
            data: Cow::Owned(data.into()),
            shape: Cow::Owned(shape.into()),
        }
    }
    
    fn reshape<S>(&mut self, shape: S)
    where S: Into<Vec<usize>> + Clone
    {
        // Clone here is cheap since
        // shape is not going to be a
        // big array and it avoids
        // possible problems later on
        self.shape = Cow::from(shape.into());
    }
}
 
fn main() {
	// example of passing data by reference
    // let my_array = NdArray::new(
    //     &[1, 2, 3, 
    //      4, 5, 6], &[2, 3]);
    // println!("{:?}", my_array);
    // example of passing data and
    // transfering ownership
    let mut my_array = NdArray::new_owned(
        [1, 2, 3, 
         4, 5, 6], [2, 3]);
    println!("Before reshape: {:?}", my_array);
    
    my_array.reshape([3, 2]);
    
    // To avoid a move, you must either
    // pass by reference or clone (which is expensive)
    let another_array = &my_array; // 1st option
    // let another_array = my_array.clone(); // 2nd option
    println!("Linked array {:?}", another_array);
    
    println!("After reshape: {:?}", my_array);
}

It should also be mentioned that in the user guide (on cargo doc), when we eventually work on ndarray integration, it is necessary to do use elara_array::NdArray as ElaraArray or similar to avoid namespace clashes.

Planned data saving API:

// use a specialized nanoserde-based
// binary file format for serializing arrays
arr.save("array.elr");
NdArray::from_file("array.elr");

Planned indexing/slicing API:

// Indexing (for both getting and setting data)
// these are implemented by two index implementations,
// one for views and one for direct indices
impl Index<[f64; N]> for NdArray<T, N>;
impl Index<NdArrayView<f64, N>> for Ndarray<T, N>;
 
// Construct a view
// These replace numpy-style slices
let yourview<f64, N> = NdArrayView::row_view(start, end, step);
 
arr[[1, 3, 5]]; // direct indexing
arr[yourview]; // view indexing
 
// Slices
pub fn slice(&self, slice: &[Range<usize>]) -> ArrayView<T, N>;
pub struct ArraySlice;
 
// returns view of entire array
a[&[.., ..]]
 
// returns view of 1st inner element
a[&[s!(0), ..]] // this is equal to a[&[..1, ..]]
 
// returns view of range
a[&[1..2, 1..2]]
 
// for more exotic slices use the dedicated slicer
let s = ArraySlice::new_columns([1, 5, 8]);
a[s] // returns another view

Acknowledgement: a lot of these ideas came from the nd_array crate and NdArray and they deserve the credit for that.

Elara ML

For Elara ML there are actually 3 APIs planned to be implemented. The first one is PyTorch-style:

pub struct MyModel {
    input_layer: Input,
    hidden_1: Dense,
    hidden_2: Dense,
    output_layer: Output
}
 
impl MyModel {
    // x and y are here only for shape determination
    fn new(x: Tensor, y: Tensor) -> MyModel {
        // automatic shape determination by passing
        // another layer as first argument
        let input_layer = Input::new(x);
        let hidden_1 = Dense::new(input_layer, 16);
        let hidden_2 = Dense::new(hidden_1, 16);
        let output_layer = Output::new(hidden_2, y);
        
        MyModel {
            input_layer,
            hidden_1,
            hidden_2,
            output_layer
        }
    }
}
 
impl Model for MyModel {
    // Models can only have one output, for
    // multi-input-output neural networks you
    // need to chain together multiple Models
    fn forward(x: Tensor) -> Tensor {
        // These absolutely don't need to
        // be in the same order as you declared
        // in new() (but probably should be so
        // that the auto shape determination works)
        let a = self.input_layer.forward(x);
        let b = self.hidden_1.forward(a);
        let c = self.hidden_2.forward(b);
        let d = self.output_layer.forward(c);
        d
    }
}
 
fn main() {
    let model = MyModel::new();
    model.compile(Optimizers::SGD);
    model.fit(&x, &y, 500, 0.00001, true);
}

This API makes it easiest to use pre-made models, because you can simply import the model and compile it. However, it might be too much abstraction - it can be a little hard to see what the model is actually doing, especially with methods like compile() and fit() that no longer have a 1-1 correspondence with performing operations on tensors.

The second uses a macro Sequential! to imitate Keras’s sequential API. This makes it easiest to learn, but again, abstracts away too much, which is not ideal, especially given how much debugging is done when making NNs.

The third is most barebones, and is the Jax-inspired API. It looks like this:

// This is just a convenient way of
// holding layers, there is nothing
// special about this struct
struct Layers {
    pub input_layer: Input,
    pub hidden_1: Dense,
    pub hidden_2: Dense,
    pub output_layer: Output
}
 
impl Layers {
    // x and y are here only for shape determination
    fn new(x: Tensor, y: Tensor) -> Layers {
        // automatic shape determination by passing
        // another layer as first argument
        let input_layer = Input::new(x);
        let hidden_1 = Dense::new(input_layer, 16);
        let hidden_2 = Dense::new(hidden_1, 16);
        let output_layer = Output::new(hidden_2, y);
        
        MyModel {
            input_layer,
            hidden_1,
            hidden_2,
            output_layer
        }
    }
 
    // Note: for zero_grad()
    // and update(), these can be
    // made less verbose by creating an
    // iter() method - see
    // https://stackoverflow.com/questions/30218886/how-to-implement-iterator-and-intoiterator-for-a-simple-struct
    fn zero_grad(&self) {
        self.input_layer.zero_grad();
        self.hidden_1.zero_grad();
        self.hidden_2.zero_grad();
        self.output_layer.zero_grad();
    }
 
    fn update(&self, lr: f64) {
        self.input_layer.update(lr);
        self.hidden_1.update(lr);
        self.hidden_2.update(lr);
        self.output_layer.update(lr);
    }
 
    fn save(&self) {
        let weights = NNSerializer::new("weights.bin");
        // Add labels to weights; they will be referred
        // to by their labels when the weights are loaded
        weights.add(self.input_layer, "input_layer");
        weights.add(self.hidden_1, "hidden_1");
        weights.add(self.hidden_2, "hidden_2");
        weights.add(self.output_layer, "output_layer");
        weights.write();
    }
}
 
fn forward(layers: Layers, x: Tensor) -> Tensor {
    let a = layers.input_layer.forward(x);
    let b = layers.hidden_1.forward(a);
    let c = layers.hidden_2.forward(b);
    let d = layers.output_layer.forward(c);
    d
}
 
fn mean_squared_error(y: Tensor, y_pred: Tensor) -> Tensor {
    (&y_pred - &y).pow(2)
}
 
fn main() {
    // load x and y...
    let layers = Layers::new();
    let pbar = TrainingProgress::new(); // used to display progress bars
 
    // here we write our custom optimizer
    for i in 0..1000 {
        let preds = forward(layers, x);
        // preds and loss are both tensors, so they
        // can work with all the standard tensor methods,
        // including output to graphviz files!
        let loss = mean_squared_error(y, preds);
        pbar.update(i, &loss); // shows latest progress
        let lr = 1.0 - 0.9*i/100.0
        loss.backward();
        layers.update(lr);
        layers.zero_grad();
    }
}

This approach has just the right amount of abstraction, and is very flexible, because it allows defining custom forward passes (with the ability to do multiple inputs or multiple outputs), custom loss functions, and custom optimizers. Furthermore, this API can easily interoperate with the PyTorch-style API. So this will be the API that is primarily focused on.

Elara UI

Elara UI/UX guidelines: high accessibility, visual comfort, and clarity are the main priorities.

Elara UI should have responsivity by setting breakpoint functions in Component trait.

For Elara UI also create pure CPU-based backend that uses a Rust-ported version of fenster. Users can choose which backend they want:

  • GPU backend is faster and leads to smoother UI rendering but uses a lot of battery and can be glitchy if graphics drivers aren’t working correctly, and may not be compatible with very old devices
  • For low-powered devices CPU rendering is better, but it is much slower