Deeplearning4J 1.0.0-beta7 released

We are proud to announce the 1.0.0-beta7 release of the Deeplearning4J family of libraries. Whether you are using ND4J, SameDiff, Deeplearning4J, RL4J, DataVec, Arbiter or any combination of those, you should update to the latest version. We have fixed a lot of bugs, implemented many optimizations and introduced a lot of new features.

This blog post covers the highlights of the 1.0.0-beta7 release. If you want to see all changes, take a look at the release notes page in the DL4J documentation

Got any feedback?

If you’d like to give us feedback for this release, feel free to post it on the community forums or send us an email to [email protected].

SameDiff and ND4J APIs Synchronized – Operation Namespaces

Over the last few releases, a bit of op drift happened between ND4J and SameDiff where some operations (ops) where available only in one of the two libraries.

We have solved this problem through code generation. The ops are now defined in a separate repository (see and we generate both the code and the reference documentation for those ops from those definitions.

For all namespaces there are now equivalents on both the Nd4j and SameDiff classes. This means that you will find all ops that are available in Nd4j.math also in SameDiff.math. All of those ops will have the same signatures, with the obvious exception that SameDiff will take SDVariable typed tensors, while Nd4j requires INDArray typed tensors.

To see an overview of all available ops, take a look at the new op documentation.

If you want to learn more about it, feel free to take a look at and ask questions on the community forums.

New Operations and Namespaces

Since beta6 we have added a bunch of new operations to the pre-existing namespaces. If you were using DynamicCustomOp previously, you should check the documentation if that op is now available directly and remove that workaround.

Also, we have added a whole new namespace: linalg. It provides you with an assortment of operations for linear algebra use cases. Check out the documentation for all supported operations in this namespace.

Multi-Threaded inference with SameDiff

SameDiff now supports multi-threaded inference. This means that you can use the same SameDiff graph instance both safely and efficiently from multiple threads.

This is especially useful if you are using it to deploy your trained model to some kind of web service or in an application server like Apache Tomcat.

SameDiff Profiling listener

A really cool feature that might come in handy when trying to understand why something is running slow is the SameDiff profiling listener.

sd.addListeners(ProfilingListener.builder(new File("X:/profiler.json"))

You can add it to your SameDiff graph, and it will collect runtime information that can be opened up in the chrome dev tools by opening chrome://tracing and loading the resulting JSON file.

This is what a trace may look like for you.

That way you can pinpoint where exactly most of your training or inference time is spent!

Easier Custom Loss Functions

Sometimes you need a special loss function for your model. Previously, you had to implement both the forward as well as the backward pass if you wanted to use a custom loss function. Now you can define your loss function using SameDiff and the backward pass will be taken care of automatically.

.layer(new OutputLayer.Builder(new SameDiffLoss() {
    public SDVariable defineLoss(SameDiff sameDiff, SDVariable layerInput, SDVariable labels) {
        return labels.squaredDifference(layerInput).mean(1);

In principle it is as simple as this example shows you. However, usually you will want to define your loss function as a non-anonymous class – otherwise you will not be able to load your saved model.

This is also of interest for people who import their models from Keras, as we sometimes can’t import custom parts of their models, and they have to provide an implementation in Java. This is now very easy, as they don’t have to deal with the backwards pass anymore.

public class MyLoss extends SameDiffLoss {
    public SDVariable defineLoss(SameDiff sd, SDVariable layerInput, SDVariable labels) {
        return sd.math.log(sd.math.cosh(labels.sub(layerInput)));

First you define the loss function you need. And then before you import your model, you register it along with the name that is used in the keras model like this:

KerasLossUtils.registerCustomLoss("my_loss", new MyLoss());

Keras import improvements

Many have asked for it, and finally it arrived: The first iteration of the tf.keras import support. Keras has been importable for a long time, but when it became a part of Tensorflow, they changed the saved model format, and with it our import got into trouble. Now, with this release we finally added support for importing tf.keras.

The import itself is as simple as ever:

ComputationGraph dl4jModel = KerasModelImport.importKerasModelAndWeights(path);

If you’ve used Keras import before, you will notice that this is exactly the same invocation as it was for the non-tf.keras version. This way you shouldn’t need to worry which version you are importing, as we take care of distinguishing them behind the curtains.

Support for NHWC and NWC input formats

In addition to the NCHW (channels first) data format, we now also support the NHWC (channels last) data format for all 2D CNN layers (conv2d, pooling2d, etc) and global pooling.

The difference between those two formats is the position of the channel in the shape of the input. NCHW formatted data has the shape [numExamples, channels, height, width], while NHWC formatted data has the shape [numExamples, height, width, channels].

As DL4J is frequently used as the deployment platform for pretrained Keras models, the importer will now apply the appropriate data format configuration, so imported models should expect data to be provided in the same format as it was in training.

Additionally, DL4J now also supports the NWC (channels last; shape: [numExamples, sequenceLength, channels]) format for all RNN and 1D CNN layers. Again, this makes it easier to use imported models with data formatted in the same way that was used to train them.

If you want to use those data formats outside of model import, you can specify them by configuring your input type accordingly when you define your model architecture.

// Recurrent, i.e. time series input, in NWC format
.setInputType(InputType.recurrent(numFeatures, RNNFormat.NWC))

// Image input in NCHW format
.setInputType(InputType.convolutional(height, width, channels, CNN2DFormat.NCHW))

RL4J Improvements

RL4J has seen a lot of work in this release by Alexandre Boulanger and Chris Bamford. Most of the work went into refactoring the internal structure in order to make it ready for new algorithms.

For anyone who has struggled with RL4J in the past, now would be a good time to try it out again, as the refactoring has also brought with it a lot of bug fixes and performance improvements.


Overall, this release has seen a lot of optimizations, so you should be seeing a faster training performance than in beta6.

One of the optimizations that we’d like to highlight is that we have moved optimizers into libnd4j, meaning that they are now fully implemented in C++ instead of Java.

Preview: Support for ARM64 devices

With beta7 we have introduced support for ARM64 based devices. Both the Raspberry Pi 4 (when using a 64-bit distribution) and the Jetson Nano fall into this class of devices. With beta7 you can use those devices without any additional setup.

Because there is a bug in the latest release of OpenBLAS (version 0.3.9), which results in GEMM operations going to NaN sometimes. For this reason, the support of ARM64 devices is only a preview in this release. If you want to keep track of the progress, see this issue on github.

If you want to use a Raspberry Pi 4, we suggest sticking with Raspbian in 32-Bit mode (armhf), as our testing has indicated that it just works out of the box.

As the Jetson Nano also features a CUDA capable GPU, we are also releasing a CUDA backend for ARM64. However, that has only been tested on the Jetson Nano.

Breaking Changes

Even though we are still in a pre-1.0 release, we have tried to keep breaking changes to a minimum, with no breaking changes between most of the releases. Unfortunately, this time around a few breaking changes were necessary.

Work towards Java Modules and OSGi Support

Historically we had some packages that were split over multiple jar files. This is a big problem when trying to use the Java module system that is present since Java 9 or when trying to deploy DL4J based applications in an OSGi context.

In order to rectify this, we had to introduce some breaking changes, by moving classes to different packages.

If you have code that doesn’t compile because of those changes any more, you can probably let your IDE sort out the changes you need to make. But for everyone else the pull request that introduced those changes has all the necessary information:

Keras import now defaults to using model channel order

For some users the new NHWC and NWC data format support, and automatic usage of it when importing Keras models, results in a breaking change.

If you are importing a Keras model, but have built a data pipeline that provides the data in NCHW or NCW format, you will likely run into an error message like this:

org.deeplearning4j.exception.DL4JInvalidInputException: Cannot do forward pass in Convolution layer (layer name = layer0, layer index = 0): input array channels does not match CNN layer configuration (data format = NCHW, data input channels = 12, [minibatch, channels, height, width]=[5, 12, 12, 3]; expected input channels = 3) (layer name: layer0, layer index: 0, layer type: ConvolutionLayer)
Note: Convolution layers can be configured for either NCHW (channels first) or NHWC (channels last) format for input images and activations.
Layers can be configured using .dataFormat(CNN2DFormat.NCHW/NHWC) when constructing the layer, or for the entire net using .setInputType(InputType.convolutional(height, width, depth, CNN2DForman.NCHW/NHWC)).
ImageRecordReader and NativeImageLoader can also be configured to load image data in either NCHW or NHWC format which must match the network

As you can see we tried to make it as obvious as possible what needs to be done. However, if you imported your Keras model in a previous version and then saved it in DL4J format, then it will work just as before and you don’t have to change anything.

Korean NLP users with Scala Dependencies beware

In the last release, version 1.0.0-beta6, we released the deeplearning4j-nlp-korean module with Scala 2.12 support, however, there were runtime issues, as a transitive dependency hasn’t been updated to properly support Scala 2.12.

For this reason, we can only release this module with Scala 2.11 support.

SameDiff: DifferentialFunctionFactory Class Removed

Due to the way we handle the Op definition now (see SameDiff and ND4J APIs Synchronized – Operation Namespaces), this class isn’t needed anymore and has been removed.

This class used to be available as SameDiff.f(), but wasn’t considered to be a public API. If you still used it for anything, you will have to adapt to using namespaces, as now all ops that were previously available through it have equivalents in the namespaces.

Removed Modules

We have removed a few modules, some of them didn’t make sense as standalone modules anymore and were merged into others, while others have been removed entirely.

Removed because they weren’t maintained and didn’t see a lot of usage, or were otherwise deprecated: datavec-perf, datavec-camel, nd4j-camel-routes, nd4j-gson, deeplearning4j-aws, deeplearning4j-util, nd4j-jackson

Merged into nd4j-api: nd4j-context, nd4j-buffer

Merged into deeplearning4j-core: dl4j-perf, dl4j-util

Add a Comment

Your email address will not be published. Required fields are marked *