How to set up and run Apache SystemML locally.Docs
Alternatively, you can download Spark directly.
If you are a python user, we recommend that you download and install Apache SystemML via pip:
Alternatively, if you intend to use SystemML via spark-shell (or spark-submit), you only need systemml-0.12.0-incubating.jar, which is packaged into our official binary release (systemml-0.12.0-incubating.zip). Note: If you have installed SystemML via pip, you can get the location of this jar by executing following command:
Start up your Jupyter notebook by moving to the folder where you saved the notebook. Then copy and paste the line below:
To use SystemML with Spark Shell, the SystemML jar can be referenced using Spark Shell’s --jars option. Start the Spark Shell with SystemML with the following line of code in your terminal:
To begin, start an MLContext by typing the code below. Once successful, you should see a “Welcome to Apache SystemML!” message.
The ScriptFactory class allows DML and PYDML scripts to be created from Strings, Files, URLs, and InputStreams. Here, we’ll use the dmlmethod to create a DML “hello world” script based on a String. We execute the script using MLContext’s execute method, which displays “hello world” to the console. The execute method returns an MLResults object, which contains no results since the script has no outputs.
As an example of how to use SystemML, we’ll first use Spark to create a DataFrame called df of random doubles from 0 to 1 consisting of 10,000 rows and 1,000 columns.
We’ll create a DML script using the ScriptFactory dml method to find the minimum, maximum, and mean values in a matrix. This script has one input variable, matrix Xin, and three output variables, minOut, maxOut, and meanOut. For performance, we’ll specify metadata indicating that the matrix has 10,000 rows and 1,000 columns. We execute the script and obtain the results as a Tuple by calling getTuple on the results, specifying the types and names of the output variables.
Many different types of input and output variables are automatically allowed. These types include Boolean, Long, Double, String, Array[Array[Double]], RDD
Let’s take a look at an example of input matrices as RDDs in CSV format. We’ll create two 2x2 matrices and input these into a DML script. This script will sum each matrix and create a message based on which sum is greater. We will output the sums and the message.
Congratulations! You’ve now run examples in Apache SystemML!