Install from Azure Marketplace
For scalable deployments on the Microsoft Azure Cloud, we detail in the following sections the steps to set you-up.
We assume you have an operational Azure account and that you are used to provision resources like virtual images, virtual network....
You are also able to connect remotely to those images via SSH and run Linux shell commands.
I you don't have yet an account, Microsoft can provide you with a free trial subscription.
We describe 2 different techniques:
The first one is an easy installation from the Microsoft Azure Marketplace. The current available version is however limited in terms of scalability, this is why we recommend to use the second way.
The second one is a separate installation of a Microsoft Azure HDInsight (Hadoop) cluster and the installation of the Zeppelin Docker image on a separated virtual machine instance located on the same virtual network.
Go to the Datalayer page on the Microsoft Azure Marketplace and click on install.
You will then have to fill a few parameters like the size of the instance..., and then click on
Once done, check the public IP address which has been assigned and browse the welcome page served on that IP address.
This deployment supports Scala, Python and R in
First create a dedicated Resource Group for your future Cluster and any upcoming servers you will add.
In this Resource Groupe, create a dedicated Virtual Network for you Cluster and any upcoming servers you will add.
Launch a new HDInsigh Cluster.
Hadoop cluster type is the one you need (not Spark, as the Zeppelin image will ship the needed Spark Libraries to the Hadoop Cluster in YARN mode).
As second step, define the credentials.
Then choose the number of nodes you want. A tiny 3 nodes cluster will do the job to start.
Configure the network and the resource group (indicate the ones you have created in the previous steps).
Follow the operation events. If something goes wrong, delete and restart...
If everything goes well, your Hadoop cluster will be available in the Resource Group you have defined.
You can verify that your Head and Worker nodes are available on the Virtual Network you have created.
The public IP adresses with which you will connect to from your remote environment are also listed.
You can connect via SSH to the Head Node.
Create Zeppelin Node
This section explains how to create a separate node for Zeppelin.
- A separate Node is technically needed because Azure does not allow you to open ports on the Cluster Nodes.
- It is also a good practice to have dedicated Node for each responsbilities (Notebook and Cluster).
First, choose a server type (for example a Centos 7 image).
Set the needed parameters:
- Your SSH public key.
- The Virtual Network (the same as the one used for the HDInsight cluster).
Double check that everything is nicely setup.
For the network WEB access to Zeppelin, open port
80 for HTTP.
Configure Zeppelin Node
From a shell session, use the SSH command to connect to the Zeppelin node by providing the username and IP address .
If you used a password for the user account, you will be prompted to enter the password.
If you used an SSH key that is secured with a passphrase, you will be prompted to enter the passphrase. Otherwise, SSH will attempt to automatically authenticate by using one of the local private keys on your client.
Once logged on, the first action is to Act as
Then install and start Docker. If you use a Centos 7 image, this can be achieved with:
sudo yum install -y docker sudo service docker start
If you use a Ubuntu image, you will install and start Docker with:
sudo apt install docker.io sudo service docker start
We the expect you to:
Get the image with
docker pull datalayer/zeppelin.
start.shscript located in the
zeppelinfolder will allow you to start the
For more details, follow the Zeppelin Docker documentation to install and configure the needed Zeppelin Docker image. Check especially the
Spark in YARN mode section as the Docker container will have to connect to the external HDInsight cluster.
Before launching Zeppelin, there are 2 important additional steps to connect to the correct cluster:
Copy the complete
/etc/hadoop/conffolder from the HDInsight cluster Head Node to the Zeppelin Node. You will scp them on your laptop from the HDInsight cluster, to then scp them from your laptop to the Zeppelin Node
/etc/hostsand add the entries present in the
/etc/hostsof the HDInsight cluster Head Node (connect via SSH to see the host file).
Typical entries in
/etc/hosts are e.g.:
100.117.108.74 PkrVMxrkvxp812q.PkrSrvxrkvxp812q.b2.internal.cloudapp.net PkrVMxrkvxp812q 10.0.0.17 10.0.0.17 headnodehost # SlaveNodeManager
Run Zeppelin Docker Image
These are the steps to fit Azure requirements:
datalayer-docker repository you have cloned, go to the
zeppelin folder and start on the HTTP port 80 with:
CTRL-C and stop the running Zeppelin process with:
/opt/datalayer-zeppelin/conf/datalayer-site.xml and set:
<property> <name>datalayer.spark.master.mode</name> <value>yarn</value> </property> <property> <name>datalayer.hadoop.conf.dir</name> <value>/etc/hadoop/conf</value> </property>
Finally take note of the public IP address of your Zeppelin Node and type it in your browser. The Zeppelin welcome page should show up.
Sign Up menu to create your profile. Once your profile is created, you can read the documentation to know more about the offered functionalities.