Interpreter Security

We assume you have a Hadoop cluster named DatalayerCluster.

Start with a simple Cluster with the HDFS + YARN + MapReduce2 + Zookeeper services and define the custom core-site configuration properties:

  • hadoop.proxyuser.datalayer.hosts = *
  • hadoop.proxyuser.datalayer.groups = *

Enable Kerberos

You have the option to Kerberize your Hadoop Cluster.

As prerequisite, you need to download the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files from the Oracle WEB Site and place the jar files in the Java JCE folder. You can copy the jars from your local drive to the Docker container with: scp -P 2222 *.jar root@localhost:/opt/java/jre/lib/security.

Follow the steps described on the Hortonworks documentation website Configuring Ambari and Hadoop for Kerberos to enable Kerberos.

Tip: During the Kerberization, if for some reason HDFS does not go out of its safe mode, you can force it with:

kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-DatalayerCluster@DATALAYER.IO
hdfs dfsadmin -safemode leave

As part of the process, create the Kerberos admin user and an additional user.

sudo kadmin.local -q "addprinc admin/admin"

Already done in the Docker image with password 'datalayer'.

Check Kerberos.

sudo kadmin.local -q "listprincs"
# sudo kadmin.local -p admin/admin
sudo kadmin.local -q "addprinc zeppelin@DATALAYER.IO"

Already done in the Docker image with password 'datalayer'.

Check the validity of the user your have created (get a Ticket and show it):

kinit zeppelin@DATALAYER.IO
klist
kdestroy

Tips

Kerberos correct working is sensible to name resolution to IP address.

Ensure the hostname of the Docker image resolves the same way in the image and in the host.

Check '/etc/hosts'.

Take care on encryption type in /etc/krb5.conf.

[libdefaults]
  default_realm = DATALAYER.IO
  rdns = false
  ticket_lifetime = 24h
  renew_lifetime = 7d
  dns_lookup_realm = false
  dns_lookup_kdc = false
  forwardable = true
  default_tkt_enctypes = des3-cbc-sha1 rc4-hmac des-cbc-crc des-cbc-md5
  default_tgs_enctypes = des3-cbc-sha1 rc4-hmac des-cbc-crc des-cbc-md5
  permitted_enctypes = aes128-cts-hmac-sha1-96 des-cbc-crc des-cbc-md5 des3-cbc-sha1 rc4-hmac
  allow_weak_crypto = true

Validate Cluster

Check your Kerberos Cluster:

klist -kt /etc/security/keytabs/hdfs.headless.keytab
kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-DatalayerCluster@DATALAYER.IO
hdfs dfs -mkdir /user/zeppelin
hdfs dfs -chown -R datalayer:hdfs /user/zeppelin
hdfs dfs -ls /user
hdfs dfs -ls /user/zeppelin
kinit -kt zeppelin@DATALAYER.IO
klist
# 2.3.4.0-3845 depends on your HDP version...
yarn jar /usr/hdp/2.3.4.0-3845/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 10

SPNEGO Keytab

Create a user and export the Keytab to be used for the SPNEGO (not yet available on Zeppelin):

sudo kadmin.local -q "addprinc -randkey HTTP/zeppelin-ambari.datalayer.io.local@DATALAYER.IO"
sudo kadmin.local -q "xst -k /etc/security/keytabs/spnego.keytab HTTP/zeppelin-ambari.datalayer.io.local@DATALAYER.IO"
sudo chmod 400 /etc/security/keytabs/spnego.keytab

Already done in the Docker image.

Check the Keytab:

sudo klist -kt /etc/security/keytabs/spnego.keytab

Proxy User Keytab

Create a user and export the Keytab to be used for the Proxy User:

sudo kadmin.local -q "addprinc -randkey datalayer@DATALAYER.IO"
sudo kadmin.local -q "xst -k /etc/security/keytabs/datalayer.keytab datalayer@DATALAYER.IO"
sudo chmod 400 /etc/security/keytabs/datalayer.keytab

Alreay done in the Docker image.

Check the Keytab:

sudo klist -kt /etc/security/keytabs/datalayer.keytab

You are now ready to enable Zeppelin to run on your Kerberos cluster configuring those properties in datalayer-site.xml:

  • datalayer.hadoop.keytab.path = /etc/security/keytabs/datalayer.keytab
  • datalayer.hadoop.keytab.principal = datalayer@DATALAYER.IO

Already done in the Docker image.

Browse http://localhost:8666 to view the welcome page and test your Spark code on your Cluster (yarn-client mode).

Read more on the Datalayer Zeppelin Guide for other specific configuration to map your requirements.

User Repository

The datalayer.persist.storage defines where users are stored.

The default configuration uses a JDBC connection as defined in $ZEPPELIN_HOME/conf/hibernate.cfg.xml and $ZEPPELIN_HOME/conf/META-INF/persistence.xml.

You will need to add the needed JDBC jars in $ZEPPELIN_HOME/conf/lib folder.

User Provisioning

On user creation, the script defined in datalayer.script.on-user-creation is invoked.

The parameters being passed to the script are: username <uid> <gid> <ldap-hostname> <ldap-port>.

Kerberos

There are 2 facets in bringing Kerberos to Zeppelin:

  • Submit Spark jobs on a Kerberized YARN cluster with a Proxy user.
  • Authenticate a HTTP request initiated from a Browser with a TGT (Ticket-Granting-Ticket) via SPNEGO protocol.
kerberos
Figure - kerberos

Follow the steps described on the Hortonworks documentation website Configuring Ambari and Hadoop for Kerberos to enable Kerberos.

As prerequisite, you need to download the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files from the Oracle WEB Site and place the jar files in the Java JCE folder. You can copy the jars from your local drive to the Docker container with: scp -P 2222 *.jar root@localhost:/opt/java/jre/lib/security. This is applicable on the host where you will run Zeppelin.

Tip: During the Kerberization, if for some reason HDFS does not go out of its safe mode, you can force it with:

kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-DatalayerCluster@DATALAYER.IO
hdfs dfsadmin -safemode leave

As part of the process, create the Kerberos admin user and an additional user.

sudo kadmin.local -q "addprinc admin/admin"

Check Kerberos.

sudo kadmin.local -q "listprincs"
# sudo kadmin.local -p admin/admin
sudo kadmin.local -q "addprinc zeppelin@DATALAYER.IO"

Check the validity of the user your have created (get a Ticket and show it):

kinit zeppelin@DATALAYER.IO
klist
kdestroy

Kerberos correct working is sensible to name resolution to IP address. Ensure the hostname of the Docker image resolves the same way in the image and in the host (Check '/etc/hosts').

Take care on encryption type in /etc/krb5.conf. For example, set rdns = false on Keberos server and clients.

[libdefaults]
  default_realm = DATALAYER.IO
  rdns = false
  ticket_lifetime = 24h
  renew_lifetime = 7d
  dns_lookup_realm = false
  dns_lookup_kdc = false
  forwardable = true
  default_tkt_enctypes = des3-cbc-sha1 rc4-hmac des-cbc-crc des-cbc-md5
  default_tgs_enctypes = des3-cbc-sha1 rc4-hmac des-cbc-crc des-cbc-md5
  permitted_enctypes = aes128-cts-hmac-sha1-96 des-cbc-crc des-cbc-md5 des3-cbc-sha1 rc4-hmac
  allow_weak_crypto = true

Finally, check your Kerberos Cluster is operational with simple commands, e.g.:

klist -kt /etc/security/keytabs/hdfs.headless.keytab
kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-DatalayerCluster@DATALAYER.IO
hdfs dfs -mkdir /user/zeppelin
hdfs dfs -chown -R datalayer:hdfs /user/zeppelin
hdfs dfs -ls /user
hdfs dfs -ls /user/zeppelin
kinit -kt zeppelin@DATALAYER.IO
klist
# 2.3.4.0-3845 depends on your HDP version...
yarn jar /usr/hdp/2.3.4.0-3845/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 10

Tip; Test with Docker: Follow the Kerberos Quick Start section on the Datalayer Zeppelin with Ambari Guide

SPNEGO

SPNEGO is supported as authentication middle for remote browsers having a Kerberos ticket-granting ticket (TGT).

The client running the browser with hava a TGT and the authentication is transmitted via the SPNEGO protocol.

Create a user and export the Keytab to be used for the SPNEGO (not yet available on Zeppelin):

Ensure you have a JDK version lower than update_40 (JDK 1.8.40+ have a bug for SPNEGO and will not work).

sudo kadmin.local -q "addprinc -randkey HTTP/zeppelin-ambari.datalayer.io.local@DATALAYER.IO"
sudo kadmin.local -q "xst -k /etc/security/keytabs/spnego.keytab HTTP/zeppelin-ambari.datalayer.io.local@DATALAYER.IO"
sudo chmod 400 /etc/security/keytabs/spnego.keytab

(replace zeppelin-ambari.datalayer.io.local with the hostname of the server running Zeppelin).

Check the Keytab:

sudo klist -kt /etc/security/keytabs/spnego.keytab

Configure the following settings:

  • datalayer.http.authentication.type = kerberos
  • datalayer.http.keytab.path = Path to the exported keytab

Proxy User

You need a Hadoop YARN cluster with kerberos enabled with a foo user configured as proxy-user in the core-site.xml:

...
<property>
  <name>hadoop.proxyuser.foo.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.foo.groups</name>
  <value>*</value>
</property>
...

Ensure datalayer.master.mode is set with yarn-client mode.

Create a user and export the Keytab to be used for the Proxy User:

sudo kadmin.local -q "addprinc -randkey datalayer@DATALAYER.IO"
sudo kadmin.local -q "xst -k /etc/security/keytabs/datalayer.keytab datalayer@DATALAYER.IO"
sudo chmod 400 /etc/security/keytabs/datalayer.keytab

Check the Keytab:

sudo klist -kt /etc/security/keytabs/datalayer.keytab

foo must also be a valid Kerberos user and you will need to export a keytab from the Kerberos server and make it available on the node running Zeppelin.

sudo /usr/sbin/kadmin.local -q "xst -k foo.keytab foo@BAR.COM"

Copy foo.keytab on the Zeppelin server and configure datalayer-site.xml accordingly (the path and the principal).

For each user being created in the Datalayer User database (during the signup process), the script defined by datalayer.script.on-user-creation is executed.

This allows for example to provision on each cluster node the user, as Hadoop YARN with Kerberos requires the foo linux user on each node.

The parameters being passed to the script are: username <uid> <gid> <ldap-hostname> <ldap-port>.

You are now ready to enable Zeppelin to run on your Kerberos cluster configuring the following properties in datalayer-site.xml:

  • datalayer.hadoop.conf.dir: The core-site.xml of the Hadoop configuration must be set with hadoop.security.authentication value being kerberos, any other properties to access the Kerberos cluster must also be avaiable of cours.
  • datalayer.hadoop.keytab.path = e.g. /etc/security/keytabs/foo.keytab
  • datalayer.hadoop.keytab.principal = e.g. foo@BAR.COM
  • datalayer.hadoop.keytab.path
  • datalayer.hadoop.keytab.principal
  • datalayer.script.on-user-creation
  • datalayer.linux.userid
  • datalayer.linux.groupid

Upgrading Zeppelin with Multiuser

Current Inplementation

Modes

It is possible to execute many paragraphs in parallel. However, at the back-end side, we’re still using synchronous queries. Asynchronous execution is only possible when it is possible to return a Future value in the InterpreterResult. It may be an interesting proposal for the Zeppelin project.

Recently, Zeppelin allows you to choose the level of isolation for your interpreters (see Interpreter Binding Mode ).

Long story short, you have 3 available bindings:

shared : same JVM and same Interpreter instance for all notes scoped : same JVM but different Interpreter instances, one for each note isolated: different JVM running a single Interpreter instance, one JVM for each note Using the shared binding, the same com.datastax.driver.core.Session object is used for all notes and paragraphs. Consequently, if you use the USE keyspace name; statement to log into a keyspace, it will change the keyspace for all current users of the Cassandra interpreter because we only create 1 com.datastax.driver.core.Session object per instance of Cassandra interpreter.

The same remark does apply to the prepared statement hash map, it is shared by all users using the same instance of Cassandra interpreter.

When using scoped binding, in the same JVM Zeppelin will create multiple instances of the Cassandra interpreter, thus multiple com.datastax.driver.core.Session objects. Beware of resource and memory usage using this binding !

The isolated mode is the most extreme and will create as many JVM/com.datastax.driver.core.Session object as there are distinct notes.

Each Interpreter Setting can choose one of 'shared', 'scoped', 'isolated' interpreter binding mode. In 'shared' mode, every notebook bound to the Interpreter Setting will share the single Interpreter instance. In 'scoped' mode, each notebook will create new Interpreter instance in the same interpreter process. In 'isolated' mode, each notebook will create new Interpreter process.

option().isPerNoteSession() option().isPerNoteProcess()
shared
scoped
isolated

InterpreterSetting.java

private String getInterpreterProcessKey(String noteId) {
  if (getOption().isExistingProcess) {
    return Constants.EXISTING_PROCESS;
  } else if (getOption().isPerNoteProcess()) {
    return noteId;
  } else {
    return SHARED_PROCESS;
  }
}

InterpreterFactory.java

private String getInterpreterInstanceKey(String noteId, InterpreterSetting setting) {
  if (setting.getOption().isExistingProcess()) {
    return Constants.EXISTING_PROCESS;
  } else if (setting.getOption().isPerNoteSession() || setting.getOption().isPerNoteProcess()) {
    return noteId;
  } else {
    return SHARED_SESSION;
  }
}

results matching ""

    No results matching ""