Interpreters on Kubernetes

Apache Zeppelin with Spark on Kubernetes is experimental

At the time being, the needed code is not integrated in the master branches of apache-zeppelin nor the apache-spark-on-k8s/spark repositories. You are welcome to already ty it out and send any feedback and question.

First things first, you have to choose the following modes in which you will run Zeppelin with Spark on Kubernetes:

  • The Kubernetes modes: Can be in-cluster (within a Pod) or out-cluster (from outside the Kubernetes cluster).
  • The Spark deployment modes: Can be client or cluster.

Only three combinations of these options are supported:

  1. in-cluster with spark-client mode.
  2. in-cluster with spark-cluster mode.
  3. out-cluster with spark-cluster mode.

For now, to be able to test these combinations, you need to build specific branches (see hereafter) or to use third-party Helm charts or Docker images. The needed branches and related PR are listed here:

  1. In-cluster client mode see pull request #456
  2. Add support to run Spark interpreter on a Kubernetes cluster see pull request #2637

In-Cluster with Spark-Client

In-Cluster with Spark-Client
Figure - In-Cluster with Spark-Client

Build a new Zeppelin based on #456 In-cluster client mode.

Once done, deploy that new build in a Kubernetes Pod with the following interpreter settings:

  • spark.master: k8s://https://kubernetes:443
  • spark.submit.deployMode: client
  • spark.kubernetes.driver.pod.name: The name of the pod where your Zeppelin instance is running.
  • spark.app.name: Any name you want, without space nor special characters.
  • Other spark.k8s properties you need to make your spark working (see Running Spark on Kubernetes) such as spark.kubernetes.initcontainer.docker.image, spark.kubernetes.driver.docker.image, spark.kubernetes.executor.docker.image...

In-Cluster with Spark-Cluster

In-Cluster with Spark-Cluster
Figure - In-Cluster with Spark-Cluster

Build a new Zeppelin Docker image based on #2637 Spark interpreter on a Kubernetes.

Once done, deploy that new build in a Kubernetes Pod with the following interpreter settings:

  • spark.master: k8s://https://kubernetes:443
  • spark.submit.deployMode: cluster
  • spark.kubernetes.driver.pod.name: Do not set this property.
  • spark.app.name: Any name you want, without space nor special characters.
  • Other spark.k8s properties you need to make your spark working (see Running Spark on Kubernetes) such as spark.kubernetes.initcontainer.docker.image, spark.kubernetes.driver.docker.image, spark.kubernetes.executor.docker.image...

Out-Cluster with Spark-Cluster

Out-Cluster with Spark-Cluster
Figure - Out-Cluster with Spark-Cluster

Build a new Spark and their associated docker images based on #2637 Spark interpreter on a Kubernetes.

Once done, any vanilla Apache Zeppelin deployed in a Kubernetes Pod (your can use a Helm chart for this) will work out-of-the box with the following interpreter settings:

  • spark.master: k8s://https://ip-address-of-the-kube-api:6443 (port may depend on your setup)
  • spark.submit.deployMode: cluster
  • spark.kubernetes.driver.pod.name: Do not set this property.
  • spark.app.name: Any name you want, without space nor special characters.
  • Other spark.k8s properties you need to make your spark working (see Running Spark on Kubernetes) such as spark.kubernetes.initcontainer.docker.image, spark.kubernetes.driver.docker.image, spark.kubernetes.executor.docker.image...

How to test

For now, you will have to build custom Spark or Zeppelin Docker images to suit your needs.

Helm Charts for Zeppelin are available to deploy on your Kubernetes cluster.

Zeppelin on Kubernetes

kubectl delete -f k8s/zeppelin-k8s.yaml
kubectl create -f k8s/zeppelin-k8s.yaml
kubectl get svc --selector=app=zeppelin-k8-svc -o jsonpath='{.items[0].spec.ports}'
export POD=$(kubectl get pods -n default -l "app=zeppelin-k8s" -o jsonpath="{.items[0].metadata.name}")
kubectl exec -it $POD -- bash
kubectl get svc spark-resource-staging-service -o jsonpath='{.spec.clusterIP}'
kubectl expose pod $HOSTNAME --port=$PORT --target-port=$PORT --name=zeppelin-interpreter-spark-$PORT
CALLBACK_HOST=$(kubectl get -o template svc zeppelin-interpreter-spark-$PORT --template={{.spec.clusterIP}})

Driver Memory

interpreter.sh:       INTERPRETER_RUN_COMMAND+=' '` echo ${SPARK_SUBMIT} --driver-memory 60g --driver-java-options \"${JAVA_INTP_OPTS}\" --class ${ZEPPELIN_SERVER} --driver-class-path \"${ZEPPELIN_INTP_CLASSPATH_OVERRIDES}:${ZEPPELIN_INTP_CLASSPATH}\" ${SPARK_SUBMIT_OPTIONS} --proxy-user ${ZEPPELIN_IMPERSONATE_USER} ${SPARK_APP_JAR} ${PORT}`
interpreter.sh:       INTERPRETER_RUN_COMMAND+=' '` echo ${SPARK_SUBMIT} --driver-memory 40g --driver-java-options \"${JAVA_INTP_OPTS}\" --class ${ZEPPELIN_SERVER} --driver-class-path \"${ZEPPELIN_INTP_CLASSPATH_OVERRIDES}:${ZEPPELIN_INTP_CLASSPATH}\" ${SPARK_SUBMIT_OPTIONS} ${SPARK_APP_JAR} ${PORT}`

Info

./bin/interpreter.sh -d /src/zeppelin/interpreter/md -p 40030 -l .//local-repo/2BXKBTJ9W
export ZEPPELIN_MEM="-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005"
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005

Multi User

Distributed / HA

Misc

  • ZEPPELIN-1774 Export notebook as a pixel-perfect printable document i.e. export as a PDF
  • ZEPPELIN-1793 Import\export between Zeppelin and Jupyter notebook formats
  • ZEPPELIN-2089 Run spell as display system with backend interpreter

results matching ""

    No results matching ""