This fork is mainly created to support Spark 3.0. For other changes, it has:
- Artifact sizes can be up to 1GB
- Is built on Hadoop 3.2, as earlier versions had issues with AWS S3
- Is built having Fargate and AWS ECS in mind
build: sbt -DscalaVersion=2.12.7 -DsparkVersion=3.0.1 docker
This fork includes a third variable in case the image built is for a private repo. It's called imagePath and can be used as follows:
sbt -DscalaVersion=2.12.7 -DsparkVersion=3.0.1 -DimagePath=REPO_NAME_HERE docker
Docker images:
- kaitumisuuringutekeskus/mist:1.1.3-3.0.1-scala-2.12-hadoop3.2
First pull the mist image using
docker pull kaitumisuuringutekeskus/mist:1.1.3-3.0.1-scala-2.12-hadoop3.2
Then run it using docker:
docker run -p 2004:2004 -p 4040:4040 -v /var/run/docker.sock:/var/run/docker.sock kaitumisuuringutekeskus/mist:1.1.3-3.0.1-scala-2.12-hadoop3.2
If the docker instance needs access to localhost urls, use:
docker run -p 2004:2004 -p 4040:4040 --add-host=host.docker.internal:host-gateway -v /var/run/docker.sock:/var/run/docker.sock kaitumisuuringutekeskus/mist:1.1.3-3.0.1-scala-2.12-hadoop3.2
Localhost inside the container will be "host.docker.internal"
Now you can connect to the instance on the ip and port localhost:2004
Hydrosphere Mist is a serverless proxy for Spark cluster. Mist provides a new functional programming framework and deployment model for Spark applications.
Please see our quick start guide and documentation
Features:
- Spark Function as a Service. Deploy Spark functions rather than notebooks or scripts.
- Spark Cluster and Session management. Fully managed Spark sessions backed by on-demand EMR, Hortonworks, Cloudera, DC/OS and vanilla Spark clusters.
- Typesafe programming framework that clearly defines inputs and outputs of every Spark job.
- REST HTTP & Messaging (MQTT, Kafka) API for Scala & Python Spark jobs.
- Multi-cluster mode: Seamless Spark cluster on-demand provisioning, autoscaling and termination(pending)
It creates a unified API layer for building enterprise solutions and microservices on top of a Spark functions.
Please report bugs/problems to: https://github.com/Hydrospheredata/mist/issues.