You do not need to restart the cluster after changing Python or Java library dependencies in Databricks Connect, because each client session is isolated from each other in the cluster. Anywhere you can import pyspark, import, or require(SparkR), you can now run Spark jobs directly from your application, without needing to install any IDE plugins or use Spark submission scripts. Then, the logical representation of the job is sent to the Spark server running in Databricks for execution in the cluster.
It allows you to write jobs using Spark APIs and run them remotely on a Databricks cluster instead of in the local Spark session.įor example, when you run the DataFrame command (.).groupBy(.).agg(.).show() using Databricks Connect, the parsing and planning of the job runs on your local machine. Databricks Connect is a client library for Databricks Runtime.