Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-6078] Reduce interpreter size #4816

Merged
merged 2 commits into from
Oct 4, 2024
Merged

Conversation

Reamer
Copy link
Contributor

@Reamer Reamer commented Sep 6, 2024

What is this PR for?

This PR removes the hadoop-runtime in all interpreters that are already contained in the zeppelin-interpreter-shaded.jar.

This shrinks the Zeppelin distribution image back to its normal size.

What type of PR is it?

Hot Fix

What is the Jira issue?

How should this be tested?

  • CI
  • local tests

Screenshots (if appropriate)

New image is is nearly back to normal
zeppelin-size

Questions:

  • Does the license files need to update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

@Reamer Reamer marked this pull request as draft September 6, 2024 11:16
@Reamer Reamer force-pushed the jarsize branch 2 times, most recently from 6093f31 to 1a80ac9 Compare September 6, 2024 13:41
@Reamer Reamer marked this pull request as ready for review September 9, 2024 06:18
@@ -36,6 +36,7 @@
<groupId>${project.groupId}</groupId>
<artifactId>zeppelin-interpreter-shaded</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks correct to me, as zeppelin-interpreter-shaded-*.jar will always be added to the interpreter process's classpath.

zeppelin/bin/interpreter.sh

Lines 109 to 110 in 940cf13

ZEPPELIN_INTERPRETER_API_JAR=$(find "${ZEPPELIN_HOME}/interpreter" -name 'zeppelin-interpreter-shaded-*.jar')
ZEPPELIN_INTP_CLASSPATH+=":${CLASSPATH}:${ZEPPELIN_INTERPRETER_API_JAR}"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this change, some interperters pom.xml <exclude>org.apache.zeppelin:zeppelin-interpreter-shaded</exclude> can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, this should be investigated in more detail. @jongyoul once suggested that zeppelin-interpreter and zeppelin-interpreter-shaded should be merged, if I remember correctly

@@ -52,6 +53,10 @@
<groupId>io.atomix</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this have real effects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, I changed the image in the pull request description and labeled the German columns additionally.

Copy link
Contributor Author

@Reamer Reamer Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before PR

root@f8d6a0394383:/# du -sh /opt/zeppelin/interpreter/*
83M	/opt/zeppelin/interpreter/alluxio
50M	/opt/zeppelin/interpreter/angular
57M	/opt/zeppelin/interpreter/bigquery
87M	/opt/zeppelin/interpreter/cassandra
75M	/opt/zeppelin/interpreter/elasticsearch
54M	/opt/zeppelin/interpreter/file
101M	/opt/zeppelin/interpreter/flink
36M	/opt/zeppelin/interpreter/flink-cmd
57M	/opt/zeppelin/interpreter/groovy
64M	/opt/zeppelin/interpreter/hbase
55M	/opt/zeppelin/interpreter/influxdb
51M	/opt/zeppelin/interpreter/java
24M	/opt/zeppelin/interpreter/jdbc
69M	/opt/zeppelin/interpreter/jupyter
57M	/opt/zeppelin/interpreter/livy
72M	/opt/zeppelin/interpreter/md
51M	/opt/zeppelin/interpreter/mongodb
57M	/opt/zeppelin/interpreter/neo4j
81M	/opt/zeppelin/interpreter/python
202M	/opt/zeppelin/interpreter/r
69M	/opt/zeppelin/interpreter/sh
88M	/opt/zeppelin/interpreter/spark
65M	/opt/zeppelin/interpreter/spark-submit
65M	/opt/zeppelin/interpreter/sparql
69M	/opt/zeppelin/interpreter/zeppelin-interpreter-shaded-0.12.0-SNAPSHOT.jar
root@3b21ce44e005:/# du -sh /opt/zeppelin/interpreter/
1.7G	/opt/zeppelin/interpreter/

After PR

root@e052296353cb:/# du -sh /opt/zeppelin/interpreter/*
83M	/opt/zeppelin/interpreter/alluxio
484K	/opt/zeppelin/interpreter/angular
7.4M	/opt/zeppelin/interpreter/bigquery
87M	/opt/zeppelin/interpreter/cassandra
25M	/opt/zeppelin/interpreter/elasticsearch
4.3M	/opt/zeppelin/interpreter/file
52M	/opt/zeppelin/interpreter/flink
36M	/opt/zeppelin/interpreter/flink-cmd
7.3M	/opt/zeppelin/interpreter/groovy
14M	/opt/zeppelin/interpreter/hbase
4.8M	/opt/zeppelin/interpreter/influxdb
780K	/opt/zeppelin/interpreter/java
24M	/opt/zeppelin/interpreter/jdbc
19M	/opt/zeppelin/interpreter/jupyter
7.5M	/opt/zeppelin/interpreter/livy
22M	/opt/zeppelin/interpreter/md
1.6M	/opt/zeppelin/interpreter/mongodb
7.6M	/opt/zeppelin/interpreter/neo4j
31M	/opt/zeppelin/interpreter/python
202M	/opt/zeppelin/interpreter/r
20M	/opt/zeppelin/interpreter/sh
88M	/opt/zeppelin/interpreter/spark
16M	/opt/zeppelin/interpreter/spark-submit
16M	/opt/zeppelin/interpreter/sparql
69M	/opt/zeppelin/interpreter/zeppelin-interpreter-shaded-0.12.0-SNAPSHOT.jar
root@e052296353cb:/# du -sh /opt/zeppelin/interpreter  
839M	/opt/zeppelin/interpreter

Copy link
Member

@pan3793 pan3793 Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Reamer I do not mean the whole PR, just this

+ <exclusion>
+   <groupId>org.apache.hadoop</groupId>
+   <artifactId>*</artifactId>
+ </exclusion>
./mvnw clean package -Pbuild-distr -DskipTests

without the above exclusion

du -hd1 interpreter 
7.5M    interpreter/neo4j
 15M    interpreter/sparql
 83M    interpreter/alluxio
4.2M    interpreter/file
 31M    interpreter/python
 30M    interpreter/flink
154M    interpreter/r
476K    interpreter/angular
772K    interpreter/java
7.9M    interpreter/bigquery
1.6M    interpreter/mongodb
2.7M    interpreter/jdbc
4.7M    interpreter/influxdb
 20M    interpreter/sh
 15M    interpreter/spark-submit
7.5M    interpreter/livy
 39M    interpreter/cassandra
 22M    interpreter/md
 14M    interpreter/hbase
7.2M    interpreter/groovy
 16M    interpreter/flink-cmd
 19M    interpreter/jupyter
 25M    interpreter/elasticsearch
 38M    interpreter/spark
585M    interpreter

with the above exclusion

du -hd1 interpreter                           
7.5M    interpreter/neo4j
 15M    interpreter/sparql
 96M    interpreter/alluxio
4.2M    interpreter/file
 31M    interpreter/python
 30M    interpreter/flink
166M    interpreter/r
476K    interpreter/angular
772K    interpreter/java
7.9M    interpreter/bigquery
1.6M    interpreter/mongodb
2.7M    interpreter/jdbc
4.7M    interpreter/influxdb
 20M    interpreter/sh
 15M    interpreter/spark-submit
7.5M    interpreter/livy
 39M    interpreter/cassandra
 22M    interpreter/md
 14M    interpreter/hbase
7.2M    interpreter/groovy
 16M    interpreter/flink-cmd
 19M    interpreter/jupyter
 26M    interpreter/elasticsearch
 38M    interpreter/spark
612M    interpreter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pan3793
Yes, this is also necessary when working with the profile -Pinclude-hadoop.

@Reamer Reamer requested a review from pan3793 September 16, 2024 06:46
@pan3793
Copy link
Member

pan3793 commented Sep 29, 2024

the rlang and python interpreters explicitly declare zeppelin-jupyter-interpreter-shaded as compile scope, I think we should remove them and respect zeppelin-interpreter-parent/pom.xml

zeppelin/rlang/pom.xml

Lines 48 to 52 in c5ccc8e

<dependency>
<groupId>org.apache.zeppelin</groupId>
<artifactId>zeppelin-jupyter-interpreter-shaded</artifactId>
<version>${project.version}</version>
</dependency>

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, there's no harm, let me get it in.

@pan3793 pan3793 merged commit 6e5f30c into apache:master Oct 4, 2024
28 checks passed
pan3793 pushed a commit that referenced this pull request Oct 4, 2024
### What is this PR for?
This PR removes the hadoop-runtime in all interpreters that are already contained in the zeppelin-interpreter-shaded.jar.

This shrinks the Zeppelin distribution image back to its normal size.

### What type of PR is it?
Hot Fix

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-6078

### How should this be tested?
* CI
* [ ] local tests

### Screenshots (if appropriate)
New image is is nearly back to normal
![zeppelin-size](https://github.com/user-attachments/assets/87311906-5d2e-44d2-bbf2-92e18b505dc3)

### Questions:
* Does the license files need to update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Closes #4816 from Reamer/jarsize.

Signed-off-by: Cheng Pan <[email protected]>
(cherry picked from commit 6e5f30c)
Signed-off-by: Cheng Pan <[email protected]>
@pan3793
Copy link
Member

pan3793 commented Oct 4, 2024

Merged into master/0.12

@Reamer Reamer deleted the jarsize branch October 7, 2024 15:00
@Reamer
Copy link
Contributor Author

Reamer commented Oct 7, 2024

the rlang and python interpreters explicitly declare zeppelin-jupyter-interpreter-shaded as compile scope, I think we should remove them and respect zeppelin-interpreter-parent/pom.xml

zeppelin/rlang/pom.xml

Lines 48 to 52 in c5ccc8e

<dependency>
<groupId>org.apache.zeppelin</groupId>
<artifactId>zeppelin-jupyter-interpreter-shaded</artifactId>
<version>${project.version}</version>
</dependency>

This is another construction site

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants