Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: 'JavaPackage' object is not callable #2122

Open
Shekharrajak opened this issue Feb 19, 2019 · 13 comments
Open

TypeError: 'JavaPackage' object is not callable #2122

Shekharrajak opened this issue Feb 19, 2019 · 13 comments

Comments

@Shekharrajak
Copy link

Shekharrajak commented Feb 19, 2019

Python:

When I am trying to run ADAMContext(sparkSession) , I am getting this error:

 c = self._jvm.org.bdgenomics.adam.rdd.ADAMContext.ADAMContextFromSession(ss._jsparkSession)
TypeError: 'JavaPackage' object is not callable

The full code I am executing is :

from bdgenomics.adam.adamContext import ADAMContext
from pyspark.sql import SparkSession

  class_name = 'Spark shell'
  ss = SparkSession.builder.master('local[*]')\
  .appName(class_name)\
  .getOrCreate()
  sc = ss.sparkContext
  ac = ADAMContext(ss)

I also followed the comment: JohnSnowLabs/spark-nlp#232 (comment) , but it didn't work for me.

 $ java --version
openjdk 10.0.2 2018-07-17
OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.4)
OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.4, mixed mode)
sh
@heuermh
Copy link
Member

heuermh commented Feb 19, 2019

Hello @Shekharrajak!

Thanks for submitting this issue. How did you install Spark and ADAM?

@Shekharrajak
Copy link
Author

@heuermh
Copy link
Member

heuermh commented Feb 25, 2019

I cut a new ADAM version 0.26.0 release last week and pushed to PiPy. Could you give it another try with pip install bdgenomics.adam?

@Shekharrajak
Copy link
Author

I have updated the library and updated the sample code :

from bdgenomics.adam.adamContext import ADAMContext
from util import resourceFile
from pyspark.sql import SparkSession

def main():
  class_name = 'Spark shell'
  ss = SparkSession.builder.master('local[*]')\
  .appName(class_name)\
  .config("spark.driver.memory","8G")\
  .config("spark.driver.maxResultSize", "2G")\
  .config("spark.kryoserializer.buffer.max", "500m")\
  .config("spark.jars.packages", "JohnSnowLabs:spark-nlp:1.8.2")\
  .getOrCreate()
  sc = ss.sparkContext
  ac = ADAMContext(ss)

  # Load the file
  testFile = resourceFile("small.sam")
  reads = ac.loadAlignments(testFile)
  print(reads.toDF().count())

if __name__ == '__main__':
  main()

But still getting the same error :

Traceback (most recent call last):
  File "load_alignments.py", line 24, in <module>
    main()
  File "load_alignments.py", line 16, in main
    ac = ADAMContext(ss)
  File "/home/lib/python3.7/site-packages/bdgenomics/adam/adamContext.py", line 55, in __init__
    c = self._jvm.org.bdgenomics.adam.rdd.ADAMContext.ADAMContextFromSession(ss._jsparkSession)

@Shekharrajak
Copy link
Author

This is the details :

$  pip show bdgenomics.adam    
Name: bdgenomics.adam
Version: 0.26.0
Summary: A fast, scalable genome analysis system
Home-page: https://github.com/bdgenomics/adam
Author: Big Data Genomics
Author-email: [email protected]
License: UNKNOWN
Location: /home/lib/python3.7/site-packages
Requires: pyspark
Required-by: 

@akmorrow13
Copy link
Contributor

@heuermh do you know if pip installs the ADAM binary? by just installing ADAM via pip, there is no way the binaries would be accessible, right?

@heuermh
Copy link
Member

heuermh commented Feb 26, 2019

do you know if pip installs the ADAM binary? by just installing ADAM via pip, there is no way the binaries would be accessible, right?

The docs "Pip will install the bdgenomics.adam Python binding, as well as the ADAM CLI." and original pull requests (#1848, #1849) read as if that were so.

I plan to fire up a new vanilla EC2 instance and give it a try this afternoon.

@heuermh
Copy link
Member

heuermh commented Feb 26, 2019

@akmorrow13
Copy link
Contributor

Ahh fancy. @Shekharrajak how are you starting python?

@heuermh
Copy link
Member

heuermh commented Feb 26, 2019

It does look like we have a problem.

Starting with a new Amazon Linux 2 AMI on EC2, adam-submit and adam-shell appear to work fine, but pyadam fails to start

$ ssh ...

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
1 package(s) needed for security, out of 3 available
Run "sudo yum update" to apply all updates.

$ sudo yum update
...

$ python --version
Python 2.7.14

$ sudo easy_install pip
...
Installed /usr/lib/python2.7/site-packages/pip-19.0.3-py2.7.egg

$ sudo pip install pyspark
...
Successfully installed py4j-0.10.7 pyspark-2.4.0

$ which pyspark
/usr/bin/pyspark

$ pyspark --version
JAVA_HOME is not set

$ which java
/usr/bin/which: no java in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/ec2-user/.local/bin:/home/ec2-user/bin)

$ sudo yum install java-1.8.0-openjdk-devel
...
Installed:
  java-1.8.0-openjdk-devel.x86_64 1:1.8.0.191.b12-0.amzn2

$ pyspark --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/

Using Scala version 2.11.12, OpenJDK 64-Bit Server VM, 1.8.0_191

$ sudo pip install bdgenomics.adam
...
Requirement already satisfied: pyspark>=1.6.0 in /usr/lib/python2.7/site-packages
(from bdgenomics.adam) (2.4.0)
Requirement already satisfied: py4j==0.10.7 in /usr/lib/python2.7/site-packages
(from pyspark>=1.6.0->bdgenomics.adam) (0.10.7)
Installing collected packages: bdgenomics.adam
  Running setup.py install for bdgenomics.adam ... done
Successfully installed bdgenomics.adam-0.26.0

$ which pyadam
/usr/bin/pyadam

$ pyadam --version
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
ls: cannot access
/usr/lib/python2.7/site-packages/bdgenomics/adam/adam-python/dist:
No such file or directory
Failed to find ADAM egg in
/usr/lib/python2.7/site-packages/bdgenomics/adam/adam-python/dist.
You need to build ADAM before running this program.

$ which adam-submit
/usr/bin/adam-submit

$ adam-submit --version
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using spark-submit=/usr/bin/spark-submit
2019-02-26 19:21:31 INFO  ADAMMain:109 - ADAM invoked with args: "--version"

       e        888~-_         e            e    e
      d8b       888   \       d8b          d8b  d8b
     /Y88b      888    |     /Y88b        d888bdY88b
    /  Y88b     888    |    /  Y88b      / Y88Y Y888b
   /____Y88b    888   /    /____Y88b    /   YY   Y888b
  /      Y88b   888_-~    /      Y88b  /          Y888b

ADAM version: 0.26.0
Built for: Apache Spark 2.3.3, Scala 2.11.12, and Hadoop 2.7.5


$ touch empty.sam
$ adam-shell
Using SPARK_SHELL=/usr/bin/spark-shell
Spark context available as 'sc' (master = local[*], app id = local-1551208961369).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_191)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.bdgenomics.adam.rdd.ADAMContext._
import org.bdgenomics.adam.rdd.ADAMContext._

scala> val alignments = sc.loadAlignments("empty.sam")
alignments: org.bdgenomics.adam.rdd.read.AlignmentRecordDataset =
RDDBoundAlignmentRecordDataset with 0 reference sequences, 0 read groups,
and 0 processing steps

scala> alignments.toDF().count()
res0: Long = 0

scala> :quit

After a bit of messing around, it appears find-adam-egg.py is complaining about not finding an egg file

...
$ find-adam-assembly.sh
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
/usr/lib/python2.7/site-packages/bdgenomics/adam/jars/adam.jar

$ find-adam-egg.sh
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
ls: cannot access
/usr/lib/python2.7/site-packages/bdgenomics/adam/adam-python/dist:
No such file or directory
Failed to find ADAM egg in
/usr/lib/python2.7/site-packages/bdgenomics/adam/adam-python/dist.
You need to build ADAM before running this program.

If I modify the pyspark command in pyadam to not include the --py-files ${ADAM_EGG} argument, it seems to work ok

$ pyspark \
  --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
  --conf spark.kryo.registrator=org.bdgenomics.adam.serialization.ADAMKryoRegistrator \
  --jars `find-adam-assembly.sh` \
  --driver-class-path `find-adam-assembly.sh`

Python 2.7.14 (default, Jul 26 2018, 19:59:38)
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/

Using Python version 2.7.14 (default, Jul 26 2018 19:59:38)
SparkSession available as 'spark'.
>>> from bdgenomics.adam.adamContext import ADAMContext
>>> from pyspark.sql import SparkSession
>>> ss = SparkSession.builder.master('local').getOrCreate()
>>> ac = ADAMContext(ss)
>>> alignments = ac.loadAlignments("empty.sam")
>>> print(alignments.toDF().count())
0

@Shekharrajak
Copy link
Author

Ahh fancy. @Shekharrajak how are you starting python?

@akmorrow13 , I have put those lines of code into .py file and running simply using python filename.py .

@heuermh , I tried your above commands and got the similar output.

Can I do something like this :

  class_name = 'Spark shell'
  ss = SparkSession.builder.master('local[*]')\
  .appName(class_name)\
  .config("spark.driver.memory","8G")\
  .config("spark.driver.maxResultSize", "2G")\
  .config("spark.kryoserializer.buffer.max", "500m")\
  .config("spark.jars.packages", "JohnSnowLabs:spark-nlp:1.8.2")\
  .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")\
  .config("spark.kryo.registrator", 
    "org.bdgenomics.adam.serialization.ADAMKryoRegistrator")\
  .config("spark.driver.extraClassPath", "`find-adam-assembly.sh`")\
  .config("spark.jars.packages", "`find-adam-assembly.sh`")\
  .getOrCreate()

?

I got error,when I tried above code : `

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Provided Maven Coordinates must be in the form 'groupId:artifactId:version'. The coordinate provided is: `find-adam-assembly.sh`
	at scala.Predef$.require(Predef.scala:224)
	at org.apache.spark.deploy.SparkSubmitUtils$$anonfun$extractMavenCoordinates$1.apply(SparkSubmit.scala:1018)
	at org.apache.spark.deploy.SparkSubmitUtils$$anonfun$extractMavenCoordinates$1.apply(SparkSubmit.scala:1016)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at org.apache.spark.deploy.SparkSubmitUtils$.extractMavenCoordinates(SparkSubmit.scala:1016)
	at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1264)
	at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)
	at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:315)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Traceback (most recent call last):
  File "load_alignments.py", line 29, in <module>
    main()
  File "load_alignments.py", line 18, in main
    .config("spark.jars.packages", "`find-adam-assembly.sh`")\
  File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 349, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 115, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 298, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number

@heuermh
Copy link
Member

heuermh commented May 23, 2019

Sorry for the delay in responding, re

.config("spark.driver.extraClassPath", "`find-adam-assembly.sh`")\

This will return the path to the ADAM assembly jar, and what Spark wants is the Maven coordinates (groupId:artifactId:version).

For version 0.26.0 of ADAM, that would be org.bdgenomics.adam:adam-assembly-spark2_2.11:0.26.0.

@heuermh
Copy link
Member

heuermh commented Jan 6, 2020

Hello @akmorrow13, this issue looks similar to #2225 to me, in that removing the egg stuff seems to help. Curious if we should do that or if there might be another more Python-y approach to take.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants