Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: handle self-loops (more efficiently) #1231

Open
wants to merge 52 commits into
base: titan10
Choose a base branch
from

Conversation

imri
Copy link

@imri imri commented Dec 27, 2015

As of Titan 1.0, the implementation of TitanVertexDeserializer does not handle scenario of an Edge connecting a Vertex to itself.
In such a case, the same edge loaded twice, once with IN direction, and another time with OUT. Therefore, the edge is added to the graph twice, and while the first addition succeeds, the second causes an 'Edge with id already exists' exception (because there is an attempt to add an existing key to 'graph.edges' dictionary).

As of Titan 1.1, a new method called isLoopAdded was added in order to fix this issue, which loops through the adjacent vertices of a vertex to check if its on a self-loop.

The same purpose can be achieved if you only add the edge once -
If the edge otherVertexId == vertexId, and the direction is OUT, the vertex is added to the graph, and if the direction is IN, it is ignored.
Doing so prevents the exception.

spmallette and others added 30 commits September 4, 2015 07:10
Three tests not passing right now.
… suite.

These tests were related to consistency of behavior around deleted elements.  This proved a bit hard to enforce properly on the TinkerPop side and so this requirement was removed for 3.1.x.
Currently opted to reference the shaded jackson classes rather than depend on jackson directly in titan.  Seems like it is better to handle it that way as it prevents jackson version divergence.
TP3 at 3.1.0 no longer supports hadoop 1.x.  It supports 2.x.  Had to make a package name change and specifically include spark-gremlin.  No guarantees that what i've done packages things properly as I had to make some exclusions given hadoop 1.x dependencies.  There is likely a bigger set of chagnes to the project structure and poms that what was done here to just get this to build.
…pdate. Added a test case to make sure that TitanGraphStep folds in HasContainers even in mid-traversal (@dalaro -- I don't know where this should have gone. So please feel free to put somewhere else if you want. I put it in the standard pattern we use in TinkerPop). Added GraphComputer.config(key,value) as a default method to TitanGraphComputer. It simply does nothing. I don't know if you guys want to have it set configurations. If you do, please do what is appropriate instead of it just doing nothing. Again, that was a TinkerPop3.1 update.
…kwards compatible. Given that Titan just returned this, removed the method.
Assertions after calling clopen() in BerkeleyElasticsearchTest have
been failing since the upgrade from TP 3.0 to 3.1.  I can't tell
precisely when it broke, since many of the recent predecessor commits
that bisect would touch have compile errors.

However, squinting at the failure stack traces and playing with tweaks
eventually pointed to a change in the default on-close behavior of
transactions between TP 3.0 and 3.1.  TitanIndexTest relies on commit.
However, this was changed to rollback in 3.1:

http://tinkerpop.apache.org/docs/3.1.0-incubating/upgrade.html#_transaction_close_default_behavior

I overrode TitanIndexTest's clopen() to commit the graph's transaction
before invoking super.clopen().
remove rexster dir and reference in pom.xml
Didn't do too much replacement with "Gremlin Server" stuff in the debian packaging stuff as it is generally out of date and expected to see removal as a whole.
dalaro and others added 22 commits December 17, 2015 03:10
This changes the default behavior of thinkaurelius#1156.

Instead of hardcoding SizeTieredCompactionStrategy as the default and
applying it in the absence of an explicit user override, this commit
removes the default value for COMPACTION_STRATEGY.  It also updates
the three use sites to check whether the config contains
COMPACTION_STRATEGY before attempting to read the option's value.

Similarly, if COMPACTION_OPTIONS was not set, or if it was set to an
empty map, then its use sites will not call the associated builder
method, allowing the implementation to use whatever default it might
have (probably empty anyway).
This upgrades Jackson from 2.3.0 to 2.4.4 (for compatibility with the
version of Spark preferred by TP 3.1).

This downgrades Jersey from 1.18.2 to 1.9 to match Gremlin-Server.
This is probably going to break something else.

Independent of the version changes, this commit also puts
jersey-client and jersey-guice under version management.  Before, they
were unmanaged, so they could get out of sync with the versions of the
rest of the (managed) jersey artifacts.  Hence jersey-client
mismatched to jersey-server, etc.
Conflicts:
	pom.xml
	titan-core/src/main/java/com/thinkaurelius/titan/core/attribute/Geoshape.java
	titan-core/src/main/java/com/thinkaurelius/titan/graphdb/tinkerpop/io/graphson/TitanGraphSONModule.java
TP 3.1 changed the default tx.close() behavior from commit to
rollback.  This broke TitanGraphBaseTest and the classes that extend
it.  This has been floating in my stash for a while as I swapped
between branches.  It's enough to fix TitanGraphBaseTest and
subclasses, but I am not convinced that it fixes all instances of the
underlying problem (there could be tests that relied on the autocommit
behavior that do not extend TGBT).
PR thinkaurelius#1195 contains a fix for a bug that makes the InputFormat throw an
exception when attempting to read a self-loop edge.  It does not
contain a test.  This commit adds a test that fails (since the PR has
not yet been merged).  The PR should fix it.
Formatting and changing a boxed Boolean parameter to a primitive
The problem here is that thinkaurelius#1197 caused the first job configuration
executed in a JVM to be cached for the life of the JVM.  The tests
execute multiple jobs with distinct configurations while reusing the
same JVM (Hadoop's LocalJobRunner).

This commit changes GiraphInputFormat such that the configuration most
recently supplied to getConf anywhere in the JVM (this is statically
stored) wwill be used any time the refcount of its
TitanVertexDeserializer rises from zero to one.
…tan into issue_1208-graben1437-update-slf4j-version
This script properly instantiates "g".
@spmallette
Copy link
Member

This pull request seems to come with a lot of "extra commits" or something. it makes it a bit hard to evaluate. could you please resubmit this PR against the titan11 branch and we can evaluate it at that point?

@analytically
Copy link

+1

@technotrance
Copy link

Call green lol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants