giraph.git
5 years agoFixing RAT checks for Apache Giraph release release-1.2 23/head 95/head rel/1.2.0-RC1
Sergey Edunov [Tue, 11 Oct 2016 23:49:16 +0000 (16:49 -0700)] 
Fixing RAT checks for Apache Giraph release

Test Plan:
mvn apache-rat:check -Phadoop_2
mvn apache-rat:check -Phadoop_1
mvn clean verify -Phadoop_facebook

Reviewers: maja.kabiljo, majakabiljo, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D64917

5 years agoGIRAPH-1118 - Giraph-gora and Giraph-rexster test cases fail in release-1.2 rel/1.2.0-RC0
Sergey Edunov [Thu, 6 Oct 2016 18:13:49 +0000 (11:13 -0700)] 
GIRAPH-1118 - Giraph-gora and Giraph-rexster test cases fail in release-1.2

Test Plan:
mvn clean verify -Phadoop_facebook
rm -rf ~/.m2/repository/org/apache/giraph
mvn clean install -Phadoop_1
rm -rf ~/.m2/repository/org/apache/giraph
mvn clean install -Phadoop_2

Reviewers: maja.kabiljo, majakabiljo, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D64719

5 years ago GIRAPH-1118 - Giraph-gora and Giraph-rexster test cases fail in release-1.2
Sergey Edunov [Wed, 5 Oct 2016 22:05:58 +0000 (15:05 -0700)] 
 GIRAPH-1118 - Giraph-gora and Giraph-rexster test cases fail in release-1.2

Test Plan:
mvn clean verify -Phadoop_facebook
mvn clean install -Phadoop_1
mvn clean install -Phadoop_2

Reviewers: majakabiljo, dionysis.logothetis, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D64683

6 years agoUpdate version to 1.2.0
Sergey Edunov [Wed, 28 Sep 2016 22:31:11 +0000 (15:31 -0700)] 
Update version to 1.2.0

6 years agoGIRAPH-1094 remove hbase1 from distribution for hadoop_1
Sergey Edunov [Wed, 21 Sep 2016 21:47:10 +0000 (14:47 -0700)] 
GIRAPH-1094 remove hbase1 from distribution for hadoop_1

Summary: Missed that part in the last diff.

Test Plan:
mvn clean package -Phadoop_2 -fae
then checked that giraph-hbase.jar is in the distribution

mvn clean package -Phadoop_1 -fae
then checked that giraph-hbase.jar is not in the distribution

Reviewers: maja.kabiljo, majakabiljo, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D64203

6 years agoGIRAPH-1094 Remove hbase from hadoop_1
Sergey Edunov [Wed, 21 Sep 2016 18:04:34 +0000 (11:04 -0700)] 
GIRAPH-1094 Remove hbase from hadoop_1

Summary: Hadoop_1 and current versions of hbase are incompatible. Removing support for HBASE from Hadoop_1 profile

Test Plan: mvn clean package -Phadoop_1 -fae

Reviewers: majakabiljo, maja.kabiljo, dionysis.logothetis

Reviewed By: maja.kabiljo, dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D64197

6 years agohbase_fix
Sergey Edunov [Fri, 9 Sep 2016 17:43:28 +0000 (10:43 -0700)] 
hbase_fix

6 years agoGIRAPH-1098 Job may get stuck if zookeeper port fixed and is in use
Sergey Edunov [Wed, 20 Jul 2016 17:20:36 +0000 (10:20 -0700)] 
GIRAPH-1098 Job may get stuck if zookeeper port fixed and is in use

Test Plan: mvn clean verify -Phadoop_facebook

Reviewers: majakabiljo, dionysis.logothetis, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D60945

6 years agoFixing Giraph pom.xml to reflect new project committers
Hassan Eslami [Tue, 26 Jul 2016 18:27:24 +0000 (11:27 -0700)] 
Fixing Giraph pom.xml to reflect new project committers

Summary:
Fixed the list of project committers. Please review your information and let me know if I should change anything.

This will be the first diff that I'll be committing all by myself, more like a test to see my username is gone through Apache's internal :-)

Test Plan: N/A

Reviewers: ikabiljo, pavanka, avery.ching, sergey.edunov

Reviewed By: sergey.edunov

Differential Revision: https://reviews.facebook.net/D61197

6 years agoGIRAPH-1105: Fix number of open requests in FacebookConfiguration
Maja Kabiljo [Fri, 12 Aug 2016 21:57:53 +0000 (14:57 -0700)] 
GIRAPH-1105: Fix number of open requests in FacebookConfiguration

Test Plan: This was significantly better in some experiments, but we can investigate more in the future

Differential Revision: https://reviews.facebook.net/D62019

6 years agoGIRAPH-1097 Fix TestOutOfCore.testOutOfCoreLocalDiskAccessor
Sergey Edunov [Tue, 19 Jul 2016 00:30:04 +0000 (17:30 -0700)] 
GIRAPH-1097 Fix TestOutOfCore.testOutOfCoreLocalDiskAccessor

Summary:
On my laptop it failed because of an NPE in WorkerSuperstepMetrics.
I tracked it down and found that it is triggered from the branch of code that prints out metrics. We don't normally print out metrics in unit tests, so I'd expect this feature doesn't exist or not functional in hadoop_1. I'll try to disable it, to see how jenkins reacts.

Test Plan:  mvn test -pl giraph-examples -am -Dtest=TestOutOfCore -DfailIfNoTests=false -Phadoop_1

Reviewers: maja.kabiljo, dionysis.logothetis, heslami

Reviewed By: heslami

Differential Revision: https://reviews.facebook.net/D60873

6 years ago[GIRAPH-1095] Performance regression after GIRAPH-1068
Sergey Edunov [Fri, 15 Jul 2016 21:22:59 +0000 (14:22 -0700)] 
[GIRAPH-1095] Performance regression after GIRAPH-1068

Summary: Need to pass some missing parameters to zookeeper

Test Plan: run a few jobs

Reviewers: dionysis.logothetis, heslami, majakabiljo, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D60831

6 years agoGIRAPH-1092 TestCollections.testLargeBasicList fails with OOM
Sergey Edunov [Wed, 13 Jul 2016 21:38:02 +0000 (14:38 -0700)] 
GIRAPH-1092 TestCollections.testLargeBasicList fails with OOM

Summary: This test case requires too much memory to run in Jenkins. Talked to Sergey Pupyrev and we decided to disable it.

Test Plan: none

Reviewers: majakabiljo, maja.kabiljo, spupyrev

Reviewed By: spupyrev

Differential Revision: https://reviews.facebook.net/D60753

6 years ago[GIRAPH-1091] Fix SimpleRangePartitionFactoryTest
Maja Kabiljo [Wed, 13 Jul 2016 18:05:48 +0000 (11:05 -0700)] 
[GIRAPH-1091] Fix SimpleRangePartitionFactoryTest

Summary: SimpleRangePartitionFactoryTest relied on old logic for calculating number of partitions and got broken with GIRAPH-1082.

Test Plan: Ran the test

Differential Revision: https://reviews.facebook.net/D60747

6 years agoFinbugs issues. Preparing for release
Sergey Edunov [Wed, 13 Jul 2016 00:25:30 +0000 (17:25 -0700)] 
Finbugs issues. Preparing for release

6 years agoPreparing for release
Sergey Edunov [Wed, 13 Jul 2016 00:09:42 +0000 (17:09 -0700)] 
Preparing for release

6 years ago[GIRAPH-1089] Fix a bug in out-of-core infrastructure
Hassan Eslami [Tue, 12 Jul 2016 18:33:38 +0000 (11:33 -0700)] 
[GIRAPH-1089] Fix a bug in out-of-core infrastructure

Summary: This diff fixes a bug in out-of-core infrastructure that caused user requirement (max number of partitions in memory) for fixed out-of-core strategy get violated. The cause of the problems was the un-clear definition of in-memory partitions. In this diff, we distinguish the partitions that are entirely in memory from those that are partially in memory.

Test Plan:
mvn clean verify

Reviewers: dionysis.logothetis, sergey.edunov, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D60573

6 years agoGIRAPH-1085: Add InMemoryDataAccessor
Maja Kabiljo [Wed, 6 Jul 2016 21:57:33 +0000 (14:57 -0700)] 
GIRAPH-1085: Add InMemoryDataAccessor

Summary: When we deal with graphs which have a lot of vertices with very little total data associated with them (values + edges) we start experiencing memory problems because of too many objects created, since every vertex has multiple objects associated with it. To solve this problem, we should have a serialized partition representation (current ByteArrayPartition just keeps byte[] per vertex, not per partition). We can leverage the out-of-core infrastructure and just add data accessor which won't be backed by disk but in memory buffers.

Test Plan: Successfully ran a job which was failing without this.

Differential Revision: https://reviews.facebook.net/D60435

6 years agoGIRAPH-1082: Remove limit on the number of partitions
Maja Kabiljo [Fri, 1 Jul 2016 14:39:25 +0000 (07:39 -0700)] 
GIRAPH-1082: Remove limit on the number of partitions

Summary: Currently we have a limit on how many partitions we can have because we write all partition information to Zookeeper. We can instead send this information in requests and remove the hard limit.

Test Plan: Ran pagerank for 100 iterations with 500k partitions.

Differential Revision: https://reviews.facebook.net/D60267

6 years agoGIRAPH-1083: Make sure we fail after exception in ooc-io thread happens
Maja Kabiljo [Fri, 1 Jul 2016 20:26:50 +0000 (13:26 -0700)] 
GIRAPH-1083: Make sure we fail after exception in ooc-io thread happens

Summary: Currently if some exception happens in ooc-io thread the job is left running for long time after the exception. We should make sure we fail early.

Test Plan: Ran a job with ooc on where I simulated the failure, without change job hangs for a long time, with the change it fails right after the exception happens, and logs it to command line.

Differential Revision: https://reviews.facebook.net/D60291

6 years agoGIRAPH-1080: Add FacebookConfiguration
Maja Kabiljo [Tue, 28 Jun 2016 20:14:32 +0000 (13:14 -0700)] 
GIRAPH-1080: Add FacebookConfiguration

Summary: Just copied from internal

Test Plan: verify

Differential Revision: https://reviews.facebook.net/D60135

6 years agoGIRAPH-1081: Fix a bug in internal out-of-core infra: multithreaded accesses to buffers
Hassan Eslami [Wed, 29 Jun 2016 01:43:18 +0000 (18:43 -0700)] 
GIRAPH-1081: Fix a bug in internal out-of-core infra: multithreaded accesses to buffers

Summary: The multi-threaded accesses to raw data buffers in `DiskBackedDataStore` is overlooked, violating assumption on properly partitioning data to different IO threads.

Test Plan: mvn clean verify

Reviewers: dionysis.logothetis, sergey.edunov, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D60147

6 years agoGIRAPH-1079: Add triangle counting example
Maja Kabiljo [Mon, 27 Jun 2016 17:56:02 +0000 (10:56 -0700)] 
GIRAPH-1079: Add triangle counting example

Summary: Just moved from internal

Test Plan: mvn verify

Differential Revision: https://reviews.facebook.net/D60057

6 years agoDecouple out-of-core persistence infrastructure from out-of-core computation
Hassan Eslami [Mon, 27 Jun 2016 21:13:29 +0000 (14:13 -0700)] 
Decouple out-of-core persistence infrastructure from out-of-core computation

Summary:
This diff proposes the following:
  - The persistence layer is decoupled from out-of-core infrastructure. This way one can simply implement different data accessors for various persistence resources. The persistence layer for reading/writing from/to local file system is implemented in this diff.
  - Previously, out-of-core data were indexed by string literals. This has changed for more flexibility. Now, data are accessible by a more flexible data indexing mechanism, in which a chain of indices are used to address a particular data.
  - With different implementations of data accessor, now there may be more emphasis on having more IO threads. It is important that these IO threads are load-balanced. In this diff, the mechanism to assign partitions to IO threads has changed.
  - All the coolness of Kryo's (de)serialization and RandomAccessFile (in D59277) is included in this diff, all at one place.

Test Plan:
mvn clean verify
out-of-core snapshot test passes

Reviewers: dionysis.logothetis, maja.kabiljo, sergey.edunov

Differential Revision: https://reviews.facebook.net/D59691

6 years agoGIRAPH-1078 createZooKeeperServerList should use task instead of port number
Sergey Edunov [Fri, 24 Jun 2016 17:15:30 +0000 (10:15 -0700)] 
GIRAPH-1078 createZooKeeperServerList should use task instead of port number

Summary: createZooKeeperServerList doesn't have a port yet, as we haven't started zookeeper. What we actually have is the task number. Port will be later set by the master.

Test Plan: run a few jobs.

Reviewers: maja.kabiljo, majakabiljo, heslami, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D59961

6 years agoGIRAPH-1062: Page rank in Blocks&Pieces
Maja Kabiljo [Wed, 11 May 2016 23:09:47 +0000 (16:09 -0700)] 
GIRAPH-1062: Page rank in Blocks&Pieces

Summary: We have some examples of pagerank, but they all have some things missing. Make one which will take sinks into account, have convergence checks, support both weighted and unweighted graphs.

Test Plan: mvn clean verify -P hadoop_facebook. We use this app internally

Differential Revision: https://reviews.facebook.net/D58059

6 years agoGIRAPH-1077: Jobs getting stuck after channel failure
Maja Kabiljo [Tue, 21 Jun 2016 18:54:40 +0000 (11:54 -0700)] 
GIRAPH-1077: Jobs getting stuck after channel failure

Summary: When a channel fails currently we just log the failure. Since we don't wait on open requests from every place, checking requests doesn't get called always, and we've seen issues with jobs staying stuck, for example during the input stage when request for split to read from worker to master fails. When we know that channel failed, we should try to resend the requests from that channel.

Test Plan: Ran a job multiple times until I got failure of channel between master and worker to happen, without this change job would get stuck but with it it ran successfully.

Differential Revision: https://reviews.facebook.net/D59895

6 years agoGIRAPH-1076 Race condition in FileTxnSnapLog
Sergey Edunov [Tue, 21 Jun 2016 17:14:34 +0000 (10:14 -0700)] 
GIRAPH-1076 Race condition in FileTxnSnapLog

Summary:
org.apache.zookeeper.server.persistence.FileTxnSnapLog has a potential for race condition:

    if (!this.dataDir.exists()) {
        if (!this.dataDir.mkdirs()) {
               throw new IOException("Unable to create data directory " + this.dataDir);
        }
    }

If two threads try to create FileTxnSnapLog simultaneously it can trigger IOException.
We saw this happening in Giraph where FileTxnSnapLog is being created by PurgeTask created by DatadirCleanupManager and by InProcessZooKeeperRunner#runFromConfig.
Until and if ever, the zookeeper code is fixed, we need to make sure zookeeper starts first and only then starts PurgeTask.

Test Plan: run a few jobs and mvn clean verify

Reviewers: majakabiljo, dionysis.logothetis, heslami, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D59883

6 years agoImprove out-of-core metrics
Hassan Eslami [Mon, 20 Jun 2016 19:23:42 +0000 (12:23 -0700)] 
Improve out-of-core metrics

Summary: For the metric showing the percentage of the graph in memory it makes more sense to show the lowest fraction of the graph that was in memory during a superstep. Basically, a user is more interested to see how bad was the out-of-core execution, and how many more machines he/she needs to use to run the job entirely in memory.

Test Plan:
mvn clean verify
visual, looking at Hadoop metric and per-worker metric

Reviewers: sergey.edunov, dionysis.logothetis, maja.kabiljo

Reviewed By: dionysis.logothetis, maja.kabiljo

Differential Revision: https://reviews.facebook.net/D59451

6 years agoGIRAPH-1075 checkstyle
Maja Kabiljo [Mon, 20 Jun 2016 17:24:26 +0000 (10:24 -0700)] 
GIRAPH-1075 checkstyle

Summary:

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:

6 years agoGIRAPH-1075: UnsafeByteArrayOutputStream silently writes long UTFs incorrectly
Maja Kabiljo [Fri, 17 Jun 2016 19:23:09 +0000 (12:23 -0700)] 
GIRAPH-1075: UnsafeByteArrayOutputStream silently writes long UTFs incorrectly

Summary: UnsafeByteArrayOutputStream.writeUTF was copied from DataOutputStream, but part which checks the length was missed out. When we try to write long strings they serialize without an issue, but when we try to deserialize them we get a wrong value back and don't read the same number of bytes. Make it fail like DataOutputStream instead.

Test Plan: Added a test

Differential Revision: https://reviews.facebook.net/D59817

6 years agoGIRAPH-1068 Make Zookeeper accept 0 as a port number and let it choose any available...
Sergey Edunov [Wed, 15 Jun 2016 21:50:44 +0000 (14:50 -0700)] 
GIRAPH-1068 Make Zookeeper accept 0 as a port number and let it choose any available free port

Summary:
We have a few use cases where having zookeeper bound to specific port is very inconvenient.
1) Unit tests that run in parallel.
2) Shared clusters where multiple giraph instances can run on the same machines.

In theory we don't need to know what port zookeeper will run on. In most cases we're fine with any port available.
Picking any available port is currently supported by the server socket, but is not supported in the code that parses zookeper configs (this code lives in zookeper).
We don't have to parse configs though, as we have a way to run zookeper in process. And in that case we can have a full control on how zookeeper is initialized.

For this task I want to allow 0 as a port number for zookeeper. Which will allow us to run zookeeper on any available port. And I will also remove "out of process" zookeeper, as it clearly provides no benefits to us.

Note: it will still be possible to run external zookeper, if you have it running somewhere as a service.

Note2: this change is intended to remove the functionality to support multiple zk servers running as a part of the Giraph job and only support a single zk server. If you want to run multiple zookeepers, you need to configure them separately and let Giraph use existing zookeper quorum

Test Plan:
tested a few things:
picking any available port
-Dgiraph.zkServerPort=0
using external zookeeper:
-Dgiraph.zkList="hadoopXXX.YYY.facebook.com:22181"
using specified port:
-Dgiraph.zkServerPort=22128

Reviewers: majakabiljo, maja.kabiljo, dionysis.logothetis, avery.ching, heslami

Reviewed By: avery.ching, heslami

Differential Revision: https://reviews.facebook.net/D59109

6 years agoGIRAPH-1070 Comparators in PartitionUtils can overflow
Sergey Edunov [Sat, 11 Jun 2016 00:41:02 +0000 (17:41 -0700)] 
GIRAPH-1070 Comparators in PartitionUtils can overflow

Test Plan: mvn clean verify

Reviewers: majakabiljo, maja.kabiljo, dionysis.logothetis, heslami

Reviewed By: heslami

Differential Revision: https://reviews.facebook.net/D59547

6 years agoGIRAPH-1069 Race condition in all *ConfOption classes
Sergey Edunov [Wed, 8 Jun 2016 22:01:40 +0000 (15:01 -0700)] 
GIRAPH-1069 Race condition in all *ConfOption classes

Summary:
*ConfOption classes, such as ClassConfOption, IntConfOption, FloatConfOption etc, call AllOtions.add(this) from their constructor. This call updates static list without any synchronization. Hence, if you create conf option classes in different threads you run into race condition.
The only reason we have AllOptions is to create documentation. It can be done with reflection instead. So, let's remove AllOtions.add(this) from all conf classes and implement reflection based approach in AllOptions

Test Plan:
mvn clean verify -Phadoop_facebook

checked that generated options.xml is the same as before

Reviewers: majakabiljo, dionysis.logothetis, heslami, maja.kabiljo

Reviewed By: heslami, maja.kabiljo

Differential Revision: https://reviews.facebook.net/D59331

6 years agoCleanup the old out-of-core message mechanism
Hassan Eslami [Tue, 31 May 2016 17:37:00 +0000 (10:37 -0700)] 
Cleanup the old out-of-core message mechanism

Summary: With the new out-of-core infrastructure, there is no need for the old version of message out-of-core. The old version of message out-of-core also interferes with the new mechanism. It seems that the old out-of-core message mechanism is not necessary anymore. This diff removes the old out-of-core messages and cleans up its implications on the rest of the code base.

Test Plan:
mvn clean verify
snapshot tests passes

Reviewers: maja.kabiljo, dionysis.logothetis, sergey.edunov

Differential Revision: https://reviews.facebook.net/D58701

6 years agoIntegrating out-of-core mechanism with credit-based flow-control and data generation...
Sergey Edunov [Fri, 20 May 2016 22:14:08 +0000 (15:14 -0700)] 
Integrating out-of-core mechanism with credit-based flow-control and data generation tethering

Summary: This diff integrates out-of-core infrastructure with credit-based flow control and adds the ability to tether the rate of data generation/processing. Data generation/processing rate is controlled by changing the number of active processing (input/compute) threads. This diff also implements a new (and more performant) adaptive out-of-core policy.

Test Plan:
mvn clean verify
all snapshot tests including ones with large data pass
Running adaptive out-of-core on large graph with very limited memory does not fail.
This diff should enable us to avoid *any* reasonable job to fail!

Reviewers: maja.kabiljo, sergey.edunov, avery.ching, dionysis.logothetis

Reviewed By: dionysis.logothetis

Subscribers: ramesh-muthusamy

Differential Revision: https://reviews.facebook.net/D55479

6 years agoGIRAPH-1063: Make primitive type generated fixed capacity min heaps
Maja Kabiljo [Tue, 17 May 2016 13:55:58 +0000 (06:55 -0700)] 
GIRAPH-1063: Make primitive type generated fixed capacity min heaps

Summary: It's often needed to get top k (key, value) pairs, but existing implementations deal with objects making them inefficient. Make one with primitive types. Most of the added code is generated.

Test Plan: Added tests, mvn verify passed

Differential Revision: https://reviews.facebook.net/D58299

6 years agoGIRAPH-1065: Allow extending JobProgressTrackerService
Maja Kabiljo [Wed, 18 May 2016 16:27:06 +0000 (09:27 -0700)] 
GIRAPH-1065: Allow extending JobProgressTrackerService

Summary: We might want to perform additional actions on events from JobProgressTrackerService. Allow overriding it and specifying another class to use.

Test Plan: Ran a job with custom JobProgressTrackerService and verify actions on it are called

Differential Revision: https://reviews.facebook.net/D58383

6 years agoGRIAPH-1064: Reconnect JobProgressTracker
Maja Kabiljo [Tue, 17 May 2016 19:22:19 +0000 (12:22 -0700)] 
GRIAPH-1064: Reconnect JobProgressTracker

Summary: When workers/master don't talk to JobProgressTracker it can disconnect and throw RejectedExecutionException - we should catch and retry on that exception too.

Test Plan: Ran a job where master would fail to talk to JobProgressTracker after a while without this change, with the change it worked

Differential Revision: https://reviews.facebook.net/D58323

6 years agoGIRAPH-1061: Add Connected Components block factory
Maja Kabiljo [Mon, 9 May 2016 23:47:13 +0000 (16:47 -0700)] 
GIRAPH-1061: Add Connected Components block factory

Summary: Add block factory for Connected Components to make it easy to run it.

Test Plan: Added a test, mvn clean verify

Differential Revision: https://reviews.facebook.net/D57951

6 years agoBlock API handle
Dionysios Logothetis [Tue, 10 May 2016 17:41:40 +0000 (10:41 -0700)] 
Block API handle

Summary:
- Some apps need a reference to the Block API objects (e.g. BlockOutputApi) before they are actually executed. See documentation of `BlockApiHandle` for more details.
- Also, made I `BlockWorkerApi` implement the `BlockOutputApi` as opposed to only the `BlockWorkerReceiveApi` so that output is possible inside the sender. too.

Test Plan:
- `mvn install`
- internal app that uses the api handle from the master and from the workers
- internal snapshot tests

Reviewers: maja.kabiljo, sergey.edunov, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D57939

6 years agoGIRAPH-1060: Add combiner to connected components
Maja Kabiljo [Mon, 9 May 2016 18:07:48 +0000 (11:07 -0700)] 
GIRAPH-1060: Add combiner to connected components

Summary: Connected components should use combiner to make it more efficient and require less memory. A few additional cleanups while at it.

Test Plan: mvn clean verify

Differential Revision: https://reviews.facebook.net/D57879

6 years agoGIRAPH-1058: Fix connection retry logic
Maja Kabiljo [Fri, 29 Apr 2016 20:23:29 +0000 (13:23 -0700)] 
GIRAPH-1058: Fix connection retry logic

Summary: Currently when we fail to connect to a channel we retry immediately and that retry most often fails. Add a short wait between retries, and improve the check for whether the channel connected successfully.

Test Plan: Ran multiple jobs which were often failing before the fix, with fix they worked

Differential Revision: https://reviews.facebook.net/D57447

6 years agofixing cases when there is no conf
spupyrev [Fri, 29 Apr 2016 16:47:29 +0000 (09:47 -0700)] 
fixing cases when there is no conf

Summary:
conf is not needed anymore -- the question is why LongDiffNullArrayEdges extends ConfigurableOutEdges:)
I'd prefer having LongDiffNullArray + CompressedOutEdges instead

Test Plan: test

Reviewers: ikabiljo, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D56505

6 years agoDecoupling NettyClient from control flow policy
Sergey Edunov [Tue, 26 Apr 2016 22:34:42 +0000 (15:34 -0700)] 
Decoupling NettyClient from control flow policy

Summary: This diff refactors NettyClient by decoupling flow control mechanism from NettyClient. Through the refactoring process, some performance and correctness bugs have been found due to the better readability of the refactored code.

Test Plan:
mvn clean verify
Tested large jobs and the output was correct
Tested large jobs and it did not have any performance degradation for codes using the old mechanism

Reviewers: maja.kabiljo, sergey.edunov, avery.ching, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D56367

6 years agoSetting auto-read in Netty to false
Hassan Eslami [Tue, 26 Apr 2016 17:51:13 +0000 (10:51 -0700)] 
Setting auto-read in Netty to false

Summary: By default, auto-read flag is set to true in Netty. This means Netty proactively read requests as they become available to a worker. However, this behavior sometime causes the off-heap memory to increase continuously. This happens specifically in presence of a spike in the amount of received requests. In that situation, the processing/handling rate of incoming requests may be less than the request receipt rate leading to high-memory kill (CGroup kill or OOM). With auto-read flag set to false, we read and process requests one by one and (hopefully/presumably) letting the transport layer do the flow control (i.e. dropping packets or reducing congestion window of TCP).

Test Plan:
mvn clean verify
PageRank-like application at large scale fails with auto-read set to true, and successfully runs with auto-read set to false.
**DO NOT ACCEPT THIS DIFF.** We should do more testing and prove it is reliable.

Reviewers: dionysis.logothetis, sergey.edunov, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D57213

6 years agoGIRAPH-1055: Javadoc fails build with Java 8
Avery Ching [Fri, 22 Apr 2016 23:07:22 +0000 (16:07 -0700)] 
GIRAPH-1055: Javadoc fails build with Java 8

Summary:
Java 8 javadoc has stricter checking, which results in mvn javadoc:javadoc failing:
Example:
100 errors
200 warnings
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Giraph Parent ............................... SUCCESS [ 1.196 s]
[INFO] Apache Giraph Core ................................. FAILURE [ 9.583 s]

Test Plan:
[INFO] --- maven-javadoc-plugin:2.9:javadoc (default-cli) @ giraph-dist ---
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Giraph Parent ............................... SUCCESS [  1.093 s]
[INFO] Apache Giraph Core ................................. SUCCESS [ 10.934 s]
[INFO] Apache Giraph Blocks Framework ..................... SUCCESS [  3.245 s]
[INFO] Apache Giraph Examples ............................. SUCCESS [  3.841 s]
[INFO] Apache Giraph Accumulo I/O ......................... SUCCESS [  2.048 s]
[INFO] Apache Giraph HBase I/O ............................ SUCCESS [  1.132 s]
[INFO] Apache Giraph HCatalog I/O ......................... SUCCESS [  3.053 s]
[INFO] Apache Giraph Gora I/O ............................. SUCCESS [  3.500 s]
[INFO] Apache Giraph Rexster I/O .......................... SUCCESS [  0.091 s]
[INFO] Apache Giraph Rexster Kibble ....................... SUCCESS [  1.276 s]
[INFO] Apache Giraph Rexster I/O Formats .................. SUCCESS [  3.193 s]
[INFO] Apache Giraph Distribution ......................... SUCCESS [  2.074 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 35.880 s
[INFO] Finished at: 2016-04-22T16:09:04-07:00
[INFO] Final Memory: 62M/1131M
[INFO] ------------------------------------------------------------------------

Reviewers: maja.kabiljo, sergey.edunov

Reviewed By: sergey.edunov

Differential Revision: https://reviews.facebook.net/D57105

6 years agoGIRAPH-1054: Separate ThriftService from JobProgressTrackerService on the client
Avery Ching [Wed, 13 Apr 2016 23:03:40 +0000 (16:03 -0700)] 
GIRAPH-1054: Separate ThriftService from JobProgressTrackerService on the client

Summary:
* Moves the job tracker conf options into the GiraphConstants
* Factors out the static GiraphJob#startThriftServer and GiraphJob#stopThriftServer methods from createJobProgressServer
* Allows adding other Thrift services to the ThriftServer

Test Plan: Tried on a cluster

Reviewers: maja.kabiljo, sergey.edunov

Reviewed By: sergey.edunov

Subscribers: sergey.edunov

Differential Revision: https://reviews.facebook.net/D57087

6 years ago[GIRAPH-1053] Log exceptions to command line
Maja Kabiljo [Tue, 19 Apr 2016 00:34:36 +0000 (17:34 -0700)] 
[GIRAPH-1053] Log exceptions to command line

Summary: When we know an exception occurred, log it to command line to make it easier for people running jobs to see what the issue was.

Test Plan: Ran two jobs, one with error in input one with error in compute, verified exception is printed to command line. Also ran a normal job and verified it didn't print anything new to command line

Differential Revision: https://reviews.facebook.net/D56931

6 years ago[GIRAPH-1041] Generate primitive type specific code
Igor Kabiljo [Tue, 15 Mar 2016 21:41:35 +0000 (14:41 -0700)] 
[GIRAPH-1041] Generate primitive type specific code

Summary:
- Use FreeMarker library to generate primitive type specific code.
Initially generating three sets of files:
{TYPE}Consumer, {TYPE}TypeOps and W{TYPE}ArrayList

Right now generation happens manually, and generated files are being committed.
In the future we can move those to a separate project, and have them generated
when maven is compiling and deploying.

Additionally to generation change, BasicArrayList is renamed to WArrayList and
directly extends fastutil implementation, to now serves two purposes:
- generic handling of efficient arrays through TypeOps
- extended fastutil class - to make it writtable, to add useful Java8 methods,
  or anything else we can think of. Since we are just extending it, and there is
  no efficiency penalty, we can always use WLongArrayList instead of LongArrayList.

There is additional WReusableLongArrayList, which when readFields is called,
doesn't size it to exact size, but reuses the old length.

Test Plan:
mvn clean install

There are no changes in logic in this diff. Will send a small separate diff
with some examples of what is now simpler.

Reviewers: sergey.edunov, dionysis.logothetis, spupyrev, maja.kabiljo

Differential Revision: https://reviews.facebook.net/D52515

6 years agoGIRAPH-1052: Fix makeSymmetricUnweighted
Maja Kabiljo [Fri, 8 Apr 2016 21:33:50 +0000 (14:33 -0700)] 
GIRAPH-1052: Fix makeSymmetricUnweighted

Summary: PrepareGraphPieces.makeSymmetricUnweighted is currently very inefficient for skewed degree graphs, because it reuses set objects based on the number of in edges, but also adds all out edges to the set, so sets which should be small can become huge. Since incoming ids are unique anyways, we don't need to add them to the set.

Test Plan: Ran a job without and with the change, verified result is the same but it's much faster now

Reviewers: ikabiljo

Differential Revision: https://reviews.facebook.net/D56481

6 years agoGIRAPH-1050: Add MapperObserver
Maja Kabiljo [Thu, 7 Apr 2016 16:40:27 +0000 (09:40 -0700)] 
GIRAPH-1050: Add MapperObserver

Summary: Add MapperObserver which will be called once per mapper before anything else happens.

Test Plan: Ran a job with MapperObserver set, verified it's called at the right time

Differential Revision: https://reviews.facebook.net/D56373

6 years agoGIRAPH-1046: Add a way to synchronize full GC calls across workers
Maja Kabiljo [Thu, 10 Mar 2016 22:30:11 +0000 (14:30 -0800)] 
GIRAPH-1046: Add a way to synchronize full GC calls across workers

Summary: In applications which use memory more heavily, we can see full GC pauses happening on different workers at different times, and each of these is causing some delay because other workers are often waiting on something from the worker in GC (closing open requests, finishing superstep, etc). Having a way to coordinate when full GCs are called could help them have less effect on job performance.

Test Plan: Ran some memory heavy jobs where I observed overall better performance from using this feature.

Differential Revision: https://reviews.facebook.net/D55347

6 years agoImprove flow control on sender side (pre-requisite for credit-based flow control)
Hassan Eslami [Mon, 4 Apr 2016 23:29:36 +0000 (16:29 -0700)] 
Improve flow control on sender side (pre-requisite for credit-based flow control)

Summary: Currently, a sender worker will keep all open requests (and optionally up to a certain number of total open requests) in its own memory. This behavior may cause high memory usage in sender side. Also, since messages can arrive to a worker at an arbitrary rate, receiver may not have the ability to handle all incoming messages, hence we may see a large memory footprint in receiver as well. This diff addresses the problem by limiting the number of open requests per worker in sender side. Also, it provides a cache of unsent requests on sender in case the sender already sent enough messages to another worker but has not received any response back.

Test Plan: mvn clean verify

Reviewers: avery.ching, sergey.edunov, maja.kabiljo, dionysis.logothetis

Reviewed By: dionysis.logothetis

Subscribers: Alessio

Differential Revision: https://reviews.facebook.net/D43797

6 years agounsafe readers for varints
spupyrev [Fri, 1 Apr 2016 18:06:38 +0000 (11:06 -0700)] 
unsafe readers for varints

Summary:
Varint encdoing (and hence, LongDiffNullArrayEdges) can be much faster if using UnsafeByteInput/Output. In fact, the speed of iterating over LongDiffNullArrayEdges is almost as fast as iterating over LongNullArrayEdges after the change. This difference is less than a few percent for jobs that require a lot of edge iterators, while it is significant (over 20%) without the change.

JIRA: https://issues.apache.org/jira/browse/GIRAPH-1049

Test Plan: mvn clean install

Reviewers: sergey.edunov, maja.kabiljo, dionysis.logothetis, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D56169

6 years agofaster varint
spupyrev [Wed, 23 Mar 2016 17:34:05 +0000 (10:34 -0700)] 
faster varint

Summary:
Varint is improved in two ways:
- faster readLong and readInt
- making sure that negative numbers can be encoded

JIRA: https://issues.apache.org/jira/browse/GIRAPH-1049

Test Plan: TestVarint.java

Reviewers: dionysis.logothetis, maja.kabiljo, sergey.edunov, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D55755

6 years ago[GIRAPH-1041] Generate primitive type specific code for functions
Igor Kabiljo [Tue, 15 Mar 2016 21:40:52 +0000 (14:40 -0700)] 
[GIRAPH-1041] Generate primitive type specific code for functions

Summary:
- Use FreeMarker library to generate primitive type specific code.
Initially generating two sets of files:
{TYPE}Consumer, Obj2{TYPE}Function

Right now generation happens manually, and generated files are being committed.
In the future we can move those to a separate project, and have them generated
when maven is compiling and deploying.

Splitting of D52515 into reviewable pieces

Test Plan:
mvn clean install

There are no changes in logic in this diff.

Reviewers: spupyrev, sergey.edunov, dionysis.logothetis, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D55527

6 years agoIncrease info-logging while waiting for straggler workers
Tyler Serdar Bulut [Tue, 15 Mar 2016 19:10:36 +0000 (12:10 -0700)] 
Increase info-logging while waiting for straggler workers

Summary:
Keep logging info messages while waiting for task-time-out

Test Plan:
All unit tests are passing.
Manual tests to ensure desired functionality is observed.

Reviewers: maja.kabiljo

Subscribers: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D55467

6 years agoGIRAPH-1039: Fix stopping jmap histo thread
Maja Kabiljo [Tue, 3 Nov 2015 19:01:22 +0000 (11:01 -0800)] 
GIRAPH-1039: Fix stopping jmap histo thread

Summary: Currently if jmap histo frequency is set to long period we end up stuck in the end of the job for a long time waiting on jmap histo thread

Test Plan: Ran a job with long jmap frequency - verified it gets stuck without this change and finishes fine with it

Differential Revision: https://reviews.facebook.net/D50085

6 years agounsafe byte readers/writers
spupyrev [Tue, 15 Mar 2016 23:47:15 +0000 (16:47 -0700)] 
unsafe byte readers/writers

Summary: using unsafe readers/writers

Test Plan:
tested on PageRank app, and Fanout computation. In both cases, there is a ~20% speedup

JIRA: https://issues.apache.org/jira/browse/GIRAPH-1049

Reviewers: sergey.edunov, maja.kabiljo, dionysis.logothetis, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D55509

6 years agoNew out-of-core infrastructure (first patch including fixed out-of-core mechanism)
Sergey Edunov [Tue, 15 Mar 2016 17:40:20 +0000 (10:40 -0700)] 
New out-of-core infrastructure (first patch including fixed out-of-core mechanism)

Summary: This is a re-design of out-of-core mechanism. The new implementation allows for much more intelligent partition scheduling and IO.

Test Plan:
mvn clean verify

Reviewers: maja.kabiljo, sergey.edunov, avery.ching, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D54549

6 years ago[easy] Log aggregate times per piece at the end
Igor Kabiljo [Fri, 4 Mar 2016 20:33:35 +0000 (12:33 -0800)] 
[easy] Log aggregate times per piece at the end

Test Plan:
It prints:

  16/03/04 12:46:38 INFO internal.BlockMasterLogic: Time sums master:
     count    time %       time name
        34    37.50%   00:00:00 PageRankCheckConvergence
        34    62.50%   00:00:00 PageRankUpdate
        68             00:00:00 total

  16/03/04 12:46:38 INFO internal.BlockMasterLogic: Time sums worker:
     count    time %       time name
        33    50.00%   00:00:00 [receiver=PageRankCheckConvergence,sender=PageRankUpdate]
         1     0.82%   00:00:00 [receiver=PageRankCheckConvergence,sender=null]
        34    47.54%   00:00:00 [receiver=PageRankUpdate,sender=PageRankCheckConvergence]
         1     1.64%   00:00:00 [receiver=null,sender=PageRankUpdate]
        69             00:00:00 total

Reviewers: dionysis.logothetis, sergey.edunov, maja.kabiljo, spupyrev

Reviewed By: spupyrev

Differential Revision: https://reviews.facebook.net/D55113

6 years agoGIRAPH-1044. Update book info in the User Docs / Related Literature page of the site
Roman Shaposhnik [Mon, 7 Mar 2016 01:41:28 +0000 (17:41 -0800)] 
GIRAPH-1044. Update book info in the User Docs / Related Literature page of the site

6 years agoMaking threads in JobProgressService daemons
Sergey Edunov [Fri, 26 Feb 2016 19:48:08 +0000 (11:48 -0800)] 
Making threads in JobProgressService daemons

Summary: We noticed that sometimes job client doesn't finish because threads in JobProgressService are still running. Here we're making them daemons, so that if everything else is done, we will be able to finish application.

Test Plan: run test job

Reviewers: majakabiljo, dionysis.logothetis, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D53829

6 years agoAdd PrepareGraphPieces.isSymmetricBlock to check for a symmetric graph
Yuri Schimke [Thu, 25 Feb 2016 18:42:21 +0000 (10:42 -0800)] 
Add PrepareGraphPieces.isSymmetricBlock to check for a symmetric graph

Summary:
PrepareGraphPieces.isSymmetricBlock is a reusable factory function
for creating blocks that check if a graph is symmetric by
XOR reducing a preditable hash of the edges pairs (V1, V2)

Test Plan:
Unit Tests for all changed files, testing on demo graphs.
Will run a full test job.

Reviewers: spupyrev, dionysis.logothetis, ikabiljo, maja.kabiljo

Reviewed By: maja.kabiljo

Subscribers: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D54411

6 years agoUse Partitions in LocalBlockRunner
Igor Kabiljo [Tue, 29 Dec 2015 23:14:32 +0000 (15:14 -0800)] 
Use Partitions in LocalBlockRunner

Summary:
Speed up LocalBlockRunner, by not operating on a TestGraph, but on vertices stored in partitions.
With it - deprecate old non-SimplePartitionerFactory way of specifying partitioning.
(and with it renamed SimplePartitionerFactory to old name GraphPartitionerFactory, and changing it to
 GraphPartitionerFactoryInterface)

Test Plan:
Run unit-test for speed:

  testEmptyIterationsSmallGraph
    6.5 -> 6.3
  testEmptyIterationsSyntheticGraphLowDegree()
    42.0 -> 13.8
  testEmptyIterationsSyntheticGraphHighDegree()
    3.6 -> 2.0
  testPageRankSyntheticGraphLowDegree()
    51.0 -> 47.2
  testPageRankSyntheticGraphHighDegree()
    20.3 -> 17.4

Reviewers: maja.kabiljo, sergey.edunov, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D52425

6 years agoImplementation of DirectWritableSerializerCopyTest.copy()
Sergey Edunov [Fri, 15 Jan 2016 17:42:50 +0000 (09:42 -0800)] 
Implementation of DirectWritableSerializerCopyTest.copy()

Summary: Needed in certain application.

Test Plan:
  mvn clean install
  Also run an actual application.

Reviewers: maja.kabiljo, sergey.edunov, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D52149

6 years agoCorrection in interface documentation
Sergey Edunov [Tue, 1 Dec 2015 19:52:29 +0000 (11:52 -0800)] 
Correction in interface documentation

Summary: The description of the arguments in TypoOps.set() is reversed.

Test Plan: n/a

Reviewers: majakabiljo, sergey.edunov, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D43425

6 years agoAdded vldb publication
Sergey Edunov [Tue, 1 Dec 2015 19:50:19 +0000 (11:50 -0800)] 
Added vldb publication

Summary: Added new vldb paper in literature.xml.

Test Plan: n/a

Reviewers: avery.ching, sergey.edunov, maja.kabiljo, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D45399

6 years agoMake IntSupplier extend Serializable
Sergey Edunov [Wed, 18 Nov 2015 19:58:11 +0000 (11:58 -0800)] 
Make IntSupplier extend Serializable

Summary: Lambdas with IntSuppliers don't get serialized.

Test Plan: n/a

Reviewers: ikabiljo, sergey.edunov

Reviewed By: sergey.edunov

Differential Revision: https://reviews.facebook.net/D50973

6 years ago[GIRAPH-1037] Surface worker index information to computations
Igor Kabiljo [Sat, 24 Oct 2015 00:34:07 +0000 (17:34 -0700)] 
[GIRAPH-1037] Surface worker index information to computations

Summary:
It can be useful for applications to surface:
number of workers
index of a worker particular VertexId is assigned to
index of a current worker.

Test Plan: mvn clean install

Reviewers: dionysis.logothetis, sergey.edunov, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D49401

6 years agoGIRAPH-1036: Allow mappers to fail early on exceptions
Maja Kabiljo [Wed, 21 Oct 2015 01:19:36 +0000 (18:19 -0700)] 
GIRAPH-1036: Allow mappers to fail early on exceptions

Summary:
Often when something fails in a mapper we see it stuck until its timeout passes. Digging through this issue I found two root causes:
- Many threads we are creating were not daemon, preventing process to exit, only main thread should be daemon
- When calling submit on ExecutorService, exceptions are not propagated back to the caller, unless get is called on the future. In ProgressableUtils.getResultsWithNCallables we were calling get on one by one future, causing us to have to wait for previous futures to finish before getting exception which happened in later one.

Test Plan: Run jobs in which I simulated exceptions on some partitions in loading, compute and storing phases, for each verified we exit quickly with exception clearly shown, and without this change we'd wait for timeout and other threads from same ProgressableUtils.getResultsWithNCallables to finish. Run a normal job successfully. mvn clean verify

Differential Revision: https://reviews.facebook.net/D49143

6 years agoGIRAPH-1034 Allow IPs for Worker2Worker communication
Sergey Edunov [Tue, 20 Oct 2015 00:35:58 +0000 (17:35 -0700)] 
GIRAPH-1034 Allow IPs for Worker2Worker communication

Test Plan:
Run several jobs in unreliable DNS environment.  With and without -Dgiraph.preferIP=true
Without this options job fail, but pass otherwise.

Reviewers: dionysis.logothetis, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D48825

6 years agoGIRAPH-1035: Make sure we are able to use all compute threads
Maja Kabiljo [Mon, 19 Oct 2015 17:35:54 +0000 (10:35 -0700)] 
GIRAPH-1035: Make sure we are able to use all compute threads

Summary: The default logic of choosing the number of partitions when we use few workers and a lot of compute threads ends up choosing less partitions than there are threads. Add additional setting to prevent that.

Test Plan: Run a job with a few workers and lot of threads and verified number of partitions is set properly. mvn verify passed.

Differential Revision: https://reviews.facebook.net/D48993

6 years agoGIRAPH-1033: Remove zookeeper from input splits handling
Maja Kabiljo [Mon, 12 Oct 2015 17:56:39 +0000 (10:56 -0700)] 
GIRAPH-1033: Remove zookeeper from input splits handling

Summary: Currently we use zookeeper for handling input splits, by having each worker checking each split, and when a lot of splits are used this becomes very slow. We should have master coordinate input splits allocation instead, making the complexity proportional to #splits instead of #workers*#splits. Master holds all the splits and worker send requests to him asking for splits when they need them.

Test Plan: Run a job with 200 machines and 200k small splits - without this change input superstep takes 30 minutes, and with it less than 2 minutes. Also verified correctness on sample job. mvn clean verify passes.

Differential Revision: https://reviews.facebook.net/D48531

6 years agoMerge seeded and unseeded BFS into a single BFS implementation
Mayank Pundir [Fri, 9 Oct 2015 20:18:37 +0000 (13:18 -0700)] 
Merge seeded and unseeded BFS into a single BFS implementation

Summary: This change creates a general Breadth First Search version which supports the default BFS where distances to one or more seeds are computed. Additionally, this new version also supports assigning vertices to closest seeds for the purpose of clustering the vertices. This change provides a BlockFactory which highlights this functionality in addition to test cases.

Test Plan: Test cases for the new functionality added.

Reviewers: spupyrev, ikabiljo

Reviewed By: ikabiljo

Subscribers: dionysis.logothetis, sergey.edunov, maja.kabiljo

Differential Revision: https://reviews.facebook.net/D47985

7 years ago[GIRAPH-1031] Adding onAllMappersStarted callback
Sergey Edunov [Sat, 19 Sep 2015 00:27:20 +0000 (17:27 -0700)] 
[GIRAPH-1031] Adding onAllMappersStarted callback

7 years agofixing large sets
spupyrev [Tue, 8 Sep 2015 23:05:29 +0000 (16:05 -0700)] 
fixing large sets

Summary:
Sets with a large number of elements (>800000000) are not supported by IntOpenHashSet.
Chaning it to IntOpenHashBigSet.

ArrayLists have the same problem, but we'll postpone the fix untill we have a use case

https://issues.apache.org/jira/browse/GIRAPH-1028

Test Plan:
mvn clean install
see also a new test TestCollections (that needs 32G to run)

Reviewers: sergey.edunov, maja.kabiljo, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D44859

7 years agoAdding Blocks Framework documentation
Igor Kabiljo [Mon, 24 Aug 2015 21:58:09 +0000 (14:58 -0700)] 
Adding Blocks Framework documentation

Summary:
Adding basic documentation - full example walkthrough and migration library, and only minor info about the framework itself.
Will extend framework part more in the future.

Test Plan: Not sure how to test

Reviewers: avery.ching, sergey.edunov, dionysis.logothetis, maja.kabiljo

Reviewed By: dionysis.logothetis, maja.kabiljo

Differential Revision: https://reviews.facebook.net/D45411

7 years ago[GIRAPH-1023] Adding out-of-core messages to previously implemented adaptive out...
Hassan Eslami [Thu, 30 Jul 2015 21:03:12 +0000 (14:03 -0700)] 
[GIRAPH-1023] Adding out-of-core messages to previously implemented adaptive out-of-core mechanism

Summary:
This is the continuation of the previous diff on out-of-core mechanism. This diff completes the last diff by adding out-of-core messages, making the entire out-of-core mechanism a cohesive entity in Giraph.

This diff also improves the API of PartitionStore by some minor refactoring.

Test Plan:
mvn clean verify
running pagerank and turning message combiner off on a large graph with limited memory does not fail

Reviewers: maja.kabiljo, sergey.edunov, avery.ching, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D42897

7 years agoGIRAPH-1024mvn release:prepare not committing changes to pom.xml
Sergey Edunov [Wed, 29 Jul 2015 21:31:52 +0000 (14:31 -0700)] 
GIRAPH-1024mvn release:prepare not committing changes to pom.xml

7 years ago[GIRAPH-1022] Adaptive out-of-core mechanism for input superstep and graph
Hassan Eslami [Mon, 27 Jul 2015 18:59:21 +0000 (11:59 -0700)] 
[GIRAPH-1022] Adaptive out-of-core mechanism for input superstep and graph

Summary: This code adds the ability to adaptively control the out-of-core mechanism for graph data structure at run-time during input/output superstep and computation superstep. Basically, the implemented mechanism monitors the amount of available free memory in a separate thread. If there is not enough memory, the code adjusts the number of partitions in memory, and spills a series of partitions/buffers to disk. Also, if the amount of free memory is more than expected, some of the on-disk partitions are brought back to memory. Additionally, if amount of free memory is marginal, the mechanism mocks the memory usage by gradually bringing partitions to memory.

Test Plan:
mvn clean verify
Unit tests added to giraph-core
End-to-end test added to giraph-example
Running the code on PageRank on a large graph and not getting OOM failures.

Reviewers: maja.kabiljo, sergey.edunov, avery.ching, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D40563

7 years agoGIRAPH-1021: Missing progress report for graph mutations
Hassan Eslami [Thu, 16 Jul 2015 22:48:26 +0000 (15:48 -0700)] 
GIRAPH-1021: Missing progress report for graph mutations
Fix progress report for graph mutations

Summary: Progress report in the new implementation of graph mutation is missing. This can cause lack of progress errors at runtime for mutations on high degree vertices

Test Plan: mvn clean verify

Reviewers: avery.ching, dionysis.logothetis, maja.kabiljo, sergey.edunov

Differential Revision: https://reviews.facebook.net/D42309

7 years ago[GIRAPH-1020] TaskInfo equality condition bug fix
Hassan Eslami [Wed, 15 Jul 2015 19:48:22 +0000 (12:48 -0700)] 
[GIRAPH-1020] TaskInfo equality condition bug fix

Summary: Currently equality is checked based on the raw host-name of an object with lower-case host-name of another object. This is not the right semantic and can cause subsequent bugs in partition assignment and migration.

Test Plan: mvn clean verify

Reviewers: maja.kabiljo, sergey.edunov, dionysis.logothetis, avery.ching

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D42273

7 years ago[GIRAPH-1019] Optimizing and debugging vertex mutation mechanism
Hassan Eslami [Mon, 6 Jul 2015 17:39:53 +0000 (10:39 -0700)] 
[GIRAPH-1019] Optimizing and debugging vertex mutation mechanism

Summary:
The old implementation of vertex mutation mechanism was single-threaded and had some redundant computation. The single threaded behavior is causing a huge performance degradation for out-of-core case, since all the partitions are being read and written sequentially in one thread to apply mutations. Also, in case where a vertex is mutated and has messages at the same time, the current code fails to execute which does not seem to be the expected behavior.

This diff implements an optimized multi-threaded approach for vertex mutations. With this diff, vertex mutation happens at the beginning of processing each partition. Also, parts of partition migration code is modified to migrate mutations as well.

Test Plan: mvn clean verify

Reviewers: avery.ching, maja.kabiljo, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D40821

7 years ago[GIRAPH-1013] Adding more libraries, algos and examples
Igor Kabiljo [Tue, 30 Jun 2015 06:20:58 +0000 (23:20 -0700)] 
[GIRAPH-1013] Adding more libraries, algos and examples

Summary:
Adding more libraries, algos and examples

Only changes from our internal state:

New classes:
PairReduce
MaxMessageCombiner
PartitioningStats
TestMessageChain

Change to:
Pieces
SendMessageChain

Test Plan: mvn clean install -Phadoop_facebook

Reviewers: maja.kabiljo, dionysis.logothetis, sergey.edunov

Reviewed By: sergey.edunov

Differential Revision: https://reviews.facebook.net/D40935

7 years ago[GIRAPH-1013] Adding prepare graph library in Java8
Igor Kabiljo [Thu, 25 Jun 2015 21:13:18 +0000 (14:13 -0700)] 
[GIRAPH-1013] Adding prepare graph library in Java8

Summary:
Adding simple graph preparation:
- symmetric
- removal of isolated edges
- normalizing
- connected components

Creating a new module, only used for Phadoop_facebook, which is written in Java8.

Tests are new/modified from what is in our repo, the rest is identical.

Test Plan: mvn clean install

Reviewers: maja.kabiljo, dionysis.logothetis, sergey.edunov

Reviewed By: sergey.edunov

Differential Revision: https://reviews.facebook.net/D40719

7 years agoGIRAPH-1018: Improving PartitionStore API to better match its expected behaviour
Hassan Eslami [Mon, 29 Jun 2015 22:47:12 +0000 (15:47 -0700)] 
GIRAPH-1018: Improving PartitionStore API to better match its expected behaviour
(heslami via aching)

Summary: Currently for statistics operations on each partition, entire partition is loaded using getOrCreatePartition method of PartitionStore. This diff improves the API of PartitionStore by adding required methods to only return the statistics.

Test Plan: mvn clean verify

Reviewers: dionysis.logothetis, maja.kabiljo, avery.ching

Reviewed By: avery.ching

Differential Revision: https://reviews.facebook.net/D40731

7 years ago[GIRAPH 1013] Adding TestGraphUtils and NumericTestGraph
Igor Kabiljo [Tue, 23 Jun 2015 04:44:49 +0000 (21:44 -0700)] 
[GIRAPH 1013] Adding TestGraphUtils and NumericTestGraph

Summary:
Adding simplified framework for running application tests.
Code for testing is going to be much shorter, especially
when Java 8 is used.

Only difference compared to our codebase is addition of
SendingMessagesTest to showcase these capabilities

Test Plan: mvn clean install

Reviewers: dionysis.logothetis, sergey.edunov, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D40533

7 years ago[GIRAPH-1013] Adding reducer handle utilities
Igor Kabiljo [Wed, 17 Jun 2015 19:47:52 +0000 (12:47 -0700)] 
[GIRAPH-1013] Adding reducer handle utilities

Summary: And more functional interfaces, and PairWritable

Test Plan: mvn clean install

Reviewers: dionysis.logothetis, sergey.edunov, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D40269

7 years agoGIRAPH-1017: Add support for ImmutableMap in Kryo
Maja Kabiljo [Tue, 23 Jun 2015 23:48:55 +0000 (16:48 -0700)] 
GIRAPH-1017: Add support for ImmutableMap in Kryo

Summary: Trying to serialize ImmutableMap currently throws an exception - we should add a support for it.

Test Plan: Added a test, verified that app which was failing without the change passes now

Reviewers: ikabiljo

Differential Revision: https://reviews.facebook.net/D40575

7 years ago[GIRAPH 1013] Apply @edunov fix for block output
Igor Kabiljo [Thu, 18 Jun 2015 23:52:22 +0000 (16:52 -0700)] 
[GIRAPH 1013] Apply @edunov fix for block output

Summary:
Apply fix:
https://phabricator.fb.com/D2141200

Test Plan: mvn clean install

Reviewers: maja.kabiljo, sergey.edunov, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D40395

7 years agoGIRAPH-1015: Support vertex combiner in TestGraph
Maja Kabiljo [Wed, 17 Jun 2015 00:30:22 +0000 (17:30 -0700)] 
GIRAPH-1015: Support vertex combiner in TestGraph

Summary: TestGraph should use vertex combiner which is specified in the conf passed, instead of replacing the vertex with latest added.

Test Plan: Added a test, mvn clean verify

Reviewers: sergey.edunov, ikabiljo

Differential Revision: https://reviews.facebook.net/D40227

7 years ago[GIRAPH-1013] Add library of common pieces and functions
Igor Kabiljo [Wed, 10 Jun 2015 22:33:35 +0000 (15:33 -0700)] 
[GIRAPH-1013] Add library of common pieces and functions

Summary:
StripingUtils has been modified, to be compiled with Java7, and to
have snippet of MIT lincense for used hash algorithm.

Test Plan: mvn clean install

Reviewers: dionysis.logothetis, sergey.edunov, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D39915

7 years ago[GIRAPH-1013] Add migration library
Igor Kabiljo [Wed, 10 Jun 2015 17:33:04 +0000 (10:33 -0700)] 
[GIRAPH-1013] Add migration library

Summary:
Add library that simplifies migration to Blocks Framework

Copied one of the example tests, that use both master, computation and worker context,
to show it all works without any code change

Test Plan: mvn clean install -Phadoop_facebook

Reviewers: dionysis.logothetis, sergey.edunov, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D39891

7 years ago[GIRAPH-1013] Cleanup use of conf for local testing
Igor Kabiljo [Thu, 11 Jun 2015 23:17:53 +0000 (16:17 -0700)] 
[GIRAPH-1013] Cleanup use of conf for local testing

Summary:
Right now we are creating two immutable conf options, and using them both,
which is unnecessary and confusing.

Change to do it only once, and not need to pass it around (TestGraph has it)

Test Plan: mvn clean install

Reviewers: laxman.dhulipala, maja.kabiljo, dionysis.logothetis, sergey.edunov

Differential Revision: https://reviews.facebook.net/D39987

7 years agoGIRAPH-1014: Decrease number of nifty threads created
Maja Kabiljo [Sat, 13 Jun 2015 02:06:12 +0000 (19:06 -0700)] 
GIRAPH-1014: Decrease number of nifty threads created

Summary: By default, ThriftClientManager creates 2*numProcessors threads, making it harder to look through jstack. We use them just for job progress reporting, so no need to have that many.

Test Plan: Run a job, verified number of threads decreased

Reviewers: ikabiljo, sergey.edunov

Differential Revision: https://reviews.facebook.net/D40125

7 years ago[GIRAPH-1013] Add BlockExecutionTest
Igor Kabiljo [Mon, 8 Jun 2015 23:24:45 +0000 (16:24 -0700)] 
[GIRAPH-1013] Add BlockExecutionTest

Summary:
Add support for executing single blocks, as well as adding a test for core of the framework

Equivalent to internal https://phabricator.fb.com/D2137589 diff.

Test Plan: mvn clean install

Reviewers: maja.kabiljo, dionysis.logothetis, sergey.edunov

Reviewed By: sergey.edunov

Differential Revision: https://reviews.facebook.net/D39873

7 years ago[GIRAPH-1013] Add local (single machine) implementation
Igor Kabiljo [Mon, 8 Jun 2015 18:48:28 +0000 (11:48 -0700)] 
[GIRAPH-1013] Add local (single machine) implementation

Summary:
This allows you to run application written in Blocks Framework
very efficiently on single machine.

Specifically this is interesting for having fast unit tests.

Test Plan:
mvn clean install -Phadoop_facebook

Making TargetVertexIdIterator public is in addition to just adding classes to open source

Reviewers: maja.kabiljo, dionysis.logothetis, sergey.edunov

Reviewed By: sergey.edunov

Differential Revision: https://reviews.facebook.net/D39717

7 years agoGIRAPH-1012: Remove giraph-hive
Maja Kabiljo [Fri, 12 Jun 2015 18:48:18 +0000 (11:48 -0700)] 
GIRAPH-1012: Remove giraph-hive

Summary: We are not using hive-io-experimental anymore and we'll be deprecating that project. Since we are not aware of anyone else using it, we are thinking of removing giraph-hive completely from the repository. Please comment if you have any objections.

Test Plan: compile with different profiles

Reviewers: ikabiljo, sergey.edunov

Differential Revision: https://reviews.facebook.net/D40053