giraph.git
5 years agocloses #28
Dionysios Logothetis [Thu, 30 Mar 2017 16:19:41 +0000 (11:19 -0500)] 
closes #28

5 years agoFix findbug issue
Hassan Eslami [Thu, 30 Mar 2017 00:30:21 +0000 (19:30 -0500)] 
Fix findbug issue

5 years agocloses #29
Dionysios Logothetis [Wed, 29 Mar 2017 16:56:59 +0000 (09:56 -0700)] 
closes #29

5 years agoJIRA-1137
Hassan Eslami [Mon, 27 Mar 2017 18:22:09 +0000 (13:22 -0500)] 
JIRA-1137

closes #26

5 years agoJIRA-1134
Maja Kabiljo [Fri, 17 Mar 2017 17:40:53 +0000 (10:40 -0700)] 
JIRA-1134

closes #24

5 years agocloses #25
Sergey Edunov [Wed, 15 Mar 2017 17:00:58 +0000 (10:00 -0700)] 
closes #25

5 years agoJIRA-1133
Maja Kabiljo [Tue, 7 Mar 2017 20:50:45 +0000 (12:50 -0800)] 
JIRA-1133

closes #22

5 years agoGIRAPH-1132
Sergey Edunov [Wed, 1 Mar 2017 22:05:54 +0000 (14:05 -0800)] 
GIRAPH-1132

closes #21

5 years agocloses #20
Dionysios Logothetis [Mon, 27 Feb 2017 18:39:56 +0000 (10:39 -0800)] 
closes #20

5 years agoGIRAPH-1130 Fix RepeatUntilBlock
Igor Kabiljo [Fri, 27 Jan 2017 19:34:06 +0000 (11:34 -0800)] 
GIRAPH-1130 Fix RepeatUntilBlock

closes #16

5 years agoGIRAPH-1129
Igor Kabiljo [Fri, 20 Jan 2017 18:51:20 +0000 (10:51 -0800)] 
GIRAPH-1129

closes #15

5 years agoGIRAPH-1129
Igor Kabiljo [Fri, 13 Jan 2017 19:20:12 +0000 (11:20 -0800)] 
GIRAPH-1129

closes #14

5 years agoGIRAPH-1129
Igor Kabiljo [Fri, 13 Jan 2017 18:48:45 +0000 (10:48 -0800)] 
GIRAPH-1129

closes #13

5 years agoGIRAPH-1128. Giraph does not build because of maven-dependency-plugin (patch submitte...
Roman Shaposhnik [Fri, 13 Jan 2017 02:25:50 +0000 (18:25 -0800)] 
GIRAPH-1128. Giraph does not build because of maven-dependency-plugin (patch submitted by Naresh Bafna)

5 years agoAdd missing files for GIRAPH-1125. Closes #12
Sergey Edunov [Tue, 27 Dec 2016 21:49:25 +0000 (13:49 -0800)] 
Add missing files for GIRAPH-1125. Closes #12

5 years agoGIRAPH-1125
Hassan Eslami [Fri, 23 Dec 2016 18:03:37 +0000 (12:03 -0600)] 
GIRAPH-1125

Closes #12

5 years agoFix typo
Sergey Edunov [Tue, 29 Nov 2016 00:23:58 +0000 (16:23 -0800)] 
Fix typo

Author: edunov

Closes #11

5 years agoCorrect typo in word "initialize"
KidEinstein [Tue, 29 Nov 2016 00:16:07 +0000 (16:16 -0800)] 
Correct typo in word "initialize"

Author: KidEinstein

Reviewer: edunov

Closes #10

5 years agoGIRAPH-1124 - Create documentation on how to make Giraph release
Sergey Edunov [Fri, 18 Nov 2016 19:19:16 +0000 (11:19 -0800)] 
GIRAPH-1124 - Create documentation on how to make Giraph release

Test Plan: mvn clean site

Reviewers: rvs, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D65313

5 years agoFix Checkstyle
Sergey Edunov [Mon, 14 Nov 2016 19:58:59 +0000 (11:58 -0800)] 
Fix Checkstyle

Test Plan:
mvn clean site -DskipTests -Phadoop_2 -Ddependency.locations.enabled=false
mvn clean install -Phadoop_2 -Prelease
mvn clean install -Phadoop_1 -Prelease

Reviewers: dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D65499

5 years agoBump Apache Giraph version to 1.3.0-SNAPSHOT
Sergey Edunov [Tue, 25 Oct 2016 00:20:02 +0000 (17:20 -0700)] 
Bump Apache Giraph version to 1.3.0-SNAPSHOT

Test Plan: none

Reviewers: dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D65391

5 years agoGIRAPH-1122 Javadoc generation fails for Giraph 1.2.0
Sergey Edunov [Fri, 14 Oct 2016 20:43:50 +0000 (13:43 -0700)] 
GIRAPH-1122 Javadoc generation fails for Giraph 1.2.0

Test Plan: mvn clean site -DskipTests -Phadoop_2 -Ddependency.locations.enabled=false

Reviewers: majakabiljo, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D64995

5 years agoFixing RAT checks for Apache Giraph release
Sergey Edunov [Tue, 11 Oct 2016 23:49:16 +0000 (16:49 -0700)] 
Fixing RAT checks for Apache Giraph release

Test Plan:
mvn apache-rat:check -Phadoop_2
mvn apache-rat:check -Phadoop_1
mvn clean verify -Phadoop_facebook

Reviewers: maja.kabiljo, majakabiljo, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D64917

5 years agoGIRAPH-1118 - Giraph-gora and Giraph-rexster test cases fail in release-1.2
Sergey Edunov [Thu, 6 Oct 2016 18:13:49 +0000 (11:13 -0700)] 
GIRAPH-1118 - Giraph-gora and Giraph-rexster test cases fail in release-1.2

Test Plan:
mvn clean verify -Phadoop_facebook
rm -rf ~/.m2/repository/org/apache/giraph
mvn clean install -Phadoop_1
rm -rf ~/.m2/repository/org/apache/giraph
mvn clean install -Phadoop_2

Reviewers: maja.kabiljo, majakabiljo, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D64719

5 years ago GIRAPH-1118 - Giraph-gora and Giraph-rexster test cases fail in release-1.2
Sergey Edunov [Wed, 5 Oct 2016 22:05:58 +0000 (15:05 -0700)] 
 GIRAPH-1118 - Giraph-gora and Giraph-rexster test cases fail in release-1.2

Test Plan:
mvn clean verify -Phadoop_facebook
mvn clean install -Phadoop_1
mvn clean install -Phadoop_2

Reviewers: majakabiljo, dionysis.logothetis, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D64683

6 years ago[GIRAPH-1117] Provide a flexible way to decide whether to create vertex when it is...
Sergey Edunov [Thu, 29 Sep 2016 23:54:18 +0000 (16:54 -0700)] 
[GIRAPH-1117] Provide a flexible way to decide whether to create vertex when it is not present in the input

Test Plan: run hello pagerank with this feature on and off

Reviewers: majakabiljo, maja.kabiljo, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D64485

6 years agoGIRAPH-1094 remove hbase1 from distribution for hadoop_1
Sergey Edunov [Wed, 21 Sep 2016 21:47:10 +0000 (14:47 -0700)] 
GIRAPH-1094 remove hbase1 from distribution for hadoop_1

Summary: Missed that part in the last diff.

Test Plan:
mvn clean package -Phadoop_2 -fae
then checked that giraph-hbase.jar is in the distribution

mvn clean package -Phadoop_1 -fae
then checked that giraph-hbase.jar is not in the distribution

Reviewers: maja.kabiljo, majakabiljo, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D64203

6 years agoGIRAPH-1094 Remove hbase from hadoop_1
Sergey Edunov [Wed, 21 Sep 2016 18:04:34 +0000 (11:04 -0700)] 
GIRAPH-1094 Remove hbase from hadoop_1

Summary: Hadoop_1 and current versions of hbase are incompatible. Removing support for HBASE from Hadoop_1 profile

Test Plan: mvn clean package -Phadoop_1 -fae

Reviewers: majakabiljo, maja.kabiljo, dionysis.logothetis

Reviewed By: maja.kabiljo, dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D64197

6 years agoGIRAPH-1114: Expose StatusReporter from workers in blocks framework
Maja Kabiljo [Wed, 14 Sep 2016 23:35:22 +0000 (16:35 -0700)] 
GIRAPH-1114: Expose StatusReporter from workers in blocks framework

Summary: Sometimes we need to call progress or update status from workers, expose this functionality

Test Plan: verify

Differential Revision: https://reviews.facebook.net/D63999

6 years agoGIRAPH-1115: Move UncaughtExceptionHandler setup to GraphTaskManager
Maja Kabiljo [Mon, 19 Sep 2016 19:26:49 +0000 (12:26 -0700)] 
GIRAPH-1115: Move UncaughtExceptionHandler setup to GraphTaskManager

Test Plan: Ran a job which isn't using GraphMapper and verified exception handler was set properly

Differential Revision: https://reviews.facebook.net/D64113

6 years agoGIRAPH-1111 - FileOutputFormat#setOutputPath is not always available
Sergey Edunov [Wed, 14 Sep 2016 17:20:25 +0000 (10:20 -0700)] 
GIRAPH-1111 - FileOutputFormat#setOutputPath is not always available

Test Plan:
mvn clean install
+ run a few jobs

Reviewers: majakabiljo, dionysis.logothetis, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D63837

6 years agofaster maps
spupyrev [Wed, 31 Aug 2016 00:32:13 +0000 (17:32 -0700)] 
faster maps

Summary:
The idea is to replace HashMap<LongWritable, V> to Long2ObjectOpenHashMap<V> (and Map<Int...> to Int2Object...)
This will save space and speed up some applications.

I changed the type of such a map in TestGraph.java, which gives up to 2x speed up on an
example of page rank computation (see comment below)

JIRA: https://issues.apache.org/jira/browse/GIRAPH-1049

Test Plan: TestBasicCollections.java contain some tests

Reviewers: sergey.edunov, maja.kabiljo, dionysis.logothetis, heslami, ikabiljo

Reviewed By: heslami

Differential Revision: https://reviews.facebook.net/D55587

6 years agoGIRAPH-1108: Allow measuring time spent doing GC in some interval
Maja Kabiljo [Fri, 26 Aug 2016 20:51:37 +0000 (13:51 -0700)] 
GIRAPH-1108: Allow measuring time spent doing GC in some interval

Summary: Sometimes when things are slow, we want to know whether it's because of GC or not. Keep track of last k GC pauses and a way to check how much time since some timestamp was spent doing GC.

Test Plan: Ran a job which periodically prints stats from this and manually verified based on GC logs that it's measuring it correctly

Differential Revision: https://reviews.facebook.net/D62727

6 years agoOut-of-core is logging too aggressively
Tyler Serdar Bulut [Tue, 30 Aug 2016 18:31:13 +0000 (13:31 -0500)] 
Out-of-core is logging too aggressively

Summary:
Example aggressive logging at INFO level:

INFO    <datestamp> [ooc-io-0] org.apache.giraph.ooc.policy.ThresholdBasedOracle  - getNextIOActions: usedMemoryFraction = 0.79
INFO    <datestamp> [ooc-io-0] org.apache.giraph.ooc.OutOfCoreIOCallable  - call: thread 0's next IO command is: LoadPartitionIOCommand: (partitionId = 4676, superstep = 0)
INFO    <datestamp> [ooc-io-0] org.apache.giraph.ooc.OutOfCoreIOCallable  - call: thread 0's command LoadPartitionIOCommand: (partitionId = 4676, superstep = 0) completed: bytes= 0, duration=0, bandwidth=NaN, bandwidth (excluding GC time)=NaN

Test Plan: mvn clean verify -P hadoop_facebook

Reviewers: majakabiljo, maja.kabiljo, sergey.edunov, heslami

Reviewed By: heslami

Subscribers: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D62853

6 years agoGIRAPH-1103: Another try to fix jobs getting stuck after channel failure
Maja Kabiljo [Mon, 8 Aug 2016 18:13:35 +0000 (11:13 -0700)] 
GIRAPH-1103: Another try to fix jobs getting stuck after channel failure

Summary:
With GIRAPH-1087 we see jobs stuck after channel failure less often, but it still happens. There are several additional issues I found: requests failing to send at the first place so they never get retried, callbacks for channel failures not being triggered always.
Added a thread which will periodically check on open requests even when we are not waiting on all open requests (since in many places we don't), remove the check that request wass ent when retrying it, added some thread utils while at it.

Test Plan: Before the change, failure rate of a particular job was about 1 in 50. Had over 200 successful runs with this change.

Differential Revision: https://reviews.facebook.net/D61719

6 years agoGIRAPH-1107: Allow observers to access job counters
Maja Kabiljo [Tue, 23 Aug 2016 18:52:26 +0000 (11:52 -0700)] 
GIRAPH-1107: Allow observers to access job counters

Summary: From mapper/master/worker observer we might want to update some job counters for stats. For that we should allow observers to access job context.

Test Plan: Ran a job which accesses counters from WorkerObserver

Reviewers: sergey.edunov

Reviewed By: sergey.edunov

Differential Revision: https://reviews.facebook.net/D62391

6 years agoGIRAPH-1105: Fix number of open requests in FacebookConfiguration
Maja Kabiljo [Fri, 12 Aug 2016 21:57:53 +0000 (14:57 -0700)] 
GIRAPH-1105: Fix number of open requests in FacebookConfiguration

Test Plan: This was significantly better in some experiments, but we can investigate more in the future

Differential Revision: https://reviews.facebook.net/D62019

6 years agoGIRAPH-1104: NegativeArraySize exception in BigDataOutput
Maja Kabiljo [Wed, 10 Aug 2016 19:56:19 +0000 (12:56 -0700)] 
GIRAPH-1104: NegativeArraySize exception in BigDataOutput

Summary:
BigDataIO is not properly handling large byte[] being written to it. Chunk them up when needed to respect the max single data output size.
With D61791 job was still failing with the same exception.

Test Plan: The job which was failing because of large edges now works, added a test

Differential Revision: https://reviews.facebook.net/D61839

6 years agoFixing Giraph pom.xml to reflect new project committers
Hassan Eslami [Tue, 26 Jul 2016 18:27:24 +0000 (11:27 -0700)] 
Fixing Giraph pom.xml to reflect new project committers

Summary:
Fixed the list of project committers. Please review your information and let me know if I should change anything.

This will be the first diff that I'll be committing all by myself, more like a test to see my username is gone through Apache's internal :-)

Test Plan: N/A

Reviewers: ikabiljo, pavanka, avery.ching, sergey.edunov

Reviewed By: sergey.edunov

Differential Revision: https://reviews.facebook.net/D61197

6 years agoGIRAPH-1098 Job may get stuck if zookeeper port fixed and is in use
Sergey Edunov [Wed, 20 Jul 2016 17:20:36 +0000 (10:20 -0700)] 
GIRAPH-1098 Job may get stuck if zookeeper port fixed and is in use

Test Plan: mvn clean verify -Phadoop_facebook

Reviewers: majakabiljo, dionysis.logothetis, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D60945

6 years agoGIRAPH-1087: Retry requests after channel failure
Maja Kabiljo [Tue, 12 Jul 2016 17:27:47 +0000 (10:27 -0700)] 
GIRAPH-1087: Retry requests after channel failure

Summary: We currently don't have a callback to retry requests after channel failure, and would either wait for request timeout or not retrying request at all at places where we don't wait for open requests.

Test Plan: Hard to reproduce the issue (ran many jobs but was unable to), we'll see if the problem happens again in prod with this change.

Differential Revision: https://reviews.facebook.net/D60675

6 years agoGIRAPH-1097 Fix TestOutOfCore.testOutOfCoreLocalDiskAccessor
Sergey Edunov [Tue, 19 Jul 2016 00:30:04 +0000 (17:30 -0700)] 
GIRAPH-1097 Fix TestOutOfCore.testOutOfCoreLocalDiskAccessor

Summary:
On my laptop it failed because of an NPE in WorkerSuperstepMetrics.
I tracked it down and found that it is triggered from the branch of code that prints out metrics. We don't normally print out metrics in unit tests, so I'd expect this feature doesn't exist or not functional in hadoop_1. I'll try to disable it, to see how jenkins reacts.

Test Plan:  mvn test -pl giraph-examples -am -Dtest=TestOutOfCore -DfailIfNoTests=false -Phadoop_1

Reviewers: maja.kabiljo, dionysis.logothetis, heslami

Reviewed By: heslami

Differential Revision: https://reviews.facebook.net/D60873

6 years ago[GIRAPH-1095] Performance regression after GIRAPH-1068
Sergey Edunov [Fri, 15 Jul 2016 21:22:59 +0000 (14:22 -0700)] 
[GIRAPH-1095] Performance regression after GIRAPH-1068

Summary: Need to pass some missing parameters to zookeeper

Test Plan: run a few jobs

Reviewers: dionysis.logothetis, heslami, majakabiljo, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D60831

6 years agoGIRAPH-1092 TestCollections.testLargeBasicList fails with OOM
Sergey Edunov [Wed, 13 Jul 2016 21:38:02 +0000 (14:38 -0700)] 
GIRAPH-1092 TestCollections.testLargeBasicList fails with OOM

Summary: This test case requires too much memory to run in Jenkins. Talked to Sergey Pupyrev and we decided to disable it.

Test Plan: none

Reviewers: majakabiljo, maja.kabiljo, spupyrev

Reviewed By: spupyrev

Differential Revision: https://reviews.facebook.net/D60753

6 years ago[GIRAPH-1091] Fix SimpleRangePartitionFactoryTest
Maja Kabiljo [Wed, 13 Jul 2016 18:05:48 +0000 (11:05 -0700)] 
[GIRAPH-1091] Fix SimpleRangePartitionFactoryTest

Summary: SimpleRangePartitionFactoryTest relied on old logic for calculating number of partitions and got broken with GIRAPH-1082.

Test Plan: Ran the test

Differential Revision: https://reviews.facebook.net/D60747

6 years agoGIRAPH-1086: Use pool of byte arrays with InMemoryDataAccessor
Maja Kabiljo [Mon, 11 Jul 2016 18:07:18 +0000 (11:07 -0700)] 
GIRAPH-1086: Use pool of byte arrays with InMemoryDataAccessor

Summary: Have a pool of byte arrays with InMemoryDataAccessor, to save on byte array creation and initialization.

Test Plan: Improved performance of a job using InMemoryDataAccessor

Differential Revision: https://reviews.facebook.net/D60621

6 years ago[GIRAPH-1089] Fix a bug in out-of-core infrastructure
Hassan Eslami [Tue, 12 Jul 2016 18:33:38 +0000 (11:33 -0700)] 
[GIRAPH-1089] Fix a bug in out-of-core infrastructure

Summary: This diff fixes a bug in out-of-core infrastructure that caused user requirement (max number of partitions in memory) for fixed out-of-core strategy get violated. The cause of the problems was the un-clear definition of in-memory partitions. In this diff, we distinguish the partitions that are entirely in memory from those that are partially in memory.

Test Plan:
mvn clean verify

Reviewers: dionysis.logothetis, sergey.edunov, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D60573

6 years agoGIRAPH-1085: Add InMemoryDataAccessor
Maja Kabiljo [Wed, 6 Jul 2016 21:57:33 +0000 (14:57 -0700)] 
GIRAPH-1085: Add InMemoryDataAccessor

Summary: When we deal with graphs which have a lot of vertices with very little total data associated with them (values + edges) we start experiencing memory problems because of too many objects created, since every vertex has multiple objects associated with it. To solve this problem, we should have a serialized partition representation (current ByteArrayPartition just keeps byte[] per vertex, not per partition). We can leverage the out-of-core infrastructure and just add data accessor which won't be backed by disk but in memory buffers.

Test Plan: Successfully ran a job which was failing without this.

Differential Revision: https://reviews.facebook.net/D60435

6 years agoGIRAPH-1082: Remove limit on the number of partitions
Maja Kabiljo [Fri, 1 Jul 2016 14:39:25 +0000 (07:39 -0700)] 
GIRAPH-1082: Remove limit on the number of partitions

Summary: Currently we have a limit on how many partitions we can have because we write all partition information to Zookeeper. We can instead send this information in requests and remove the hard limit.

Test Plan: Ran pagerank for 100 iterations with 500k partitions.

Differential Revision: https://reviews.facebook.net/D60267

6 years agoGIRAPH-1083: Make sure we fail after exception in ooc-io thread happens
Maja Kabiljo [Fri, 1 Jul 2016 20:26:50 +0000 (13:26 -0700)] 
GIRAPH-1083: Make sure we fail after exception in ooc-io thread happens

Summary: Currently if some exception happens in ooc-io thread the job is left running for long time after the exception. We should make sure we fail early.

Test Plan: Ran a job with ooc on where I simulated the failure, without change job hangs for a long time, with the change it fails right after the exception happens, and logs it to command line.

Differential Revision: https://reviews.facebook.net/D60291

6 years agoGIRAPH-1080: Add FacebookConfiguration
Maja Kabiljo [Tue, 28 Jun 2016 20:14:32 +0000 (13:14 -0700)] 
GIRAPH-1080: Add FacebookConfiguration

Summary: Just copied from internal

Test Plan: verify

Differential Revision: https://reviews.facebook.net/D60135

6 years agoGIRAPH-1081: Fix a bug in internal out-of-core infra: multithreaded accesses to buffers
Hassan Eslami [Wed, 29 Jun 2016 01:43:18 +0000 (18:43 -0700)] 
GIRAPH-1081: Fix a bug in internal out-of-core infra: multithreaded accesses to buffers

Summary: The multi-threaded accesses to raw data buffers in `DiskBackedDataStore` is overlooked, violating assumption on properly partitioning data to different IO threads.

Test Plan: mvn clean verify

Reviewers: dionysis.logothetis, sergey.edunov, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D60147

6 years agoGIRAPH-1079: Add triangle counting example
Maja Kabiljo [Mon, 27 Jun 2016 17:56:02 +0000 (10:56 -0700)] 
GIRAPH-1079: Add triangle counting example

Summary: Just moved from internal

Test Plan: mvn verify

Differential Revision: https://reviews.facebook.net/D60057

6 years agoDecouple out-of-core persistence infrastructure from out-of-core computation
Hassan Eslami [Mon, 27 Jun 2016 21:13:29 +0000 (14:13 -0700)] 
Decouple out-of-core persistence infrastructure from out-of-core computation

Summary:
This diff proposes the following:
  - The persistence layer is decoupled from out-of-core infrastructure. This way one can simply implement different data accessors for various persistence resources. The persistence layer for reading/writing from/to local file system is implemented in this diff.
  - Previously, out-of-core data were indexed by string literals. This has changed for more flexibility. Now, data are accessible by a more flexible data indexing mechanism, in which a chain of indices are used to address a particular data.
  - With different implementations of data accessor, now there may be more emphasis on having more IO threads. It is important that these IO threads are load-balanced. In this diff, the mechanism to assign partitions to IO threads has changed.
  - All the coolness of Kryo's (de)serialization and RandomAccessFile (in D59277) is included in this diff, all at one place.

Test Plan:
mvn clean verify
out-of-core snapshot test passes

Reviewers: dionysis.logothetis, maja.kabiljo, sergey.edunov

Differential Revision: https://reviews.facebook.net/D59691

6 years agoGIRAPH-1078 createZooKeeperServerList should use task instead of port number
Sergey Edunov [Fri, 24 Jun 2016 17:15:30 +0000 (10:15 -0700)] 
GIRAPH-1078 createZooKeeperServerList should use task instead of port number

Summary: createZooKeeperServerList doesn't have a port yet, as we haven't started zookeeper. What we actually have is the task number. Port will be later set by the master.

Test Plan: run a few jobs.

Reviewers: maja.kabiljo, majakabiljo, heslami, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D59961

6 years agoGIRAPH-1062: Page rank in Blocks&Pieces
Maja Kabiljo [Wed, 11 May 2016 23:09:47 +0000 (16:09 -0700)] 
GIRAPH-1062: Page rank in Blocks&Pieces

Summary: We have some examples of pagerank, but they all have some things missing. Make one which will take sinks into account, have convergence checks, support both weighted and unweighted graphs.

Test Plan: mvn clean verify -P hadoop_facebook. We use this app internally

Differential Revision: https://reviews.facebook.net/D58059

6 years agoGIRAPH-1077: Jobs getting stuck after channel failure
Maja Kabiljo [Tue, 21 Jun 2016 18:54:40 +0000 (11:54 -0700)] 
GIRAPH-1077: Jobs getting stuck after channel failure

Summary: When a channel fails currently we just log the failure. Since we don't wait on open requests from every place, checking requests doesn't get called always, and we've seen issues with jobs staying stuck, for example during the input stage when request for split to read from worker to master fails. When we know that channel failed, we should try to resend the requests from that channel.

Test Plan: Ran a job multiple times until I got failure of channel between master and worker to happen, without this change job would get stuck but with it it ran successfully.

Differential Revision: https://reviews.facebook.net/D59895

6 years agoGIRAPH-1076 Race condition in FileTxnSnapLog
Sergey Edunov [Tue, 21 Jun 2016 17:14:34 +0000 (10:14 -0700)] 
GIRAPH-1076 Race condition in FileTxnSnapLog

Summary:
org.apache.zookeeper.server.persistence.FileTxnSnapLog has a potential for race condition:

    if (!this.dataDir.exists()) {
        if (!this.dataDir.mkdirs()) {
               throw new IOException("Unable to create data directory " + this.dataDir);
        }
    }

If two threads try to create FileTxnSnapLog simultaneously it can trigger IOException.
We saw this happening in Giraph where FileTxnSnapLog is being created by PurgeTask created by DatadirCleanupManager and by InProcessZooKeeperRunner#runFromConfig.
Until and if ever, the zookeeper code is fixed, we need to make sure zookeeper starts first and only then starts PurgeTask.

Test Plan: run a few jobs and mvn clean verify

Reviewers: majakabiljo, dionysis.logothetis, heslami, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D59883

6 years agoImprove out-of-core metrics
Hassan Eslami [Mon, 20 Jun 2016 19:23:42 +0000 (12:23 -0700)] 
Improve out-of-core metrics

Summary: For the metric showing the percentage of the graph in memory it makes more sense to show the lowest fraction of the graph that was in memory during a superstep. Basically, a user is more interested to see how bad was the out-of-core execution, and how many more machines he/she needs to use to run the job entirely in memory.

Test Plan:
mvn clean verify
visual, looking at Hadoop metric and per-worker metric

Reviewers: sergey.edunov, dionysis.logothetis, maja.kabiljo

Reviewed By: dionysis.logothetis, maja.kabiljo

Differential Revision: https://reviews.facebook.net/D59451

6 years agoGIRAPH-1075 checkstyle
Maja Kabiljo [Mon, 20 Jun 2016 17:24:26 +0000 (10:24 -0700)] 
GIRAPH-1075 checkstyle

Summary:

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:

6 years agoGIRAPH-1075: UnsafeByteArrayOutputStream silently writes long UTFs incorrectly
Maja Kabiljo [Fri, 17 Jun 2016 19:23:09 +0000 (12:23 -0700)] 
GIRAPH-1075: UnsafeByteArrayOutputStream silently writes long UTFs incorrectly

Summary: UnsafeByteArrayOutputStream.writeUTF was copied from DataOutputStream, but part which checks the length was missed out. When we try to write long strings they serialize without an issue, but when we try to deserialize them we get a wrong value back and don't read the same number of bytes. Make it fail like DataOutputStream instead.

Test Plan: Added a test

Differential Revision: https://reviews.facebook.net/D59817

6 years agoGIRAPH-1068 Make Zookeeper accept 0 as a port number and let it choose any available...
Sergey Edunov [Wed, 15 Jun 2016 21:50:44 +0000 (14:50 -0700)] 
GIRAPH-1068 Make Zookeeper accept 0 as a port number and let it choose any available free port

Summary:
We have a few use cases where having zookeeper bound to specific port is very inconvenient.
1) Unit tests that run in parallel.
2) Shared clusters where multiple giraph instances can run on the same machines.

In theory we don't need to know what port zookeeper will run on. In most cases we're fine with any port available.
Picking any available port is currently supported by the server socket, but is not supported in the code that parses zookeper configs (this code lives in zookeper).
We don't have to parse configs though, as we have a way to run zookeper in process. And in that case we can have a full control on how zookeeper is initialized.

For this task I want to allow 0 as a port number for zookeeper. Which will allow us to run zookeeper on any available port. And I will also remove "out of process" zookeeper, as it clearly provides no benefits to us.

Note: it will still be possible to run external zookeper, if you have it running somewhere as a service.

Note2: this change is intended to remove the functionality to support multiple zk servers running as a part of the Giraph job and only support a single zk server. If you want to run multiple zookeepers, you need to configure them separately and let Giraph use existing zookeper quorum

Test Plan:
tested a few things:
picking any available port
-Dgiraph.zkServerPort=0
using external zookeeper:
-Dgiraph.zkList="hadoopXXX.YYY.facebook.com:22181"
using specified port:
-Dgiraph.zkServerPort=22128

Reviewers: majakabiljo, maja.kabiljo, dionysis.logothetis, avery.ching, heslami

Reviewed By: avery.ching, heslami

Differential Revision: https://reviews.facebook.net/D59109

6 years agoGIRAPH-1070 Comparators in PartitionUtils can overflow
Sergey Edunov [Sat, 11 Jun 2016 00:41:02 +0000 (17:41 -0700)] 
GIRAPH-1070 Comparators in PartitionUtils can overflow

Test Plan: mvn clean verify

Reviewers: majakabiljo, maja.kabiljo, dionysis.logothetis, heslami

Reviewed By: heslami

Differential Revision: https://reviews.facebook.net/D59547

6 years agoGIRAPH-1069 Race condition in all *ConfOption classes
Sergey Edunov [Wed, 8 Jun 2016 22:01:40 +0000 (15:01 -0700)] 
GIRAPH-1069 Race condition in all *ConfOption classes

Summary:
*ConfOption classes, such as ClassConfOption, IntConfOption, FloatConfOption etc, call AllOtions.add(this) from their constructor. This call updates static list without any synchronization. Hence, if you create conf option classes in different threads you run into race condition.
The only reason we have AllOptions is to create documentation. It can be done with reflection instead. So, let's remove AllOtions.add(this) from all conf classes and implement reflection based approach in AllOptions

Test Plan:
mvn clean verify -Phadoop_facebook

checked that generated options.xml is the same as before

Reviewers: majakabiljo, dionysis.logothetis, heslami, maja.kabiljo

Reviewed By: heslami, maja.kabiljo

Differential Revision: https://reviews.facebook.net/D59331

6 years agoCleanup the old out-of-core message mechanism
Hassan Eslami [Tue, 31 May 2016 17:37:00 +0000 (10:37 -0700)] 
Cleanup the old out-of-core message mechanism

Summary: With the new out-of-core infrastructure, there is no need for the old version of message out-of-core. The old version of message out-of-core also interferes with the new mechanism. It seems that the old out-of-core message mechanism is not necessary anymore. This diff removes the old out-of-core messages and cleans up its implications on the rest of the code base.

Test Plan:
mvn clean verify
snapshot tests passes

Reviewers: maja.kabiljo, dionysis.logothetis, sergey.edunov

Differential Revision: https://reviews.facebook.net/D58701

6 years agoIntegrating out-of-core mechanism with credit-based flow-control and data generation...
Sergey Edunov [Fri, 20 May 2016 22:14:08 +0000 (15:14 -0700)] 
Integrating out-of-core mechanism with credit-based flow-control and data generation tethering

Summary: This diff integrates out-of-core infrastructure with credit-based flow control and adds the ability to tether the rate of data generation/processing. Data generation/processing rate is controlled by changing the number of active processing (input/compute) threads. This diff also implements a new (and more performant) adaptive out-of-core policy.

Test Plan:
mvn clean verify
all snapshot tests including ones with large data pass
Running adaptive out-of-core on large graph with very limited memory does not fail.
This diff should enable us to avoid *any* reasonable job to fail!

Reviewers: maja.kabiljo, sergey.edunov, avery.ching, dionysis.logothetis

Reviewed By: dionysis.logothetis

Subscribers: ramesh-muthusamy

Differential Revision: https://reviews.facebook.net/D55479

6 years agoGIRAPH-1063: Make primitive type generated fixed capacity min heaps
Maja Kabiljo [Tue, 17 May 2016 13:55:58 +0000 (06:55 -0700)] 
GIRAPH-1063: Make primitive type generated fixed capacity min heaps

Summary: It's often needed to get top k (key, value) pairs, but existing implementations deal with objects making them inefficient. Make one with primitive types. Most of the added code is generated.

Test Plan: Added tests, mvn verify passed

Differential Revision: https://reviews.facebook.net/D58299

6 years agoGIRAPH-1065: Allow extending JobProgressTrackerService
Maja Kabiljo [Wed, 18 May 2016 16:27:06 +0000 (09:27 -0700)] 
GIRAPH-1065: Allow extending JobProgressTrackerService

Summary: We might want to perform additional actions on events from JobProgressTrackerService. Allow overriding it and specifying another class to use.

Test Plan: Ran a job with custom JobProgressTrackerService and verify actions on it are called

Differential Revision: https://reviews.facebook.net/D58383

6 years agoGRIAPH-1064: Reconnect JobProgressTracker
Maja Kabiljo [Tue, 17 May 2016 19:22:19 +0000 (12:22 -0700)] 
GRIAPH-1064: Reconnect JobProgressTracker

Summary: When workers/master don't talk to JobProgressTracker it can disconnect and throw RejectedExecutionException - we should catch and retry on that exception too.

Test Plan: Ran a job where master would fail to talk to JobProgressTracker after a while without this change, with the change it worked

Differential Revision: https://reviews.facebook.net/D58323

6 years agoGIRAPH-1061: Add Connected Components block factory
Maja Kabiljo [Mon, 9 May 2016 23:47:13 +0000 (16:47 -0700)] 
GIRAPH-1061: Add Connected Components block factory

Summary: Add block factory for Connected Components to make it easy to run it.

Test Plan: Added a test, mvn clean verify

Differential Revision: https://reviews.facebook.net/D57951

6 years agoBlock API handle
Dionysios Logothetis [Tue, 10 May 2016 17:41:40 +0000 (10:41 -0700)] 
Block API handle

Summary:
- Some apps need a reference to the Block API objects (e.g. BlockOutputApi) before they are actually executed. See documentation of `BlockApiHandle` for more details.
- Also, made I `BlockWorkerApi` implement the `BlockOutputApi` as opposed to only the `BlockWorkerReceiveApi` so that output is possible inside the sender. too.

Test Plan:
- `mvn install`
- internal app that uses the api handle from the master and from the workers
- internal snapshot tests

Reviewers: maja.kabiljo, sergey.edunov, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D57939

6 years agoGIRAPH-1060: Add combiner to connected components
Maja Kabiljo [Mon, 9 May 2016 18:07:48 +0000 (11:07 -0700)] 
GIRAPH-1060: Add combiner to connected components

Summary: Connected components should use combiner to make it more efficient and require less memory. A few additional cleanups while at it.

Test Plan: mvn clean verify

Differential Revision: https://reviews.facebook.net/D57879

6 years agoGIRAPH-1058: Fix connection retry logic
Maja Kabiljo [Fri, 29 Apr 2016 20:23:29 +0000 (13:23 -0700)] 
GIRAPH-1058: Fix connection retry logic

Summary: Currently when we fail to connect to a channel we retry immediately and that retry most often fails. Add a short wait between retries, and improve the check for whether the channel connected successfully.

Test Plan: Ran multiple jobs which were often failing before the fix, with fix they worked

Differential Revision: https://reviews.facebook.net/D57447

6 years agofixing cases when there is no conf
spupyrev [Fri, 29 Apr 2016 16:47:29 +0000 (09:47 -0700)] 
fixing cases when there is no conf

Summary:
conf is not needed anymore -- the question is why LongDiffNullArrayEdges extends ConfigurableOutEdges:)
I'd prefer having LongDiffNullArray + CompressedOutEdges instead

Test Plan: test

Reviewers: ikabiljo, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D56505

6 years agoDecoupling NettyClient from control flow policy
Sergey Edunov [Tue, 26 Apr 2016 22:34:42 +0000 (15:34 -0700)] 
Decoupling NettyClient from control flow policy

Summary: This diff refactors NettyClient by decoupling flow control mechanism from NettyClient. Through the refactoring process, some performance and correctness bugs have been found due to the better readability of the refactored code.

Test Plan:
mvn clean verify
Tested large jobs and the output was correct
Tested large jobs and it did not have any performance degradation for codes using the old mechanism

Reviewers: maja.kabiljo, sergey.edunov, avery.ching, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D56367

6 years agoSetting auto-read in Netty to false
Hassan Eslami [Tue, 26 Apr 2016 17:51:13 +0000 (10:51 -0700)] 
Setting auto-read in Netty to false

Summary: By default, auto-read flag is set to true in Netty. This means Netty proactively read requests as they become available to a worker. However, this behavior sometime causes the off-heap memory to increase continuously. This happens specifically in presence of a spike in the amount of received requests. In that situation, the processing/handling rate of incoming requests may be less than the request receipt rate leading to high-memory kill (CGroup kill or OOM). With auto-read flag set to false, we read and process requests one by one and (hopefully/presumably) letting the transport layer do the flow control (i.e. dropping packets or reducing congestion window of TCP).

Test Plan:
mvn clean verify
PageRank-like application at large scale fails with auto-read set to true, and successfully runs with auto-read set to false.
**DO NOT ACCEPT THIS DIFF.** We should do more testing and prove it is reliable.

Reviewers: dionysis.logothetis, sergey.edunov, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D57213

6 years agoGIRAPH-1055: Javadoc fails build with Java 8
Avery Ching [Fri, 22 Apr 2016 23:07:22 +0000 (16:07 -0700)] 
GIRAPH-1055: Javadoc fails build with Java 8

Summary:
Java 8 javadoc has stricter checking, which results in mvn javadoc:javadoc failing:
Example:
100 errors
200 warnings
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Giraph Parent ............................... SUCCESS [ 1.196 s]
[INFO] Apache Giraph Core ................................. FAILURE [ 9.583 s]

Test Plan:
[INFO] --- maven-javadoc-plugin:2.9:javadoc (default-cli) @ giraph-dist ---
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Giraph Parent ............................... SUCCESS [  1.093 s]
[INFO] Apache Giraph Core ................................. SUCCESS [ 10.934 s]
[INFO] Apache Giraph Blocks Framework ..................... SUCCESS [  3.245 s]
[INFO] Apache Giraph Examples ............................. SUCCESS [  3.841 s]
[INFO] Apache Giraph Accumulo I/O ......................... SUCCESS [  2.048 s]
[INFO] Apache Giraph HBase I/O ............................ SUCCESS [  1.132 s]
[INFO] Apache Giraph HCatalog I/O ......................... SUCCESS [  3.053 s]
[INFO] Apache Giraph Gora I/O ............................. SUCCESS [  3.500 s]
[INFO] Apache Giraph Rexster I/O .......................... SUCCESS [  0.091 s]
[INFO] Apache Giraph Rexster Kibble ....................... SUCCESS [  1.276 s]
[INFO] Apache Giraph Rexster I/O Formats .................. SUCCESS [  3.193 s]
[INFO] Apache Giraph Distribution ......................... SUCCESS [  2.074 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 35.880 s
[INFO] Finished at: 2016-04-22T16:09:04-07:00
[INFO] Final Memory: 62M/1131M
[INFO] ------------------------------------------------------------------------

Reviewers: maja.kabiljo, sergey.edunov

Reviewed By: sergey.edunov

Differential Revision: https://reviews.facebook.net/D57105

6 years agoGIRAPH-1054: Separate ThriftService from JobProgressTrackerService on the client
Avery Ching [Wed, 13 Apr 2016 23:03:40 +0000 (16:03 -0700)] 
GIRAPH-1054: Separate ThriftService from JobProgressTrackerService on the client

Summary:
* Moves the job tracker conf options into the GiraphConstants
* Factors out the static GiraphJob#startThriftServer and GiraphJob#stopThriftServer methods from createJobProgressServer
* Allows adding other Thrift services to the ThriftServer

Test Plan: Tried on a cluster

Reviewers: maja.kabiljo, sergey.edunov

Reviewed By: sergey.edunov

Subscribers: sergey.edunov

Differential Revision: https://reviews.facebook.net/D57087

6 years ago[GIRAPH-1053] Log exceptions to command line
Maja Kabiljo [Tue, 19 Apr 2016 00:34:36 +0000 (17:34 -0700)] 
[GIRAPH-1053] Log exceptions to command line

Summary: When we know an exception occurred, log it to command line to make it easier for people running jobs to see what the issue was.

Test Plan: Ran two jobs, one with error in input one with error in compute, verified exception is printed to command line. Also ran a normal job and verified it didn't print anything new to command line

Differential Revision: https://reviews.facebook.net/D56931

6 years ago[GIRAPH-1041] Generate primitive type specific code
Igor Kabiljo [Tue, 15 Mar 2016 21:41:35 +0000 (14:41 -0700)] 
[GIRAPH-1041] Generate primitive type specific code

Summary:
- Use FreeMarker library to generate primitive type specific code.
Initially generating three sets of files:
{TYPE}Consumer, {TYPE}TypeOps and W{TYPE}ArrayList

Right now generation happens manually, and generated files are being committed.
In the future we can move those to a separate project, and have them generated
when maven is compiling and deploying.

Additionally to generation change, BasicArrayList is renamed to WArrayList and
directly extends fastutil implementation, to now serves two purposes:
- generic handling of efficient arrays through TypeOps
- extended fastutil class - to make it writtable, to add useful Java8 methods,
  or anything else we can think of. Since we are just extending it, and there is
  no efficiency penalty, we can always use WLongArrayList instead of LongArrayList.

There is additional WReusableLongArrayList, which when readFields is called,
doesn't size it to exact size, but reuses the old length.

Test Plan:
mvn clean install

There are no changes in logic in this diff. Will send a small separate diff
with some examples of what is now simpler.

Reviewers: sergey.edunov, dionysis.logothetis, spupyrev, maja.kabiljo

Differential Revision: https://reviews.facebook.net/D52515

6 years agoGIRAPH-1052: Fix makeSymmetricUnweighted
Maja Kabiljo [Fri, 8 Apr 2016 21:33:50 +0000 (14:33 -0700)] 
GIRAPH-1052: Fix makeSymmetricUnweighted

Summary: PrepareGraphPieces.makeSymmetricUnweighted is currently very inefficient for skewed degree graphs, because it reuses set objects based on the number of in edges, but also adds all out edges to the set, so sets which should be small can become huge. Since incoming ids are unique anyways, we don't need to add them to the set.

Test Plan: Ran a job without and with the change, verified result is the same but it's much faster now

Reviewers: ikabiljo

Differential Revision: https://reviews.facebook.net/D56481

6 years agoGIRAPH-1050: Add MapperObserver
Maja Kabiljo [Thu, 7 Apr 2016 16:40:27 +0000 (09:40 -0700)] 
GIRAPH-1050: Add MapperObserver

Summary: Add MapperObserver which will be called once per mapper before anything else happens.

Test Plan: Ran a job with MapperObserver set, verified it's called at the right time

Differential Revision: https://reviews.facebook.net/D56373

6 years agoGIRAPH-1046: Add a way to synchronize full GC calls across workers
Maja Kabiljo [Thu, 10 Mar 2016 22:30:11 +0000 (14:30 -0800)] 
GIRAPH-1046: Add a way to synchronize full GC calls across workers

Summary: In applications which use memory more heavily, we can see full GC pauses happening on different workers at different times, and each of these is causing some delay because other workers are often waiting on something from the worker in GC (closing open requests, finishing superstep, etc). Having a way to coordinate when full GCs are called could help them have less effect on job performance.

Test Plan: Ran some memory heavy jobs where I observed overall better performance from using this feature.

Differential Revision: https://reviews.facebook.net/D55347

6 years agoImprove flow control on sender side (pre-requisite for credit-based flow control)
Hassan Eslami [Mon, 4 Apr 2016 23:29:36 +0000 (16:29 -0700)] 
Improve flow control on sender side (pre-requisite for credit-based flow control)

Summary: Currently, a sender worker will keep all open requests (and optionally up to a certain number of total open requests) in its own memory. This behavior may cause high memory usage in sender side. Also, since messages can arrive to a worker at an arbitrary rate, receiver may not have the ability to handle all incoming messages, hence we may see a large memory footprint in receiver as well. This diff addresses the problem by limiting the number of open requests per worker in sender side. Also, it provides a cache of unsent requests on sender in case the sender already sent enough messages to another worker but has not received any response back.

Test Plan: mvn clean verify

Reviewers: avery.ching, sergey.edunov, maja.kabiljo, dionysis.logothetis

Reviewed By: dionysis.logothetis

Subscribers: Alessio

Differential Revision: https://reviews.facebook.net/D43797

6 years agounsafe readers for varints
spupyrev [Fri, 1 Apr 2016 18:06:38 +0000 (11:06 -0700)] 
unsafe readers for varints

Summary:
Varint encdoing (and hence, LongDiffNullArrayEdges) can be much faster if using UnsafeByteInput/Output. In fact, the speed of iterating over LongDiffNullArrayEdges is almost as fast as iterating over LongNullArrayEdges after the change. This difference is less than a few percent for jobs that require a lot of edge iterators, while it is significant (over 20%) without the change.

JIRA: https://issues.apache.org/jira/browse/GIRAPH-1049

Test Plan: mvn clean install

Reviewers: sergey.edunov, maja.kabiljo, dionysis.logothetis, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D56169

6 years agofaster varint
spupyrev [Wed, 23 Mar 2016 17:34:05 +0000 (10:34 -0700)] 
faster varint

Summary:
Varint is improved in two ways:
- faster readLong and readInt
- making sure that negative numbers can be encoded

JIRA: https://issues.apache.org/jira/browse/GIRAPH-1049

Test Plan: TestVarint.java

Reviewers: dionysis.logothetis, maja.kabiljo, sergey.edunov, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D55755

6 years ago[GIRAPH-1041] Generate primitive type specific code for functions
Igor Kabiljo [Tue, 15 Mar 2016 21:40:52 +0000 (14:40 -0700)] 
[GIRAPH-1041] Generate primitive type specific code for functions

Summary:
- Use FreeMarker library to generate primitive type specific code.
Initially generating two sets of files:
{TYPE}Consumer, Obj2{TYPE}Function

Right now generation happens manually, and generated files are being committed.
In the future we can move those to a separate project, and have them generated
when maven is compiling and deploying.

Splitting of D52515 into reviewable pieces

Test Plan:
mvn clean install

There are no changes in logic in this diff.

Reviewers: spupyrev, sergey.edunov, dionysis.logothetis, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D55527

6 years agoIncrease info-logging while waiting for straggler workers
Tyler Serdar Bulut [Tue, 15 Mar 2016 19:10:36 +0000 (12:10 -0700)] 
Increase info-logging while waiting for straggler workers

Summary:
Keep logging info messages while waiting for task-time-out

Test Plan:
All unit tests are passing.
Manual tests to ensure desired functionality is observed.

Reviewers: maja.kabiljo

Subscribers: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D55467

6 years agoGIRAPH-1039: Fix stopping jmap histo thread
Maja Kabiljo [Tue, 3 Nov 2015 19:01:22 +0000 (11:01 -0800)] 
GIRAPH-1039: Fix stopping jmap histo thread

Summary: Currently if jmap histo frequency is set to long period we end up stuck in the end of the job for a long time waiting on jmap histo thread

Test Plan: Ran a job with long jmap frequency - verified it gets stuck without this change and finishes fine with it

Differential Revision: https://reviews.facebook.net/D50085

6 years agounsafe byte readers/writers
spupyrev [Tue, 15 Mar 2016 23:47:15 +0000 (16:47 -0700)] 
unsafe byte readers/writers

Summary: using unsafe readers/writers

Test Plan:
tested on PageRank app, and Fanout computation. In both cases, there is a ~20% speedup

JIRA: https://issues.apache.org/jira/browse/GIRAPH-1049

Reviewers: sergey.edunov, maja.kabiljo, dionysis.logothetis, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D55509

6 years agoNew out-of-core infrastructure (first patch including fixed out-of-core mechanism)
Sergey Edunov [Tue, 15 Mar 2016 17:40:20 +0000 (10:40 -0700)] 
New out-of-core infrastructure (first patch including fixed out-of-core mechanism)

Summary: This is a re-design of out-of-core mechanism. The new implementation allows for much more intelligent partition scheduling and IO.

Test Plan:
mvn clean verify

Reviewers: maja.kabiljo, sergey.edunov, avery.ching, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D54549

6 years ago[easy] Log aggregate times per piece at the end
Igor Kabiljo [Fri, 4 Mar 2016 20:33:35 +0000 (12:33 -0800)] 
[easy] Log aggregate times per piece at the end

Test Plan:
It prints:

  16/03/04 12:46:38 INFO internal.BlockMasterLogic: Time sums master:
     count    time %       time name
        34    37.50%   00:00:00 PageRankCheckConvergence
        34    62.50%   00:00:00 PageRankUpdate
        68             00:00:00 total

  16/03/04 12:46:38 INFO internal.BlockMasterLogic: Time sums worker:
     count    time %       time name
        33    50.00%   00:00:00 [receiver=PageRankCheckConvergence,sender=PageRankUpdate]
         1     0.82%   00:00:00 [receiver=PageRankCheckConvergence,sender=null]
        34    47.54%   00:00:00 [receiver=PageRankUpdate,sender=PageRankCheckConvergence]
         1     1.64%   00:00:00 [receiver=null,sender=PageRankUpdate]
        69             00:00:00 total

Reviewers: dionysis.logothetis, sergey.edunov, maja.kabiljo, spupyrev

Reviewed By: spupyrev

Differential Revision: https://reviews.facebook.net/D55113

6 years agoGIRAPH-1044. Update book info in the User Docs / Related Literature page of the site
Roman Shaposhnik [Mon, 7 Mar 2016 01:41:28 +0000 (17:41 -0800)] 
GIRAPH-1044. Update book info in the User Docs / Related Literature page of the site

6 years agoMaking threads in JobProgressService daemons
Sergey Edunov [Fri, 26 Feb 2016 19:48:08 +0000 (11:48 -0800)] 
Making threads in JobProgressService daemons

Summary: We noticed that sometimes job client doesn't finish because threads in JobProgressService are still running. Here we're making them daemons, so that if everything else is done, we will be able to finish application.

Test Plan: run test job

Reviewers: majakabiljo, dionysis.logothetis, maja.kabiljo

Reviewed By: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D53829

6 years agoAdd PrepareGraphPieces.isSymmetricBlock to check for a symmetric graph
Yuri Schimke [Thu, 25 Feb 2016 18:42:21 +0000 (10:42 -0800)] 
Add PrepareGraphPieces.isSymmetricBlock to check for a symmetric graph

Summary:
PrepareGraphPieces.isSymmetricBlock is a reusable factory function
for creating blocks that check if a graph is symmetric by
XOR reducing a preditable hash of the edges pairs (V1, V2)

Test Plan:
Unit Tests for all changed files, testing on demo graphs.
Will run a full test job.

Reviewers: spupyrev, dionysis.logothetis, ikabiljo, maja.kabiljo

Reviewed By: maja.kabiljo

Subscribers: maja.kabiljo

Differential Revision: https://reviews.facebook.net/D54411

6 years agoUse Partitions in LocalBlockRunner
Igor Kabiljo [Tue, 29 Dec 2015 23:14:32 +0000 (15:14 -0800)] 
Use Partitions in LocalBlockRunner

Summary:
Speed up LocalBlockRunner, by not operating on a TestGraph, but on vertices stored in partitions.
With it - deprecate old non-SimplePartitionerFactory way of specifying partitioning.
(and with it renamed SimplePartitionerFactory to old name GraphPartitionerFactory, and changing it to
 GraphPartitionerFactoryInterface)

Test Plan:
Run unit-test for speed:

  testEmptyIterationsSmallGraph
    6.5 -> 6.3
  testEmptyIterationsSyntheticGraphLowDegree()
    42.0 -> 13.8
  testEmptyIterationsSyntheticGraphHighDegree()
    3.6 -> 2.0
  testPageRankSyntheticGraphLowDegree()
    51.0 -> 47.2
  testPageRankSyntheticGraphHighDegree()
    20.3 -> 17.4

Reviewers: maja.kabiljo, sergey.edunov, dionysis.logothetis

Reviewed By: dionysis.logothetis

Differential Revision: https://reviews.facebook.net/D52425

6 years agoImplementation of DirectWritableSerializerCopyTest.copy()
Sergey Edunov [Fri, 15 Jan 2016 17:42:50 +0000 (09:42 -0800)] 
Implementation of DirectWritableSerializerCopyTest.copy()

Summary: Needed in certain application.

Test Plan:
  mvn clean install
  Also run an actual application.

Reviewers: maja.kabiljo, sergey.edunov, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D52149

6 years agoCorrection in interface documentation
Sergey Edunov [Tue, 1 Dec 2015 19:52:29 +0000 (11:52 -0800)] 
Correction in interface documentation

Summary: The description of the arguments in TypoOps.set() is reversed.

Test Plan: n/a

Reviewers: majakabiljo, sergey.edunov, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D43425

6 years agoAdded vldb publication
Sergey Edunov [Tue, 1 Dec 2015 19:50:19 +0000 (11:50 -0800)] 
Added vldb publication

Summary: Added new vldb paper in literature.xml.

Test Plan: n/a

Reviewers: avery.ching, sergey.edunov, maja.kabiljo, ikabiljo

Reviewed By: ikabiljo

Differential Revision: https://reviews.facebook.net/D45399

6 years agoMake IntSupplier extend Serializable
Sergey Edunov [Wed, 18 Nov 2015 19:58:11 +0000 (11:58 -0800)] 
Make IntSupplier extend Serializable

Summary: Lambdas with IntSuppliers don't get serialized.

Test Plan: n/a

Reviewers: ikabiljo, sergey.edunov

Reviewed By: sergey.edunov

Differential Revision: https://reviews.facebook.net/D50973