GIRAPH-1085: Add InMemoryDataAccessor
authorMaja Kabiljo <majakabiljo@fb.com>
Wed, 6 Jul 2016 21:57:33 +0000 (14:57 -0700)
committerMaja Kabiljo <majakabiljo@fb.com>
Mon, 11 Jul 2016 17:28:14 +0000 (10:28 -0700)
commitb51ecd27cccc520764c9ae53cabcb61d67d46d15
treeb286697cc5d45c8d326762416c1b3aaeda19f5df
parent28cbe037cf9299ed6a089cc78039d0a16d0116ce
GIRAPH-1085: Add InMemoryDataAccessor

Summary: When we deal with graphs which have a lot of vertices with very little total data associated with them (values + edges) we start experiencing memory problems because of too many objects created, since every vertex has multiple objects associated with it. To solve this problem, we should have a serialized partition representation (current ByteArrayPartition just keeps byte[] per vertex, not per partition). We can leverage the out-of-core infrastructure and just add data accessor which won't be backed by disk but in memory buffers.

Test Plan: Successfully ran a job which was failing without this.

Differential Revision: https://reviews.facebook.net/D60435
giraph-core/src/main/java/org/apache/giraph/ooc/data/DiskBackedDataStore.java
giraph-core/src/main/java/org/apache/giraph/ooc/persistence/InMemoryDataAccessor.java [new file with mode: 0644]
giraph-core/src/main/java/org/apache/giraph/ooc/persistence/OutOfCoreDataAccessor.java
giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataOutput.java