crunch.git
20 months agoMerge pull request #34 from noslowerdna/CRUNCH-698 master
Josh Wills [Tue, 2 Feb 2021 17:20:15 +0000 (09:20 -0800)] 
Merge pull request #34 from noslowerdna/CRUNCH-698

CRUNCH-698: Inclusion of local patch for AVRO-2944

20 months agoCRUNCH-698: Inclusion of local patch for AVRO-2944 34/head
Andrew Olson [Tue, 2 Feb 2021 16:38:53 +0000 (10:38 -0600)] 
CRUNCH-698: Inclusion of local patch for AVRO-2944

2 years agoMerge pull request #33 from ben-roling/CRUNCH-696
Josh Wills [Tue, 12 May 2020 18:41:24 +0000 (11:41 -0700)] 
Merge pull request #33 from ben-roling/CRUNCH-696

CRUNCH-696 update FormatBundle.readFields() compatibility

2 years agoUpdate crunch-core/src/main/java/org/apache/crunch/io/FormatBundle.java 33/head
Ben Roling [Tue, 12 May 2020 17:24:21 +0000 (12:24 -0500)] 
Update crunch-core/src/main/java/org/apache/crunch/io/FormatBundle.java

Co-authored-by: Andrew Olson <930946+noslowerdna@users.noreply.github.com>
2 years agoCRUNCH-696 update FormatBundle.readFields() compatibility
Ben Roling [Mon, 11 May 2020 17:02:19 +0000 (12:02 -0500)] 
CRUNCH-696 update FormatBundle.readFields() compatibility

Make FormatBundle.readFields() compatible with FormatBundles serialized
with an older version of Crunch.  This ensures jobs don't fail during an
upgrade to a cluster-provided Crunch dependency.  Without this some jobs
get submitted without the filesystem field in the serialized
FormatBundle and then encounter EOFException when the job gets
scheduled to run and uses the newer Crunch to deserialize the FormatBundle.

2 years agoCRUNCH-695: Fix NullPointerException in RegionLocationTable (#32)
Andrew Olson [Wed, 25 Mar 2020 16:14:37 +0000 (11:14 -0500)] 
CRUNCH-695: Fix NullPointerException in RegionLocationTable (#32)

Co-authored-by: Andrew Olson <aolson1@cerner.com>
2 years agoUpdate to kafka 2.2.1
Jan Van Besien [Fri, 21 Feb 2020 13:22:59 +0000 (14:22 +0100)] 
Update to kafka 2.2.1

Remove duplication of KafkaData, KafkaSource, KafkaInputFormat in order to only retain
the variants from org.apache.crunch.kafka.record that were already mostly compatible
with kafka 2.2.1. Fix some remaining incompatibilities, in particular related to reading
offset information from the broker.

Signed-off-by: Josh Wills <jwills@apache.org>
2 years agoMerge pull request #30 from apache/jwills_great_version_upgrade
Josh Wills [Thu, 16 Jan 2020 00:57:42 +0000 (16:57 -0800)] 
Merge pull request #30 from apache/jwills_great_version_upgrade

The great version upgrade PR

2 years agoFixup duplicate hadoop-hdfs dep jwills_great_version_upgrade 30/head
Josh Wills [Tue, 14 Jan 2020 23:14:17 +0000 (15:14 -0800)] 
Fixup duplicate hadoop-hdfs dep

2 years agoMerge pull request #31 from apache/CRUNCH-693
Josh Wills [Tue, 14 Jan 2020 23:08:40 +0000 (15:08 -0800)] 
Merge pull request #31 from apache/CRUNCH-693

CRUNCH-693: Make text parsing locale-independent

2 years agoCRUNCH-693: Make text parsing locale-independent CRUNCH-693 31/head
Gabriel Reid [Sat, 11 Jan 2020 15:35:20 +0000 (16:35 +0100)] 
CRUNCH-693: Make text parsing locale-independent

Standardize on US-based locale for number formatting (which is
backwards-compatible with historical behavior).

2 years agoFix unnecessary stubbings in the kafka test suite
Josh Wills [Fri, 15 Nov 2019 00:48:33 +0000 (16:48 -0800)] 
Fix unnecessary stubbings in the kafka test suite

2 years agooops should have fixed that one
Josh Wills [Fri, 15 Nov 2019 00:34:52 +0000 (16:34 -0800)] 
oops should have fixed that one

2 years agomostly kafka fixes; some jackson fixes
Josh Wills [Fri, 15 Nov 2019 00:22:49 +0000 (16:22 -0800)] 
mostly kafka fixes; some jackson fixes

2 years agoand more more fixes
Josh Wills [Thu, 14 Nov 2019 21:50:59 +0000 (13:50 -0800)] 
and more more fixes

2 years agoEver more fixes
Josh Wills [Thu, 14 Nov 2019 21:50:15 +0000 (13:50 -0800)] 
Ever more fixes

2 years agoWIP for modernizing Crunch deps
Josh Wills [Wed, 13 Nov 2019 23:14:51 +0000 (15:14 -0800)] 
WIP for modernizing Crunch deps

2 years ago[maven-release-plugin] prepare for next development iteration
Josh Wills [Tue, 8 Oct 2019 23:24:41 +0000 (16:24 -0700)] 
[maven-release-plugin] prepare for next development iteration

2 years ago[maven-release-plugin] prepare branch apache-crunch-1.0
Josh Wills [Tue, 8 Oct 2019 23:24:23 +0000 (16:24 -0700)] 
[maven-release-plugin] prepare branch apache-crunch-1.0

2 years agoWire up crunch-kafka to work as part of the distribution/release
Josh Wills [Tue, 8 Oct 2019 17:19:03 +0000 (10:19 -0700)] 
Wire up crunch-kafka to work as part of the distribution/release
process.

3 years agoCRUNCH-670: Make AvroPathPerKeyTarget work with the Spark Runtime.
Josh Wills [Mon, 23 Sep 2019 17:57:52 +0000 (10:57 -0700)] 
CRUNCH-670: Make AvroPathPerKeyTarget work with the Spark Runtime.

3 years agoMerge pull request #27 from noslowerdna/CRUNCH-688
Josh Wills [Fri, 2 Aug 2019 23:12:03 +0000 (16:12 -0700)] 
Merge pull request #27 from noslowerdna/CRUNCH-688

CRUNCH-688: Fix HFile node affinity for non-default namespace HBase t…

3 years agoCRUNCH-688: Fix HFile node affinity for non-default namespace HBase tables 27/head
Andrew Olson [Fri, 2 Aug 2019 21:47:09 +0000 (16:47 -0500)] 
CRUNCH-688: Fix HFile node affinity for non-default namespace HBase tables

3 years agoCRUNCH-679: Improvements for usage of DistCp (#20)
Andrew Olson [Mon, 15 Jul 2019 16:42:30 +0000 (11:42 -0500)] 
CRUNCH-679: Improvements for usage of DistCp (#20)

* CRUNCH-679: Improvements for usage of DistCp

* CRUNCH-679: Fix NPE bug by preserving IOUtils.cleanup logic

* CRUNCH-679: CrunchRenameCopyListing's constructor needs to be public

* CRUNCH-679: Unset rename configuration after loading into copy listing

* CRUNCH-679: Reduce default max distcp map tasks from 1000 to 100

* CRUNCH-679: Update log message formatting

3 years agoCRUNCH-681: Updating HFileUtils to accept a filesystem parameter for … (#22)
Andrew Olson [Fri, 12 Jul 2019 21:43:19 +0000 (16:43 -0500)] 
CRUNCH-681: Updating HFileUtils to accept a filesystem parameter for … (#22)

* CRUNCH-681: Updating HFileUtils to accept a filesystem parameter for targets and sources

* CRUNCH-681: Add and update javadoc

3 years agoCRUNCH-685 Use whitelist and blacklist for .fileSystem() properties (#25)
Ben Roling [Fri, 12 Jul 2019 21:36:57 +0000 (16:36 -0500)] 
CRUNCH-685 Use whitelist and blacklist for .fileSystem() properties (#25)

* CRUNCH-685 Use whitelist and blacklist for .fileSystem() properties

* CRUNCH-685 fix noisy logging

* CRUNCH-686 Fix FormatBundle to hide redacted properties

3 years agoCRUNCH-683 avoid unnecessary listStatus() calls from getPathSize() (#26)
Ben Roling [Fri, 12 Jul 2019 21:30:24 +0000 (16:30 -0500)] 
CRUNCH-683 avoid unnecessary listStatus() calls from getPathSize() (#26)

3 years agoCRUNCH-635: Output path per key for Text target
Suyash Agarwal [Wed, 3 Jul 2019 02:35:47 +0000 (08:05 +0530)] 
CRUNCH-635: Output path per key for Text target

Signed-off-by: Josh Wills <jwills@apache.org>
3 years agoCRUNCH-684: Fix NullPointerException
Andrew Olson [Thu, 2 May 2019 12:36:34 +0000 (07:36 -0500)] 
CRUNCH-684: Fix NullPointerException

Signed-off-by: Josh Wills <jwills@apache.org>
3 years agoCRUNCH-684: Fix .equals and .hashCode for Targets
Andrew Olson [Wed, 1 May 2019 21:20:17 +0000 (16:20 -0500)] 
CRUNCH-684: Fix .equals and .hashCode for Targets

Signed-off-by: Josh Wills <jwills@apache.org>
3 years agoCRUNCH-681: Add and update javadoc
Andrew Olson [Thu, 18 Apr 2019 20:54:47 +0000 (15:54 -0500)] 
CRUNCH-681: Add and update javadoc

Signed-off-by: Josh Wills <jwills@apache.org>
3 years agoCRUNCH-681: Updating HFileUtils to accept a filesystem parameter for targets and...
Andrew Olson [Thu, 18 Apr 2019 15:26:48 +0000 (10:26 -0500)] 
CRUNCH-681: Updating HFileUtils to accept a filesystem parameter for targets and sources

Signed-off-by: Josh Wills <jwills@apache.org>
3 years agoMerge pull request #21 from noslowerdna/CRUNCH-680
Micah Whitacre [Fri, 1 Mar 2019 16:40:48 +0000 (10:40 -0600)] 
Merge pull request #21 from noslowerdna/CRUNCH-680

CRUNCH-680: Kafka Source should split very large partitions

3 years agoCRUNCH-680: Kafka Source should split very large partitions 21/head
Andrew Olson [Fri, 22 Feb 2019 19:34:32 +0000 (13:34 -0600)] 
CRUNCH-680: Kafka Source should split very large partitions

3 years agoMerge pull request #19 from ben-roling/CRUNCH-677_master2
Micah Whitacre [Tue, 26 Feb 2019 16:27:37 +0000 (10:27 -0600)] 
Merge pull request #19 from ben-roling/CRUNCH-677_master2

CRUNCH-677 Source and Target accept FileSystem

3 years agoCRUNCH-677 fix merge mistakes 19/head
Ben Roling [Thu, 21 Feb 2019 17:17:25 +0000 (11:17 -0600)] 
CRUNCH-677 fix merge mistakes

3 years agoCRUNCH-677 Source and Target accept FileSystem
Ben Roling [Wed, 20 Feb 2019 17:42:24 +0000 (11:42 -0600)] 
CRUNCH-677 Source and Target accept FileSystem

3 years agoCRUNCH-678: Avoid unnecessary last modified time retrieval
Andrew Olson [Tue, 19 Feb 2019 22:46:20 +0000 (16:46 -0600)] 
CRUNCH-678: Avoid unnecessary last modified time retrieval

Signed-off-by: Josh Wills <jwills@apache.org>
3 years agoCRUNCH-660, CRUNCH-675: Use DistCp instead of FileUtils.copy when source and destinat...
Andrew Olson [Wed, 23 Jan 2019 17:23:57 +0000 (11:23 -0600)] 
CRUNCH-660, CRUNCH-675: Use DistCp instead of FileUtils.copy when source and destination paths are in different filesystems

Signed-off-by: Josh Wills <jwills@apache.org>
4 years agoCRUNCH-671: Failed to generate reports using "mvn site"
Jun He [Thu, 9 Aug 2018 05:49:09 +0000 (05:49 +0000)] 
CRUNCH-671: Failed to generate reports using "mvn site"

Crunch build failed due to "ClassNotFound" in doxia.
This is caused by maven-project-info-reports-plugin updated to 3.0.0, depends on
doxia-site-renderer 1.8 (which has org.apache.maven.doxia.siterenderer.DocumentContent
this class), while maven-site-plugin:3.3 depends on doxia-site-renderer:1.4 (which
doesn't have org.apache.maven.doxia.siterenderer.DocumentContent)
Specify maven-site-plugin to 3.7 can resolve this.

Signed-off-by: Jun He <jun.he@linaro.org>
Signed-off-by: Josh Wills <jwills@apache.org>
4 years agoCRUNCH-619: Update to HBase 2.0.1. Contributed by Attila Sasvari.
Josh Wills [Mon, 23 Jul 2018 20:31:00 +0000 (13:31 -0700)] 
CRUNCH-619: Update to HBase 2.0.1. Contributed by Attila Sasvari.

4 years agoCRUNCH-669: Add an option to disable temp dir deletion in the finalize() method of...
Josh Wills [Mon, 30 Apr 2018 18:47:15 +0000 (11:47 -0700)] 
CRUNCH-669: Add an option to disable temp dir deletion in the finalize() method of a DistributedPipeline

4 years agoCRUNCH-668: Support globbing patterns in From#avroFile
Clément MATHIEU [Tue, 27 Mar 2018 15:55:15 +0000 (17:55 +0200)] 
CRUNCH-668: Support globbing patterns in From#avroFile

Signed-off-by: Josh Wills <jwills@apache.org>
4 years agoFix HCatSourceITSpec.testBasic
Clément MATHIEU [Tue, 6 Mar 2018 16:47:48 +0000 (17:47 +0100)] 
Fix HCatSourceITSpec.testBasic

Signed-off-by: Josh Wills <jwills@apache.org>
4 years agoCRUNCH-665: Add crunch.max.poll.interval property
Clément MATHIEU [Wed, 7 Mar 2018 09:13:51 +0000 (10:13 +0100)] 
CRUNCH-665: Add crunch.max.poll.interval property

Signed-off-by: Josh Wills <jwills@apache.org>
4 years agoCRUNCH-664 Fixes HBase configuration properties being overwritten
Nathan Schile [Mon, 5 Feb 2018 15:08:46 +0000 (09:08 -0600)] 
CRUNCH-664 Fixes HBase configuration properties being overwritten

Signed-off-by: Josh Wills <jwills@apache.org>
4 years agoExpose combine file split file path via Hadoop config
Ben Roling [Wed, 24 Jan 2018 16:40:18 +0000 (10:40 -0600)] 
Expose combine file split file path via Hadoop config

Signed-off-by: Josh Wills <jwills@apache.org>
4 years agoCRUNCH-662: Updated KafkaRecordReader to better handle errors, empty reads and approp...
Bryan Baugher [Wed, 24 Jan 2018 20:14:31 +0000 (14:14 -0600)] 
CRUNCH-662: Updated KafkaRecordReader to better handle errors, empty reads and appropriately retry

Signed-off-by: Josh Wills <jwills@apache.org>
4 years agoCRUNCH-661: Make DataBaseSource.Builder methods public
Josh Wills [Thu, 18 Jan 2018 21:11:26 +0000 (13:11 -0800)] 
CRUNCH-661: Make DataBaseSource.Builder methods public

4 years agoCRUNCH-654: KafkaSource should use the new Kafka Consumer API instead of the SimpleCo...
Josh Wills [Mon, 11 Dec 2017 17:56:38 +0000 (09:56 -0800)] 
CRUNCH-654: KafkaSource should use the new Kafka Consumer API instead of the SimpleConsumer. Contributed by Bryan Baugher.

4 years agoCRUNCH-340: added HCatSource & HCatTarget
Stephen Durfey [Mon, 4 Dec 2017 16:49:59 +0000 (10:49 -0600)] 
CRUNCH-340: added HCatSource & HCatTarget

Signed-off-by: Josh Wills <jwills@apache.org>
4 years agoCRUNCH-659: updated hive dependency to 2.1
Stephen Durfey [Thu, 7 Dec 2017 15:55:56 +0000 (09:55 -0600)] 
CRUNCH-659: updated hive dependency to 2.1

Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
4 years agoCRUNCH-652: Fix to make the SourceTargetHelperTest less flakey on hadoop 3.0.0. Contr...
Josh Wills [Fri, 27 Oct 2017 04:09:27 +0000 (21:09 -0700)] 
CRUNCH-652: Fix to make the SourceTargetHelperTest less flakey on hadoop 3.0.0. Contributed by Gergo Repas.

4 years agoCRUNCH-653: Created KafkaSource that provides ConsumerRecord messages
Bryan Baugher [Wed, 16 Aug 2017 21:19:42 +0000 (16:19 -0500)] 
CRUNCH-653: Created KafkaSource that provides ConsumerRecord messages

Signed-off-by: Josh Wills <jwills@apache.org>
5 years agoCRUNCH-647: Remove obsolete jackson dependencies
Josh Wills [Fri, 12 May 2017 16:52:49 +0000 (09:52 -0700)] 
CRUNCH-647: Remove obsolete jackson dependencies

5 years agoCRUNCH-644 Supply preferred node for HFile writes
Gabriel Reid [Thu, 27 Apr 2017 12:52:16 +0000 (14:52 +0200)] 
CRUNCH-644 Supply preferred node for HFile writes

Designate the preferred HDFS data node when creating HFiles for
bulk load to improve data locality of the created HFiles.

5 years agoCRUNCH-618: Run on Spark 2. Contributed by Gergő Pásztor.
Tom White [Thu, 13 Apr 2017 15:10:23 +0000 (16:10 +0100)] 
CRUNCH-618: Run on Spark 2. Contributed by Gergő Pásztor.

5 years agoCRUNCH-642 Enable GroupingOptions for Distinct operations.
Xavier Talpe [Thu, 13 Apr 2017 05:52:43 +0000 (07:52 +0200)] 
CRUNCH-642 Enable GroupingOptions for Distinct operations.

This fixes the existing call for numReducers as it was not working as
intended for non-memory PCollections due to using an invalid amount
of numReducers. To increase flexibility when using the API,
another call was added that allow to directly pass the GroupingOptions.

Signed-off-by: Josh Wills <jwills@apache.org>
5 years agoCRUNCH-641: Wrong decimal format in dot files. Contributed by Gergő Pásztor.
Tom White [Wed, 12 Apr 2017 14:03:41 +0000 (15:03 +0100)] 
CRUNCH-641: Wrong decimal format in dot files. Contributed by Gergő Pásztor.

5 years agoCRUNCH-642 Enable numReducers option for Distinct operations.
Xavier Talpe [Mon, 10 Apr 2017 13:51:32 +0000 (15:51 +0200)] 
CRUNCH-642 Enable numReducers option for Distinct operations.

Signed-off-by: Josh Wills <jwills@apache.org>
5 years agoCRUNCH-636: amend Make replication factor for temporary files configurable
Attila Sasvari [Thu, 23 Mar 2017 20:35:36 +0000 (21:35 +0100)] 
CRUNCH-636: amend Make replication factor for temporary files configurable

Signed-off-by: Josh Wills <jwills@apache.org>
5 years agoCRUNCH-636: Make replication factor for temporary files configurable
Attila Sasvari [Mon, 20 Mar 2017 10:17:55 +0000 (11:17 +0100)] 
CRUNCH-636: Make replication factor for temporary files configurable

Signed-off-by: Josh Wills <jwills@apache.org>
5 years agoCRUNCH-638: Improve dot file generation for better supportability. Contributed by...
Tom White [Tue, 7 Mar 2017 14:38:52 +0000 (14:38 +0000)] 
CRUNCH-638: Improve dot file generation for better supportability. Contributed by Gergő Pásztor.

5 years agoCRUNCH-633: Remove the commons-httpclient:commons-httpclient dependency. Contributed...
Tom White [Mon, 20 Feb 2017 10:28:05 +0000 (10:28 +0000)] 
CRUNCH-633: Remove the commons-httpclient:commons-httpclient dependency. Contributed by Gergő Pásztor.

5 years agoCRUNCH-634 Fix typo in log message
Gabriel Reid [Mon, 13 Feb 2017 18:57:32 +0000 (19:57 +0100)] 
CRUNCH-634 Fix typo in log message

Contributed by Attila Sasvari

5 years agoCRUNCH-628: Upgraded to Kafka 0.10.0.x
Micah Whitacre [Wed, 7 Dec 2016 02:50:02 +0000 (21:50 -0500)] 
CRUNCH-628: Upgraded to Kafka 0.10.0.x

5 years ago[maven-release-plugin] prepare for next development iteration
Josh Wills [Sun, 5 Feb 2017 19:22:06 +0000 (11:22 -0800)] 
[maven-release-plugin] prepare for next development iteration

5 years ago[maven-release-plugin] prepare branch apache-crunch-0.15
Josh Wills [Sun, 5 Feb 2017 19:22:05 +0000 (11:22 -0800)] 
[maven-release-plugin] prepare branch apache-crunch-0.15

5 years agoCRUNCH-632: Added support for compressed CSVSource files.
Micah Whitacre [Thu, 12 Jan 2017 02:51:26 +0000 (20:51 -0600)] 
CRUNCH-632: Added support for compressed CSVSource files.

CRUNCH-632: Wrote simple test showing it now working on compressed CSV file.

Signed-off-by: Josh Wills <jwills@apache.org>
5 years agoMerge branch 'CRUNCH-630'
Micah Whitacre [Thu, 12 Jan 2017 01:53:05 +0000 (19:53 -0600)] 
Merge branch 'CRUNCH-630'

5 years agoCRUNCH-629: Kafka source pulling is aggressive
Brian Tieman [Tue, 13 Dec 2016 15:01:08 +0000 (09:01 -0600)] 
CRUNCH-629: Kafka source pulling is aggressive

Added some parenthesis to force proper order of operations in KafkaRecordReader.

Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
5 years agoCRUNCH-630: set a better default for the situation where offsets are out of range.
Micah Whitacre [Tue, 3 Jan 2017 17:39:31 +0000 (11:39 -0600)] 
CRUNCH-630: set a better default for the situation where offsets are out of range.

5 years agoQuick and Dirty Workaround for Crunch DistCache
Dimitry Goldin [Fri, 14 Oct 2016 16:39:41 +0000 (18:39 +0200)] 
Quick and Dirty Workaround for Crunch DistCache

Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
Signed-off-by: Josh Wills <jwills@apache.org>
5 years agoCRUNCH-622: From.avroFile fails if path not on default filesystem. Contributed by...
Josh Wills [Sat, 3 Dec 2016 19:56:59 +0000 (11:56 -0800)] 
CRUNCH-622: From.avroFile fails if path not on default filesystem. Contributed by Micah Whitacre.

5 years agoCRUNCH-620: Reduce "isn't a known config" warnings by slimming down ConsumerConfig...
Stefan Mendoza [Tue, 13 Sep 2016 03:38:41 +0000 (22:38 -0500)] 
CRUNCH-620: Reduce "isn't a known config" warnings by slimming down ConsumerConfig properties

Resolved by tagging the Kafka connection properties so that the Kafka Consumers can be built with slimmer ConsumerConfig properties.

Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
5 years agoCRUNCH-625: Add missing .union implementations for LTables with LTables and PTables
David Whiting [Thu, 20 Oct 2016 14:17:00 +0000 (16:17 +0200)] 
CRUNCH-625: Add missing .union implementations for LTables with LTables and PTables

5 years agoCRUNCH-621: Added check into hasPendingData to check if there is a large number of...
Micah Whitacre [Tue, 13 Sep 2016 15:35:35 +0000 (10:35 -0500)] 
CRUNCH-621: Added check into hasPendingData to check if there is a large number of requests with no data to make sure there is still data there.

5 years agoCRUNCH-623: Improves Javadoc of PTable#cogroup
Nathan Schile [Thu, 29 Sep 2016 21:24:14 +0000 (16:24 -0500)] 
CRUNCH-623: Improves Javadoc of PTable#cogroup

Signed-off-by: Josh Wills <jwills@apache.org>
6 years agoCRUNCH-617: Support defensively handling null when partition leader cannot be found.
Micah Whitacre [Tue, 6 Sep 2016 20:55:56 +0000 (15:55 -0500)] 
CRUNCH-617: Support defensively handling null when partition leader cannot be found.

Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
6 years agoCRUNCH-616: Replace (possibly copyrighted) Maugham text with Dickens. Contributed...
Tom White [Thu, 8 Sep 2016 13:12:30 +0000 (14:12 +0100)] 
CRUNCH-616: Replace (possibly copyrighted) Maugham text with Dickens. Contributed by Sean Owen.

Remove non-applicable Project Gutenberg license. Adjust lots of tests to match new text.

6 years agoCRUNCH-601: Handle empty PCollections correctly in Crunch-on-Spark. Created by Micah...
Josh Wills [Wed, 24 Aug 2016 17:59:14 +0000 (10:59 -0700)] 
CRUNCH-601: Handle empty PCollections correctly in Crunch-on-Spark. Created by Micah Whitacre,
Mikael Goldmann, and Josh Wills.

6 years agoCRUNCH-519: Add more detail to plan dot file. Contributed by Ron Hashimshony.
Josh Wills [Wed, 24 Aug 2016 03:07:23 +0000 (20:07 -0700)] 
CRUNCH-519: Add more detail to plan dot file. Contributed by Ron Hashimshony.

6 years agoCRUNCH-604: Avoid expensive Writables.reloadWritableComparableCodes
Micah Whitacre [Tue, 2 Aug 2016 21:29:55 +0000 (16:29 -0500)] 
CRUNCH-604: Avoid expensive Writables.reloadWritableComparableCodes

6 years agoCRUNCH-611: Corrected files that were missing the APL headers.
Micah Whitacre [Tue, 2 Aug 2016 20:58:29 +0000 (15:58 -0500)] 
CRUNCH-611: Corrected files that were missing the APL headers.

6 years agoCRUNCH-611: Added API for Offset reading/writing along with a simple implementation...
Micah Whitacre [Wed, 13 Jul 2016 15:18:17 +0000 (10:18 -0500)] 
CRUNCH-611: Added API for Offset reading/writing along with a simple implementation that supports doing it from hdfs.

Signed-off-by: Micah Whitacre <mkwhit@apache.org>
6 years agoCRUNCH-614: Fix HFileUtils.writeToHFilesForIncrementalLoad slowed dramatically by...
Josh Wills [Sat, 30 Jul 2016 00:47:09 +0000 (17:47 -0700)] 
CRUNCH-614: Fix HFileUtils.writeToHFilesForIncrementalLoad slowed dramatically by copying KeyValue byte array. Contributed by Ben Roling.

6 years agoCRUNCH-613: Fix FileTargetImplTest.testHandleOutputsMovesFilesToDestination instability
Clément MATHIEU [Tue, 19 Jul 2016 19:30:35 +0000 (21:30 +0200)] 
CRUNCH-613: Fix FileTargetImplTest.testHandleOutputsMovesFilesToDestination instability

Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
CRUNCH-613: Fixed up the test to consolidate constants used.

6 years agoCRUNCH-612: Add support of private ctors to AvroDeepCopier
Clément MATHIEU [Tue, 19 Jul 2016 19:01:20 +0000 (21:01 +0200)] 
CRUNCH-612: Add support of private ctors to AvroDeepCopier

Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
6 years agoCRUNCH-609: Improved KafkaRecordReader to keep retrying when the range of offsets...
Micah Whitacre [Tue, 28 Jun 2016 20:44:15 +0000 (15:44 -0500)] 
CRUNCH-609: Improved KafkaRecordReader to keep retrying when the range of offsets has not been fully consumed.

6 years agoCRUNCH-606: Handle setting version correctly and removed stray System.out in test.
Micah Whitacre [Mon, 23 May 2016 20:13:02 +0000 (15:13 -0500)] 
CRUNCH-606: Handle setting version correctly and removed stray System.out in test.

6 years agoCRUNCH-606: Kafka Source for Crunch which supports reading data as BytesWritable
Micah Whitacre [Mon, 11 Apr 2016 14:47:33 +0000 (09:47 -0500)] 
CRUNCH-606: Kafka Source for Crunch which supports reading data as BytesWritable

* Some of the code contributed by Bryan Baugher and Andrew Olson

6 years agoCRUNCH-608 Write Bloom filters in HFiles
Gabriel Reid [Tue, 10 May 2016 09:02:11 +0000 (11:02 +0200)] 
CRUNCH-608 Write Bloom filters in HFiles

Use a correctly-configured StoreFile.Writer (instead of HFile.Writer)
for writing HFiles so that Bloom filter data is also included in
the written HFiles.

6 years agoCRUNCH-607 Allow collection reuse in MemPipeline
Gabriel Reid [Mon, 2 May 2016 15:31:20 +0000 (17:31 +0200)] 
CRUNCH-607 Allow collection reuse in MemPipeline

Prevent SingleUseIterable from throwing an IllegalArgumentException
when legal reuse of PGroupedCollections are done with the
MemPipeline.

This simply prevents materializing the transformed contents of
a MemCollection until it is iterated over.

6 years ago[maven-release-plugin] prepare for next development iteration
Josh Wills [Sun, 24 Apr 2016 02:23:02 +0000 (19:23 -0700)] 
[maven-release-plugin] prepare for next development iteration

6 years ago[maven-release-plugin] prepare branch apache-crunch-0.14
Josh Wills [Sun, 24 Apr 2016 02:23:01 +0000 (19:23 -0700)] 
[maven-release-plugin] prepare branch apache-crunch-0.14

6 years agoCRUNCH-579: Supported access to counters from original TaskContext
mkwhitacre [Mon, 23 Nov 2015 00:07:30 +0000 (18:07 -0600)] 
CRUNCH-579: Supported access to counters from original TaskContext

Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
6 years agoCRUNCH-600: pass job credentials when building multiple outputs
Igor Bernstein [Sun, 10 Apr 2016 19:42:10 +0000 (15:42 -0400)] 
CRUNCH-600: pass job credentials when building multiple outputs

Signed-off-by: Micah Whitacre <mkwhit@gmail.com>
6 years agoCRUNCH-599: Fix increment and incrementIf methods in crunch-lambda so they also emit...
David Whiting [Thu, 31 Mar 2016 10:06:45 +0000 (12:06 +0200)] 
CRUNCH-599: Fix increment and incrementIf methods in crunch-lambda so they also emit the incoming element

6 years agoCRUNCH-597: Upgrade to Parquet 1.8.1
Josh Wills [Thu, 24 Mar 2016 16:55:16 +0000 (09:55 -0700)] 
CRUNCH-597: Upgrade to Parquet 1.8.1

6 years agoCRUNCH-596 Support right-outer bloom join
tworec [Fri, 4 Mar 2016 17:58:01 +0000 (18:58 +0100)] 
CRUNCH-596 Support right-outer bloom join

Signed-off-by: Gabriel Reid <greid@apache.org>