gobblin.git
43 min ago[GOBBLIN-1714] Use FileNotFoundException when determining files in source/target... master
Andy Jiang [Mon, 26 Sep 2022 20:48:29 +0000 (13:48 -0700)] 
[GOBBLIN-1714] Use FileNotFoundException when determining files in source/target instead of generic IOException (#3568)

* modify getFilesAtPath to throw FileNotFoundException if couldn't find any files

* log using targetFs

* Change catching IOException to FileNotFoundException instead

* Fix trailing whitespaces

* Fix indentation

* Fix indentation

2 days ago[GOBBLIN-1705] New consumer service to monitor changes to FlowSpecStore (#3557)
umustafi [Fri, 23 Sep 2022 22:28:05 +0000 (15:28 -0700)] 
[GOBBLIN-1705] New consumer service to monitor changes to FlowSpecStore (#3557)

* [GOBBLIN-1705] New consumer service to monitor changes to FlowSpecStore

* enabled by configuration for now
* the monitor processes changes to specs due to API requests and notifies the Scheduler to execute actions

* Initialize monitor correctly with right partitions, emit metrics, process heartbeat

* fix guice binding issues

* minor fixes with log and typo

Co-authored-by: Urmi Mustafi <umustafi@umustafi-mn1.linkedin.biz>
3 days ago[GOBBLIN-1706] Add DagActionStore to store the action to kill/resume one flow executi...
Zihan Li [Fri, 23 Sep 2022 17:04:41 +0000 (10:04 -0700)] 
[GOBBLIN-1706] Add DagActionStore to store the action to kill/resume one flow execution (#3558)

* address comments

* use connectionmanager when httpclient is not cloesable

* [GOBBLIN-1706]Add DagActionStore to store the action to kill/resume one flow execution

* add new flow execution handler which use DagactionStore to persist dag actions and let other host get the info

* Make dag manager integrate with the dag action store

* address comments

* address comments

* fix typo and add comments

* [GOBBLIN-1699] Log progress of reducer task for visibility with slow compaction jobs #3552

* before starting reduce
* after first record is reduced
* after reducing every 1000 records

Co-authored-by: Urmi Mustafi <umustafi@umustafi-mn1.linkedin.biz>
* [GOBBLIN-1673][GOBBLIN-1683] Skeleton code for handling messages between task runner / application master for Dynamic work unit allocation (#3539)

* [GOBBLIN-1673] Schema for dynamic work unit message

* [GOBBLIN-1683] Dynamic Work Unit messaging abstractions

* [GOBBLIN-1698] Fast fail during work unit generation based on config. (#3542)

* fast fail during work unit generation based on config.

* [GOBBLIN-1690] Added logging to ORC writer

Closes #3543 from rdsr/master

* [GOBBLIN-1678] Refactor git flowgraph component to be extensible (#3536)

* Refactor git flowgraph component to be extensible

* Move files to appropriate modules

* Cleanup and add javadocs

* Cleanup, add missing javadocs

* Address review and import order

* Fix findbugs

* Use java sort instead of collections

* Add GMCE topic explicitly to hive commit event (#3547)

* [GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode (#3544)

* address comments

* use connectionmanager when httpclient is not cloesable

* [GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode

* add orchestor as listener before service start

* fix code style

* address comments

* fix test case to test orchestor as one listener of flow spec

* remove unintentional change

* remove unused import

* address comments

* fix typo

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
* fast fail during work unit generation based on config.

Co-authored-by: Meeth Gala <mgala@linkedin.com>
Co-authored-by: Ratandeep <rdsr.me@gmail.com>
Co-authored-by: William Lo <lo.william97@gmail.com>
Co-authored-by: Jack Moseley <jmoseley@linkedin.com>
Co-authored-by: Zihan Li <zihli@linkedin.com>
Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
* Define basics for collecting Iceberg metadata for the current snapshot (#3559)

* [GOBBLIN-1701] Replace jcenter with either maven central or gradle plugin portal (#3554)

* remove jcentral
* Use gradle plugin portal for shadow
* Use maven central in all other cases

* [GOBBLIN-1695] Fix: Failure to add spec executors doesn't block deployment (#3551)

* Allow first time failure to authenticate with Azkaban to fail silently

* Fix findbugs report

* Refactor azkaban authentication into function. Call on init and if session_id is null when adding a flow

* Add handling for fetchSession throwing an exception

* Add logging when fails on constructor and initialization, but continue to local deploy

* Revert changes for azkabanSpecProducer, but quiet log instead of throw in constructor

* Fixed vars

* Revert changes on azkabanSpecProducer

* clean up error throwing

* revert function checking changes

* Reformat file

* Clean up function

* Format file for try/catch

* Allow first time failure to authenticate with Azkaban to fail silently

* Fix findbugs report

* Refactor azkaban authentication into function. Call on init and if session_id is null when adding a flow

* Fixed rebase

* Fixed rebase

* Revert changes for azkabanSpecProducer, but quiet log instead of throw in constructor

* Add whitespace back

* fix helix job wait completion bug when job goes to STOPPING state (#3556)

address comments

update stoppingStateEndTime with currentTime

update test cases

* [GOBBLIN-1699] Log progress of reducer task for visibility with slow compaction jobs #3552

* before starting reduce
* after first record is reduced
* after reducing every 1000 records

Co-authored-by: Urmi Mustafi <umustafi@umustafi-mn1.linkedin.biz>
* Define basics for collecting Iceberg metadata for the current snapshot

* [GOBBLIN-1673][GOBBLIN-1683] Skeleton code for handling messages between task runner / application master for Dynamic work unit allocation (#3539)

* [GOBBLIN-1673] Schema for dynamic work unit message

* [GOBBLIN-1683] Dynamic Work Unit messaging abstractions

* Address review comments

* Correct import order

Co-authored-by: Matthew Ho <homatt999@gmail.com>
Co-authored-by: Andy Jiang <20528968+AndyJiang99@users.noreply.github.com>
Co-authored-by: Hanghang Nate Liu <nate.hanghang.liu@gmail.com>
Co-authored-by: umustafi <umust77@gmail.com>
Co-authored-by: Urmi Mustafi <umustafi@umustafi-mn1.linkedin.biz>
Co-authored-by: William Lo <lo.william97@gmail.com>
* [GOBBLIN-1710]  Codecov should be optional in CI and not fail Github Actions (#3562)

* [GOBBLIN-1711] Replace Jcenter with maven central (#3566)

* [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding (#3549)

* address comments

* use connectionmanager when httpclient is not cloesable

* fix test case to test orchestor as one listener of flow spec

* remove unintentional change

* [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding

* fix compilation error

* address comments

* address comments

* address comments

* update outdated javadoc

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
* [GOBBLIN-1709] Create Iceberg Datasets Finder, Iceberg Dataset and FileSet to generate Copy Entities to support Distcp for Iceberg (#3560)

* initial commit for iceberg distcp.

* adding copy entity helper and icerbeg distcp template and test case.

* Adding unit tests and refactoring method definitions for an Iceberg dataset.

* resolve conflicts after cleaning history

* update iceberg dataset and finder to include javadoc

* addressed comments on PR and aligned code check style

* renamed vars, added logging and updated javadoc

* update dataset descriptor with ternary operation and rename fs to sourceFs

* added source and target fs and update iceberg dataset finder constructor

* Update source and dest dataset methods as protected and add req args constructor

* change the order of attributes for iceberg dataset finder ctor

* update iceberg dataset methods with correct source and target fs

Co-authored-by: Meeth Gala <mgala@linkedin.com>
* [GOBBLIN-1707] Add `IcebergTableTest` unit test (#3564)

* Add `IcebergTableTest` unit test

* Fixup comment and indentation

* Minor correction of `Long` => `Integer`

* Correct comment

* [GOBBLIN-1711] Replace Jcenter with maven central (#3566)

* Minor rename of local var

Co-authored-by: Matthew Ho <homatt999@gmail.com>
* [GOBBLIN-1708] Improve TimeAwareRecursiveCopyableDataset to lookback only into datefolders that match range (#3563)

* Check datetime range validity prior to recursing

* Remove unused packages

* Remove extra line

* Reformat function

* Check string prior to parsing

* removed unused import

* Change checkpathdatetimevalidity to use available localdatetime library parsing functions

* Change to isempty

* Modify check path to be flexible

* Update javadoc

* Add unit tests and refactor

* change bind class as GOBBLIN-1697 get merged

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
Co-authored-by: umustafi <umust77@gmail.com>
Co-authored-by: Urmi Mustafi <umustafi@umustafi-mn1.linkedin.biz>
Co-authored-by: Matthew Ho <homatt999@gmail.com>
Co-authored-by: meethngala <meethgala.16@gmail.com>
Co-authored-by: Meeth Gala <mgala@linkedin.com>
Co-authored-by: Ratandeep <rdsr.me@gmail.com>
Co-authored-by: William Lo <lo.william97@gmail.com>
Co-authored-by: Jack Moseley <jmoseley@linkedin.com>
Co-authored-by: Kip Kohn <ckohn@linkedin.com>
Co-authored-by: Andy Jiang <20528968+AndyJiang99@users.noreply.github.com>
Co-authored-by: Hanghang Nate Liu <nate.hanghang.liu@gmail.com>
3 days ago[GOBBLIN-1704] Purge offline helix instances during startup (#3561)
Matthew Ho [Fri, 23 Sep 2022 00:13:54 +0000 (17:13 -0700)] 
[GOBBLIN-1704] Purge offline helix instances during startup (#3561)

* [GOBBLIN-1704] Purge offline helix instances during startup

To avoid accumulation of helix instances, we should clean up
over time. But we can only perform the clean up at startup
because it's unsafe to call the API while instances are being
created / removed.

* Emit Gobblin Tracking Event during failure / completion

3 days ago[GOBBLIN-1708] Improve TimeAwareRecursiveCopyableDataset to lookback only into datefo...
Andy Jiang [Thu, 22 Sep 2022 23:46:11 +0000 (16:46 -0700)] 
[GOBBLIN-1708] Improve TimeAwareRecursiveCopyableDataset to lookback only into datefolders that match range (#3563)

* Check datetime range validity prior to recursing

* Remove unused packages

* Remove extra line

* Reformat function

* Check string prior to parsing

* removed unused import

* Change checkpathdatetimevalidity to use available localdatetime library parsing functions

* Change to isempty

* Modify check path to be flexible

* Update javadoc

* Add unit tests and refactor

4 days ago[GOBBLIN-1707] Add `IcebergTableTest` unit test (#3564)
Kip Kohn [Thu, 22 Sep 2022 17:56:53 +0000 (10:56 -0700)] 
[GOBBLIN-1707] Add `IcebergTableTest` unit test (#3564)

* Add `IcebergTableTest` unit test

* Fixup comment and indentation

* Minor correction of `Long` => `Integer`

* Correct comment

* [GOBBLIN-1711] Replace Jcenter with maven central (#3566)

* Minor rename of local var

Co-authored-by: Matthew Ho <homatt999@gmail.com>
4 days ago[GOBBLIN-1709] Create Iceberg Datasets Finder, Iceberg Dataset and FileSet to generat...
meethngala [Thu, 22 Sep 2022 17:03:49 +0000 (12:03 -0500)] 
[GOBBLIN-1709] Create Iceberg Datasets Finder, Iceberg Dataset and FileSet to generate Copy Entities to support Distcp for Iceberg (#3560)

* initial commit for iceberg distcp.

* adding copy entity helper and icerbeg distcp template and test case.

* Adding unit tests and refactoring method definitions for an Iceberg dataset.

* resolve conflicts after cleaning history

* update iceberg dataset and finder to include javadoc

* addressed comments on PR and aligned code check style

* renamed vars, added logging and updated javadoc

* update dataset descriptor with ternary operation and rename fs to sourceFs

* added source and target fs and update iceberg dataset finder constructor

* Update source and dest dataset methods as protected and add req args constructor

* change the order of attributes for iceberg dataset finder ctor

* update iceberg dataset methods with correct source and target fs

Co-authored-by: Meeth Gala <mgala@linkedin.com>
4 days ago[GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message...
Zihan Li [Thu, 22 Sep 2022 03:13:28 +0000 (20:13 -0700)] 
[GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding (#3549)

* address comments

* use connectionmanager when httpclient is not cloesable

* fix test case to test orchestor as one listener of flow spec

* remove unintentional change

* [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding

* fix compilation error

* address comments

* address comments

* address comments

* update outdated javadoc

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
6 days ago[GOBBLIN-1711] Replace Jcenter with maven central (#3566)
Matthew Ho [Tue, 20 Sep 2022 17:46:32 +0000 (10:46 -0700)] 
[GOBBLIN-1711] Replace Jcenter with maven central (#3566)

11 days ago[GOBBLIN-1710] Codecov should be optional in CI and not fail Github Actions (#3562)
Matthew Ho [Thu, 15 Sep 2022 01:54:25 +0000 (18:54 -0700)] 
[GOBBLIN-1710]  Codecov should be optional in CI and not fail Github Actions (#3562)

13 days agoDefine basics for collecting Iceberg metadata for the current snapshot (#3559)
Kip Kohn [Tue, 13 Sep 2022 18:40:07 +0000 (11:40 -0700)] 
Define basics for collecting Iceberg metadata for the current snapshot (#3559)

* [GOBBLIN-1701] Replace jcenter with either maven central or gradle plugin portal (#3554)

* remove jcentral
* Use gradle plugin portal for shadow
* Use maven central in all other cases

* [GOBBLIN-1695] Fix: Failure to add spec executors doesn't block deployment (#3551)

* Allow first time failure to authenticate with Azkaban to fail silently

* Fix findbugs report

* Refactor azkaban authentication into function. Call on init and if session_id is null when adding a flow

* Add handling for fetchSession throwing an exception

* Add logging when fails on constructor and initialization, but continue to local deploy

* Revert changes for azkabanSpecProducer, but quiet log instead of throw in constructor

* Fixed vars

* Revert changes on azkabanSpecProducer

* clean up error throwing

* revert function checking changes

* Reformat file

* Clean up function

* Format file for try/catch

* Allow first time failure to authenticate with Azkaban to fail silently

* Fix findbugs report

* Refactor azkaban authentication into function. Call on init and if session_id is null when adding a flow

* Fixed rebase

* Fixed rebase

* Revert changes for azkabanSpecProducer, but quiet log instead of throw in constructor

* Add whitespace back

* fix helix job wait completion bug when job goes to STOPPING state (#3556)

address comments

update stoppingStateEndTime with currentTime

update test cases

* [GOBBLIN-1699] Log progress of reducer task for visibility with slow compaction jobs #3552

* before starting reduce
* after first record is reduced
* after reducing every 1000 records

Co-authored-by: Urmi Mustafi <umustafi@umustafi-mn1.linkedin.biz>
* Define basics for collecting Iceberg metadata for the current snapshot

* [GOBBLIN-1673][GOBBLIN-1683] Skeleton code for handling messages between task runner / application master for Dynamic work unit allocation (#3539)

* [GOBBLIN-1673] Schema for dynamic work unit message

* [GOBBLIN-1683] Dynamic Work Unit messaging abstractions

* Address review comments

* Correct import order

Co-authored-by: Matthew Ho <homatt999@gmail.com>
Co-authored-by: Andy Jiang <20528968+AndyJiang99@users.noreply.github.com>
Co-authored-by: Hanghang Nate Liu <nate.hanghang.liu@gmail.com>
Co-authored-by: umustafi <umust77@gmail.com>
Co-authored-by: Urmi Mustafi <umustafi@umustafi-mn1.linkedin.biz>
Co-authored-by: William Lo <lo.william97@gmail.com>
13 days ago[GOBBLIN-1698] Fast fail during work unit generation based on config. (#3542)
meethngala [Tue, 13 Sep 2022 17:32:41 +0000 (13:32 -0400)] 
[GOBBLIN-1698] Fast fail during work unit generation based on config. (#3542)

* fast fail during work unit generation based on config.

* [GOBBLIN-1690] Added logging to ORC writer

Closes #3543 from rdsr/master

* [GOBBLIN-1678] Refactor git flowgraph component to be extensible (#3536)

* Refactor git flowgraph component to be extensible

* Move files to appropriate modules

* Cleanup and add javadocs

* Cleanup, add missing javadocs

* Address review and import order

* Fix findbugs

* Use java sort instead of collections

* Add GMCE topic explicitly to hive commit event (#3547)

* [GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode (#3544)

* address comments

* use connectionmanager when httpclient is not cloesable

* [GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode

* add orchestor as listener before service start

* fix code style

* address comments

* fix test case to test orchestor as one listener of flow spec

* remove unintentional change

* remove unused import

* address comments

* fix typo

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
* fast fail during work unit generation based on config.

Co-authored-by: Meeth Gala <mgala@linkedin.com>
Co-authored-by: Ratandeep <rdsr.me@gmail.com>
Co-authored-by: William Lo <lo.william97@gmail.com>
Co-authored-by: Jack Moseley <jmoseley@linkedin.com>
Co-authored-by: Zihan Li <zihli@linkedin.com>
Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
2 weeks ago[GOBBLIN-1673][GOBBLIN-1683] Skeleton code for handling messages between task runner...
Matthew Ho [Mon, 12 Sep 2022 16:56:48 +0000 (09:56 -0700)] 
[GOBBLIN-1673][GOBBLIN-1683] Skeleton code for handling messages between task runner / application master for Dynamic work unit allocation (#3539)

* [GOBBLIN-1673] Schema for dynamic work unit message

* [GOBBLIN-1683] Dynamic Work Unit messaging abstractions

2 weeks ago[GOBBLIN-1699] Log progress of reducer task for visibility with slow compaction jobs...
umustafi [Fri, 9 Sep 2022 18:49:54 +0000 (11:49 -0700)] 
[GOBBLIN-1699] Log progress of reducer task for visibility with slow compaction jobs #3552

* before starting reduce
* after first record is reduced
* after reducing every 1000 records

Co-authored-by: Urmi Mustafi <umustafi@umustafi-mn1.linkedin.biz>
2 weeks agofix helix job wait completion bug when job goes to STOPPING state (#3556)
Hanghang Nate Liu [Fri, 9 Sep 2022 18:07:10 +0000 (11:07 -0700)] 
fix helix job wait completion bug when job goes to STOPPING state (#3556)

address comments

update stoppingStateEndTime with currentTime

update test cases

2 weeks ago[GOBBLIN-1695] Fix: Failure to add spec executors doesn't block deployment (#3551)
Andy Jiang [Thu, 8 Sep 2022 23:20:47 +0000 (16:20 -0700)] 
[GOBBLIN-1695] Fix: Failure to add spec executors doesn't block deployment (#3551)

* Allow first time failure to authenticate with Azkaban to fail silently

* Fix findbugs report

* Refactor azkaban authentication into function. Call on init and if session_id is null when adding a flow

* Add handling for fetchSession throwing an exception

* Add logging when fails on constructor and initialization, but continue to local deploy

* Revert changes for azkabanSpecProducer, but quiet log instead of throw in constructor

* Fixed vars

* Revert changes on azkabanSpecProducer

* clean up error throwing

* revert function checking changes

* Reformat file

* Clean up function

* Format file for try/catch

* Allow first time failure to authenticate with Azkaban to fail silently

* Fix findbugs report

* Refactor azkaban authentication into function. Call on init and if session_id is null when adding a flow

* Fixed rebase

* Fixed rebase

* Revert changes for azkabanSpecProducer, but quiet log instead of throw in constructor

* Add whitespace back

2 weeks ago[GOBBLIN-1701] Replace jcenter with either maven central or gradle plugin portal...
Matthew Ho [Thu, 8 Sep 2022 22:15:37 +0000 (15:15 -0700)] 
[GOBBLIN-1701] Replace jcenter with either maven central or gradle plugin portal (#3554)

* remove jcentral
* Use gradle plugin portal for shadow
* Use maven central in all other cases

2 weeks agoMerge pull request #3555 from umustafi/removeUnusedGradlePlugin
William Lo [Thu, 8 Sep 2022 21:48:08 +0000 (14:48 -0700)] 
Merge pull request #3555 from umustafi/removeUnusedGradlePlugin

[GOBBLIN-1700] Remove unused coveralls-gradle-plugin dependency

2 weeks ago[GOBBLIN-1700] Remove unused coveralls-gradle-plugin dependency 3555/head
Urmi Mustafi [Thu, 8 Sep 2022 21:06:58 +0000 (14:06 -0700)] 
[GOBBLIN-1700] Remove unused coveralls-gradle-plugin dependency

2 weeks agoadd MysqlUserQuotaManager (#3545)
Arjun Singh Bora [Tue, 6 Sep 2022 20:07:03 +0000 (13:07 -0700)] 
add MysqlUserQuotaManager (#3545)

fix unit test and address review comments
address review comments
merge conflicts

3 weeks ago[GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode (#3544)
Zihan Li [Thu, 1 Sep 2022 16:20:23 +0000 (09:20 -0700)] 
[GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode (#3544)

* address comments

* use connectionmanager when httpclient is not cloesable

* [GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode

* add orchestor as listener before service start

* fix code style

* address comments

* fix test case to test orchestor as one listener of flow spec

* remove unintentional change

* remove unused import

* address comments

* fix typo

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
3 weeks agoAdd GMCE topic explicitly to hive commit event (#3547)
Jack Moseley [Wed, 31 Aug 2022 20:36:29 +0000 (13:36 -0700)] 
Add GMCE topic explicitly to hive commit event (#3547)

3 weeks ago[GOBBLIN-1678] Refactor git flowgraph component to be extensible (#3536)
William Lo [Wed, 31 Aug 2022 18:55:06 +0000 (11:55 -0700)] 
[GOBBLIN-1678] Refactor git flowgraph component to be extensible (#3536)

* Refactor git flowgraph component to be extensible

* Move files to appropriate modules

* Cleanup and add javadocs

* Cleanup, add missing javadocs

* Address review and import order

* Fix findbugs

* Use java sort instead of collections

4 weeks ago[GOBBLIN-1690] Added logging to ORC writer
Ratandeep [Thu, 25 Aug 2022 05:07:50 +0000 (01:07 -0400)] 
[GOBBLIN-1690] Added logging to ORC writer

Closes #3543 from rdsr/master

4 weeks agoAllow all iceberg exceptions to be fault tolerant (#3541)
Jack Moseley [Tue, 23 Aug 2022 17:40:25 +0000 (10:40 -0700)] 
Allow all iceberg exceptions to be fault tolerant (#3541)

5 weeks agoGuard against exists fs call as well (#3538)
William Lo [Fri, 19 Aug 2022 17:47:31 +0000 (10:47 -0700)] 
Guard against exists fs call as well (#3538)

5 weeks agoAdd error handling for timeaware finder to handle scenarios where fil… (#3537)
William Lo [Thu, 18 Aug 2022 22:20:03 +0000 (15:20 -0700)] 
Add error handling for timeaware finder to handle scenarios where fil… (#3537)

* Add error handling for timeaware finder to handle scenarios where files do not exist or folders not matching date format

* Check path exists before attempting ls

5 weeks ago[GOBBLIN-1675] Add pagination for GaaS on server side (#3533)
Andy Jiang [Thu, 18 Aug 2022 20:28:53 +0000 (13:28 -0700)] 
[GOBBLIN-1675] Add pagination for GaaS on server side (#3533)

* added mysql statement and take in pageContext

* Working functionality for count and start in api layer

* Revert unnecessary changes

* Revert unncessary changes

* Add tests for pagination with filterFlow

* Handle start and count for getall call

* Handle start and count for getAll and getFilterFlows call in API layer of GaaS

* Add tests and fix for pagination for getAll

* Refactor conditions

* Fixed tests

* Add in condition to separate user defined count and start from default values

* Fix checkstyle

* Handle get all with pagination case separately from filter

* Completed ServiceManagerTest for getAll

* remove modified_time field from Get all statement

* Completed GobblinServiceManager tests for filtered flows with pagination

* Remove separate call for tests

* Add context for debug logs and documenation

* Update user.name and user.email on commits

* Add in second ordering field

* Fix testGitCreate test by utilizing json

Co-authored-by: Andy Jiang <andjiang@andjiang-mn1.linkedin.biz>
6 weeks ago[GOBBLIN-1672] Refactor metrics from DagManager into its own class, add metrics per...
William Lo [Tue, 9 Aug 2022 22:02:31 +0000 (15:02 -0700)] 
[GOBBLIN-1672] Refactor metrics from DagManager into its own class, add metrics per … (#3532)

* Refactor metrics from DagManager into its own class, add metrics per executor for SUCCESS, FAILED, SLA_EXCEEDED, START_SLA_EXCEEDED

* Address review, fix flow gauge for failed flows

6 weeks ago[GOBBLIN-1677] Fix timezone property to read from key correctly (#3535)
William Lo [Mon, 8 Aug 2022 23:29:14 +0000 (16:29 -0700)] 
[GOBBLIN-1677] Fix timezone property to read from key correctly (#3535)

* Fix timezone property to read from key correctly

* Code cleanup and additional test around invalid timezones

7 weeks ago[Gobblin-931] Fix typo in gobblin CLI usage (#3530)
Bharath Krishna [Mon, 8 Aug 2022 20:23:01 +0000 (13:23 -0700)] 
[Gobblin-931] Fix typo in gobblin CLI usage (#3530)

* Correct typoe "gobblon" to "gobblin"
* Fix some documentation wording

Co-authored-by: Bharath Krishna <bmurali@roku.com>
7 weeks ago[GOBBLIN-1671] : Fix gobblin.sh script to add external jars as colon separated to...
Bharath Krishna [Mon, 8 Aug 2022 20:22:27 +0000 (13:22 -0700)] 
[GOBBLIN-1671] : Fix gobblin.sh script to add external jars as colon separated to HADOOP_CLASSPATH (#3531)

When using `mapreduce` mode in gobblin.sh, the additional jars passed to
gobblin.sh through --jars are comma separated. They are incorrectly
added to HADOOP_CLASSPATH that takes colon (:) separated jars.

2 months ago[GOBBLIN-1656] Return a http status 503 on GaaS when quota is exceeded for user or...
William Lo [Mon, 25 Jul 2022 20:49:01 +0000 (13:49 -0700)] 
[GOBBLIN-1656] Return a http status 503 on GaaS when quota is exceeded for user or flowgroup (#3516)

* Add e2e tests and set http response code for quota exceeded

* cleanup

* Fix checkstyle test

* Improve guard against schedule change if quota is exceeded

* Fix bug relating to exception propagation and scheduler not checking quota due to current attempt number

* Address review comments

* Refactor based on review feedback

* Fix test

* Cleanup around handling responses from callbacks in GaaS API

* Fix checkstyle

* catch quotaexceededexception instead of checking type explicitly

* Log other errors and throw 500

* Fix checkstyle dead store

* Fix checkstyle again

2 months ago[GOBBLIN-1669] Clean up TimeAwareRecursiveCopyableDataset to support seconds in time...
William Lo [Thu, 21 Jul 2022 20:15:54 +0000 (13:15 -0700)] 
[GOBBLIN-1669] Clean up TimeAwareRecursiveCopyableDataset to support seconds in time… (#3528)

* Clean up TimeAwareRecursiveCopyableDataset to support seconds in timestamp, squash logs for timestamp paths that do not exist, and improve efficiency on search for non-nested timestamps

* Fix checkstyle

* Address review

* Refactor based on reviews to handle every date pattern format recursively without the iterator

* Calculate start and end date in helper

2 months ago[GOBBLIN-1670] Remove rat tasks and unneeded checkstyles blocking build pipeline...
William Lo [Thu, 21 Jul 2022 16:41:09 +0000 (09:41 -0700)] 
[GOBBLIN-1670] Remove rat tasks and unneeded checkstyles blocking build pipeline (#3529)

* Remove rat task on ci-cd

* Disable checkstyle for generated code

* Remove checkstyle on generated rest files as well

* Fix checkstyle for generated java code

* Remove checkstyle for gobblin-metrics-base generated code

* Remove checkstyle for generated code in test utils and http

2 months ago[GOBBLIN-1668] Add audit counts for iceberg registration (#3527)
Jack Moseley [Mon, 18 Jul 2022 19:04:55 +0000 (12:04 -0700)] 
[GOBBLIN-1668] Add audit counts for iceberg registration (#3527)

* Add audit counts for iceberg registration

* Address comments

* Address comments

2 months ago[GOBBLIN-1667] Create new predicate - ExistingPartitionSkipPredicate (#3526)
Christopher Harris [Mon, 11 Jul 2022 21:48:43 +0000 (16:48 -0500)] 
[GOBBLIN-1667] Create new predicate - ExistingPartitionSkipPredicate (#3526)

Currently the hive.dataset.existing.entity.policy.ABORT will not
abort if there is an existing partition. One option to resolve this
is to support the ABORT configuration but that might be backwards
incompatible, so introducing a new skip predicate called
ExistingPartitionSkipPredicate that will skip any partition that
already exists in the target table

3 months agoCalculate requested container count based on adding allocated count and outstanding...
Hanghang Nate Liu [Tue, 21 Jun 2022 20:10:49 +0000 (13:10 -0700)] 
Calculate requested container count based on adding allocated count and outstanding ContainerRequests in Yarn (#3524)

3 months agomake the requestedContainerCountMap correctly update the container count (#3523)
Hanghang Nate Liu [Fri, 17 Jun 2022 18:28:59 +0000 (11:28 -0700)] 
make the requestedContainerCountMap correctly update the container count (#3523)

update the place to decrease the requestedContainerCountMap

3 months agoFix running counts for retried flows (#3520)
William Lo [Wed, 15 Jun 2022 22:04:14 +0000 (15:04 -0700)] 
Fix running counts for retried flows (#3520)

3 months agoAllow table to flush after write failure (#3522)
Jack Moseley [Wed, 15 Jun 2022 19:01:02 +0000 (12:01 -0700)] 
Allow table to flush after write failure (#3522)

3 months ago[GOBBLIN-1652]Add more log in the KafkaJobStatusMonitor in case it fails to process...
Zihan Li [Tue, 14 Jun 2022 20:21:01 +0000 (13:21 -0700)] 
[GOBBLIN-1652]Add more log in the KafkaJobStatusMonitor in case it fails to process one GobblinTrackingEvent (#3513)

* address comments

* use connectionmanager when httpclient is not cloesable

* [GOBBLIN-1652]Add more log in the KafkaJobStatusMonitor in case it fails to process one GobblinTrackingEvent

* add test

* fix test

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
3 months agoMake Yarn container and helix instance allocation group by tag (#3519)
Hanghang Nate Liu [Thu, 9 Jun 2022 19:27:42 +0000 (12:27 -0700)] 
Make Yarn container and helix instance allocation group by tag (#3519)

3 months ago[GOBBLIN-1657] Update completion watermark on change_property in IcebergMetadataWrite...
vbohra [Tue, 7 Jun 2022 21:48:27 +0000 (14:48 -0700)] 
[GOBBLIN-1657] Update completion watermark on change_property in IcebergMetadataWriter (#3517)

* [GOBBLIN-1655] Update completion watermark for quiet tables during iceberg registration

* [GOBBLIN-1657] Update completion watermark on change_proerty GMCE

* Added test case to check watermark update on change_property

Co-authored-by: Vikram Bohra <vbohra@vbohra-mn1.linkedin.biz>
3 months ago[GOBBLIN-1654] Add capacity floor to avoid aggressively requesting resource and small...
Zihan Li [Wed, 1 Jun 2022 21:39:47 +0000 (14:39 -0700)] 
[GOBBLIN-1654] Add capacity floor to avoid aggressively requesting resource and small files. (#3515)

* address comments

* use connectionmanager when httpclient is not cloesable

* [GOBBLIN-1654] Add capacity floor to avoid aggressively requesting resource and small files.

* address comments

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
3 months ago[GOBBLIN-1653] Shorten job name length if it exceeds 255 characters (#3514)
William Lo [Wed, 1 Jun 2022 18:31:32 +0000 (11:31 -0700)] 
[GOBBLIN-1653] Shorten job name length if it exceeds 255 characters (#3514)

* Shorten job name length if it exceeds 255 characters (max size for a directory component)

* Address review to account for flow name in hash

4 months ago[GOBBLIN-1650] Implement flowGroup quotas for the DagManager (#3511)
William Lo [Fri, 27 May 2022 20:15:30 +0000 (13:15 -0700)] 
[GOBBLIN-1650] Implement flowGroup quotas for the DagManager (#3511)

* Implement flowGroup quotas for the DagManager

* Address review and add comments to tests

* Add guard for double increments on already tracked dags

* Fix tests

4 months ago[GOBBLIN-1648] Complete use of JDBC `DataSource` 'read-only' validation query by...
Kip Kohn [Thu, 26 May 2022 17:11:00 +0000 (10:11 -0700)] 
[GOBBLIN-1648] Complete use of JDBC `DataSource` 'read-only' validation query by incorporating where previously omitted (#3509)

* Complete use of JDBC `DataSource` 'read-only' validation query by incorporating where previously omitted

* Add logging to demonstrate validation query successfully configured

4 months agoAdd config to set close timeout in HiveRegister (#3512)
Jack Moseley [Wed, 25 May 2022 18:20:38 +0000 (11:20 -0700)] 
Add config to set close timeout in HiveRegister (#3512)

4 months agoadd an API in AbstractBaseKafkaConsumerClient to list selected topics (#3501)
Arjun Singh Bora [Mon, 23 May 2022 17:57:47 +0000 (10:57 -0700)] 
add an API in AbstractBaseKafkaConsumerClient to list selected topics (#3501)

4 months ago[GOBBLIN-1649] Revert gobblin-1633 (#3510)
Matthew Ho [Wed, 18 May 2022 22:59:57 +0000 (15:59 -0700)] 
[GOBBLIN-1649] Revert gobblin-1633 (#3510)

4 months ago[GOBBLIN-1639] Prevent metrics reporting if configured, clean up workunit count metri...
William Lo [Wed, 18 May 2022 22:34:54 +0000 (15:34 -0700)] 
[GOBBLIN-1639] Prevent metrics reporting if configured, clean up workunit count metric (#3500)

* Move workunit count metrics emitting to Gobblin pipeline, add configuration to prevent metrics reporting if configured

* rename config key

* fix test

* Fix checkstyle and other tests

* Create a custom extensible hook for GaaS metrics on JobLauncher

* Add tests

* Fix failing tests

* Address review

* Address review comment

4 months ago[GOBBLIN-1647] Add hive commit GTE to HiveMetadataWriter (#3508)
Jack Moseley [Tue, 17 May 2022 21:02:43 +0000 (14:02 -0700)] 
[GOBBLIN-1647] Add hive commit GTE to HiveMetadataWriter (#3508)

* Add hive commit GTE to HiveMetadataWriter

* Address comment

4 months ago[GOBBLIN-1633] Fix compaction actions on job failure not retried if compaction succee...
Matthew Ho [Fri, 13 May 2022 16:32:54 +0000 (09:32 -0700)] 
[GOBBLIN-1633] Fix compaction actions on job failure not retried if compaction succeeds (#3494)

* GOBBLIN-1633

Fix compaction on job failure not retried if compaction succeeds

* Fix typos

4 months ago[GOBBLIN-1646] Revert yarn container / helix tag group changes (#3507)
Matthew Ho [Thu, 12 May 2022 22:13:06 +0000 (15:13 -0700)] 
[GOBBLIN-1646] Revert yarn container / helix tag group changes (#3507)

Revert "Fix bug when shrinking the container in Yarn service (#3504)"
This reverts commit dd6d910a7e7a90d15258c6c77ebe626ae6d573f9.

Revert "[GOBBLIN-1620]Make yarn container allocation group by helix tag (#3487)"
This reverts commit 3e877951c284ccd68be3634522f9fc2c3d39f81a.

4 months ago[GOBBLIN-1641] Add meter for sla exceeded flows (#3502)
William Lo [Wed, 11 May 2022 20:48:40 +0000 (13:48 -0700)] 
[GOBBLIN-1641] Add meter for sla exceeded flows (#3502)

* Add meter for sla exceeded flows

* fix tests

* Fix test nullpointer

* Address review + augment tests

4 months agoGOBBLIN-1644 (#3506)
Matthew Ho [Wed, 11 May 2022 18:36:12 +0000 (11:36 -0700)] 
GOBBLIN-1644 (#3506)

Log assigned participant when helix participant check fails

4 months ago[GOBBLIN-1645]Change the prefix of dagManager heartbeat to make it consistent with...
Zihan Li [Wed, 11 May 2022 17:45:27 +0000 (10:45 -0700)] 
[GOBBLIN-1645]Change the prefix of dagManager heartbeat to make it consistent with other metrics (#3505)

* address comments

* use connectionmanager when httpclient is not cloesable

* [GOBBLIN-1645]Change the prefix of dagManager heartbeat to make it consistent with other metrics

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
4 months agoFix bug when shrinking the container in Yarn service (#3504)
Hanghang Nate Liu [Tue, 10 May 2022 00:12:41 +0000 (17:12 -0700)] 
Fix bug when shrinking the container in Yarn service (#3504)

* Fix bug when shrinking the container in Yarn service

update unit test

update requestedContainerCountMap

* address comment

4 months ago[GOBBLIN-1637] Add writer, operation, and partition info to failed metadata writer...
Jack Moseley [Thu, 5 May 2022 17:27:42 +0000 (10:27 -0700)] 
[GOBBLIN-1637] Add writer, operation, and partition info to failed metadata writer events (#3498)

* Add writer, operation, and partition info to failed metadata writer events

* Add partitionKeys to failure event

* Add all failed writers to event

4 months ago[GOBBLIN-1638] Fix unbalanced running count metrics due to Azkaban failures (#3499)
William Lo [Thu, 5 May 2022 00:30:19 +0000 (17:30 -0700)] 
[GOBBLIN-1638] Fix unbalanced running count metrics due to Azkaban failures (#3499)

* Fix unbalanced running count metrics due to Azkaban failures

* Fix tests

* Address comments

* Rename test

* Update comment with review

4 months ago[GOBBLIN-1634] Add retries on flow sla kills (#3495)
William Lo [Thu, 28 Apr 2022 22:24:10 +0000 (15:24 -0700)] 
[GOBBLIN-1634] Add retries on flow sla kills (#3495)

* Add retries on flow sla kills

* Address review

* Address review comment

4 months ago[GOBBLIN-1620]Make yarn container allocation group by helix tag (#3487)
Hanghang Nate Liu [Thu, 28 Apr 2022 19:46:35 +0000 (12:46 -0700)] 
[GOBBLIN-1620]Make yarn container allocation group by helix tag (#3487)

* make yarn service aware of helix tag and resource requirment for each workflow so that containers will be assigned to correct task

update test cases

update helix instance tag during task runner initiation

update logs

update test case

* remove lib not used, add test case

address comments

* update test cases

* remove container min and max config

4 months ago[GOBBLIN-1636] Close DatasetCleaner after clean task (#3497)
Zihan Li [Thu, 28 Apr 2022 00:49:55 +0000 (17:49 -0700)] 
[GOBBLIN-1636] Close DatasetCleaner after clean task (#3497)

* address comments

* use connectionmanager when httpclient is not cloesable

* [GOBBLIN-1636] Close DatasetCleaner after clean task

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
4 months ago[GOBBLIN-1635] Avoid loading env configuration when using config store to improve...
Zihan Li [Wed, 27 Apr 2022 21:28:43 +0000 (14:28 -0700)] 
[GOBBLIN-1635] Avoid loading env configuration when using config store to improve the performance (#3496)

* address comments

* use connectionmanager when httpclient is not cloesable

* [GOBBLIN-1631]Emit heartbeat for dagManagerThread

* [GOBBLIN-1635] Avoid loading env configuration when using config store to improve the performance

* [GOBBLIN-1630] Remove flow level metrics for adhoc flows (#3491)

* Remove emitting metrics for adhoc flows in dagmanager and orchestrator

* Add tests

* Fix tests

* Address comments

* Improve test by validating gauge value

* use data node aliases to figure out data node names before using DMAS (#3493)

* [GOBBLIN-1619] WriterUtils.mkdirsWithRecursivePermission contains race condition and puts unnecessary load on filesystem (#3477)

* [GOBBLIN-1619] Fix race cond. in writerutil mkdirs

* writer util mkdirs previously had race condition when multiple processes
try to create the same parent directory. This causes incorrect
FileNotFoundException
* new implementation does not change the behavior

* Test coverage for retry config

* Wait for file to exist via retry cfg before setting perms

* use user supplied props to create FileSystem in DatasetCleanerTask (#3483)

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
Co-authored-by: William Lo <lo.william97@gmail.com>
Co-authored-by: Arjun Singh Bora <abora@linkedin.com>
Co-authored-by: Matthew Ho <homatt999@gmail.com>
5 months agouse user supplied props to create FileSystem in DatasetCleanerTask (#3483)
Arjun Singh Bora [Tue, 26 Apr 2022 18:00:36 +0000 (11:00 -0700)] 
use user supplied props to create FileSystem in DatasetCleanerTask (#3483)

5 months ago[GOBBLIN-1619] WriterUtils.mkdirsWithRecursivePermission contains race condition...
Matthew Ho [Thu, 21 Apr 2022 19:21:02 +0000 (12:21 -0700)] 
[GOBBLIN-1619] WriterUtils.mkdirsWithRecursivePermission contains race condition and puts unnecessary load on filesystem (#3477)

* [GOBBLIN-1619] Fix race cond. in writerutil mkdirs

* writer util mkdirs previously had race condition when multiple processes
try to create the same parent directory. This causes incorrect
FileNotFoundException
* new implementation does not change the behavior

* Test coverage for retry config

* Wait for file to exist via retry cfg before setting perms

5 months agouse data node aliases to figure out data node names before using DMAS (#3493)
Arjun Singh Bora [Tue, 19 Apr 2022 22:02:11 +0000 (15:02 -0700)] 
use data node aliases to figure out data node names before using DMAS (#3493)

5 months ago[GOBBLIN-1630] Remove flow level metrics for adhoc flows (#3491)
William Lo [Mon, 18 Apr 2022 23:37:30 +0000 (16:37 -0700)] 
[GOBBLIN-1630] Remove flow level metrics for adhoc flows (#3491)

* Remove emitting metrics for adhoc flows in dagmanager and orchestrator

* Add tests

* Fix tests

* Address comments

* Improve test by validating gauge value

5 months ago[GOBBLIN-1631]Emit heartbeat for dagManagerThread (#3492)
Zihan Li [Wed, 13 Apr 2022 22:17:00 +0000 (15:17 -0700)] 
[GOBBLIN-1631]Emit heartbeat for dagManagerThread (#3492)

* address comments

* use connectionmanager when httpclient is not cloesable

* [GOBBLIN-1631]Emit heartbeat for dagManagerThread

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
5 months ago[GOBBLIN-1624] Refactor quota management, fix various bugs in accounting of running...
William Lo [Mon, 11 Apr 2022 21:42:12 +0000 (14:42 -0700)] 
[GOBBLIN-1624] Refactor quota management, fix various bugs in accounting of running … (#3481)

* Refactor quota management, fix various bugs in accounting of running jobs

* Add javadocs

* Address comments, add metric counts to tests

* Address scenario on startup where quota is decreased

* rename onstartup to onInit

5 months ago[GOBBLIN-1613] Add metadata writers field to GMCE schema (#3490)
Matthew Ho [Mon, 11 Apr 2022 20:53:58 +0000 (13:53 -0700)] 
[GOBBLIN-1613] Add metadata writers field to GMCE schema (#3490)

* [GOBBLIN-1613] Add metadata writers field to GMCE schema

* generalize dataset, platform, and table naming
* more test coverage for GMCE writer

* Reverting data.json syntax change.
- Avro doesn't follow regular json syntax

* Clean up random semi colon

* Improve naming

5 months agoUpdate README.md
Zihan Li [Thu, 7 Apr 2022 21:17:42 +0000 (14:17 -0700)] 
Update README.md

5 months ago[GOBBLIN-1629] Make GobblinMCEWriter be able to catch error when calculating hive...
Zihan Li [Tue, 29 Mar 2022 22:39:03 +0000 (15:39 -0700)] 
[GOBBLIN-1629] Make GobblinMCEWriter be able to catch error when calculating hive specs (#3489)

* address comments

* use connectionmanager when httpclient is not cloesable

* [GOBBLIN-1629] Make GobblinMCEWriter be able to catch error when calculating hive specs

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
5 months agoAdd/fix some fields of MetadataWriterFailureEvent (#3485)
Jack Moseley [Tue, 29 Mar 2022 21:16:32 +0000 (14:16 -0700)] 
Add/fix some fields of MetadataWriterFailureEvent (#3485)

5 months ago[GOBBLIN-1627] provide option to convert datanodes names (#3484)
Arjun Singh Bora [Mon, 28 Mar 2022 17:43:00 +0000 (10:43 -0700)] 
[GOBBLIN-1627] provide option to convert datanodes names (#3484)

* provide option to convert datanodes names
address review comments

* address review comments

* address review comments

* address review comment

5 months agoAdd coverage for edge cases when table paths do not exist, check parents (#3482)
William Lo [Mon, 28 Mar 2022 17:42:23 +0000 (10:42 -0700)] 
Add coverage for edge cases when table paths do not exist, check parents (#3482)

5 months ago[GOBBLIN-1616] Add close connection logic in salseforceSource (#3486)
Zihan Li [Mon, 28 Mar 2022 17:39:44 +0000 (10:39 -0700)] 
[GOBBLIN-1616] Add close connection logic in salseforceSource (#3486)

* [GOBBLIN-1616] Add close connection logic in salseforceSource

* remove unused import

* address comments

* use connectionmanager when httpclient is not cloesable

Co-authored-by: Zihan Li <zihli@zihli-mn2.linkedin.biz>
6 months ago[GOBBLIN-1621] Make HelixRetriggeringJobCallable emit job skip event when job is...
Zihan Li [Thu, 17 Mar 2022 21:45:24 +0000 (14:45 -0700)] 
[GOBBLIN-1621] Make HelixRetriggeringJobCallable emit job skip event when job is dropped due to previous job is running (#3478)

* [GOBBLIN-1621] Make HelixRetriggeringJobCallable emit job skip event when job is dropped due to previous job is running

* address typo

* address comments

* fix checkStyle

* address comments

6 months ago[GOBBLIN-1623] Fix NPE when try to close RestApiConnector (#3480)
Zihan Li [Wed, 16 Mar 2022 20:36:26 +0000 (13:36 -0700)] 
[GOBBLIN-1623] Fix NPE when try to close RestApiConnector (#3480)

* Fix NPE when try to close RestApiConnector

* fix typo

6 months agoClear bad mysql packages from cache in CI/CD machines (#3479)
William Lo [Tue, 15 Mar 2022 23:24:38 +0000 (16:24 -0700)] 
Clear bad mysql packages from cache in CI/CD machines (#3479)

6 months ago[GOBBLIN-1617] pass configurations to some HadoopUtils APIs (#3475)
Arjun Singh Bora [Mon, 7 Mar 2022 19:49:29 +0000 (11:49 -0800)] 
[GOBBLIN-1617] pass configurations to some HadoopUtils APIs (#3475)

* pass configurations to some HadoopUtils APIs

* address review comments

6 months ago[GOBBLIN-1616] Make RestApiConnector be able to close the connection finally (#3474)
Zihan Li [Thu, 3 Mar 2022 02:06:33 +0000 (18:06 -0800)] 
[GOBBLIN-1616] Make RestApiConnector be able to close the connection finally (#3474)

* [GOBBLIN-1616] Make RestliAPIConnector be able to close the connection finally

* address comments

6 months agoadd config to set log level for any class (#3473)
Arjun Singh Bora [Wed, 2 Mar 2022 05:37:22 +0000 (21:37 -0800)] 
add config to set log level for any class (#3473)

7 months agoFix bug where partitioned tables would always return the wrong equality in paths...
William Lo [Thu, 24 Feb 2022 22:35:51 +0000 (14:35 -0800)] 
Fix bug where partitioned tables would always return the wrong equality in paths (#3472)

7 months ago[GOBBLIN-1602] Change hive table location and partition check to validate using FS...
William Lo [Thu, 17 Feb 2022 18:42:26 +0000 (10:42 -0800)] 
[GOBBLIN-1602] Change hive table location and partition check to validate using FS r… (#3459)

* Change hive table location and partition check to validate using FS resolvePath to resolve logical paths

* Add tests for Unpartitioned file set

* Address review, add additional throw if locations mismatch for partition location validation

* Fix checkstyles again

* allow partial success policy for workunits

7 months agoDon't flush on change_property operation (#3467)
Jack Moseley [Tue, 15 Feb 2022 03:59:37 +0000 (19:59 -0800)] 
Don't flush on change_property operation (#3467)

7 months agoFix case where error GTE is incorrectly sent from MCE writer (#3466)
Jack Moseley [Sat, 12 Feb 2022 01:35:01 +0000 (17:35 -0800)] 
Fix case where error GTE is incorrectly sent from MCE writer (#3466)

7 months agopartial rollback of PR 3464 (#3465)
Arjun Singh Bora [Fri, 11 Feb 2022 16:15:13 +0000 (08:15 -0800)] 
partial rollback of PR 3464 (#3465)

7 months ago[GOBBLIN-1604] Throw exception if there are no allocated requests due to lack of...
William Lo [Thu, 10 Feb 2022 19:03:08 +0000 (11:03 -0800)] 
[GOBBLIN-1604] Throw exception if there are no allocated requests due to lack of res… (#3461)

* Throw exception if there are no allocated requests due to lack of resources

* Fix typo

7 months ago[GOBBLIN-1603] Throws error if configured when encountering an IO exception while...
William Lo [Tue, 8 Feb 2022 21:22:34 +0000 (13:22 -0800)] 
[GOBBLIN-1603] Throws error if configured when encountering an IO exception while co… (#3460)

* Throws error if configured when encountering an IO exception while collecting copy entities

* Fix checkstyle

7 months ago[GOBBLIN-1606] change DEFAULT_GOBBLIN_COPY_CHECK_FILESIZE value (#3464)
Arjun Singh Bora [Tue, 8 Feb 2022 19:11:14 +0000 (11:11 -0800)] 
[GOBBLIN-1606] change DEFAULT_GOBBLIN_COPY_CHECK_FILESIZE value (#3464)

* change DEFAULT_GOBBLIN_COPY_CHECK_FILESIZE value
do not reuse the metric

* fix unit tests

* address review comments

7 months agoUpgraded dropwizard metrics library version from 3.2.3 -> 4.1.2 and added a new wrapp...
Abhishek Nath [Mon, 7 Feb 2022 23:36:45 +0000 (15:36 -0800)] 
Upgraded dropwizard metrics library version from 3.2.3 -> 4.1.2 and added a new wrapper class on dropwizard Timer.Context class to handle the code compatibility as the newer version of this class implements AutoClosable instead of Closable. (#3463)

7 months ago[GOBBLIN-1605] Fix mysql ubuntu download 404 not found for Github Actions CI/CD ...
William Lo [Mon, 7 Feb 2022 18:51:21 +0000 (10:51 -0800)] 
[GOBBLIN-1605] Fix mysql ubuntu download 404 not found for Github Actions CI/CD (#3462)

* fix mysql ubuntu download 404 not found for Github Actions CI/CD

* use fix missing flag instead of updating every package

* use apt-get update

7 months ago[GOBBLIN-1601] implement ChangePermissionCommitStep (#3457)
Arjun Singh Bora [Mon, 31 Jan 2022 18:42:01 +0000 (10:42 -0800)] 
[GOBBLIN-1601] implement ChangePermissionCommitStep (#3457)

* implement ChangePermissionCommitStep
add a configuration to match permission of ancestor directories permissions in source and destination

* address review comments

8 months ago[GOBBLIN-1598]Fix metrics already exist issue in dag manager (#3454)
Zihan Li [Tue, 25 Jan 2022 02:58:27 +0000 (18:58 -0800)] 
[GOBBLIN-1598]Fix metrics already exist issue in dag manager (#3454)

* [GOBBLIN-1598]Fix metrics already exist issue in dag manager

* fix typo

* address comments

8 months ago[GOBBLIN-1597] Add error handling in dagmanager to continue if dag fails to process...
William Lo [Wed, 19 Jan 2022 19:51:25 +0000 (11:51 -0800)] 
[GOBBLIN-1597] Add error handling in dagmanager to continue if dag fails to process,… (#3452)

* Add error handling in dagmanager to continue if dag fails to process, make Azkaban client retry on timeouts

* Addressed comments

8 months agoGOBBLIN-1579 Fail job on hive existing target table location mismatch (#3433)
vgnanasekaran [Wed, 19 Jan 2022 19:46:00 +0000 (11:46 -0800)] 
GOBBLIN-1579 Fail job on hive existing target table location mismatch (#3433)

Co-authored-by: Gnanasekaran <vgnanasekaran@paypal.com>
8 months ago[GOBBLIN-1596] Ignore already exists exception if the table has already been created...
William Lo [Fri, 14 Jan 2022 01:05:48 +0000 (17:05 -0800)] 
[GOBBLIN-1596] Ignore already exists exception if the table has already been created… (#3451)

* Ignore already exists exception if the table has already been created by another thread or job entirely

* Address review + add concurrency fix

8 months ago[GOBBLIn-1595]Fix the dead lock during hive registration (#3450)
Zihan Li [Thu, 13 Jan 2022 19:42:02 +0000 (11:42 -0800)] 
[GOBBLIn-1595]Fix the dead lock during hive registration (#3450)