samza.git
2 days agoMinor fixes to KeyValueStore and RocksDBKeyValueStore master
Prateek Maheshwari [Thu, 19 Oct 2017 18:36:02 +0000 (11:36 -0700)] 
Minor fixes to KeyValueStore and RocksDBKeyValueStore

1. Replaced extension class in KeyValueStore with default methods.
2. Fixed formatting in RocksDBKeyValueStore#openDB.
3. Now logs original RocksDBException on errors opening the db. Other minor log message cleanup.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #332 from prateekm/store-fixes

2 days agoSAMZA-1466: Flaky test: TestRocksDbKeyValueStore suite
Prateek Maheshwari [Thu, 19 Oct 2017 18:01:56 +0000 (11:01 -0700)] 
SAMZA-1466: Flaky test: TestRocksDbKeyValueStore suite

TestRocksDbKeyValueStore tests are failing intermittently with errors like:
```
testFlush FAILED
    org.apache.samza.SamzaException: Error opening RocksDB store dbStore at location /tmp, received the following exception from RocksDB org.rocksdb.RocksDBException: "
        at org.apache.samza.storage.kv.RocksDbKeyValueStore$.openDB(RocksDbKeyValueStore.scala:99)
        at org.apache.samza.storage.kv.TestRocksDbKeyValueStore.testFlush(TestRocksDbKeyValueStore.scala:73)
```

These happen intermittently because:
1. RocksDB throws an exception if open is called on a store that's already open.
2. A new unit test was added in PR #327 that didn't close the store.
3. Other tests in this class use the same store directory (System.getProperty("java.io.tmpdir")) and run concurrently. Any test that runs after the one in 2 above fails.

Even when we log the RocksDBException (fixed in pr #332), the exception message is malformed due to a bug in RocksDB: https://github.com/facebook/rocksdb/issues/1688. This is fixed in the latest RocksDB version (verified with 5.8.0), so the messages should be more meaningful after an upgrade. Fixed stack trace will say something like:
```
Caused by: org.rocksdb.RocksDBException: While lock file: /var/folders/1b/4nqqvf4s27sby0frjr0q5t_h0004hp/T/LOCK: No locks available
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:231)
at org.apache.samza.storage.kv.RocksDbKeyValueStore$.openDB(RocksDbKeyValueStore.scala:70)
```

Fixing just the test in the mean time.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #333 from prateekm/store-test-fix

3 days agoSAMZA-1465: Performance regression for KafkaCheckpointManager
Jacob Maes [Wed, 18 Oct 2017 22:30:34 +0000 (15:30 -0700)] 
SAMZA-1465: Performance regression for KafkaCheckpointManager

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>, Boris Shkolnik <boryas@apache.org>, Fred Ji <fredji97@yahoo.com>

Closes #331 from jmakes/master

5 days agoSAMZA-1461; Expose RocksDB properties as metrics
Janek Lasocki-Biczysko [Mon, 16 Oct 2017 23:06:28 +0000 (16:06 -0700)] 
SAMZA-1461; Expose RocksDB properties as metrics

Automatically build gauges for RocksDB properties via configuration:
`stores.<storename>.rocksdb.telemetry.list=<rocksDbProperty1>, <rocksDbProperty1>`

Author: Janek Lasocki-Biczysko <janek.lasocki-biczysko@skyscanner.net>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #327 from janeklb/jlb_rocksDBPropertiesAsMetrics

5 days agoSAMZA-1463 disable some flaky tests on hdfs system
Hai Lu [Mon, 16 Oct 2017 18:26:45 +0000 (11:26 -0700)] 
SAMZA-1463 disable some flaky tests on hdfs system

disable some flaky tests on hdfs system until future investigation

Author: Hai Lu <halu@linkedin.com>

Reviewers: Xinyu Liu <xinyuliu.us@gmail.com>

Closes #329 from lhaiesp/master

8 days agoChangeLog should not require KafkaSystemAdmin
Boris S [Fri, 13 Oct 2017 22:04:22 +0000 (15:04 -0700)] 
ChangeLog should not require KafkaSystemAdmin

Author: Boris S <boryas@apache.org>

Reviewers: Xinyu Liu <xinyuliu.us@gmail.com>

Closes #325 from sborya/ChangeLogRequireKafkaSystemAdmin

8 days agoSAMZA-1424 SAMZA-1425 SAMZA-1426; Support serdes and persistent state for windows
Jagadish [Fri, 13 Oct 2017 20:22:56 +0000 (13:22 -0700)] 
SAMZA-1424 SAMZA-1425 SAMZA-1426; Support serdes and persistent state for windows

Notable changes:
* Made windows durable with support for persistent recoverable stores
* New storage format to support multiple messages in windows (the previous storage format
of storing the entire message-list as a value in the store incurs significant serde overhead)
* Wire `TimeSeriesStore` with the WindowOperator implementation.

Testing:
* Existing unit tests and integration tests pass with serdes wired-up
* Will follow-up with a PR for hello-samza soon.

Note: Majority of changes are in `samza-core/src/main/java/org/apache/samza/operators/impl/WindowOperatorImpl.java` and github collapsed the diff.

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Prateek Maheshwari<pmaheshw@linkedin.com>

Closes #321 from vjagadish1989/window-operator-serde

10 days agoSAMZA-1447; Swapping out CLI JobStatusProvider for REST based implementation in samza...
Abhishek Shivanna [Wed, 11 Oct 2017 20:57:20 +0000 (13:57 -0700)] 
SAMZA-1447; Swapping out CLI JobStatusProvider for REST based implementation in samza-rest

Removing the YarnCliJobStatusProvider since forking a new shell for every request on the JobsResource endpoint is resource intensive.

Author: Abhishek Shivanna <ashivanna@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #317 from abhishekshivanna/samza-rest-oom

10 days agoSAMZA-1451: Disable integration tests conditionally in build.
Shanthoosh Venkataraman [Wed, 11 Oct 2017 20:56:24 +0000 (13:56 -0700)] 
SAMZA-1451: Disable integration tests conditionally in build.

Remove runIntegrationTests gradle property added as a part of SAMZA-1355 and introduce skipIntegrationTests property(which inverts it).
If skipIntegrationTests gradle project property is enabled, execution of all tests in samza-test project will be skipped from the build.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #319 from shanthoosh/skip_integration_tests

10 days agoSAMZA-1409; add missing parts and make guideline clearer for docs/README.md
Fred Ji [Wed, 11 Oct 2017 19:09:33 +0000 (12:09 -0700)] 
SAMZA-1409; add missing parts and make guideline clearer for docs/README.md

Author: Fred Ji <haifeng.ji@gmail.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #288 from fredji97/master_docs_readme

10 days agoSAMZA-1453; Update README with Travis-CI badge.
Daniel Nishimura [Wed, 11 Oct 2017 19:03:28 +0000 (12:03 -0700)] 
SAMZA-1453; Update README with Travis-CI badge.

Added Travis-CI to README.md file.

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #323 from dnishimura/samza-1453-readme-travisci

10 days agoSAMZA-1452: Clean up interrupted thread bugs
Nacho Solis [Wed, 11 Oct 2017 14:49:52 +0000 (07:49 -0700)] 
SAMZA-1452: Clean up interrupted thread bugs

Call Thread.currentThread().interrupt(); when capturing InterruptedException

Author: Nacho Solis <nsolis@linkedin.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>,Jagadish <jvenkatr@linkedin.com>

Closes #322 from isolis/cleancodebugs

12 days agoSAMZA-1292: Merge operator can be no-op when there are no streams to merge
Prateek Maheshwari [Mon, 9 Oct 2017 21:48:05 +0000 (14:48 -0700)] 
SAMZA-1292: Merge operator can be no-op when there are no streams to merge

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #320 from prateekm/small-fixes

2 weeks agoSAMZA-272, SAMZA-1440, SAMZA-1269: Fixed thread interrupt tests in TestExponentialSle...
Prateek Maheshwari [Fri, 6 Oct 2017 23:32:06 +0000 (16:32 -0700)] 
SAMZA-272, SAMZA-1440, SAMZA-1269: Fixed thread interrupt tests in TestExponentialSleepStrategy.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #318 from prateekm/ess-test-fix

2 weeks agoSAMZA-1435: Changed samza-api Serde implementations from Scala to Java.
Prateek Maheshwari [Fri, 6 Oct 2017 17:47:37 +0000 (10:47 -0700)] 
SAMZA-1435: Changed samza-api Serde implementations from Scala to Java.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #316 from prateekm/java-serdes

2 weeks agoSAMZA-1442: use systemAdmin.validateStream in KafkaCheckpointManager
Boris S [Thu, 5 Oct 2017 22:36:37 +0000 (15:36 -0700)] 
SAMZA-1442: use systemAdmin.validateStream in KafkaCheckpointManager

Author: Boris S <boryas@apache.org>

Reviewers: Jacob Maes <jacob.maes@gmail.com>

Closes #314 from sborya/CheckpointTopicValidation

2 weeks agoSAMZA-1444: Removed circular dependency b/w samza-core test and samza-kv-rocksdb...
Prateek Maheshwari [Thu, 5 Oct 2017 17:59:39 +0000 (10:59 -0700)] 
SAMZA-1444: Removed circular dependency b/w samza-core test and samza-kv-rocksdb test

Also added an implementation for KVSerde.
vjagadish1989, thanks for the TestInMemoryStore implementation!

The only usage of InternalInMemoryStore is now in WindowOperatorImpl, which will be removed as part of store wiring for the window operator.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>, Xinyu Liu <xiliu@linkedin.com>

Closes #315 from prateekm/test-store

2 weeks agoSAMZA-1429: Add callback success/failure metrics to async tasks
Daniel Nishimura [Thu, 5 Oct 2017 16:18:37 +0000 (09:18 -0700)] 
SAMZA-1429: Add callback success/failure metrics to async tasks

jmakes & nickpan47 please take a look when you get the chance.

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #306 from dnishimura/samza-1429-async-metrics

2 weeks agoSAMZA-1056: Added wiring for High Level API state stores, their serdes and changelogs.
Prateek Maheshwari [Wed, 4 Oct 2017 22:37:50 +0000 (15:37 -0700)] 
SAMZA-1056: Added wiring for High Level API state stores, their serdes and changelogs.

Provided join operator access to durable state stores.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish Venkatraman <jagadish@apache.org>

Closes #309 from prateekm/operator-store-wiring

2 weeks agoSAMZA-1109: Updated High Level API serde impl with Yi's feedback
Prateek Maheshwari [Wed, 4 Oct 2017 22:28:37 +0000 (15:28 -0700)] 
SAMZA-1109: Updated High Level API serde impl with Yi's feedback

nickpan47 for review.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: "Jagadish Venkatraman <jvenkatr@linkedin.com>"

Closes #310 from prateekm/serde-updates

2 weeks agoSAMZA-1439: Address late review feedback from SAMZA-1434
Jacob Maes [Wed, 4 Oct 2017 21:02:26 +0000 (14:02 -0700)] 
SAMZA-1439: Address late review feedback from SAMZA-1434

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Xinyu Liu <xiliu@linkedin.com>

Closes #313 from jmakes/samza-1439

2 weeks agoSAMZA-1427; use systemFactory in checkpoint manager.
Boris S [Wed, 4 Oct 2017 17:48:58 +0000 (10:48 -0700)] 
SAMZA-1427; use systemFactory in checkpoint manager.

Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>
Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Jagadish Venkatraman <jagadish@apache.org>, Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #299 from sborya/LiKafkaClient

2 weeks agoSAMZA-1431: Enable SonarCloud integration with Travis-CI.
Daniel Nishimura [Wed, 4 Oct 2017 17:39:42 +0000 (10:39 -0700)] 
SAMZA-1431: Enable SonarCloud integration with Travis-CI.

These are modifications needed to integrate Travis-CI with SonarCloud. Also, the JaCoCo plugin is enabled in gradle for code coverage reporting to SonarCloud.

For reference, here is the request to Apache Infra to enable Travis-CI: https://issues.apache.org/jira/browse/INFRA-15132

nickpan47 isolis

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #305 from dnishimura/samza-1431-sonarcloud

2 weeks agoAdded Prateek's code signing public key to KEYS
Prateek Maheshwari [Wed, 4 Oct 2017 01:00:04 +0000 (18:00 -0700)] 
Added Prateek's code signing public key to KEYS

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #312 from prateekm/commiters-update

2 weeks agoAdded self to committers list.
Prateek Maheshwari [Tue, 3 Oct 2017 23:06:33 +0000 (16:06 -0700)] 
Added self to committers list.

vjagadish1989

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #311 from prateekm/commiters-update

2 weeks agoMerge branch '0.14.0'
Xinyu Liu [Tue, 3 Oct 2017 22:10:13 +0000 (15:10 -0700)] 
Merge branch '0.14.0'

2 weeks agoMerge branch 'master' into 0.14.0
Xinyu Liu [Tue, 3 Oct 2017 22:09:41 +0000 (15:09 -0700)] 
Merge branch 'master' into 0.14.0

2 weeks agoadded Boris as a committer
Boris Shkolnik [Tue, 3 Oct 2017 07:01:16 +0000 (00:01 -0700)] 
added Boris as a committer

Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Xinyu Liu <xinyuliu.us@gmail.com>

Closes #304 from sborya/addCommitter

2 weeks agoSAMZA-1423; Follow-up to PR:303 - Remove version constant from TimeSeriesKey
Jagadish [Mon, 2 Oct 2017 19:54:22 +0000 (12:54 -0700)] 
SAMZA-1423; Follow-up to PR:303 -  Remove version constant from TimeSeriesKey

2 weeks agoSAMZA-1423; Implement time series storage for joins and windows
Jagadish [Mon, 2 Oct 2017 19:23:24 +0000 (12:23 -0700)] 
SAMZA-1423; Implement time series storage for joins and windows

Notable changes:
* New interface for storing and retrieving time series data.
* New store and serde implementation for use in windows and joins

Pending:
* Documentation, and minor clean-ups
* Wire-up of stores from ExecutionPlanner
* Usage of the store to implement various windows and joins

Author: Jagadish <jagadish@apache.org>
Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>,Xinyu Liu<xiliu@linkedin.com>

Closes #303 from vjagadish1989/window-store

3 weeks agoSAMZA-1434: Fix issues found in Hadoop 0.14.0
Xinyu Liu [Fri, 29 Sep 2017 22:05:55 +0000 (15:05 -0700)] 
SAMZA-1434: Fix issues found in Hadoop

Fix the following bugs found when running Samza on hadoop:

1. Hdfs allows output partitions to be 0 (empty folder)
2. Add null check for the changelog topic generation
3. Call getStreamSpec() instead of using streamSpec member in StreamEdge. This is due to getStreamSpec will do more transformation.
4. Bound the auto-generated intermediate topic partition by a certain count (256).

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jagadish Venkatraman <jagadish@apache.org>

Closes #307 from xinyuiscool/SAMZA-1434

3 weeks agoSAMZA-1386: Inline End-of-stream and Watermark logic inside OperatorImpl
Xinyu Liu [Fri, 29 Sep 2017 00:11:28 +0000 (17:11 -0700)] 
SAMZA-1386: Inline End-of-stream and Watermark logic inside OperatorImpl

This patch contains the following changes:
1. Refactor watermark and end-of-stream logic. The aggregation/handling has been moved from WatermarkManager/EndOfStreamManager to be inline inside OperatorImpl. This is for keeping the logic in one place.
2. Now subclass of OperatorImpl will override handleWatermark() to do its specific handling, such as fire trigger.
3. Add emitWatermark() in OperatorImpl so subclass can call it to emit watermark upon receiving a message or watermark.

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>
Author: Xinyu Liu <xiliu@xiliu-ld.linkedin.biz>

Reviewers: Yi Pan <nickpan47@gmail.com>

Closes #277 from xinyuiscool/SAMZA-1386

3 weeks agoSAMZA-1109; Allow setting Serdes for fluent API operators in code
Prateek Maheshwari [Thu, 28 Sep 2017 00:08:04 +0000 (17:08 -0700)] 
SAMZA-1109; Allow setting Serdes for fluent API operators in code

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>, Jacob Maes <jmakes@apache.org>

Closes #293 from prateekm/serde-instance

3 weeks agoSAMZA-1417: Clear and recreate intermediate and metadata streams for batch processing
Xinyu Liu [Mon, 25 Sep 2017 21:52:42 +0000 (14:52 -0700)] 
SAMZA-1417: Clear and recreate intermediate and metadata streams for batch processing

For each run of a batch application, we need to clear the internal streams from the previous run and recreate new ones. This patch introduces the following:
1) bounded flag in StreamSpec
2) app.mode (BATCH/STREAM) in the application config. An application is a batch app iff all the input streams are bounded.
3) app.runId and use it to generate the internal topics for each run.

run.id generation is not addressed in this pr. There will be another patch to resolve it for both yarn and standalone. For now, this patch only works for yarn.

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jake Maes <jmakes@apache.org>

Closes #297 from xinyuiscool/SAMZA-1417

4 weeks agoSAMZA-1428: Fix scala 2.10 compilation issue with java 8 interface st…
Jacob Maes [Thu, 21 Sep 2017 18:57:31 +0000 (11:57 -0700)] 
SAMZA-1428: Fix scala 2.10 compilation issue with java 8 interface st…

…atic methods

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Boris Shkolnik <boryas@apache.org>

Closes #300 from jmakes/samza-1428

4 weeks agoSAMZA-1392: KafkaSystemProducer performance and correctness with conc…
Jacob Maes [Wed, 20 Sep 2017 19:40:47 +0000 (12:40 -0700)] 
SAMZA-1392: KafkaSystemProducer performance and correctness with conc…

…urrent sends and flushes

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>, Yi Pan (Data Infrastructure) <nickpan47@gmail.com>

Closes #272 from jmakes/samza-1392

4 weeks agoSAMZA-1414: SamzaContainer log statements incorrectly expect containerId is a number
Daniel Nishimura [Tue, 19 Sep 2017 21:43:55 +0000 (14:43 -0700)] 
SAMZA-1414: SamzaContainer log statements incorrectly expect containerId is a number

Correctly log ID as a string in SamzaContainer.
jmakes

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #298 from dnishimura/samza-1414

4 weeks agoSAMZA-1416: Better logging around the exception where class loading failed in initial...
Daniel Nishimura [Tue, 19 Sep 2017 21:26:38 +0000 (14:26 -0700)] 
SAMZA-1416: Better logging around the exception where class loading failed in initializing the SystemFactory for a input/output system

Also added test coverage for the Util.getObj method.
nickpan47 jmakes

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #296 from dnishimura/samza-1416

4 weeks agoSAMZA-1419; Refactor StreamProcessor tests to remove Thread.sleep()
Bharath Kumarasubramanian [Mon, 18 Sep 2017 17:32:32 +0000 (10:32 -0700)] 
SAMZA-1419; Refactor StreamProcessor tests to remove Thread.sleep()

Refactoring the tests to enable parallel execution. Also, removed unnecessary sleeping during execution. This PR will serve as a prototype to get some feedback on this testing pattern which eliminates state sharing between the tests and opens up the avenue for parallel execution.

This change brings down the test time for these 4 tests to **14 sec** compared to **44 secs**.

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #291 from bharathkk/master

5 weeks agoSAMZA-1406: Fix potential orphaned containers problem in stand alone
Shanthoosh Venkataraman [Fri, 15 Sep 2017 04:17:36 +0000 (21:17 -0700)] 
SAMZA-1406: Fix potential orphaned containers problem in stand alone

5 weeks agoFix some integration tests after merging from master
Xinyu Liu [Tue, 12 Sep 2017 21:16:27 +0000 (14:16 -0700)] 
Fix some integration tests after merging from master

5 weeks agoMerge branch 'master' into 0.14.0
Xinyu Liu [Tue, 12 Sep 2017 18:32:36 +0000 (11:32 -0700)] 
Merge branch 'master' into 0.14.0

6 weeks agoSAMZA-1415: Add clearStream API in SystemAdmin and remove deprecated APIs
Xinyu Liu [Thu, 7 Sep 2017 23:49:20 +0000 (16:49 -0700)] 
SAMZA-1415: Add clearStream API in SystemAdmin and remove deprecated APIs

The patch does the following:

1) add clearStream() APi in SystemAdmin. Currently it's only supported in Kafka with broker configuring delete.topic.enable=true.

2) remove the deprecated APIs including createChangeLogStream(), validateChangelogStream() and createCoordinatorStream().

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jake Maes <jacob.maes@gmail.com>

Closes #292 from xinyuiscool/SAMZA-1415

7 weeks agoSAMZA-1413: Config for CoordinationUtilsFactory class name.
Boris Shkolnik [Wed, 30 Aug 2017 23:56:28 +0000 (16:56 -0700)] 
SAMZA-1413: Config for CoordinationUtilsFactory class name.

Author: Boris Shkolnik <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@bshkolni-ld1.linkedin.biz>

Reviewers: Navina Ramesh <navina@apache.org>, Fred Ji <fji@linkedin.com>

Closes #290 from sborya/configUtilsFactoryClassName

7 weeks agoSAMZA-1385: Coordination utils factory with distributed lock
Boris Shkolnik [Tue, 29 Aug 2017 22:37:17 +0000 (15:37 -0700)] 
SAMZA-1385: Coordination utils factory with distributed lock

this PR includes some changes from another PR. I will re-merge it again, after the other PR is in.

Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #284 from sborya/CoordinationUtilsFactory_withDistributedLock

7 weeks agoIntroduces CoordinationUtilsFactory to create different implementations of Coordinati...
Boris Shkolnik [Tue, 29 Aug 2017 20:23:47 +0000 (13:23 -0700)] 
Introduces CoordinationUtilsFactory to create different implementations of CoordinationUtils

Some refactoring and cleanup.

Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #282 from sborya/CoordinationUtilsFactory

7 weeks agoSAMZA-1410 make tests pass in different Junit versions for testPlanIdWithShuffledStre...
Fred Ji [Mon, 28 Aug 2017 21:17:32 +0000 (14:17 -0700)] 
SAMZA-1410 make tests pass in different Junit versions for testPlanIdWithShuffledStreamSpecs and testGeneratePlanIdWithDifferentStreamSpec

More details are in https://issues.apache.org/jira/browse/SAMZA-1410.

gradlew build and test passed.

Author: Fred Ji <haifeng.ji@gmail.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #289 from fredji97/assertNotEquals

7 weeks agoSAMZA-1404: Add warning in case of potential process starvation due to longer window...
Bharath Kumarasubramanian [Mon, 28 Aug 2017 19:27:47 +0000 (12:27 -0700)] 
SAMZA-1404: Add warning in case of potential process starvation due to longer window method

Currently, we use the average windowNs as the lower bound for window trigger time to determine if user needs to warned. We could potentially make this complicated by also including average commit ns and some delta to be more accurate.

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Jagadish V <vjagadish1989@gmail.com>

Closes #281 from bharathkk/master

8 weeks agoSAMZA-1408 update master doc to use 0.14.0-SNAPSHOT after 0.13.1 release
Fred Ji [Sat, 26 Aug 2017 00:21:21 +0000 (17:21 -0700)] 
SAMZA-1408 update master doc to use 0.14.0-SNAPSHOT after 0.13.1 release

Author: Fred Ji <haifeng.ji@gmail.com>

Reviewers: Yi Pan <nickpan47@gmail.com>

Closes #285 from fredji97/master_doc and squashes the following commits:

58cc775 [Fred Ji] SAMZA-1408 keep the doc unchanged for minor release in master and update PR based on feedback
fc41efe [Fred Ji] SAMZA-1408 update master doc to use 0.14.0-SNAPSHOT after 0.13.1 release

8 weeks agoFix broken link for KeyValueStore in Samza's feature preview web-page
Jagadish [Thu, 24 Aug 2017 04:47:50 +0000 (21:47 -0700)] 
Fix broken link for KeyValueStore in Samza's feature preview web-page

8 weeks agoSAMZA-1382: added Zk communication protocol version verification
Boris Shkolnik [Wed, 23 Aug 2017 00:55:54 +0000 (17:55 -0700)] 
SAMZA-1382: added Zk communication protocol version verification

Author: Boris Shkolnik <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #255 from sborya/ZkCommunicationVersion

2 months agoSAMZA-1375: Implement Lock for Azure using Lease Blob
Pawas Chhokra [Sat, 19 Aug 2017 00:49:52 +0000 (17:49 -0700)] 
SAMZA-1375: Implement Lock for Azure using Lease Blob

Author: PawasChhokra <pawas2703@gmail.com>
Author: PawasChhokra <Jaimatadi1$>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #280 from PawasChhokra/AzureDistributedLock

2 months agoSAMZA-1378: Introduce and Implement Scheduler Interface for Polling in Azure
Pawas Chhokra [Fri, 18 Aug 2017 21:38:46 +0000 (14:38 -0700)] 
SAMZA-1378: Introduce and Implement Scheduler Interface for Polling in Azure

PR 1: AzureClient + AzureConfig
PR 2: LeaseBlobManager
PR 3: BlobUtils + JobModelBundle
PR 4: TableUtils + ProcessorEntity
PR 5: AzureLeaderElector
PR 6: Added all schedulers (current PR)

Author: PawasChhokra <Jaimatadi1$>
Author: PawasChhokra <pawas2703@gmail.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #261 from PawasChhokra/AzureSchedulers

2 months agoSAMZA-1401: update NOTICE for 0.13.1 release with RocksDB BSD+patents license
Fred Ji [Fri, 18 Aug 2017 00:54:40 +0000 (17:54 -0700)] 
SAMZA-1401: update NOTICE for 0.13.1 release with RocksDB BSD+patents license

Please see details in SAMZA-1401.

Author: Fred Ji <haifeng.ji@gmail.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #279 from fredji97/rocksdb

2 months agoSAMZA-1400: disabling flaky test testTwoStreamProcessors in TestZkStreamProcessorSession
Fred Ji [Thu, 17 Aug 2017 18:16:10 +0000 (11:16 -0700)] 
SAMZA-1400: disabling flaky test testTwoStreamProcessors in TestZkStreamProcessorSession

In this SAMZA-1400, We are disabling this flaky one in master and will cherry pick for 0.13.1. We have created a ticket SAMZA-1399 for fixing it in later build.

navina sborya Please take a look.

Author: Fred Ji <haifeng.ji@gmail.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #278 from fredji97/SAMZA1400

2 months agoSAMZA-1355 : Enable standalone integration tests conditionally in build.
Shanthoosh Venkataraman [Wed, 16 Aug 2017 23:25:49 +0000 (16:25 -0700)] 
SAMZA-1355 : Enable standalone integration tests conditionally in build.

It's run only when includeSamzaTest gradle project property is set.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #237 from shanthoosh/disable_samza_test_build_with_a_flag

2 months agoSAMZA-1397; log.debug loop to run only if debug is enabled
Boris Shkolnik [Wed, 16 Aug 2017 22:08:51 +0000 (15:08 -0700)] 
SAMZA-1397; log.debug loop to run only if debug is enabled

Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Jagadish <jagadish@apache.org?

Closes #275 from sborya/LocalityLogDebug

2 months agoSAMZA-1396 TestZkLocalApplicationRunner tests fails after SAMZA-1385
Navina Ramesh [Wed, 16 Aug 2017 21:42:18 +0000 (14:42 -0700)] 
SAMZA-1396 TestZkLocalApplicationRunner tests fails after SAMZA-1385

* Fixes ZkPath issues
* Fixes appname / jobname mismatch

Author: Navina Ramesh <navina@apache.org>

Reviewers: Xinyu Liu <xinyu@apache.org>, Bharath Kumarasubramanian <codin.martial@gmail.com>

Closes #274 from navina/SAMZA-1396

2 months agoSAMZA-1394 add Fred Ji's PGP key for release purpose
Fred Ji [Wed, 16 Aug 2017 16:53:13 +0000 (09:53 -0700)] 
SAMZA-1394 add Fred Ji's PGP key for release purpose

Author: Fred Ji <fji@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #273 from fredji97/master

2 months agoSAMZA-1385: Fix zookeeper path conflict between LocalApplicationRunner and ZkJobCoord...
Bharath Kumarasubramanian [Tue, 15 Aug 2017 05:02:24 +0000 (22:02 -0700)] 
SAMZA-1385: Fix zookeeper path conflict between LocalApplicationRunner and ZkJobCoordinator

Tested the fix w/ sample page view adclick joiner job.
navina sborya nickpan47 can you please take a look at the RB?

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Navina Ramesh <navina@apache.org>, Yi Pan <nickpan47@gmail.com>, Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Closes #265 from bharathkk/master

2 months agoSAMZA-1390; Update SamzaMonitorService to spawn deamon threads
Shanthoosh Venkataraman [Tue, 15 Aug 2017 00:01:09 +0000 (17:01 -0700)] 
SAMZA-1390; Update SamzaMonitorService to spawn deamon threads

Observed in LinkedIn production setup that samza-rest jvm process doesn’t stop after main
thread death(due to jetty server failures) since non-deamon threads spawned for
`SamzaMonitorService` are alive.

This affects samza-rest jvm process lifecycle management. To fix this, plugging
in ThreadFactory which sets Thread name format & marks them as daemon.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #270 from shanthoosh/make_samza_rest_non_daemon

2 months agoSAMZA-1388: Flaky test - TestStatefulTask#testShouldStartAndRestore
Jacob Maes [Fri, 11 Aug 2017 22:50:12 +0000 (15:50 -0700)] 
SAMZA-1388: Flaky test - TestStatefulTask#testShouldStartAndRestore

I believe the problem originated from SAMZA-173.

The core issue is testShouldRestoreStore was not updated to expect 6 messages after 2 more messages were added to testShouldStartTaskForFirstTime.

Fixed the issue and refactored the code so the 2 methods wouldn't disagree again in the future.

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #269 from jmakes/samza-1388

2 months agoSAMZA-1389: Fix ZkProcessorLatch await(timeout, TimeUnit) api.
Shanthoosh Venkataraman [Fri, 11 Aug 2017 18:51:32 +0000 (11:51 -0700)] 
SAMZA-1389: Fix ZkProcessorLatch await(timeout, TimeUnit) api.

Use passed in timeUnit value for zkClient.waitUnitExists method rather than hardcoding with  `TimeUnit.MILLISECONDS`.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Boris Shkolnik <boryas@apache.org>,Fred Ji <fredji97@yahoo.com>,Jagadish <jvenkatr@linkedin.com>

Closes #268 from shanthoosh/fix_zklatch_impl

2 months agoSAMZA-1387: Unable to Start Samza App Because Regex Check
Jacob Maes [Fri, 11 Aug 2017 16:28:20 +0000 (09:28 -0700)] 
SAMZA-1387: Unable to Start Samza App Because Regex Check

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Fred Ji <fredji97@yahoo.com>

Closes #266 from jmakes/samza-1387

2 months agoSAMZA-1384: Race condition with async commit affects checkpoint correctness
Jacob Maes [Thu, 10 Aug 2017 22:44:42 +0000 (15:44 -0700)] 
SAMZA-1384: Race condition with async commit affects checkpoint correctness

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #263 from jmakes/samza-1384

2 months agoMerge branch 'master' into 0.14.0
Yi Pan (Data Infrastructure) [Wed, 9 Aug 2017 17:39:55 +0000 (10:39 -0700)] 
Merge branch 'master' into 0.14.0

Conflicts:
samza-core/src/main/java/org/apache/samza/processor/StreamProcessor.java

2 months agoSAMZA-1374: Implement Leader Election using Lease Blob in Azure
Pawas Chhokra [Wed, 9 Aug 2017 01:26:50 +0000 (18:26 -0700)] 
SAMZA-1374: Implement Leader Election using Lease Blob in Azure

PR 1: AzureClient + AzureConfig
PR 2: LeaseBlobManager
PR 3: BlobUtils + JobModelBundle
PR 4: TableUtils + ProcessorEntity
**PR 5: AzureLeaderElector** (current PR)

Author: PawasChhokra <Jaimatadi1$>
Author: PawasChhokra <pawas2703@gmail.com>

Reviewers: Navina Ramesh <navina@apache.org>, Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Closes #259 from PawasChhokra/LeaderElection

2 months agoSAMZA-1381: Create Utility Class for interacting with Azure Table Storage
Pawas Chhokra [Tue, 8 Aug 2017 00:17:33 +0000 (17:17 -0700)] 
SAMZA-1381: Create Utility Class for interacting with Azure Table Storage

PR 1: AzureClient + AzureConfig
PR 2: LeaseBlobManager
PR 3: BlobUtils + JobModelBundle
**PR 4: TableUtils + ProcessorEntity** (Current PR)

Author: PawasChhokra <Jaimatadi1$>
Author: PawasChhokra <pawas2703@gmail.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #258 from PawasChhokra/TableUtils

2 months agoSAMZA-1380: Create Utility Class for interacting with Azure Blob Storage
Pawas Chhokra [Mon, 7 Aug 2017 23:28:05 +0000 (16:28 -0700)] 
SAMZA-1380: Create Utility Class for interacting with Azure Blob Storage

PR 1: AzureClient + AzureConfig
PR 2: LeaseBlobManager
**PR 3: BlobUtils + JobModelBundle** (current PR)

Author: PawasChhokra <Jaimatadi1$>
Author: PawasChhokra <pawas2703@gmail.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #257 from PawasChhokra/BlobUtils

2 months agoSAMZA-1376: Create a leasing utility class for blobs in Azure
Pawas Chhokra [Mon, 7 Aug 2017 21:15:23 +0000 (14:15 -0700)] 
SAMZA-1376: Create a leasing utility class for blobs in Azure

navina
PR 1: AzureClient + AzureConfig
**PR 2: LeaseBlobManager** (current PR)

Author: PawasChhokra <Jaimatadi1$>
Author: PawasChhokra <pawas2703@gmail.com>

Reviewers: Navina Ramesh <navina@apache.org>, Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Closes #256 from PawasChhokra/LeaseUtils

2 months agoReenable LocalZkApplicationRunner tests.
Shanthoosh Venkataraman [Fri, 4 Aug 2017 18:49:29 +0000 (11:49 -0700)] 
Reenable LocalZkApplicationRunner tests.

Add back commented out ZkLocalApplicationRunner tests(Was dependent upon error propagation from SamzaContainer to LocalApplicationRunner).

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #250 from shanthoosh/fix_broken_tests

2 months agoSamza-1379: Create Azure Client
Pawas Chhokra [Fri, 4 Aug 2017 01:58:10 +0000 (18:58 -0700)] 
Samza-1379: Create Azure Client

navina
**PR 1: AzureClient + AzureConfig** (current PR)

Author: PawasChhokra <Jaimatadi1$>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #254 from PawasChhokra/AzureStorageClient

2 months agoFix flaky, slow integration tests in TestZkStreamProcessor and TestZkStreamProcessorS...
Shanthoosh Venkataraman [Thu, 3 Aug 2017 22:48:23 +0000 (15:48 -0700)] 
Fix flaky, slow integration tests in TestZkStreamProcessor and TestZkStreamProcessorSession

Fix flaky and slow integration tests in TestZkStreamProcessor and TestZkStreamProcessorSession
Reason for failures:

There’re three configurable wait times in rebalancing phase in samza standalone before consensus is acheived and processing resumes with updated jobModel.

* debounceTime (Specified by `job.debounce.time.ms`. Upon processor change, leader waits for this interval before generating jobModel expecting stabilization in processors group(new arrival, deletion etc)).
* taskShutdownMs (Specified by `task.shutdown.ms`. Wait time for SamzaContainer shutdown in StreamProcessor).
* barrierWaitTimeOutMs (Specified by `job.coordinator.zk.consensus.timeout.ms`. Wait time for all processors in the group to join the barrier after creation).

Above wait times affects rebalancing phase duration. All these wait time have defaults in order of 40-60 seconds and not set to low values.

Flaky tests expects processors to come back up after rebalancing phase and drain message sources(Accomplished by checking a latch.count. RemoteApplicationRunner integration tests does exact same thing by checking if kafka input queue is drained directly with similar logic).

In worst case rebalancing phases can last upto 3-4 minutes(Making these tests sometime take 10 minutes at worst case).

Change:

Set all the above timeouts to 2 seconds(Sufficient for tests and verified by local build).

Benefits:

* Faster build time(Average runtime of these individual tests were reduced from 1m56s to 14s)
* More predicability in assertions(Didn’t fail even once in 30-40 attempts locally).

NOTE: If this doesn’t fix TestZkStreamProcessor and TestZkStreamProcessorSession,
longer term fix should be to use message markers in input source and
shutdown taskCoordinator upon receiving them from TaskImpl(Or use
bounded collection based pluggable InMemorySystemConsumer/InMemorySystemProducer).

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Bharath Kumarasubramanian <codin.martial@gmail.com>, Navina Ramesh <navina@apache.org>

Closes #260 from shanthoosh/FIX_ZK_PROCESSOR_FLAKY_TESTS

2 months agoSAMZA-1365: Calling zkClient.close from zkWatch impl blocks indefinitely.
Shanthoosh Venkataraman [Thu, 3 Aug 2017 21:32:09 +0000 (14:32 -0700)] 
SAMZA-1365: Calling zkClient.close from zkWatch impl blocks indefinitely.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #253 from shanthoosh/SAMZA-1365

2 months agoSAMZA-1366: ScriptRunner should allow callers to control the child pr…
Jacob Maes [Wed, 2 Aug 2017 16:22:51 +0000 (09:22 -0700)] 
SAMZA-1366: ScriptRunner should allow callers to control the child pr…

…ocess environment.

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Jagadish <jvenkatr@linkedin.com>, Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Closes #245 from jmakes/script-runner-improvements

2 months agoSAMZA-1361; OperatorImplGraph is using wrong keys to store/retrieve OperatorImpl...
Prateek Maheshwari [Thu, 27 Jul 2017 22:13:33 +0000 (15:13 -0700)] 
SAMZA-1361; OperatorImplGraph is using wrong keys to store/retrieve OperatorImpl in the map

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish <jagadish@apachce.org>

Closes #248 from prateekm/operatorimpl-key and squashes the following commits:

e733e9d3 [Prateek Maheshwari] Dummy commit to trigger jenkins build.
5a16f162 [Prateek Maheshwari] Updated with Yi's feedback.
8eb2c5df [Prateek Maheshwari] SAMZA-1361: OperatorImplGraph is using wrong keys to store/retrieve OperatorImpl in the map

2 months agoSAMZA-1370; Memory leak in CachedStore when using ByteBufferSerde as key Serde
Prateek Maheshwari [Thu, 27 Jul 2017 22:10:43 +0000 (15:10 -0700)] 
SAMZA-1370; Memory leak in CachedStore when using ByteBufferSerde as key Serde

ByteBufferSerde uses relative bulk get to serialize the provided ByteBuffer which changes its internal position. ByteBuffer's `equals` and `hashCode` depend upon its remaining elements, i.e. on its position. This means that when using ByteBuffers as keys in the CachedStore, flushing cache contents to the underlying store changes their hashCode. Since the hashCode for the key no longer matches the one used when inserting it into the map, the LinkedHashMap cannot correctly evict or remove these entries, leading to a memory leak.

Changing ByteBufferSerde to duplicate the provided ByteBuffer before copying should fix this issue. Prefer this over using absolute gets since there's no bulk absolute get API for ByteBuffer.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #251 from prateekm/bytebufferserde

2 months agoSamza-1364: Handle ZKExceptions in zkCoordinationUtils.reset.
Shanthoosh Venkataraman [Wed, 26 Jul 2017 00:29:08 +0000 (17:29 -0700)] 
Samza-1364: Handle ZKExceptions in zkCoordinationUtils.reset.

In some cases LocalAppRunner.waitForFinish indefinitely blocks after LocalApplicationRunner.kill. Last step in LocalAppRunner.kill(streamApp) is zkClient.close()[zkClient belongs to ZkCoordinationService].

ApplicationRunner.kill triggers listeners chain and in final listener zkClient.close throws ZkInterruptedException(RuntimeException) & it's swallowed in listeners preventing shutdownLatch update in LocalApplicationRunner(required for proper shutdown).

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Boris Shkolnik <boryas@apache.org>, Navina Ramesh <navina@apache.org>

Closes #246 from shanthoosh/SAMZA-1364

2 months agoFix build failures after master merge
Shanthoosh Venkataraman [Wed, 26 Jul 2017 00:04:53 +0000 (17:04 -0700)] 
Fix build failures after master merge

Changes
* Fix checkstyle errors from #243
* Fix failure after bad merge in #244

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #252 from shanthoosh/fix_NPE_after_master_merge

2 months agoSAMZA-1282: Spinning up more containers than number of tasks.
Shanthoosh Venkataraman [Tue, 25 Jul 2017 22:32:22 +0000 (15:32 -0700)] 
SAMZA-1282: Spinning up more containers than number of tasks.

Changes

* Stop streamProcessor in onNewJobModelAvailable eventHandler(instead of onNewJobModelConfirmed
  eventHandler) when it's not part of the group and prevent it from joining the barrier.
* When numContainerIds > numTaskModels, generate JobModel by choosing lexicographically
  least `x` containerIds(where x = numTaskModels).
* Added unit and integration tests in appropriate classes to verify the expected behavior.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Boris Shkolnik <boryas@apache.org>, Navina Ramesh <navina@apache.org>

Closes #244 from shanthoosh/more_processor_than_tasks

2 months agoSAMZA-1359; Handle phantom container notifications cleanly during an RM fail-over
Jagadish Venkatraman [Tue, 25 Jul 2017 18:18:14 +0000 (11:18 -0700)] 
SAMZA-1359; Handle phantom container notifications cleanly during an RM fail-over

1. Improved our container handling logic to be resilient to phantom notifications.
2. Added a new metric to Samza's ContainerProcessManager module that tracks the number of such invalid notifications.
3. Add a couple of tests that simulate this exact scenario above that we encountered during the cluster upgrade. (container starts -> container fails -> legitimate notification for the failure - container re-start -> RM fail-over -> phantom notification with a different exit code)
4. As an aside, there are a whole bunch of tests in ContainerProcessManager that rely on Thread.sleep to ensure that threads get to run in a certain order. Removed this non-determinism and made them predictable.

Author: Jagadish Venkatraman <jvenkatr@jvenkatr-mn2.linkedin.biz>

Reviewers: Jake Maes <jmaes@linkedin.com>

Closes #243 from vjagadish1989/am-bug

2 months agoFix log messages from StreamProcessor(onJobModelExpired event).
Shanthoosh Venkataraman [Tue, 25 Jul 2017 17:35:31 +0000 (10:35 -0700)] 
Fix log messages from StreamProcessor(onJobModelExpired event).

Log messages published in onJobModelExpired event have `processorId` as null. `processorId` is cached as final var in jobCoordinatorListener method. JLS for final fields/variables states that they're initialized before the constructor. This sets local final variable copy as null(since it relies upon value of instance variable to be set in constructor).
Changes
* Use processorId directly in `createCoordinatorListener` method.
* Remove StreamProcessor.toString since it has no usages.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #249 from shanthoosh/fix_logging_in_stream_processor

3 months agoSAMZA-1368; make sure new job model will be generated in case of barrier timeout.
Boris Shkolnik [Fri, 21 Jul 2017 22:32:58 +0000 (15:32 -0700)] 
SAMZA-1368; make sure new job model will be generated in case of barrier timeout.

Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Shanthoosh V <svenkata@linkedin.com>

Closes #247 from sborya/onBarrierTimeout1

3 months agoSAMZA-1165; cleanup old zk versions.
Boris Shkolnik [Fri, 21 Jul 2017 20:43:18 +0000 (13:43 -0700)] 
SAMZA-1165; cleanup old zk versions.

Author: Boris Shkolnik <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Navina <navina@apache.org>, Shanthoosh V<svenkata@linkedin.com>

Closes #239 from sborya/zkCleanUpBarrier1

3 months agoSAMZA-1336. Session disconnect propagation.
Boris Shkolnik [Mon, 17 Jul 2017 22:57:29 +0000 (15:57 -0700)] 
SAMZA-1336. Session disconnect propagation.

If ZK doesn't receive any communication from a zkClient (including heartbeats), it closes the session with the client. It removes all the ephemeral nodes associated with the client. That's why we need to restore all these nodes - need to re-register.
We are using ZkClient library to connect to zookeeper. This library allows us to get notification when the session is closed and when a new session is created. So when the new session is created we reset all session related state and re-register.
One weird feature of the library/zookeeper is that when a new session is established, it is still possible to receive old notifications. To avoid this we introduced generation number which we pass into each callback. And if the generation number has changed when the callback was invoked, we will ignore this callback.

Author: Boris Shkolnik <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@bshkolni-ld1.linkedin.biz>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Navina Ramesh <navina@apache.org>, Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Closes #229 from sborya/SessionFailReregister

3 months agoSAMZA-1304: Handling duplicate stream processor registration.
Shanthoosh Venkataraman [Thu, 13 Jul 2017 23:19:56 +0000 (16:19 -0700)] 
SAMZA-1304: Handling duplicate stream processor registration.

When a stream processor registers with same processorId as already existing
processor in processor group, it's registration should fail.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Navina Ramesh <navina@apache.org>, Jagadish V <jvenkatr@linkedin.com>

Closes #240 from shanthoosh/standalone_duplicate_processor_fix

3 months agoSAMZA-1324: Fix NullPointerException in ZkUtils api's.
Shanthoosh Venkataraman [Wed, 12 Jul 2017 18:13:09 +0000 (11:13 -0700)] 
SAMZA-1324: Fix NullPointerException in ZkUtils api's.

Problem:
Read/Write api methods in ZkUtils updates counters/timers in `metrics` field. In a ZkUtils constructor this fields is not initialized properly. Java default for uninitialized field is null resulting in NPE.

Fix:
Initialize private fields of ZkUtils class with appropriate defaults.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #235 from shanthoosh/fix_zkutils_api

3 months agoSAMZA-1358: fix the bug in validating task.class empty string when app.class is confi...
Yi Pan (Data Infrastructure) [Tue, 11 Jul 2017 17:55:51 +0000 (10:55 -0700)] 
SAMZA-1358: fix the bug in validating task.class empty string when app.class is configured

…p.class is configured

Another bug due to scala/java differences.

Author: Yi Pan (Data Infrastructure) <nickpan47@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #242 from nickpan47/SAMZA-1358

3 months agoSAMZA-1321: Propagate end-of-stream and watermark messages
Xinyu Liu [Thu, 29 Jun 2017 00:16:10 +0000 (17:16 -0700)] 
SAMZA-1321: Propagate end-of-stream and watermark messages

The patch completes the end-of-stream work flow across multi-stage pipeline. It also contains initial commit for supporting watermarks. For watermark, there are issues raised in the review feedback and will be addressed by further prs. The main logic this patch adds:

- EndOfStreamManager aggregates the end-of-stream control messages, propagate the result to to downstream intermediate topics based on the topology of the IO in the StreamGraph.

- WatermarkManager aggregates the watermark control messages from the upstage tasks, pass it through the operators, and propagate it to downstream.

In operator impl, I implemented similar watermark logic as Beam for watermark propagation:
* InputWatermark(op) = min { OutputWatermark(op') | op1 is upstream of op}
* OutputWatermark(op) = min { InputWatermark(op), OldestWorkTime(op) }

Add quite a few unit tests and integration test. The code is 100% covered as reported by Intellij. Both control messages work as expected.

Author: Xinyu Liu <xiliu@xiliu-ld.linkedin.biz>

Reviewers: Yi Pan <nickpan47@gmail.com>

Closes #236 from xinyuiscool/SAMZA-1321

3 months agoTrigger notification
Xinyu Liu [Wed, 28 Jun 2017 21:28:07 +0000 (14:28 -0700)] 
Trigger notification

3 months agoSAMZA-1296 : Stand alone integration tests.
Shanthoosh Venkataraman [Wed, 28 Jun 2017 18:12:15 +0000 (11:12 -0700)] 
SAMZA-1296 : Stand alone integration tests.

Changes
Brings up a test bed that contains embedded kafka broker and zookeeper to test the following scenarios.

A) Rolling upgrade of stream processors.
B) Reelection of leader upon failures.
C) Registering multiple processors with same processor id.
D) Zookeeper failure before job model regeneration upon leader death should kill all running stream applications.

NOTE:
Some tests are commented out since zookeeper exceptions are swallowed in ZKJobCoordinator/ZKUtils.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Navina Ramesh <navina@apache.org>, Jagadish V <jvenkatr@linkedin.com>

Closes #196 from shanthoosh/standalone_happy_integration_tests

3 months agoSAMZA-1340 - StreamProcessor does not propagate container failures from StreamTask
Navina Ramesh [Wed, 28 Jun 2017 18:09:27 +0000 (11:09 -0700)] 
SAMZA-1340 - StreamProcessor does not propagate container failures from StreamTask

Storing the exception seen from the container in the `SamzaContainerListener#onFailure(Throwable)` in the StreamProcessor.
`JobCoordinator#stop` callback inspects this stored exception and invokes the correct callback for StreamProcessorLifecycleListener.
It is pretty difficult to add all test cases. Suggestion welcome for improving code/testing.

Author: Navina Ramesh <navina@apache.org>

Reviewers: Chris Pettitt <cpettitt@linkedin.com>, Boris Shkolnik <boryas@apache.org>

Closes #230 from navina/LISAMZA-5272

3 months agoSAMZA-1347: GroupByContainerIds NPE if containerIds list is null
Jacob Maes [Tue, 27 Jun 2017 18:42:51 +0000 (11:42 -0700)] 
SAMZA-1347: GroupByContainerIds NPE if containerIds list is null

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Boris Shkolnik <boryas@apache.org>

Closes #233 from jmakes/samza-1347

3 months agoAdd a basic .travis.yml file to run on Travis.
Nacho Solis [Tue, 27 Jun 2017 00:25:57 +0000 (17:25 -0700)] 
Add a basic .travis.yml file to run on Travis.

Enable gradle caching
Use Trusty as a platform
Set JDK to 8

Author: Nacho Solis <nsolis@linkedin.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #234 from isolis/add-travis-yml

3 months agoSAMZA-1324 : Add a metricsreporter lifecycle for JobCoordinator component of StreamPr...
PawasChhokra [Mon, 26 Jun 2017 23:35:26 +0000 (16:35 -0700)] 
SAMZA-1324 : Add a metricsreporter lifecycle for JobCoordinator component of StreamProcessor

Added a metrics class for ZK based job coordinator that reports a few metrics.

Author: PawasChhokra <Jaimatadi1$>
Author: Pawas Chhokra <pchhokra@pchhokra-mn1.linkedin.biz>
Author: PawasChhokra <pawas2703@gmail.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #223 from PawasChhokra/ZkJobCoordinatorMetrics

3 months agoSAMZA-1346: GroupByContainerCount.balance() should guard against null…
Jacob Maes [Mon, 26 Jun 2017 20:43:33 +0000 (13:43 -0700)] 
SAMZA-1346: GroupByContainerCount.balance() should guard against null…

… LocalityManager

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Chris Pettitt <cpettitt@linkedin.com>

Closes #232 from jmakes/samza-1346

3 months agoSAMZA-1337: Use StreamTask with the LocalApplicationRunner
Boris Shkolnik [Mon, 26 Jun 2017 19:25:48 +0000 (12:25 -0700)] 
SAMZA-1337: Use StreamTask with the LocalApplicationRunner

Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Navina Ramesh <navina@apache.org>, Xinyu Liu <xiliu@apache.org>, Bharath Kumarasubramanian <codin.martial@gmail.com>

Closes #231 from sborya/LocalAppRunnerWithStreamTask

4 months agoSAMZA-1334: fix pre-condition for ContainerAllocator to work properly
Yi Pan (Data Infrastructure) [Tue, 20 Jun 2017 15:35:39 +0000 (08:35 -0700)] 
SAMZA-1334: fix pre-condition for ContainerAllocator to work properly

We have observed issues when the LocalityManager reports the container locality mapping while the host-affinity is disabled in ContainerAllocator, in which the ContainerAllocator failed to release extra containers.

Hence, fix is in the form of make sure the pre-condition is met for the ContainerAllocator w/o host-affinity: the localityMap from the JobModel should contain no preferred host info.

Author: Yi Pan (Data Infrastructure) <nickpan47@gmail.com>

Reviewers: Jagadish <jagadish1989@gmail.com>

Closes #228 from nickpan47/SAMZA-1334 and squashes the following commits:

ad3320f [Yi Pan (Data Infrastructure)] SAMZA-1334: fix the pre-conditions for ContainerAllocator to work properly. Make sure JobModel is generated w/o LocalityManager if host-affinity is disabled
f76fff1 [Yi Pan (Data Infrastructure)] WIP: SAMZA-1334 fix

4 months agoSAMZA-1335: Improve logging for LocalStoreMonitor
Jacob Maes [Thu, 15 Jun 2017 20:40:43 +0000 (13:40 -0700)] 
SAMZA-1335: Improve logging for LocalStoreMonitor

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #226 from jmakes/samza-1335

4 months agoUpdated test build and README versions to next version
Prateek Maheshwari [Wed, 14 Jun 2017 22:18:45 +0000 (15:18 -0700)] 
Updated test build and README versions to next version

Updated the versions used in smoke tests and the README for overnight tests to point to the next release. Using 0.13.1 instead of 0.13.1-SNAPSHOT here since these tests are usually run against the release branch when release testing, where the version doesn't have -SNAPSHOT in it.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #224 from prateekm/version-update