samza.git
17 hours agoSAMZA-1507; Create changelog streams in Leader(ZkJobCoordinator) for stateful operators. master
Shanthoosh Venkataraman [Wed, 22 Nov 2017 19:54:10 +0000 (11:54 -0800)] 
SAMZA-1507; Create changelog streams in Leader(ZkJobCoordinator) for stateful operators.

Verified with a test standalone job. Will add integration test for this as a part of fixing and reenabling standalone integration tests.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Boris Shkolnik <boryas@apache.org>

Closes #362 from shanthoosh/master

40 hours agoSAMZA-1494; Flush operator state at end of stream
Jagadish [Tue, 21 Nov 2017 20:42:17 +0000 (12:42 -0800)] 
SAMZA-1494; Flush operator state at end of stream

- Propagate operator messages at endOfStream to all down-stream operators.
- Emit all pending windows when endOfStream is reached.
- Flush all state on endOfStream irrespective of auto-commit behavior.

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Xinyu Liu <xiliu@linkedin.com>

Closes #366 from vjagadish1989/eos-flush

41 hours agoSAMZA-1505: Fix CheckpointTool writing only one ssp per task
Xinyu Liu [Tue, 21 Nov 2017 19:26:55 +0000 (11:26 -0800)] 
SAMZA-1505: Fix CheckpointTool writing only one ssp per task

Currently when using CheckpointTool to write checkpoints, it only writes a checkpoint of a single ssp per task. By debugging the code, looks like the flatMap() on the checkpoint of Optional tuple(taskname -> Map(ssp -> offset)) merges the results by key taskname. This patch stores the results explicitly in a list and then groupBy() on it, which fixes the problem.

Author: xiliu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jake Maes <jmakes@gmail.com>

Closes #364 from xinyuiscool/SAMZA-1505

2 days agoSAMZA-1482: add config documentation for auto-restart/fail behavior o…
Yi Pan (Data Infrastructure) [Mon, 20 Nov 2017 19:48:14 +0000 (11:48 -0800)] 
SAMZA-1482: add config documentation for auto-restart/fail behavior o…

…n partition count changes

Author: Yi Pan (Data Infrastructure) <nickpan47@gmail.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #363 from nickpan47/partition-change-docsite and squashes the following commits:

5342ea0 [Yi Pan (Data Infrastructure)] SAMZA-1482: fix configuration table in documentation site
7c9f326 [Yi Pan (Data Infrastructure)] SAMZA-1482: add config documentation for auto-restart/fail behavior on partition count changes

5 days agoSAMZA-1479; Refactor KafkaCheckpointManager, KafkaCheckpointLogKey and their tests
Jagadish [Fri, 17 Nov 2017 22:41:05 +0000 (14:41 -0800)] 
SAMZA-1479; Refactor KafkaCheckpointManager, KafkaCheckpointLogKey and their tests

Notable changes:
* Rewrite `KafkaCheckpointLogKey` into two classes - an immutable class, and a SerDe
* Remove dependency on static setters in the `KafkaCheckpointLogKey`
* Change lifecycle of components in KafkaCheckpointManager
  - It's safe to start producers and consumers during `start` as opposed to lazy loading them during writes, and reads.
  - Initialize systemProducer and systemConsumer during construction
* Simplify logic for ignoring checkpoint validations
* Re-write checkpointManager#readLog() to use a simpler API.
* Remove unnecessary complexity after the migration from 0.8
* Remove unnecessary locking in startup, and shut-down
* Remove dependencies on SimpleConsumer configs like bufferSize, fetchSize, socketTimeout
* Refactor KafkaCheckpointManagerFactory and remove static getCheckpointSystemNameAndFactory
* Bug-fix : Register the taskName correctly (instead of using a dummy string for the taskName)

* Add unit tests to verify more checkpoint scenarios
* Consolidate unit tests into utils for creating producer, consumer and admin instances
* Convert/consolidate most long-running integration tests into unit tests

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #348 from vjagadish1989/kafka-checkpointmanager-refactor

6 days agoSamza SQL implementation for basic projects, filtering and UDFs
Srinivasulu Punuru [Thu, 16 Nov 2017 19:14:32 +0000 (11:14 -0800)] 
Samza SQL implementation for basic projects, filtering and UDFs

## Samza SQL implementation for basic projects, filtering and

## Design document:
https://docs.google.com/document/d/1bE-ZuPfTpntm1hT3GwQEShYDiTqU3IkxeP4-3ZcGHgU/edit?usp=sharing

Author: Srinivasulu Punuru <spunuru@linkedin.com>

Reviewers: Yi Pan <nickpan47@gmail.com>

Closes #295 from srinipunuru/samza-sql.1

6 days agoSAMZA-1504: Allow user to register container-level metrics
Xinyu Liu [Thu, 16 Nov 2017 18:46:05 +0000 (10:46 -0800)] 
SAMZA-1504: Allow user to register container-level metrics

This change allows user to register the metrics on the per-container basis.

Tested in beam runner and works as expected.

Author: xiliu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Prateek Maheshwari <prateekm@apache.org>

Closes #361 from xinyuiscool/SAMZA-1504

7 days agoSAMZA-1495: Set intermediate streams as higher priority by default
Prateek Maheshwari [Wed, 15 Nov 2017 18:45:08 +0000 (10:45 -0800)] 
SAMZA-1495: Set intermediate streams as higher priority by default

Most changes in StreamConfig are formatting fixes.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish Venkatraman <vjagadish1989@gmail.com>

Closes #358 from prateekm/intermediate-stream-priority

7 days agoSAMZA-1501: Validate operator IDs so that they don't contain special characters and...
Prateek Maheshwari [Wed, 15 Nov 2017 18:27:08 +0000 (10:27 -0800)] 
SAMZA-1501: Validate operator IDs so that they don't contain special characters and spaces

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>, Jagadish Venkatraman <vjagadish1989@gmail.com>

Closes #359 from prateekm/operator-id-validation

8 days agoSAMZA-1482: Restart or fail Samza jobs in YARN when detecting changes…
Yi Pan (Data Infrastructure) [Tue, 14 Nov 2017 20:45:23 +0000 (12:45 -0800)] 
SAMZA-1482: Restart or fail Samza jobs in YARN when detecting changes…

… in input topic partitions

Some high-lights of the changes:
- always instantiating StreamPartitionCountMonitor on all input system streams now
-- it is debatable whether we want to include systems that do not implement the optimized ExtendedSystemAdmin interface. We may need to configure a long partition monitor interval for this case and the case where there are tons of input topics. (Pending perf test)
- moved the instantiation of StreamPartitionCountMonitor out of JobModelManager and allow ClusterBasedJobCoordinator associate a callback method directly to the monitor
- allow callbacks to set different application status code before throwing exception to shutdown the job

Author: Yi Pan (Data Infrastructure) <nickpan47@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>, Jagadish <jagadish@apache.org>

Closes #351 from nickpan47/restart-on-partition-change and squashes the following commits:

8d04cd6 [Yi Pan (Data Infrastructure)] SAMZA-1482: restart or fail the job when input topic partition count changes
ee3fa65 [Yi Pan (Data Infrastructure)] SAMZA-1482: Restart or fail Samza jobs in YARN when detecting changes in input topic partitions

8 days agoSAMZA-1474: Bump up rocksdb version to 5.7.3 to include licensing changes
Bharath Kumarasubramanian [Tue, 14 Nov 2017 19:00:34 +0000 (11:00 -0800)] 
SAMZA-1474: Bump up rocksdb version to 5.7.3 to include licensing changes

You can find more details on the bug fixes and API changes [here](https://github.com/facebook/rocksdb/releases). I upgraded to 5.7.3 since 5.8.0 has a regression [KAFKA-6100](https://issues.apache.org/jira/browse/KAFKA-6100)

All of our tests passed locally. I will monitor travis for failures.

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Boris Shkolnik <boryas@gmail.com>

Closes #344 from bharathkk/master

8 days agoSAMZA-1487: Disable Flaky Zk Integration tests.
Shanthoosh Venkataraman [Tue, 14 Nov 2017 18:54:32 +0000 (10:54 -0800)] 
SAMZA-1487: Disable Flaky Zk Integration tests.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Yi Pan <nickpan47@gmail.com>

Closes #355 from shanthoosh/master

8 days agoSAMZA-1490: Fix TestRepartitionJoinWindowApp
Dong Lin [Tue, 14 Nov 2017 18:48:50 +0000 (10:48 -0800)] 
SAMZA-1490: Fix TestRepartitionJoinWindowApp

Author: Dong Lin <lindong28@gmail.com>

Reviewers: Prateek Maheshwari <prateekm@apache.org>

Closes #356 from lindong28/SAMZA-1490

8 days agoSAMZA-1492: Add exception counter to LocalStoreMonitor.
Shanthoosh Venkataraman [Tue, 14 Nov 2017 15:56:13 +0000 (07:56 -0800)] 
SAMZA-1492: Add exception counter to LocalStoreMonitor.

Add storeGCExceptionCounter to LocalStoreMonitor(as a part of LocalStoreMonitorMetrics) to enable alerts setup in an production environment.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #357 from shanthoosh/local_store_monitor_metrics_1

13 days agoSAMZA-1480: TaskStorageManager improperly initializes changelog consu…
Jacob Maes [Thu, 9 Nov 2017 19:54:32 +0000 (11:54 -0800)] 
SAMZA-1480: TaskStorageManager improperly initializes changelog consu…

…mer position when restoring a store from disk

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #350 from jmakes/samza-1480

13 days agoIgnore java fatal error log files from git and rat
Prateek Maheshwari [Thu, 9 Nov 2017 18:45:04 +0000 (10:45 -0800)] 
Ignore java fatal error log files from git and rat

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #354 from prateekm/rat-exclude-hserr

13 days agoSAMZA-1487: Disable Flaky Zk Integration tests.
Shanthoosh Venkataraman [Thu, 9 Nov 2017 18:24:53 +0000 (10:24 -0800)] 
SAMZA-1487: Disable Flaky Zk Integration tests.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Prateek Maheshwari <prateekm@apache.org>

Closes #353 from shanthoosh/master

2 weeks agoFix test failures caused by PR345
xiliu [Wed, 8 Nov 2017 02:00:03 +0000 (18:00 -0800)] 
Fix test failures caused by PR345

2 weeks agoSAMZA-1486; Checkpoint manager implementation with Azure Table
Daniel Chen [Wed, 8 Nov 2017 01:25:58 +0000 (17:25 -0800)] 
SAMZA-1486; Checkpoint manager implementation with Azure Table

vjagadish1989

Author: Daniel Chen <29577458+dxichen@users.noreply.github.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #341 from dxichen/azure-checkpoint-manager

2 weeks agoSAMZA-1437; Added tests, make eventData creation extensible
Daniel Chen [Wed, 8 Nov 2017 01:15:28 +0000 (17:15 -0800)] 
SAMZA-1437; Added tests, make eventData creation extensible

vjagadish1989 Required for internal custom eventData creation

Author: Daniel Chen <29577458+dxichen@users.noreply.github.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #352 from dxichen/create-event-data-extensible

2 weeks agoSAMZA-1477: Fix issues found by BEAM tests
Xinyu Liu [Tue, 7 Nov 2017 20:16:15 +0000 (12:16 -0800)] 
SAMZA-1477: Fix issues found by BEAM tests

A bunch of issues were found by BEAM tests, which includes:

1) WatermarkFunction needs to be able to return output after processWatermark()
2) control message doesn't implement the equals() and hashcode()
3) Some kafka system related code is not scala 2.10 compatible for tests.

Author: xiliu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Prateek

Closes #345 from xinyuiscool/SAMZA-1477

3 weeks agoSAMZA-1476: Disable flaky TestStatefulTask.testShouldStartAndRestore
Jagadish [Tue, 31 Oct 2017 15:50:19 +0000 (08:50 -0700)] 
SAMZA-1476: Disable flaky TestStatefulTask.testShouldStartAndRestore

3 weeks agoSAMZA-1473; Fix handling of initial values for aggregating windows
Jagadish [Fri, 27 Oct 2017 06:14:41 +0000 (23:14 -0700)] 
SAMZA-1473; Fix handling of initial values for aggregating windows

- Handle initial values for windows with foldLeftFunctions configured
- Improve trace level logging

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Prateek M< <pmaheshw@linkedin.com>

Closes #343 from vjagadish1989/aggregating-window-fix

3 weeks agoSAMZA-1438; SystemProducer, Consumer and Admin interfaces for EventHubs
Daniel Chen [Fri, 27 Oct 2017 05:26:29 +0000 (22:26 -0700)] 
SAMZA-1438; SystemProducer, Consumer and Admin interfaces for EventHubs

Author: Daniel Chen <29577458+dxichen@users.noreply.github.com>

Reviewers: Jagadish <jagadish@apache.org>, Prateek M<pmaheshw@linkedin.com>

Closes #308 from dxichen/eventHub-Connector

3 weeks agoSAMZA-1471: SystemConsumers should not poll ssp that hit end of stream
Hai Lu [Fri, 27 Oct 2017 00:48:47 +0000 (17:48 -0700)] 
SAMZA-1471: SystemConsumers should not poll ssp that hit end of stream

When SystemConsumers poll from SSPs that have hit end of stream, obviously there will be no data return and the poll will exhaust the timeout. This would cause performance issue.

Author: Hai Lu <halu@linkedin.com>

Reviewers: Xinyu Liu <xinyuliu.us@gmail.com>

Closes #342 from lhaiesp/master

4 weeks agoSAMZA-1472: YarnRestJobStatusProvider constructor must be public
Jacob Maes [Thu, 26 Oct 2017 01:03:11 +0000 (18:03 -0700)] 
SAMZA-1472: YarnRestJobStatusProvider constructor must be public

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>,Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Closes #340 from jmakes/samza-1472

4 weeks agoSAMZA-1454: Globally unique and user settable IDs for stateful operators
Prateek Maheshwari [Thu, 26 Oct 2017 00:18:36 +0000 (17:18 -0700)] 
SAMZA-1454: Globally unique and user settable IDs for stateful operators

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish Venkatraman <jagadish@apache.org>, Yi Pan <nickpan47@gmail.com>

Closes #324 from prateekm/operator-id-uniqueness

4 weeks agoSAMZA-1464: Flushing a closed RocksDB store causes SIGSEGVs
Prateek Maheshwari [Wed, 25 Oct 2017 23:27:52 +0000 (16:27 -0700)] 
SAMZA-1464: Flushing a closed RocksDB store causes SIGSEGVs

Made RocksDB operations check if DB is still open to avoid segfaults.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish Venkatraman <jagadish@apache.org>, Xinyu Liu <xinyuiscool@gmail.com>

Closes #334 from prateekm/segfault-fix

4 weeks agoSAMZA-1470: Wrong job status returned by YarnRestJobStatusProvider wh…
Jacob Maes [Wed, 25 Oct 2017 22:18:11 +0000 (15:18 -0700)] 
SAMZA-1470: Wrong job status returned by YarnRestJobStatusProvider wh…

…en there are multiple app

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Jagadish <jvenkatr@linkedin.com>

Closes #339 from jmakes/samza-1470-4

4 weeks agoTestExecutionPlanner compilation error
Boris S [Wed, 25 Oct 2017 01:43:51 +0000 (18:43 -0700)] 
TestExecutionPlanner compilation error

Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #337 from sborya/TestExecutionPlannerCompilationError

4 weeks agoSAMZA-1457: Set retention for internal streams for Batch application
Xinyu Liu [Wed, 25 Oct 2017 01:20:32 +0000 (18:20 -0700)] 
SAMZA-1457: Set retention for internal streams for Batch application

For intermediate streams, checkpoint and changelog, we need to set a short retention period for batch.

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>
Author: xiliu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jagadish V <vjagadish1989@gmail.com>

Closes #328 from xinyuiscool/SAMZA-1457

4 weeks agoStreamOperatorTask does not need to be final.
Boris S [Wed, 25 Oct 2017 00:44:21 +0000 (17:44 -0700)] 
StreamOperatorTask does not need to be final.

Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #336 from sborya/UnmakeStreamOperatorTaskFinal and squashes the following commits:

c44ee58 [Boris S] StreamOperatorTask does not need to be final
d4620d6 [Boris S] Merge branch 'master' of https://github.com/apache/samza
410ce78 [Boris S] Merge branch 'master' of https://github.com/apache/samza
a31a7aa [Boris Shkolnik] reduce debugging from info to debug in KafkaCheckpointManager.java

4 weeks agoMinor fixes to KeyValueStore and RocksDBKeyValueStore
Prateek Maheshwari [Thu, 19 Oct 2017 18:36:02 +0000 (11:36 -0700)] 
Minor fixes to KeyValueStore and RocksDBKeyValueStore

1. Replaced extension class in KeyValueStore with default methods.
2. Fixed formatting in RocksDBKeyValueStore#openDB.
3. Now logs original RocksDBException on errors opening the db. Other minor log message cleanup.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #332 from prateekm/store-fixes

4 weeks agoSAMZA-1466: Flaky test: TestRocksDbKeyValueStore suite
Prateek Maheshwari [Thu, 19 Oct 2017 18:01:56 +0000 (11:01 -0700)] 
SAMZA-1466: Flaky test: TestRocksDbKeyValueStore suite

TestRocksDbKeyValueStore tests are failing intermittently with errors like:
```
testFlush FAILED
    org.apache.samza.SamzaException: Error opening RocksDB store dbStore at location /tmp, received the following exception from RocksDB org.rocksdb.RocksDBException: "
        at org.apache.samza.storage.kv.RocksDbKeyValueStore$.openDB(RocksDbKeyValueStore.scala:99)
        at org.apache.samza.storage.kv.TestRocksDbKeyValueStore.testFlush(TestRocksDbKeyValueStore.scala:73)
```

These happen intermittently because:
1. RocksDB throws an exception if open is called on a store that's already open.
2. A new unit test was added in PR #327 that didn't close the store.
3. Other tests in this class use the same store directory (System.getProperty("java.io.tmpdir")) and run concurrently. Any test that runs after the one in 2 above fails.

Even when we log the RocksDBException (fixed in pr #332), the exception message is malformed due to a bug in RocksDB: https://github.com/facebook/rocksdb/issues/1688. This is fixed in the latest RocksDB version (verified with 5.8.0), so the messages should be more meaningful after an upgrade. Fixed stack trace will say something like:
```
Caused by: org.rocksdb.RocksDBException: While lock file: /var/folders/1b/4nqqvf4s27sby0frjr0q5t_h0004hp/T/LOCK: No locks available
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:231)
at org.apache.samza.storage.kv.RocksDbKeyValueStore$.openDB(RocksDbKeyValueStore.scala:70)
```

Fixing just the test in the mean time.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #333 from prateekm/store-test-fix

5 weeks agoSAMZA-1465: Performance regression for KafkaCheckpointManager
Jacob Maes [Wed, 18 Oct 2017 22:30:34 +0000 (15:30 -0700)] 
SAMZA-1465: Performance regression for KafkaCheckpointManager

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>, Boris Shkolnik <boryas@apache.org>, Fred Ji <fredji97@yahoo.com>

Closes #331 from jmakes/master

5 weeks agoSAMZA-1461; Expose RocksDB properties as metrics
Janek Lasocki-Biczysko [Mon, 16 Oct 2017 23:06:28 +0000 (16:06 -0700)] 
SAMZA-1461; Expose RocksDB properties as metrics

Automatically build gauges for RocksDB properties via configuration:
`stores.<storename>.rocksdb.telemetry.list=<rocksDbProperty1>, <rocksDbProperty1>`

Author: Janek Lasocki-Biczysko <janek.lasocki-biczysko@skyscanner.net>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #327 from janeklb/jlb_rocksDBPropertiesAsMetrics

5 weeks agoSAMZA-1463 disable some flaky tests on hdfs system
Hai Lu [Mon, 16 Oct 2017 18:26:45 +0000 (11:26 -0700)] 
SAMZA-1463 disable some flaky tests on hdfs system

disable some flaky tests on hdfs system until future investigation

Author: Hai Lu <halu@linkedin.com>

Reviewers: Xinyu Liu <xinyuliu.us@gmail.com>

Closes #329 from lhaiesp/master

5 weeks agoChangeLog should not require KafkaSystemAdmin
Boris S [Fri, 13 Oct 2017 22:04:22 +0000 (15:04 -0700)] 
ChangeLog should not require KafkaSystemAdmin

Author: Boris S <boryas@apache.org>

Reviewers: Xinyu Liu <xinyuliu.us@gmail.com>

Closes #325 from sborya/ChangeLogRequireKafkaSystemAdmin

5 weeks agoSAMZA-1424 SAMZA-1425 SAMZA-1426; Support serdes and persistent state for windows
Jagadish [Fri, 13 Oct 2017 20:22:56 +0000 (13:22 -0700)] 
SAMZA-1424 SAMZA-1425 SAMZA-1426; Support serdes and persistent state for windows

Notable changes:
* Made windows durable with support for persistent recoverable stores
* New storage format to support multiple messages in windows (the previous storage format
of storing the entire message-list as a value in the store incurs significant serde overhead)
* Wire `TimeSeriesStore` with the WindowOperator implementation.

Testing:
* Existing unit tests and integration tests pass with serdes wired-up
* Will follow-up with a PR for hello-samza soon.

Note: Majority of changes are in `samza-core/src/main/java/org/apache/samza/operators/impl/WindowOperatorImpl.java` and github collapsed the diff.

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Prateek Maheshwari<pmaheshw@linkedin.com>

Closes #321 from vjagadish1989/window-operator-serde

6 weeks agoSAMZA-1447; Swapping out CLI JobStatusProvider for REST based implementation in samza...
Abhishek Shivanna [Wed, 11 Oct 2017 20:57:20 +0000 (13:57 -0700)] 
SAMZA-1447; Swapping out CLI JobStatusProvider for REST based implementation in samza-rest

Removing the YarnCliJobStatusProvider since forking a new shell for every request on the JobsResource endpoint is resource intensive.

Author: Abhishek Shivanna <ashivanna@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #317 from abhishekshivanna/samza-rest-oom

6 weeks agoSAMZA-1451: Disable integration tests conditionally in build.
Shanthoosh Venkataraman [Wed, 11 Oct 2017 20:56:24 +0000 (13:56 -0700)] 
SAMZA-1451: Disable integration tests conditionally in build.

Remove runIntegrationTests gradle property added as a part of SAMZA-1355 and introduce skipIntegrationTests property(which inverts it).
If skipIntegrationTests gradle project property is enabled, execution of all tests in samza-test project will be skipped from the build.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #319 from shanthoosh/skip_integration_tests

6 weeks agoSAMZA-1409; add missing parts and make guideline clearer for docs/README.md
Fred Ji [Wed, 11 Oct 2017 19:09:33 +0000 (12:09 -0700)] 
SAMZA-1409; add missing parts and make guideline clearer for docs/README.md

Author: Fred Ji <haifeng.ji@gmail.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #288 from fredji97/master_docs_readme

6 weeks agoSAMZA-1453; Update README with Travis-CI badge.
Daniel Nishimura [Wed, 11 Oct 2017 19:03:28 +0000 (12:03 -0700)] 
SAMZA-1453; Update README with Travis-CI badge.

Added Travis-CI to README.md file.

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #323 from dnishimura/samza-1453-readme-travisci

6 weeks agoSAMZA-1452: Clean up interrupted thread bugs
Nacho Solis [Wed, 11 Oct 2017 14:49:52 +0000 (07:49 -0700)] 
SAMZA-1452: Clean up interrupted thread bugs

Call Thread.currentThread().interrupt(); when capturing InterruptedException

Author: Nacho Solis <nsolis@linkedin.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>,Jagadish <jvenkatr@linkedin.com>

Closes #322 from isolis/cleancodebugs

6 weeks agoSAMZA-1292: Merge operator can be no-op when there are no streams to merge
Prateek Maheshwari [Mon, 9 Oct 2017 21:48:05 +0000 (14:48 -0700)] 
SAMZA-1292: Merge operator can be no-op when there are no streams to merge

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #320 from prateekm/small-fixes

6 weeks agoSAMZA-272, SAMZA-1440, SAMZA-1269: Fixed thread interrupt tests in TestExponentialSle...
Prateek Maheshwari [Fri, 6 Oct 2017 23:32:06 +0000 (16:32 -0700)] 
SAMZA-272, SAMZA-1440, SAMZA-1269: Fixed thread interrupt tests in TestExponentialSleepStrategy.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #318 from prateekm/ess-test-fix

6 weeks agoSAMZA-1435: Changed samza-api Serde implementations from Scala to Java.
Prateek Maheshwari [Fri, 6 Oct 2017 17:47:37 +0000 (10:47 -0700)] 
SAMZA-1435: Changed samza-api Serde implementations from Scala to Java.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #316 from prateekm/java-serdes

6 weeks agoSAMZA-1442: use systemAdmin.validateStream in KafkaCheckpointManager
Boris S [Thu, 5 Oct 2017 22:36:37 +0000 (15:36 -0700)] 
SAMZA-1442: use systemAdmin.validateStream in KafkaCheckpointManager

Author: Boris S <boryas@apache.org>

Reviewers: Jacob Maes <jacob.maes@gmail.com>

Closes #314 from sborya/CheckpointTopicValidation

6 weeks agoSAMZA-1444: Removed circular dependency b/w samza-core test and samza-kv-rocksdb...
Prateek Maheshwari [Thu, 5 Oct 2017 17:59:39 +0000 (10:59 -0700)] 
SAMZA-1444: Removed circular dependency b/w samza-core test and samza-kv-rocksdb test

Also added an implementation for KVSerde.
vjagadish1989, thanks for the TestInMemoryStore implementation!

The only usage of InternalInMemoryStore is now in WindowOperatorImpl, which will be removed as part of store wiring for the window operator.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>, Xinyu Liu <xiliu@linkedin.com>

Closes #315 from prateekm/test-store

6 weeks agoSAMZA-1429: Add callback success/failure metrics to async tasks
Daniel Nishimura [Thu, 5 Oct 2017 16:18:37 +0000 (09:18 -0700)] 
SAMZA-1429: Add callback success/failure metrics to async tasks

jmakes & nickpan47 please take a look when you get the chance.

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #306 from dnishimura/samza-1429-async-metrics

7 weeks agoSAMZA-1056: Added wiring for High Level API state stores, their serdes and changelogs.
Prateek Maheshwari [Wed, 4 Oct 2017 22:37:50 +0000 (15:37 -0700)] 
SAMZA-1056: Added wiring for High Level API state stores, their serdes and changelogs.

Provided join operator access to durable state stores.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish Venkatraman <jagadish@apache.org>

Closes #309 from prateekm/operator-store-wiring

7 weeks agoSAMZA-1109: Updated High Level API serde impl with Yi's feedback
Prateek Maheshwari [Wed, 4 Oct 2017 22:28:37 +0000 (15:28 -0700)] 
SAMZA-1109: Updated High Level API serde impl with Yi's feedback

nickpan47 for review.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: "Jagadish Venkatraman <jvenkatr@linkedin.com>"

Closes #310 from prateekm/serde-updates

7 weeks agoSAMZA-1439: Address late review feedback from SAMZA-1434
Jacob Maes [Wed, 4 Oct 2017 21:02:26 +0000 (14:02 -0700)] 
SAMZA-1439: Address late review feedback from SAMZA-1434

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Xinyu Liu <xiliu@linkedin.com>

Closes #313 from jmakes/samza-1439

7 weeks agoSAMZA-1427; use systemFactory in checkpoint manager.
Boris S [Wed, 4 Oct 2017 17:48:58 +0000 (10:48 -0700)] 
SAMZA-1427; use systemFactory in checkpoint manager.

Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>
Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Jagadish Venkatraman <jagadish@apache.org>, Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #299 from sborya/LiKafkaClient

7 weeks agoSAMZA-1431: Enable SonarCloud integration with Travis-CI.
Daniel Nishimura [Wed, 4 Oct 2017 17:39:42 +0000 (10:39 -0700)] 
SAMZA-1431: Enable SonarCloud integration with Travis-CI.

These are modifications needed to integrate Travis-CI with SonarCloud. Also, the JaCoCo plugin is enabled in gradle for code coverage reporting to SonarCloud.

For reference, here is the request to Apache Infra to enable Travis-CI: https://issues.apache.org/jira/browse/INFRA-15132

nickpan47 isolis

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #305 from dnishimura/samza-1431-sonarcloud

7 weeks agoAdded Prateek's code signing public key to KEYS
Prateek Maheshwari [Wed, 4 Oct 2017 01:00:04 +0000 (18:00 -0700)] 
Added Prateek's code signing public key to KEYS

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #312 from prateekm/commiters-update

7 weeks agoAdded self to committers list.
Prateek Maheshwari [Tue, 3 Oct 2017 23:06:33 +0000 (16:06 -0700)] 
Added self to committers list.

vjagadish1989

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #311 from prateekm/commiters-update

7 weeks agoMerge branch '0.14.0'
Xinyu Liu [Tue, 3 Oct 2017 22:10:13 +0000 (15:10 -0700)] 
Merge branch '0.14.0'

7 weeks agoMerge branch 'master' into 0.14.0
Xinyu Liu [Tue, 3 Oct 2017 22:09:41 +0000 (15:09 -0700)] 
Merge branch 'master' into 0.14.0

7 weeks agoadded Boris as a committer
Boris Shkolnik [Tue, 3 Oct 2017 07:01:16 +0000 (00:01 -0700)] 
added Boris as a committer

Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Xinyu Liu <xinyuliu.us@gmail.com>

Closes #304 from sborya/addCommitter

7 weeks agoSAMZA-1423; Follow-up to PR:303 - Remove version constant from TimeSeriesKey
Jagadish [Mon, 2 Oct 2017 19:54:22 +0000 (12:54 -0700)] 
SAMZA-1423; Follow-up to PR:303 -  Remove version constant from TimeSeriesKey

7 weeks agoSAMZA-1423; Implement time series storage for joins and windows
Jagadish [Mon, 2 Oct 2017 19:23:24 +0000 (12:23 -0700)] 
SAMZA-1423; Implement time series storage for joins and windows

Notable changes:
* New interface for storing and retrieving time series data.
* New store and serde implementation for use in windows and joins

Pending:
* Documentation, and minor clean-ups
* Wire-up of stores from ExecutionPlanner
* Usage of the store to implement various windows and joins

Author: Jagadish <jagadish@apache.org>
Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>,Xinyu Liu<xiliu@linkedin.com>

Closes #303 from vjagadish1989/window-store

7 weeks agoSAMZA-1434: Fix issues found in Hadoop 0.14.0
Xinyu Liu [Fri, 29 Sep 2017 22:05:55 +0000 (15:05 -0700)] 
SAMZA-1434: Fix issues found in Hadoop

Fix the following bugs found when running Samza on hadoop:

1. Hdfs allows output partitions to be 0 (empty folder)
2. Add null check for the changelog topic generation
3. Call getStreamSpec() instead of using streamSpec member in StreamEdge. This is due to getStreamSpec will do more transformation.
4. Bound the auto-generated intermediate topic partition by a certain count (256).

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jagadish Venkatraman <jagadish@apache.org>

Closes #307 from xinyuiscool/SAMZA-1434

7 weeks agoSAMZA-1386: Inline End-of-stream and Watermark logic inside OperatorImpl
Xinyu Liu [Fri, 29 Sep 2017 00:11:28 +0000 (17:11 -0700)] 
SAMZA-1386: Inline End-of-stream and Watermark logic inside OperatorImpl

This patch contains the following changes:
1. Refactor watermark and end-of-stream logic. The aggregation/handling has been moved from WatermarkManager/EndOfStreamManager to be inline inside OperatorImpl. This is for keeping the logic in one place.
2. Now subclass of OperatorImpl will override handleWatermark() to do its specific handling, such as fire trigger.
3. Add emitWatermark() in OperatorImpl so subclass can call it to emit watermark upon receiving a message or watermark.

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>
Author: Xinyu Liu <xiliu@xiliu-ld.linkedin.biz>

Reviewers: Yi Pan <nickpan47@gmail.com>

Closes #277 from xinyuiscool/SAMZA-1386

8 weeks agoSAMZA-1109; Allow setting Serdes for fluent API operators in code
Prateek Maheshwari [Thu, 28 Sep 2017 00:08:04 +0000 (17:08 -0700)] 
SAMZA-1109; Allow setting Serdes for fluent API operators in code

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>, Jacob Maes <jmakes@apache.org>

Closes #293 from prateekm/serde-instance

8 weeks agoSAMZA-1417: Clear and recreate intermediate and metadata streams for batch processing
Xinyu Liu [Mon, 25 Sep 2017 21:52:42 +0000 (14:52 -0700)] 
SAMZA-1417: Clear and recreate intermediate and metadata streams for batch processing

For each run of a batch application, we need to clear the internal streams from the previous run and recreate new ones. This patch introduces the following:
1) bounded flag in StreamSpec
2) app.mode (BATCH/STREAM) in the application config. An application is a batch app iff all the input streams are bounded.
3) app.runId and use it to generate the internal topics for each run.

run.id generation is not addressed in this pr. There will be another patch to resolve it for both yarn and standalone. For now, this patch only works for yarn.

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jake Maes <jmakes@apache.org>

Closes #297 from xinyuiscool/SAMZA-1417

2 months agoSAMZA-1428: Fix scala 2.10 compilation issue with java 8 interface st…
Jacob Maes [Thu, 21 Sep 2017 18:57:31 +0000 (11:57 -0700)] 
SAMZA-1428: Fix scala 2.10 compilation issue with java 8 interface st…

…atic methods

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Boris Shkolnik <boryas@apache.org>

Closes #300 from jmakes/samza-1428

2 months agoSAMZA-1392: KafkaSystemProducer performance and correctness with conc…
Jacob Maes [Wed, 20 Sep 2017 19:40:47 +0000 (12:40 -0700)] 
SAMZA-1392: KafkaSystemProducer performance and correctness with conc…

…urrent sends and flushes

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>, Yi Pan (Data Infrastructure) <nickpan47@gmail.com>

Closes #272 from jmakes/samza-1392

2 months agoSAMZA-1414: SamzaContainer log statements incorrectly expect containerId is a number
Daniel Nishimura [Tue, 19 Sep 2017 21:43:55 +0000 (14:43 -0700)] 
SAMZA-1414: SamzaContainer log statements incorrectly expect containerId is a number

Correctly log ID as a string in SamzaContainer.
jmakes

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #298 from dnishimura/samza-1414

2 months agoSAMZA-1416: Better logging around the exception where class loading failed in initial...
Daniel Nishimura [Tue, 19 Sep 2017 21:26:38 +0000 (14:26 -0700)] 
SAMZA-1416: Better logging around the exception where class loading failed in initializing the SystemFactory for a input/output system

Also added test coverage for the Util.getObj method.
nickpan47 jmakes

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #296 from dnishimura/samza-1416

2 months agoSAMZA-1419; Refactor StreamProcessor tests to remove Thread.sleep()
Bharath Kumarasubramanian [Mon, 18 Sep 2017 17:32:32 +0000 (10:32 -0700)] 
SAMZA-1419; Refactor StreamProcessor tests to remove Thread.sleep()

Refactoring the tests to enable parallel execution. Also, removed unnecessary sleeping during execution. This PR will serve as a prototype to get some feedback on this testing pattern which eliminates state sharing between the tests and opens up the avenue for parallel execution.

This change brings down the test time for these 4 tests to **14 sec** compared to **44 secs**.

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #291 from bharathkk/master

2 months agoSAMZA-1406: Fix potential orphaned containers problem in stand alone
Shanthoosh Venkataraman [Fri, 15 Sep 2017 04:17:36 +0000 (21:17 -0700)] 
SAMZA-1406: Fix potential orphaned containers problem in stand alone

2 months agoFix some integration tests after merging from master
Xinyu Liu [Tue, 12 Sep 2017 21:16:27 +0000 (14:16 -0700)] 
Fix some integration tests after merging from master

2 months agoMerge branch 'master' into 0.14.0
Xinyu Liu [Tue, 12 Sep 2017 18:32:36 +0000 (11:32 -0700)] 
Merge branch 'master' into 0.14.0

2 months agoSAMZA-1415: Add clearStream API in SystemAdmin and remove deprecated APIs
Xinyu Liu [Thu, 7 Sep 2017 23:49:20 +0000 (16:49 -0700)] 
SAMZA-1415: Add clearStream API in SystemAdmin and remove deprecated APIs

The patch does the following:

1) add clearStream() APi in SystemAdmin. Currently it's only supported in Kafka with broker configuring delete.topic.enable=true.

2) remove the deprecated APIs including createChangeLogStream(), validateChangelogStream() and createCoordinatorStream().

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jake Maes <jacob.maes@gmail.com>

Closes #292 from xinyuiscool/SAMZA-1415

2 months agoSAMZA-1413: Config for CoordinationUtilsFactory class name.
Boris Shkolnik [Wed, 30 Aug 2017 23:56:28 +0000 (16:56 -0700)] 
SAMZA-1413: Config for CoordinationUtilsFactory class name.

Author: Boris Shkolnik <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@bshkolni-ld1.linkedin.biz>

Reviewers: Navina Ramesh <navina@apache.org>, Fred Ji <fji@linkedin.com>

Closes #290 from sborya/configUtilsFactoryClassName

2 months agoSAMZA-1385: Coordination utils factory with distributed lock
Boris Shkolnik [Tue, 29 Aug 2017 22:37:17 +0000 (15:37 -0700)] 
SAMZA-1385: Coordination utils factory with distributed lock

this PR includes some changes from another PR. I will re-merge it again, after the other PR is in.

Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #284 from sborya/CoordinationUtilsFactory_withDistributedLock

2 months agoIntroduces CoordinationUtilsFactory to create different implementations of Coordinati...
Boris Shkolnik [Tue, 29 Aug 2017 20:23:47 +0000 (13:23 -0700)] 
Introduces CoordinationUtilsFactory to create different implementations of CoordinationUtils

Some refactoring and cleanup.

Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #282 from sborya/CoordinationUtilsFactory

2 months agoSAMZA-1410 make tests pass in different Junit versions for testPlanIdWithShuffledStre...
Fred Ji [Mon, 28 Aug 2017 21:17:32 +0000 (14:17 -0700)] 
SAMZA-1410 make tests pass in different Junit versions for testPlanIdWithShuffledStreamSpecs and testGeneratePlanIdWithDifferentStreamSpec

More details are in https://issues.apache.org/jira/browse/SAMZA-1410.

gradlew build and test passed.

Author: Fred Ji <haifeng.ji@gmail.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #289 from fredji97/assertNotEquals

2 months agoSAMZA-1404: Add warning in case of potential process starvation due to longer window...
Bharath Kumarasubramanian [Mon, 28 Aug 2017 19:27:47 +0000 (12:27 -0700)] 
SAMZA-1404: Add warning in case of potential process starvation due to longer window method

Currently, we use the average windowNs as the lower bound for window trigger time to determine if user needs to warned. We could potentially make this complicated by also including average commit ns and some delta to be more accurate.

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Jagadish V <vjagadish1989@gmail.com>

Closes #281 from bharathkk/master

2 months agoSAMZA-1408 update master doc to use 0.14.0-SNAPSHOT after 0.13.1 release
Fred Ji [Sat, 26 Aug 2017 00:21:21 +0000 (17:21 -0700)] 
SAMZA-1408 update master doc to use 0.14.0-SNAPSHOT after 0.13.1 release

Author: Fred Ji <haifeng.ji@gmail.com>

Reviewers: Yi Pan <nickpan47@gmail.com>

Closes #285 from fredji97/master_doc and squashes the following commits:

58cc775 [Fred Ji] SAMZA-1408 keep the doc unchanged for minor release in master and update PR based on feedback
fc41efe [Fred Ji] SAMZA-1408 update master doc to use 0.14.0-SNAPSHOT after 0.13.1 release

3 months agoFix broken link for KeyValueStore in Samza's feature preview web-page
Jagadish [Thu, 24 Aug 2017 04:47:50 +0000 (21:47 -0700)] 
Fix broken link for KeyValueStore in Samza's feature preview web-page

3 months agoSAMZA-1382: added Zk communication protocol version verification
Boris Shkolnik [Wed, 23 Aug 2017 00:55:54 +0000 (17:55 -0700)] 
SAMZA-1382: added Zk communication protocol version verification

Author: Boris Shkolnik <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #255 from sborya/ZkCommunicationVersion

3 months agoSAMZA-1375: Implement Lock for Azure using Lease Blob
Pawas Chhokra [Sat, 19 Aug 2017 00:49:52 +0000 (17:49 -0700)] 
SAMZA-1375: Implement Lock for Azure using Lease Blob

Author: PawasChhokra <pawas2703@gmail.com>
Author: PawasChhokra <Jaimatadi1$>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #280 from PawasChhokra/AzureDistributedLock

3 months agoSAMZA-1378: Introduce and Implement Scheduler Interface for Polling in Azure
Pawas Chhokra [Fri, 18 Aug 2017 21:38:46 +0000 (14:38 -0700)] 
SAMZA-1378: Introduce and Implement Scheduler Interface for Polling in Azure

PR 1: AzureClient + AzureConfig
PR 2: LeaseBlobManager
PR 3: BlobUtils + JobModelBundle
PR 4: TableUtils + ProcessorEntity
PR 5: AzureLeaderElector
PR 6: Added all schedulers (current PR)

Author: PawasChhokra <Jaimatadi1$>
Author: PawasChhokra <pawas2703@gmail.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #261 from PawasChhokra/AzureSchedulers

3 months agoSAMZA-1401: update NOTICE for 0.13.1 release with RocksDB BSD+patents license
Fred Ji [Fri, 18 Aug 2017 00:54:40 +0000 (17:54 -0700)] 
SAMZA-1401: update NOTICE for 0.13.1 release with RocksDB BSD+patents license

Please see details in SAMZA-1401.

Author: Fred Ji <haifeng.ji@gmail.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #279 from fredji97/rocksdb

3 months agoSAMZA-1400: disabling flaky test testTwoStreamProcessors in TestZkStreamProcessorSession
Fred Ji [Thu, 17 Aug 2017 18:16:10 +0000 (11:16 -0700)] 
SAMZA-1400: disabling flaky test testTwoStreamProcessors in TestZkStreamProcessorSession

In this SAMZA-1400, We are disabling this flaky one in master and will cherry pick for 0.13.1. We have created a ticket SAMZA-1399 for fixing it in later build.

navina sborya Please take a look.

Author: Fred Ji <haifeng.ji@gmail.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #278 from fredji97/SAMZA1400

3 months agoSAMZA-1355 : Enable standalone integration tests conditionally in build.
Shanthoosh Venkataraman [Wed, 16 Aug 2017 23:25:49 +0000 (16:25 -0700)] 
SAMZA-1355 : Enable standalone integration tests conditionally in build.

It's run only when includeSamzaTest gradle project property is set.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #237 from shanthoosh/disable_samza_test_build_with_a_flag

3 months agoSAMZA-1397; log.debug loop to run only if debug is enabled
Boris Shkolnik [Wed, 16 Aug 2017 22:08:51 +0000 (15:08 -0700)] 
SAMZA-1397; log.debug loop to run only if debug is enabled

Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Jagadish <jagadish@apache.org?

Closes #275 from sborya/LocalityLogDebug

3 months agoSAMZA-1396 TestZkLocalApplicationRunner tests fails after SAMZA-1385
Navina Ramesh [Wed, 16 Aug 2017 21:42:18 +0000 (14:42 -0700)] 
SAMZA-1396 TestZkLocalApplicationRunner tests fails after SAMZA-1385

* Fixes ZkPath issues
* Fixes appname / jobname mismatch

Author: Navina Ramesh <navina@apache.org>

Reviewers: Xinyu Liu <xinyu@apache.org>, Bharath Kumarasubramanian <codin.martial@gmail.com>

Closes #274 from navina/SAMZA-1396

3 months agoSAMZA-1394 add Fred Ji's PGP key for release purpose
Fred Ji [Wed, 16 Aug 2017 16:53:13 +0000 (09:53 -0700)] 
SAMZA-1394 add Fred Ji's PGP key for release purpose

Author: Fred Ji <fji@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #273 from fredji97/master

3 months agoSAMZA-1385: Fix zookeeper path conflict between LocalApplicationRunner and ZkJobCoord...
Bharath Kumarasubramanian [Tue, 15 Aug 2017 05:02:24 +0000 (22:02 -0700)] 
SAMZA-1385: Fix zookeeper path conflict between LocalApplicationRunner and ZkJobCoordinator

Tested the fix w/ sample page view adclick joiner job.
navina sborya nickpan47 can you please take a look at the RB?

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Navina Ramesh <navina@apache.org>, Yi Pan <nickpan47@gmail.com>, Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Closes #265 from bharathkk/master

3 months agoSAMZA-1390; Update SamzaMonitorService to spawn deamon threads
Shanthoosh Venkataraman [Tue, 15 Aug 2017 00:01:09 +0000 (17:01 -0700)] 
SAMZA-1390; Update SamzaMonitorService to spawn deamon threads

Observed in LinkedIn production setup that samza-rest jvm process doesn’t stop after main
thread death(due to jetty server failures) since non-deamon threads spawned for
`SamzaMonitorService` are alive.

This affects samza-rest jvm process lifecycle management. To fix this, plugging
in ThreadFactory which sets Thread name format & marks them as daemon.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #270 from shanthoosh/make_samza_rest_non_daemon

3 months agoSAMZA-1388: Flaky test - TestStatefulTask#testShouldStartAndRestore
Jacob Maes [Fri, 11 Aug 2017 22:50:12 +0000 (15:50 -0700)] 
SAMZA-1388: Flaky test - TestStatefulTask#testShouldStartAndRestore

I believe the problem originated from SAMZA-173.

The core issue is testShouldRestoreStore was not updated to expect 6 messages after 2 more messages were added to testShouldStartTaskForFirstTime.

Fixed the issue and refactored the code so the 2 methods wouldn't disagree again in the future.

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #269 from jmakes/samza-1388

3 months agoSAMZA-1389: Fix ZkProcessorLatch await(timeout, TimeUnit) api.
Shanthoosh Venkataraman [Fri, 11 Aug 2017 18:51:32 +0000 (11:51 -0700)] 
SAMZA-1389: Fix ZkProcessorLatch await(timeout, TimeUnit) api.

Use passed in timeUnit value for zkClient.waitUnitExists method rather than hardcoding with  `TimeUnit.MILLISECONDS`.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Boris Shkolnik <boryas@apache.org>,Fred Ji <fredji97@yahoo.com>,Jagadish <jvenkatr@linkedin.com>

Closes #268 from shanthoosh/fix_zklatch_impl

3 months agoSAMZA-1387: Unable to Start Samza App Because Regex Check
Jacob Maes [Fri, 11 Aug 2017 16:28:20 +0000 (09:28 -0700)] 
SAMZA-1387: Unable to Start Samza App Because Regex Check

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Fred Ji <fredji97@yahoo.com>

Closes #266 from jmakes/samza-1387

3 months agoSAMZA-1384: Race condition with async commit affects checkpoint correctness
Jacob Maes [Thu, 10 Aug 2017 22:44:42 +0000 (15:44 -0700)] 
SAMZA-1384: Race condition with async commit affects checkpoint correctness

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #263 from jmakes/samza-1384

3 months agoMerge branch 'master' into 0.14.0
Yi Pan (Data Infrastructure) [Wed, 9 Aug 2017 17:39:55 +0000 (10:39 -0700)] 
Merge branch 'master' into 0.14.0

Conflicts:
samza-core/src/main/java/org/apache/samza/processor/StreamProcessor.java

3 months agoSAMZA-1374: Implement Leader Election using Lease Blob in Azure
Pawas Chhokra [Wed, 9 Aug 2017 01:26:50 +0000 (18:26 -0700)] 
SAMZA-1374: Implement Leader Election using Lease Blob in Azure

PR 1: AzureClient + AzureConfig
PR 2: LeaseBlobManager
PR 3: BlobUtils + JobModelBundle
PR 4: TableUtils + ProcessorEntity
**PR 5: AzureLeaderElector** (current PR)

Author: PawasChhokra <Jaimatadi1$>
Author: PawasChhokra <pawas2703@gmail.com>

Reviewers: Navina Ramesh <navina@apache.org>, Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Closes #259 from PawasChhokra/LeaderElection

3 months agoSAMZA-1381: Create Utility Class for interacting with Azure Table Storage
Pawas Chhokra [Tue, 8 Aug 2017 00:17:33 +0000 (17:17 -0700)] 
SAMZA-1381: Create Utility Class for interacting with Azure Table Storage

PR 1: AzureClient + AzureConfig
PR 2: LeaseBlobManager
PR 3: BlobUtils + JobModelBundle
**PR 4: TableUtils + ProcessorEntity** (Current PR)

Author: PawasChhokra <Jaimatadi1$>
Author: PawasChhokra <pawas2703@gmail.com>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #258 from PawasChhokra/TableUtils