samza.git
14 hours agoSAMZA-1536; Adding docs for Kinesis consumer master
Aditya Toomula [Thu, 14 Dec 2017 23:33:29 +0000 (15:33 -0800)] 
SAMZA-1536; Adding docs for Kinesis consumer

Author: Aditya Toomula <atoomula@atoomula-ld1.linkedin.biz>

Reviewers: Jagadish <jagadish@apache.org>

Closes #384 from atoomula/kinesis-docs

17 hours agoAdded document for table API to feature preview
Wei Song [Thu, 14 Dec 2017 20:04:34 +0000 (12:04 -0800)] 
Added document for table API to feature preview

Added document for table API to feature preview
 - Brief description of table
 - sendTo() operator for table
 - join() operator for stream-table-join

Author: Wei Song <wsong@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #387 from weisong44/table-api-14

2 days agoSAMZA-1437; Added Eventhub producer and consumer docs
Daniel Chen [Wed, 13 Dec 2017 05:47:38 +0000 (21:47 -0800)] 
SAMZA-1437; Added Eventhub producer and consumer docs

Still need to add tutorials, and configs to configurations table

vjagadish1989  for review

Author: Daniel Chen <29577458+dxichen@users.noreply.github.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #382 from dxichen/eventhub-docs

2 days agoSAMZA-1502; Make AllSspToSingleTaskGrouper work with Yarn and ZK JobCoordinator
Aditya Toomula [Wed, 13 Dec 2017 05:41:37 +0000 (21:41 -0800)] 
SAMZA-1502; Make AllSspToSingleTaskGrouper work with Yarn and ZK JobCoordinator

Sending a fresh review as I lost the earlier diffs. This is the new approach that we discussed by adding the processor list in the config and passing it to grouper.

Author: Aditya Toomula <atoomula@atoomula-ld1.linkedin.biz>

Reviewers: Yi Pan <nickpan47@gmail.com>, Shanthoosh V <svenkataraman@linkedin.com>

Closes #383 from atoomula/samza

2 days agoSAMZA-1512: Documentation on the multi-stage batch processing
Xinyu Liu [Wed, 13 Dec 2017 01:07:07 +0000 (17:07 -0800)] 
SAMZA-1512: Documentation on the multi-stage batch processing

Add overview documentation to explain how partitionBy(), checkpoint and state works in batch. Also organized the existing hdfs consumer/producer docs into the same hadoop folder under documentation.

Author: xinyuiscool <xinyuliu.us@gmail.com>

Reviewers: Jake Maes <jmakes@gmail.com>

Closes #381 from xinyuiscool/SAMZA-1512

2 days agoSAMZA-1534: Fix the visualization in job graph with the new PartitionBy Op
Xinyu Liu [Tue, 12 Dec 2017 20:16:39 +0000 (12:16 -0800)] 
SAMZA-1534: Fix the visualization in job graph with the new PartitionBy Op

Seems the stream and the partitionBy op has the same id. So in rendering I added the stream as the id for the node. Also resolved the run.id collision issue.

Author: xiliu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jagadish V <vjagadish1989@gmail.com>

Closes #385 from xinyuiscool/SAMZA-1534

2 days agoDocumentation for Samza SQL
Srinivasulu Punuru [Tue, 12 Dec 2017 18:46:01 +0000 (10:46 -0800)] 
Documentation for Samza SQL

**Samza tools** :
Contains the following tools that can be used for playing with Samza sql or any other samza job.

1. Generate kafka events : Tool used to generate avro serialized kafka events
2. Event hub consumer : Tool used to consume events from event hubs topic. This can be used if the samza job writes events to event hubs.
3. Samza sql console : Tool used to execute SQL using samza sql.

Adds documentation on how to use Samza SQL on a local machine and on a yarn environment and their associated Samza tooling.

https://issues.apache.org/jira/browse/SAMZA-1526

Author: Srinivasulu Punuru <spunuru@linkedin.com>

Reviewers: Yi Pan<nickpan47@gmail.com>, Jagadish<jagadish@apachehe.org>

Closes #374 from srinipunuru/docs.1

3 days agoSAMZA-1532; Eventhub connector fix
Daniel Chen [Tue, 12 Dec 2017 00:45:44 +0000 (16:45 -0800)] 
SAMZA-1532; Eventhub connector fix

Key fixes vjagadish1989 lhaiesp srinipunuru
- Switched Producer source vs destination assumptions in `send`, `register`
- Check `OME.key` if `OME.partitionId` is null for to get partitionId
- Upcoming offset changed the `END_OF_STREAM` rather than `newestOffset` + 1, eventHub returns an error if the offset does not exist in the system
- Made the NewestOffset+1 as upcoming offset, consumer checks if the offset is valid on startup
- Differentiated between streamNames and streamIds in configs, consumer, producer
- Checkpoint table named after job name
- Checkpoint prints better message for invalid key on write

QOL
- How to ignore integration tests
- Improved logging

EDIT:
- Also added Round Robin producer partitioning

Author: Daniel Chen <29577458+dxichen@users.noreply.github.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #377 from dxichen/eventhub-connector-fix

9 days agoInitial version of Table API
Wei Song [Tue, 5 Dec 2017 21:23:44 +0000 (13:23 -0800)] 
Initial version of Table API

Initial version of table API, it includes
 - Core table API (Table, TableDescriptor, TableSpec)
 - Local table implementation for in-memory and RocksDb
 - The writeTo() and stream-table join operators

nickpan47 xinyuiscool prateekm could you help review?

Author: Wei Song <wsong@linkedin.com>

Reviewers: Yi Pan <nickpan47@gmail.com>, Christopher Pettitt <cpettitt@linkedin.com>

Closes #349 from weisong44/table-api-14

10 days agoFix compile errors with scala 2.12 and update release notes to use check all
Bharath Kumarasubramanian [Tue, 5 Dec 2017 01:11:03 +0000 (17:11 -0800)] 
Fix compile errors with scala 2.12 and update release notes to use check all

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Xinyu Liu <xinyuliu.us@gmail.com>

Closes #378 from bharathkk/master

10 days agoUpdated Serde related documentation and error messages for High Level API
Prateek Maheshwari [Mon, 4 Dec 2017 22:21:21 +0000 (14:21 -0800)] 
Updated Serde related documentation and error messages for High Level API

Updated and clarified the documentation and error messages related to Serdes for Input/Output/PartitionBy streams.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish Venkatraman <vjagadish1989@gmail.com>

Closes #376 from prateekm/documentation-cleanup

13 days agoSAMZA-1506: Fix for robust ContainerHeartbeatMonitor exception handling.
Abhishek Shivanna [Fri, 1 Dec 2017 23:23:12 +0000 (15:23 -0800)] 
SAMZA-1506: Fix for robust ContainerHeartbeatMonitor exception handling.

The Fix includes the following changes:
- Catch all exceptions inside the heartbeat thread and not just
  IOException.
- A time based force kill when the heartbeat is invalid,
  this makes the monitor immune to threads that may keep the
  container stuck in the shutdown sequence. When the timeout
  occurs, a System.exit(1) is called.
- Increasing number of retries for failed heartbeats from 3 to 6.
  This prevents short intermittent network failurs from causing the
  containers to be invalidated.

Author: Abhishek Shivanna <abhisheks91@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #375 from abhishekshivanna/container-heartbeat

13 days agoSAMZA-1519: Add release notes to website documentation
navina [Fri, 1 Dec 2017 16:57:06 +0000 (08:57 -0800)] 
SAMZA-1519: Add release notes to website documentation

Adding a versioned page for release/upgrade notes. We can start this process from the next major version release, aka 0.14.0.

Please update this page as and when you add new features/configs/API or deprecate features/configs/API. Basically, anything that can be useful for Samza users trying to upgrade.

Note: `site.version` is not necessarily same as samza release version. For now, I am using it as a placeholder. Hopefully, with the next generation of our website, it will be better defined.

Author: navina <navina@apache.org>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #301 from navina/versioning

2 weeks agoSAMZA-1518: Add CPU utilization and file descriptor count to JvmMetrics
Jacob Maes [Fri, 1 Dec 2017 01:37:40 +0000 (17:37 -0800)] 
SAMZA-1518: Add CPU utilization and file descriptor count to JvmMetrics

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Jagadish <jvenkatr@linkedin.com>, Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #372 from jmakes/samza-1518

2 weeks agoSAMZA-1516: Another round of issues found by BEAM tests
xiliu [Wed, 29 Nov 2017 19:34:28 +0000 (11:34 -0800)] 
SAMZA-1516: Another round of issues found by BEAM tests

A couple of more fixes: 1. fix a bug of identifying input streams for an operator. 2. for partitionBy, set the partitionKey to 0L when key is null.

Author: xiliu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jagadish V <vjagadish1989@gmail.com>

Closes #370 from xinyuiscool/SAMZA-1516

2 weeks agoSAMZA-1514: Prevent changelog stream names with empty strings.
Daniel Nishimura [Wed, 29 Nov 2017 00:19:45 +0000 (16:19 -0800)] 
SAMZA-1514: Prevent changelog stream names with empty strings.

This fix prevents changelog stream names with empty or whitespace-only strings.

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #371 from dnishimura/samza-1514-empty-string-changelog

2 weeks agoSAMZA-1412 replace mockito-all with mockito-core
Fred Ji [Tue, 28 Nov 2017 22:36:24 +0000 (14:36 -0800)] 
SAMZA-1412 replace mockito-all with mockito-core

ran "./gradlew clean check" and all tests passed

Author: Fred Ji <haifeng.ji@gmail.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #365 from fredji97/mockito-core

2 weeks agoSAMZA-1515; Implement a consumer for Kinesis
Aditya Toomula [Tue, 28 Nov 2017 21:12:10 +0000 (13:12 -0800)] 
SAMZA-1515; Implement a consumer for Kinesis

Author: Aditya Toomula <atoomula@atoomula-ld1.linkedin.biz>

Reviewers: Jagadish<jagadish@apache.org>

Closes #368 from atoomula/kinesis

2 weeks agoSAMZA-1513; Doc updates for persistent windows, joins and serdes.
Jagadish [Tue, 28 Nov 2017 20:30:35 +0000 (12:30 -0800)] 
SAMZA-1513; Doc updates for persistent windows, joins and serdes.

jmakes prateekm nickpan47 for review.

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Jake Maes<jmaes@linkedin.com>

Closes #369 from vjagadish1989/doc-updates

2 weeks agoFixing broken link between Yarn Host Affinity and Resource Localizati…
navina [Mon, 27 Nov 2017 21:49:35 +0000 (13:49 -0800)] 
Fixing broken link between Yarn Host Affinity and Resource Localizati…

Fixing broken link between Yarn Host Affinity and Resource Localization pages under Documentation

Patch needs to be back-ported to 0.13.0 website!

Author: navina <navina@apache.org>

Reviewers: Jake Maes <jmakes@gmail.com>

Closes #302 from navina/website-link-fix

3 weeks agoSAMZA-1507; Create changelog streams in Leader(ZkJobCoordinator) for stateful operators.
Shanthoosh Venkataraman [Wed, 22 Nov 2017 19:54:10 +0000 (11:54 -0800)] 
SAMZA-1507; Create changelog streams in Leader(ZkJobCoordinator) for stateful operators.

Verified with a test standalone job. Will add integration test for this as a part of fixing and reenabling standalone integration tests.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Boris Shkolnik <boryas@apache.org>

Closes #362 from shanthoosh/master

3 weeks agoSAMZA-1494; Flush operator state at end of stream
Jagadish [Tue, 21 Nov 2017 20:42:17 +0000 (12:42 -0800)] 
SAMZA-1494; Flush operator state at end of stream

- Propagate operator messages at endOfStream to all down-stream operators.
- Emit all pending windows when endOfStream is reached.
- Flush all state on endOfStream irrespective of auto-commit behavior.

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Xinyu Liu <xiliu@linkedin.com>

Closes #366 from vjagadish1989/eos-flush

3 weeks agoSAMZA-1505: Fix CheckpointTool writing only one ssp per task
Xinyu Liu [Tue, 21 Nov 2017 19:26:55 +0000 (11:26 -0800)] 
SAMZA-1505: Fix CheckpointTool writing only one ssp per task

Currently when using CheckpointTool to write checkpoints, it only writes a checkpoint of a single ssp per task. By debugging the code, looks like the flatMap() on the checkpoint of Optional tuple(taskname -> Map(ssp -> offset)) merges the results by key taskname. This patch stores the results explicitly in a list and then groupBy() on it, which fixes the problem.

Author: xiliu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jake Maes <jmakes@gmail.com>

Closes #364 from xinyuiscool/SAMZA-1505

3 weeks agoSAMZA-1482: add config documentation for auto-restart/fail behavior o…
Yi Pan (Data Infrastructure) [Mon, 20 Nov 2017 19:48:14 +0000 (11:48 -0800)] 
SAMZA-1482: add config documentation for auto-restart/fail behavior o…

…n partition count changes

Author: Yi Pan (Data Infrastructure) <nickpan47@gmail.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #363 from nickpan47/partition-change-docsite and squashes the following commits:

5342ea0 [Yi Pan (Data Infrastructure)] SAMZA-1482: fix configuration table in documentation site
7c9f326 [Yi Pan (Data Infrastructure)] SAMZA-1482: add config documentation for auto-restart/fail behavior on partition count changes

3 weeks agoSAMZA-1479; Refactor KafkaCheckpointManager, KafkaCheckpointLogKey and their tests
Jagadish [Fri, 17 Nov 2017 22:41:05 +0000 (14:41 -0800)] 
SAMZA-1479; Refactor KafkaCheckpointManager, KafkaCheckpointLogKey and their tests

Notable changes:
* Rewrite `KafkaCheckpointLogKey` into two classes - an immutable class, and a SerDe
* Remove dependency on static setters in the `KafkaCheckpointLogKey`
* Change lifecycle of components in KafkaCheckpointManager
  - It's safe to start producers and consumers during `start` as opposed to lazy loading them during writes, and reads.
  - Initialize systemProducer and systemConsumer during construction
* Simplify logic for ignoring checkpoint validations
* Re-write checkpointManager#readLog() to use a simpler API.
* Remove unnecessary complexity after the migration from 0.8
* Remove unnecessary locking in startup, and shut-down
* Remove dependencies on SimpleConsumer configs like bufferSize, fetchSize, socketTimeout
* Refactor KafkaCheckpointManagerFactory and remove static getCheckpointSystemNameAndFactory
* Bug-fix : Register the taskName correctly (instead of using a dummy string for the taskName)

* Add unit tests to verify more checkpoint scenarios
* Consolidate unit tests into utils for creating producer, consumer and admin instances
* Convert/consolidate most long-running integration tests into unit tests

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #348 from vjagadish1989/kafka-checkpointmanager-refactor

4 weeks agoSamza SQL implementation for basic projects, filtering and UDFs
Srinivasulu Punuru [Thu, 16 Nov 2017 19:14:32 +0000 (11:14 -0800)] 
Samza SQL implementation for basic projects, filtering and UDFs

## Samza SQL implementation for basic projects, filtering and

## Design document:
https://docs.google.com/document/d/1bE-ZuPfTpntm1hT3GwQEShYDiTqU3IkxeP4-3ZcGHgU/edit?usp=sharing

Author: Srinivasulu Punuru <spunuru@linkedin.com>

Reviewers: Yi Pan <nickpan47@gmail.com>

Closes #295 from srinipunuru/samza-sql.1

4 weeks agoSAMZA-1504: Allow user to register container-level metrics
Xinyu Liu [Thu, 16 Nov 2017 18:46:05 +0000 (10:46 -0800)] 
SAMZA-1504: Allow user to register container-level metrics

This change allows user to register the metrics on the per-container basis.

Tested in beam runner and works as expected.

Author: xiliu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Prateek Maheshwari <prateekm@apache.org>

Closes #361 from xinyuiscool/SAMZA-1504

4 weeks agoSAMZA-1495: Set intermediate streams as higher priority by default
Prateek Maheshwari [Wed, 15 Nov 2017 18:45:08 +0000 (10:45 -0800)] 
SAMZA-1495: Set intermediate streams as higher priority by default

Most changes in StreamConfig are formatting fixes.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish Venkatraman <vjagadish1989@gmail.com>

Closes #358 from prateekm/intermediate-stream-priority

4 weeks agoSAMZA-1501: Validate operator IDs so that they don't contain special characters and...
Prateek Maheshwari [Wed, 15 Nov 2017 18:27:08 +0000 (10:27 -0800)] 
SAMZA-1501: Validate operator IDs so that they don't contain special characters and spaces

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>, Jagadish Venkatraman <vjagadish1989@gmail.com>

Closes #359 from prateekm/operator-id-validation

4 weeks agoSAMZA-1482: Restart or fail Samza jobs in YARN when detecting changes…
Yi Pan (Data Infrastructure) [Tue, 14 Nov 2017 20:45:23 +0000 (12:45 -0800)] 
SAMZA-1482: Restart or fail Samza jobs in YARN when detecting changes…

… in input topic partitions

Some high-lights of the changes:
- always instantiating StreamPartitionCountMonitor on all input system streams now
-- it is debatable whether we want to include systems that do not implement the optimized ExtendedSystemAdmin interface. We may need to configure a long partition monitor interval for this case and the case where there are tons of input topics. (Pending perf test)
- moved the instantiation of StreamPartitionCountMonitor out of JobModelManager and allow ClusterBasedJobCoordinator associate a callback method directly to the monitor
- allow callbacks to set different application status code before throwing exception to shutdown the job

Author: Yi Pan (Data Infrastructure) <nickpan47@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>, Jagadish <jagadish@apache.org>

Closes #351 from nickpan47/restart-on-partition-change and squashes the following commits:

8d04cd6 [Yi Pan (Data Infrastructure)] SAMZA-1482: restart or fail the job when input topic partition count changes
ee3fa65 [Yi Pan (Data Infrastructure)] SAMZA-1482: Restart or fail Samza jobs in YARN when detecting changes in input topic partitions

4 weeks agoSAMZA-1474: Bump up rocksdb version to 5.7.3 to include licensing changes
Bharath Kumarasubramanian [Tue, 14 Nov 2017 19:00:34 +0000 (11:00 -0800)] 
SAMZA-1474: Bump up rocksdb version to 5.7.3 to include licensing changes

You can find more details on the bug fixes and API changes [here](https://github.com/facebook/rocksdb/releases). I upgraded to 5.7.3 since 5.8.0 has a regression [KAFKA-6100](https://issues.apache.org/jira/browse/KAFKA-6100)

All of our tests passed locally. I will monitor travis for failures.

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Boris Shkolnik <boryas@gmail.com>

Closes #344 from bharathkk/master

4 weeks agoSAMZA-1487: Disable Flaky Zk Integration tests.
Shanthoosh Venkataraman [Tue, 14 Nov 2017 18:54:32 +0000 (10:54 -0800)] 
SAMZA-1487: Disable Flaky Zk Integration tests.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Yi Pan <nickpan47@gmail.com>

Closes #355 from shanthoosh/master

4 weeks agoSAMZA-1490: Fix TestRepartitionJoinWindowApp
Dong Lin [Tue, 14 Nov 2017 18:48:50 +0000 (10:48 -0800)] 
SAMZA-1490: Fix TestRepartitionJoinWindowApp

Author: Dong Lin <lindong28@gmail.com>

Reviewers: Prateek Maheshwari <prateekm@apache.org>

Closes #356 from lindong28/SAMZA-1490

4 weeks agoSAMZA-1492: Add exception counter to LocalStoreMonitor.
Shanthoosh Venkataraman [Tue, 14 Nov 2017 15:56:13 +0000 (07:56 -0800)] 
SAMZA-1492: Add exception counter to LocalStoreMonitor.

Add storeGCExceptionCounter to LocalStoreMonitor(as a part of LocalStoreMonitorMetrics) to enable alerts setup in an production environment.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #357 from shanthoosh/local_store_monitor_metrics_1

5 weeks agoSAMZA-1480: TaskStorageManager improperly initializes changelog consu…
Jacob Maes [Thu, 9 Nov 2017 19:54:32 +0000 (11:54 -0800)] 
SAMZA-1480: TaskStorageManager improperly initializes changelog consu…

…mer position when restoring a store from disk

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #350 from jmakes/samza-1480

5 weeks agoIgnore java fatal error log files from git and rat
Prateek Maheshwari [Thu, 9 Nov 2017 18:45:04 +0000 (10:45 -0800)] 
Ignore java fatal error log files from git and rat

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #354 from prateekm/rat-exclude-hserr

5 weeks agoSAMZA-1487: Disable Flaky Zk Integration tests.
Shanthoosh Venkataraman [Thu, 9 Nov 2017 18:24:53 +0000 (10:24 -0800)] 
SAMZA-1487: Disable Flaky Zk Integration tests.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Prateek Maheshwari <prateekm@apache.org>

Closes #353 from shanthoosh/master

5 weeks agoFix test failures caused by PR345
xiliu [Wed, 8 Nov 2017 02:00:03 +0000 (18:00 -0800)] 
Fix test failures caused by PR345

5 weeks agoSAMZA-1486; Checkpoint manager implementation with Azure Table
Daniel Chen [Wed, 8 Nov 2017 01:25:58 +0000 (17:25 -0800)] 
SAMZA-1486; Checkpoint manager implementation with Azure Table

vjagadish1989

Author: Daniel Chen <29577458+dxichen@users.noreply.github.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #341 from dxichen/azure-checkpoint-manager

5 weeks agoSAMZA-1437; Added tests, make eventData creation extensible
Daniel Chen [Wed, 8 Nov 2017 01:15:28 +0000 (17:15 -0800)] 
SAMZA-1437; Added tests, make eventData creation extensible

vjagadish1989 Required for internal custom eventData creation

Author: Daniel Chen <29577458+dxichen@users.noreply.github.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #352 from dxichen/create-event-data-extensible

5 weeks agoSAMZA-1477: Fix issues found by BEAM tests
Xinyu Liu [Tue, 7 Nov 2017 20:16:15 +0000 (12:16 -0800)] 
SAMZA-1477: Fix issues found by BEAM tests

A bunch of issues were found by BEAM tests, which includes:

1) WatermarkFunction needs to be able to return output after processWatermark()
2) control message doesn't implement the equals() and hashcode()
3) Some kafka system related code is not scala 2.10 compatible for tests.

Author: xiliu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Prateek

Closes #345 from xinyuiscool/SAMZA-1477

6 weeks agoSAMZA-1476: Disable flaky TestStatefulTask.testShouldStartAndRestore
Jagadish [Tue, 31 Oct 2017 15:50:19 +0000 (08:50 -0700)] 
SAMZA-1476: Disable flaky TestStatefulTask.testShouldStartAndRestore

7 weeks agoSAMZA-1473; Fix handling of initial values for aggregating windows
Jagadish [Fri, 27 Oct 2017 06:14:41 +0000 (23:14 -0700)] 
SAMZA-1473; Fix handling of initial values for aggregating windows

- Handle initial values for windows with foldLeftFunctions configured
- Improve trace level logging

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Prateek M< <pmaheshw@linkedin.com>

Closes #343 from vjagadish1989/aggregating-window-fix

7 weeks agoSAMZA-1438; SystemProducer, Consumer and Admin interfaces for EventHubs
Daniel Chen [Fri, 27 Oct 2017 05:26:29 +0000 (22:26 -0700)] 
SAMZA-1438; SystemProducer, Consumer and Admin interfaces for EventHubs

Author: Daniel Chen <29577458+dxichen@users.noreply.github.com>

Reviewers: Jagadish <jagadish@apache.org>, Prateek M<pmaheshw@linkedin.com>

Closes #308 from dxichen/eventHub-Connector

7 weeks agoSAMZA-1471: SystemConsumers should not poll ssp that hit end of stream
Hai Lu [Fri, 27 Oct 2017 00:48:47 +0000 (17:48 -0700)] 
SAMZA-1471: SystemConsumers should not poll ssp that hit end of stream

When SystemConsumers poll from SSPs that have hit end of stream, obviously there will be no data return and the poll will exhaust the timeout. This would cause performance issue.

Author: Hai Lu <halu@linkedin.com>

Reviewers: Xinyu Liu <xinyuliu.us@gmail.com>

Closes #342 from lhaiesp/master

7 weeks agoSAMZA-1472: YarnRestJobStatusProvider constructor must be public
Jacob Maes [Thu, 26 Oct 2017 01:03:11 +0000 (18:03 -0700)] 
SAMZA-1472: YarnRestJobStatusProvider constructor must be public

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>,Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Closes #340 from jmakes/samza-1472

7 weeks agoSAMZA-1454: Globally unique and user settable IDs for stateful operators
Prateek Maheshwari [Thu, 26 Oct 2017 00:18:36 +0000 (17:18 -0700)] 
SAMZA-1454: Globally unique and user settable IDs for stateful operators

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish Venkatraman <jagadish@apache.org>, Yi Pan <nickpan47@gmail.com>

Closes #324 from prateekm/operator-id-uniqueness

7 weeks agoSAMZA-1464: Flushing a closed RocksDB store causes SIGSEGVs
Prateek Maheshwari [Wed, 25 Oct 2017 23:27:52 +0000 (16:27 -0700)] 
SAMZA-1464: Flushing a closed RocksDB store causes SIGSEGVs

Made RocksDB operations check if DB is still open to avoid segfaults.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish Venkatraman <jagadish@apache.org>, Xinyu Liu <xinyuiscool@gmail.com>

Closes #334 from prateekm/segfault-fix

7 weeks agoSAMZA-1470: Wrong job status returned by YarnRestJobStatusProvider wh…
Jacob Maes [Wed, 25 Oct 2017 22:18:11 +0000 (15:18 -0700)] 
SAMZA-1470: Wrong job status returned by YarnRestJobStatusProvider wh…

…en there are multiple app

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Jagadish <jvenkatr@linkedin.com>

Closes #339 from jmakes/samza-1470-4

7 weeks agoTestExecutionPlanner compilation error
Boris S [Wed, 25 Oct 2017 01:43:51 +0000 (18:43 -0700)] 
TestExecutionPlanner compilation error

Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #337 from sborya/TestExecutionPlannerCompilationError

7 weeks agoSAMZA-1457: Set retention for internal streams for Batch application
Xinyu Liu [Wed, 25 Oct 2017 01:20:32 +0000 (18:20 -0700)] 
SAMZA-1457: Set retention for internal streams for Batch application

For intermediate streams, checkpoint and changelog, we need to set a short retention period for batch.

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>
Author: xiliu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jagadish V <vjagadish1989@gmail.com>

Closes #328 from xinyuiscool/SAMZA-1457

7 weeks agoStreamOperatorTask does not need to be final.
Boris S [Wed, 25 Oct 2017 00:44:21 +0000 (17:44 -0700)] 
StreamOperatorTask does not need to be final.

Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #336 from sborya/UnmakeStreamOperatorTaskFinal and squashes the following commits:

c44ee58 [Boris S] StreamOperatorTask does not need to be final
d4620d6 [Boris S] Merge branch 'master' of https://github.com/apache/samza
410ce78 [Boris S] Merge branch 'master' of https://github.com/apache/samza
a31a7aa [Boris Shkolnik] reduce debugging from info to debug in KafkaCheckpointManager.java

8 weeks agoMinor fixes to KeyValueStore and RocksDBKeyValueStore
Prateek Maheshwari [Thu, 19 Oct 2017 18:36:02 +0000 (11:36 -0700)] 
Minor fixes to KeyValueStore and RocksDBKeyValueStore

1. Replaced extension class in KeyValueStore with default methods.
2. Fixed formatting in RocksDBKeyValueStore#openDB.
3. Now logs original RocksDBException on errors opening the db. Other minor log message cleanup.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #332 from prateekm/store-fixes

8 weeks agoSAMZA-1466: Flaky test: TestRocksDbKeyValueStore suite
Prateek Maheshwari [Thu, 19 Oct 2017 18:01:56 +0000 (11:01 -0700)] 
SAMZA-1466: Flaky test: TestRocksDbKeyValueStore suite

TestRocksDbKeyValueStore tests are failing intermittently with errors like:
```
testFlush FAILED
    org.apache.samza.SamzaException: Error opening RocksDB store dbStore at location /tmp, received the following exception from RocksDB org.rocksdb.RocksDBException: "
        at org.apache.samza.storage.kv.RocksDbKeyValueStore$.openDB(RocksDbKeyValueStore.scala:99)
        at org.apache.samza.storage.kv.TestRocksDbKeyValueStore.testFlush(TestRocksDbKeyValueStore.scala:73)
```

These happen intermittently because:
1. RocksDB throws an exception if open is called on a store that's already open.
2. A new unit test was added in PR #327 that didn't close the store.
3. Other tests in this class use the same store directory (System.getProperty("java.io.tmpdir")) and run concurrently. Any test that runs after the one in 2 above fails.

Even when we log the RocksDBException (fixed in pr #332), the exception message is malformed due to a bug in RocksDB: https://github.com/facebook/rocksdb/issues/1688. This is fixed in the latest RocksDB version (verified with 5.8.0), so the messages should be more meaningful after an upgrade. Fixed stack trace will say something like:
```
Caused by: org.rocksdb.RocksDBException: While lock file: /var/folders/1b/4nqqvf4s27sby0frjr0q5t_h0004hp/T/LOCK: No locks available
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:231)
at org.apache.samza.storage.kv.RocksDbKeyValueStore$.openDB(RocksDbKeyValueStore.scala:70)
```

Fixing just the test in the mean time.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #333 from prateekm/store-test-fix

8 weeks agoSAMZA-1465: Performance regression for KafkaCheckpointManager
Jacob Maes [Wed, 18 Oct 2017 22:30:34 +0000 (15:30 -0700)] 
SAMZA-1465: Performance regression for KafkaCheckpointManager

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>, Boris Shkolnik <boryas@apache.org>, Fred Ji <fredji97@yahoo.com>

Closes #331 from jmakes/master

8 weeks agoSAMZA-1461; Expose RocksDB properties as metrics
Janek Lasocki-Biczysko [Mon, 16 Oct 2017 23:06:28 +0000 (16:06 -0700)] 
SAMZA-1461; Expose RocksDB properties as metrics

Automatically build gauges for RocksDB properties via configuration:
`stores.<storename>.rocksdb.telemetry.list=<rocksDbProperty1>, <rocksDbProperty1>`

Author: Janek Lasocki-Biczysko <janek.lasocki-biczysko@skyscanner.net>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #327 from janeklb/jlb_rocksDBPropertiesAsMetrics

8 weeks agoSAMZA-1463 disable some flaky tests on hdfs system
Hai Lu [Mon, 16 Oct 2017 18:26:45 +0000 (11:26 -0700)] 
SAMZA-1463 disable some flaky tests on hdfs system

disable some flaky tests on hdfs system until future investigation

Author: Hai Lu <halu@linkedin.com>

Reviewers: Xinyu Liu <xinyuliu.us@gmail.com>

Closes #329 from lhaiesp/master

2 months agoChangeLog should not require KafkaSystemAdmin
Boris S [Fri, 13 Oct 2017 22:04:22 +0000 (15:04 -0700)] 
ChangeLog should not require KafkaSystemAdmin

Author: Boris S <boryas@apache.org>

Reviewers: Xinyu Liu <xinyuliu.us@gmail.com>

Closes #325 from sborya/ChangeLogRequireKafkaSystemAdmin

2 months agoSAMZA-1424 SAMZA-1425 SAMZA-1426; Support serdes and persistent state for windows
Jagadish [Fri, 13 Oct 2017 20:22:56 +0000 (13:22 -0700)] 
SAMZA-1424 SAMZA-1425 SAMZA-1426; Support serdes and persistent state for windows

Notable changes:
* Made windows durable with support for persistent recoverable stores
* New storage format to support multiple messages in windows (the previous storage format
of storing the entire message-list as a value in the store incurs significant serde overhead)
* Wire `TimeSeriesStore` with the WindowOperator implementation.

Testing:
* Existing unit tests and integration tests pass with serdes wired-up
* Will follow-up with a PR for hello-samza soon.

Note: Majority of changes are in `samza-core/src/main/java/org/apache/samza/operators/impl/WindowOperatorImpl.java` and github collapsed the diff.

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Prateek Maheshwari<pmaheshw@linkedin.com>

Closes #321 from vjagadish1989/window-operator-serde

2 months agoSAMZA-1447; Swapping out CLI JobStatusProvider for REST based implementation in samza...
Abhishek Shivanna [Wed, 11 Oct 2017 20:57:20 +0000 (13:57 -0700)] 
SAMZA-1447; Swapping out CLI JobStatusProvider for REST based implementation in samza-rest

Removing the YarnCliJobStatusProvider since forking a new shell for every request on the JobsResource endpoint is resource intensive.

Author: Abhishek Shivanna <ashivanna@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #317 from abhishekshivanna/samza-rest-oom

2 months agoSAMZA-1451: Disable integration tests conditionally in build.
Shanthoosh Venkataraman [Wed, 11 Oct 2017 20:56:24 +0000 (13:56 -0700)] 
SAMZA-1451: Disable integration tests conditionally in build.

Remove runIntegrationTests gradle property added as a part of SAMZA-1355 and introduce skipIntegrationTests property(which inverts it).
If skipIntegrationTests gradle project property is enabled, execution of all tests in samza-test project will be skipped from the build.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #319 from shanthoosh/skip_integration_tests

2 months agoSAMZA-1409; add missing parts and make guideline clearer for docs/README.md
Fred Ji [Wed, 11 Oct 2017 19:09:33 +0000 (12:09 -0700)] 
SAMZA-1409; add missing parts and make guideline clearer for docs/README.md

Author: Fred Ji <haifeng.ji@gmail.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #288 from fredji97/master_docs_readme

2 months agoSAMZA-1453; Update README with Travis-CI badge.
Daniel Nishimura [Wed, 11 Oct 2017 19:03:28 +0000 (12:03 -0700)] 
SAMZA-1453; Update README with Travis-CI badge.

Added Travis-CI to README.md file.

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #323 from dnishimura/samza-1453-readme-travisci

2 months agoSAMZA-1452: Clean up interrupted thread bugs
Nacho Solis [Wed, 11 Oct 2017 14:49:52 +0000 (07:49 -0700)] 
SAMZA-1452: Clean up interrupted thread bugs

Call Thread.currentThread().interrupt(); when capturing InterruptedException

Author: Nacho Solis <nsolis@linkedin.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>,Jagadish <jvenkatr@linkedin.com>

Closes #322 from isolis/cleancodebugs

2 months agoSAMZA-1292: Merge operator can be no-op when there are no streams to merge
Prateek Maheshwari [Mon, 9 Oct 2017 21:48:05 +0000 (14:48 -0700)] 
SAMZA-1292: Merge operator can be no-op when there are no streams to merge

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #320 from prateekm/small-fixes

2 months agoSAMZA-272, SAMZA-1440, SAMZA-1269: Fixed thread interrupt tests in TestExponentialSle...
Prateek Maheshwari [Fri, 6 Oct 2017 23:32:06 +0000 (16:32 -0700)] 
SAMZA-272, SAMZA-1440, SAMZA-1269: Fixed thread interrupt tests in TestExponentialSleepStrategy.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #318 from prateekm/ess-test-fix

2 months agoSAMZA-1435: Changed samza-api Serde implementations from Scala to Java.
Prateek Maheshwari [Fri, 6 Oct 2017 17:47:37 +0000 (10:47 -0700)] 
SAMZA-1435: Changed samza-api Serde implementations from Scala to Java.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>

Closes #316 from prateekm/java-serdes

2 months agoSAMZA-1442: use systemAdmin.validateStream in KafkaCheckpointManager
Boris S [Thu, 5 Oct 2017 22:36:37 +0000 (15:36 -0700)] 
SAMZA-1442: use systemAdmin.validateStream in KafkaCheckpointManager

Author: Boris S <boryas@apache.org>

Reviewers: Jacob Maes <jacob.maes@gmail.com>

Closes #314 from sborya/CheckpointTopicValidation

2 months agoSAMZA-1444: Removed circular dependency b/w samza-core test and samza-kv-rocksdb...
Prateek Maheshwari [Thu, 5 Oct 2017 17:59:39 +0000 (10:59 -0700)] 
SAMZA-1444: Removed circular dependency b/w samza-core test and samza-kv-rocksdb test

Also added an implementation for KVSerde.
vjagadish1989, thanks for the TestInMemoryStore implementation!

The only usage of InternalInMemoryStore is now in WindowOperatorImpl, which will be removed as part of store wiring for the window operator.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>, Xinyu Liu <xiliu@linkedin.com>

Closes #315 from prateekm/test-store

2 months agoSAMZA-1429: Add callback success/failure metrics to async tasks
Daniel Nishimura [Thu, 5 Oct 2017 16:18:37 +0000 (09:18 -0700)] 
SAMZA-1429: Add callback success/failure metrics to async tasks

jmakes & nickpan47 please take a look when you get the chance.

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #306 from dnishimura/samza-1429-async-metrics

2 months agoSAMZA-1056: Added wiring for High Level API state stores, their serdes and changelogs.
Prateek Maheshwari [Wed, 4 Oct 2017 22:37:50 +0000 (15:37 -0700)] 
SAMZA-1056: Added wiring for High Level API state stores, their serdes and changelogs.

Provided join operator access to durable state stores.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish Venkatraman <jagadish@apache.org>

Closes #309 from prateekm/operator-store-wiring

2 months agoSAMZA-1109: Updated High Level API serde impl with Yi's feedback
Prateek Maheshwari [Wed, 4 Oct 2017 22:28:37 +0000 (15:28 -0700)] 
SAMZA-1109: Updated High Level API serde impl with Yi's feedback

nickpan47 for review.

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: "Jagadish Venkatraman <jvenkatr@linkedin.com>"

Closes #310 from prateekm/serde-updates

2 months agoSAMZA-1439: Address late review feedback from SAMZA-1434
Jacob Maes [Wed, 4 Oct 2017 21:02:26 +0000 (14:02 -0700)] 
SAMZA-1439: Address late review feedback from SAMZA-1434

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Xinyu Liu <xiliu@linkedin.com>

Closes #313 from jmakes/samza-1439

2 months agoSAMZA-1427; use systemFactory in checkpoint manager.
Boris S [Wed, 4 Oct 2017 17:48:58 +0000 (10:48 -0700)] 
SAMZA-1427; use systemFactory in checkpoint manager.

Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>
Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Jagadish Venkatraman <jagadish@apache.org>, Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #299 from sborya/LiKafkaClient

2 months agoSAMZA-1431: Enable SonarCloud integration with Travis-CI.
Daniel Nishimura [Wed, 4 Oct 2017 17:39:42 +0000 (10:39 -0700)] 
SAMZA-1431: Enable SonarCloud integration with Travis-CI.

These are modifications needed to integrate Travis-CI with SonarCloud. Also, the JaCoCo plugin is enabled in gradle for code coverage reporting to SonarCloud.

For reference, here is the request to Apache Infra to enable Travis-CI: https://issues.apache.org/jira/browse/INFRA-15132

nickpan47 isolis

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #305 from dnishimura/samza-1431-sonarcloud

2 months agoAdded Prateek's code signing public key to KEYS
Prateek Maheshwari [Wed, 4 Oct 2017 01:00:04 +0000 (18:00 -0700)] 
Added Prateek's code signing public key to KEYS

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #312 from prateekm/commiters-update

2 months agoAdded self to committers list.
Prateek Maheshwari [Tue, 3 Oct 2017 23:06:33 +0000 (16:06 -0700)] 
Added self to committers list.

vjagadish1989

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Closes #311 from prateekm/commiters-update

2 months agoMerge branch '0.14.0'
Xinyu Liu [Tue, 3 Oct 2017 22:10:13 +0000 (15:10 -0700)] 
Merge branch '0.14.0'

2 months agoMerge branch 'master' into 0.14.0
Xinyu Liu [Tue, 3 Oct 2017 22:09:41 +0000 (15:09 -0700)] 
Merge branch 'master' into 0.14.0

2 months agoadded Boris as a committer
Boris Shkolnik [Tue, 3 Oct 2017 07:01:16 +0000 (00:01 -0700)] 
added Boris as a committer

Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Xinyu Liu <xinyuliu.us@gmail.com>

Closes #304 from sborya/addCommitter

2 months agoSAMZA-1423; Follow-up to PR:303 - Remove version constant from TimeSeriesKey
Jagadish [Mon, 2 Oct 2017 19:54:22 +0000 (12:54 -0700)] 
SAMZA-1423; Follow-up to PR:303 -  Remove version constant from TimeSeriesKey

2 months agoSAMZA-1423; Implement time series storage for joins and windows
Jagadish [Mon, 2 Oct 2017 19:23:24 +0000 (12:23 -0700)] 
SAMZA-1423; Implement time series storage for joins and windows

Notable changes:
* New interface for storing and retrieving time series data.
* New store and serde implementation for use in windows and joins

Pending:
* Documentation, and minor clean-ups
* Wire-up of stores from ExecutionPlanner
* Usage of the store to implement various windows and joins

Author: Jagadish <jagadish@apache.org>
Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>,Xinyu Liu<xiliu@linkedin.com>

Closes #303 from vjagadish1989/window-store

2 months agoSAMZA-1434: Fix issues found in Hadoop
Xinyu Liu [Fri, 29 Sep 2017 22:05:55 +0000 (15:05 -0700)] 
SAMZA-1434: Fix issues found in Hadoop

Fix the following bugs found when running Samza on hadoop:

1. Hdfs allows output partitions to be 0 (empty folder)
2. Add null check for the changelog topic generation
3. Call getStreamSpec() instead of using streamSpec member in StreamEdge. This is due to getStreamSpec will do more transformation.
4. Bound the auto-generated intermediate topic partition by a certain count (256).

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jagadish Venkatraman <jagadish@apache.org>

Closes #307 from xinyuiscool/SAMZA-1434

2 months agoSAMZA-1386: Inline End-of-stream and Watermark logic inside OperatorImpl
Xinyu Liu [Fri, 29 Sep 2017 00:11:28 +0000 (17:11 -0700)] 
SAMZA-1386: Inline End-of-stream and Watermark logic inside OperatorImpl

This patch contains the following changes:
1. Refactor watermark and end-of-stream logic. The aggregation/handling has been moved from WatermarkManager/EndOfStreamManager to be inline inside OperatorImpl. This is for keeping the logic in one place.
2. Now subclass of OperatorImpl will override handleWatermark() to do its specific handling, such as fire trigger.
3. Add emitWatermark() in OperatorImpl so subclass can call it to emit watermark upon receiving a message or watermark.

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>
Author: Xinyu Liu <xiliu@xiliu-ld.linkedin.biz>

Reviewers: Yi Pan <nickpan47@gmail.com>

Closes #277 from xinyuiscool/SAMZA-1386

2 months agoSAMZA-1109; Allow setting Serdes for fluent API operators in code
Prateek Maheshwari [Thu, 28 Sep 2017 00:08:04 +0000 (17:08 -0700)] 
SAMZA-1109; Allow setting Serdes for fluent API operators in code

Author: Prateek Maheshwari <pmaheshw@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>, Jacob Maes <jmakes@apache.org>

Closes #293 from prateekm/serde-instance

2 months agoSAMZA-1417: Clear and recreate intermediate and metadata streams for batch processing
Xinyu Liu [Mon, 25 Sep 2017 21:52:42 +0000 (14:52 -0700)] 
SAMZA-1417: Clear and recreate intermediate and metadata streams for batch processing

For each run of a batch application, we need to clear the internal streams from the previous run and recreate new ones. This patch introduces the following:
1) bounded flag in StreamSpec
2) app.mode (BATCH/STREAM) in the application config. An application is a batch app iff all the input streams are bounded.
3) app.runId and use it to generate the internal topics for each run.

run.id generation is not addressed in this pr. There will be another patch to resolve it for both yarn and standalone. For now, this patch only works for yarn.

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jake Maes <jmakes@apache.org>

Closes #297 from xinyuiscool/SAMZA-1417

2 months agoSAMZA-1428: Fix scala 2.10 compilation issue with java 8 interface st…
Jacob Maes [Thu, 21 Sep 2017 18:57:31 +0000 (11:57 -0700)] 
SAMZA-1428: Fix scala 2.10 compilation issue with java 8 interface st…

…atic methods

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Boris Shkolnik <boryas@apache.org>

Closes #300 from jmakes/samza-1428

2 months agoSAMZA-1392: KafkaSystemProducer performance and correctness with conc…
Jacob Maes [Wed, 20 Sep 2017 19:40:47 +0000 (12:40 -0700)] 
SAMZA-1392: KafkaSystemProducer performance and correctness with conc…

…urrent sends and flushes

Author: Jacob Maes <jmaes@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshw@linkedin.com>, Yi Pan (Data Infrastructure) <nickpan47@gmail.com>

Closes #272 from jmakes/samza-1392

2 months agoSAMZA-1414: SamzaContainer log statements incorrectly expect containerId is a number
Daniel Nishimura [Tue, 19 Sep 2017 21:43:55 +0000 (14:43 -0700)] 
SAMZA-1414: SamzaContainer log statements incorrectly expect containerId is a number

Correctly log ID as a string in SamzaContainer.
jmakes

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #298 from dnishimura/samza-1414

2 months agoSAMZA-1416: Better logging around the exception where class loading failed in initial...
Daniel Nishimura [Tue, 19 Sep 2017 21:26:38 +0000 (14:26 -0700)] 
SAMZA-1416: Better logging around the exception where class loading failed in initializing the SystemFactory for a input/output system

Also added test coverage for the Util.getObj method.
nickpan47 jmakes

Author: Daniel Nishimura <dnishimura@gmail.com>

Reviewers: Jacob Maes <jmaes@linkedin.com>

Closes #296 from dnishimura/samza-1416

2 months agoSAMZA-1419; Refactor StreamProcessor tests to remove Thread.sleep()
Bharath Kumarasubramanian [Mon, 18 Sep 2017 17:32:32 +0000 (10:32 -0700)] 
SAMZA-1419; Refactor StreamProcessor tests to remove Thread.sleep()

Refactoring the tests to enable parallel execution. Also, removed unnecessary sleeping during execution. This PR will serve as a prototype to get some feedback on this testing pattern which eliminates state sharing between the tests and opens up the avenue for parallel execution.

This change brings down the test time for these 4 tests to **14 sec** compared to **44 secs**.

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #291 from bharathkk/master

3 months agoSAMZA-1406: Fix potential orphaned containers problem in stand alone
Shanthoosh Venkataraman [Fri, 15 Sep 2017 04:17:36 +0000 (21:17 -0700)] 
SAMZA-1406: Fix potential orphaned containers problem in stand alone

3 months agoFix some integration tests after merging from master
Xinyu Liu [Tue, 12 Sep 2017 21:16:27 +0000 (14:16 -0700)] 
Fix some integration tests after merging from master

3 months agoMerge branch 'master' into 0.14.0
Xinyu Liu [Tue, 12 Sep 2017 18:32:36 +0000 (11:32 -0700)] 
Merge branch 'master' into 0.14.0

3 months agoSAMZA-1415: Add clearStream API in SystemAdmin and remove deprecated APIs
Xinyu Liu [Thu, 7 Sep 2017 23:49:20 +0000 (16:49 -0700)] 
SAMZA-1415: Add clearStream API in SystemAdmin and remove deprecated APIs

The patch does the following:

1) add clearStream() APi in SystemAdmin. Currently it's only supported in Kafka with broker configuring delete.topic.enable=true.

2) remove the deprecated APIs including createChangeLogStream(), validateChangelogStream() and createCoordinatorStream().

Author: Xinyu Liu <xiliu@xiliu-ld1.linkedin.biz>

Reviewers: Jake Maes <jacob.maes@gmail.com>

Closes #292 from xinyuiscool/SAMZA-1415

3 months agoSAMZA-1413: Config for CoordinationUtilsFactory class name.
Boris Shkolnik [Wed, 30 Aug 2017 23:56:28 +0000 (16:56 -0700)] 
SAMZA-1413: Config for CoordinationUtilsFactory class name.

Author: Boris Shkolnik <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@bshkolni-ld1.linkedin.biz>

Reviewers: Navina Ramesh <navina@apache.org>, Fred Ji <fji@linkedin.com>

Closes #290 from sborya/configUtilsFactoryClassName

3 months agoSAMZA-1385: Coordination utils factory with distributed lock
Boris Shkolnik [Tue, 29 Aug 2017 22:37:17 +0000 (15:37 -0700)] 
SAMZA-1385: Coordination utils factory with distributed lock

this PR includes some changes from another PR. I will re-merge it again, after the other PR is in.

Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #284 from sborya/CoordinationUtilsFactory_withDistributedLock

3 months agoIntroduces CoordinationUtilsFactory to create different implementations of Coordinati...
Boris Shkolnik [Tue, 29 Aug 2017 20:23:47 +0000 (13:23 -0700)] 
Introduces CoordinationUtilsFactory to create different implementations of CoordinationUtils

Some refactoring and cleanup.

Author: Boris Shkolnik <boryas@apache.org>

Reviewers: Navina Ramesh <navina@apache.org>

Closes #282 from sborya/CoordinationUtilsFactory

3 months agoSAMZA-1410 make tests pass in different Junit versions for testPlanIdWithShuffledStre...
Fred Ji [Mon, 28 Aug 2017 21:17:32 +0000 (14:17 -0700)] 
SAMZA-1410 make tests pass in different Junit versions for testPlanIdWithShuffledStreamSpecs and testGeneratePlanIdWithDifferentStreamSpec

More details are in https://issues.apache.org/jira/browse/SAMZA-1410.

gradlew build and test passed.

Author: Fred Ji <haifeng.ji@gmail.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #289 from fredji97/assertNotEquals

3 months agoSAMZA-1404: Add warning in case of potential process starvation due to longer window...
Bharath Kumarasubramanian [Mon, 28 Aug 2017 19:27:47 +0000 (12:27 -0700)] 
SAMZA-1404: Add warning in case of potential process starvation due to longer window method

Currently, we use the average windowNs as the lower bound for window trigger time to determine if user needs to warned. We could potentially make this complicated by also including average commit ns and some delta to be more accurate.

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Jagadish V <vjagadish1989@gmail.com>

Closes #281 from bharathkk/master