samza.git
4 hours agoSAMZA-1907: Add metrics to monitor watermarks master
xinyuiscool [Tue, 25 Sep 2018 18:20:14 +0000 (11:20 -0700)] 
SAMZA-1907: Add metrics to monitor watermarks

Add initial metric to monitor the aggregated watermark time.

Author: xinyuiscool <xiliu@linkedin.com>

Reviewers: Boris Shkolnik <boris@apache.org>

Closes #658 from xinyuiscool/SAMZA-1907

22 hours agoSAMZA-1904: Added test case in TestLocalTable for low level API
Wei Song [Tue, 25 Sep 2018 00:19:23 +0000 (17:19 -0700)] 
SAMZA-1904: Added test case in TestLocalTable for low level API

As per subject, adding a new test case that uses TaskApplication

Author: Wei Song <wsong@linkedin.com>

Reviewers: Aditya Toomula <atoomula@linkedin.com>

Closes #656 from weisong44/SAMZA-1904 and squashes the following commits:

ee3942ff [Wei Song] SAMZA-1904: Added test case in TestLocalTest for low level API
1c6a2eae [Wei Song] Merge remote-tracking branch 'upstream/master'
a6c94add [Wei Song] Merge remote-tracking branch 'upstream/master'
41299b5b [Wei Song] Merge remote-tracking branch 'upstream/master'
239a0950 [Wei Song] Merge remote-tracking branch 'upstream/master'
eca00204 [Wei Song] Merge remote-tracking branch 'upstream/master'
51562391 [Wei Song] Merge remote-tracking branch 'upstream/master'
de708f5e [Wei Song] Merge remote-tracking branch 'upstream/master'
df2f8d7b [Wei Song] Merge remote-tracking branch 'upstream/master'
f28b491d [Wei Song] Merge remote-tracking branch 'upstream/master'
4782c61d [Wei Song] Merge remote-tracking branch 'upstream/master'
0440f75f [Wei Song] Merge remote-tracking branch 'upstream/master'
aae0f380 [Wei Song] Merge remote-tracking branch 'upstream/master'
a15a7c9a [Wei Song] Merge remote-tracking branch 'upstream/master'
5cbf9af9 [Wei Song] Merge remote-tracking branch 'upstream/master'
3f7ed71f [Wei Song] Added self to committer list

4 days agoSAMZA-1898: New UI layout for the Samza website
Angela Murrell [Fri, 21 Sep 2018 18:43:51 +0000 (11:43 -0700)] 
SAMZA-1898: New UI layout for the Samza website

- This is still a work in progress!

Author: Angela Murrell <amasiell.g@gmail.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #635 from amurrell/master

5 days agoSamza SQL: Code re-org to accomodate Samza SQL engine to take Calcite IR as input
Aditya Toomula [Thu, 20 Sep 2018 21:22:38 +0000 (14:22 -0700)] 
Samza SQL: Code re-org to accomodate Samza SQL engine to take Calcite IR as input

This PR has the following changes:
- Let QueryTranslator take Calcite IR as input
- Include 'INSERT INTO' sql statement for Calcite plan
- Basic DSLConverter Framework with SamzaSQL dialect as an example
- Some fixes to stream-table join wrt Serde

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Srinivasulu <spunuru@linkedin.com>, Weiqing <wiyang@linkedin.com>

Closes #630 from atoomula/dsl3 and squashes the following commits:

93c66cee [Aditya Toomula] Samza SQL: Code re-org to accomodate Samza SQL engine to take Calcite IR as input.
21c0175b [Aditya Toomula] Samza SQL: Code re-org to accomodate Samza SQL engine to take Calcite IR as input.
15a1e9fb [Aditya Toomula] Samza SQL: Code re-org to accomodate Samza SQL engine to take Calcite IR as input.
5bf0c7e1 [Aditya Toomula] Samza SQL: Code re-org to accomodate Samza SQL engine to take Calcite IR as input.
98cd9777 [Aditya Toomula] Samza SQL: Code re-org to accomodate Samza SQL engine to take Calcite IR as input.
63a66fb1 [Aditya Toomula] Samza SQL: Code re-org to accomodate Samza SQL engine to take Calcite IR as input.
6794b512 [Aditya Toomula] Samza SQL: Code re-org to accomodate Samza SQL engine to take Calcite IR as input.
c9d434a9 [Aditya Toomula] Samza SQL: Code re-org to accomodate Samza SQL engine to take Calcite IR as input.
94e53b64 [Aditya Toomula] Samza SQL: Code re-org to accomodate Samza SQL engine to take Calcite IR as input.
30c76ed9 [Aditya Toomula] Samza SQL: Code re-org to accomodate Samza SQL engine to take Calcite IR as input.

5 days agoSAMZA-1854: Changed caching table descriptor to take table descriptor instead of...
Wei Song [Thu, 20 Sep 2018 18:19:06 +0000 (11:19 -0700)] 
SAMZA-1854: Changed caching table descriptor to take table descriptor instead of run-time objects

As per subject, changed caching table descriptor to take table descriptor instead of run-time objects
 - Added BaseHybridTableDescriptor, which models a hybrid table that may contain other tables
 - Modified StreamApplicationDescriptorImpl to also include tables contained within a hybrid table

Author: Wei Song <wsong@linkedin.com>

Reviewers: Jagadish Venkatraman <jvenkatraman@linkedin.com>

Closes #645 from weisong44/SAMZA-1854 and squashes the following commits:

2c0d1362 [Wei Song] Updated based on review comments
dd18bbee [Wei Song] Merge branch 'master' into SAMZA-1854
a6c94add [Wei Song] Merge remote-tracking branch 'upstream/master'
41299b5b [Wei Song] Merge remote-tracking branch 'upstream/master'
239a0950 [Wei Song] Merge remote-tracking branch 'upstream/master'
a87a9b04 [Wei Song] SAMZA-1854: Changed caching table descriptor to take table descriptor instead of run-time objects
eca00204 [Wei Song] Merge remote-tracking branch 'upstream/master'
51562391 [Wei Song] Merge remote-tracking branch 'upstream/master'
de708f5e [Wei Song] Merge remote-tracking branch 'upstream/master'
df2f8d7b [Wei Song] Merge remote-tracking branch 'upstream/master'
f28b491d [Wei Song] Merge remote-tracking branch 'upstream/master'
4782c61d [Wei Song] Merge remote-tracking branch 'upstream/master'
0440f75f [Wei Song] Merge remote-tracking branch 'upstream/master'
aae0f380 [Wei Song] Merge remote-tracking branch 'upstream/master'
a15a7c9a [Wei Song] Merge remote-tracking branch 'upstream/master'
5cbf9af9 [Wei Song] Merge remote-tracking branch 'upstream/master'
3f7ed71f [Wei Song] Added self to committer list

5 days agoSAMZA-1849: Table Descriptors should take Serde at construction time so that descript...
Wei Song [Thu, 20 Sep 2018 17:17:49 +0000 (10:17 -0700)] 
SAMZA-1849: Table Descriptors should take Serde at construction time so that descriptors can be typed

Changed table descriptor to take serde in constructor, and removed withSerde() from all table descriptors.

Author: Wei Song <wsong@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@linkedin.com>

Closes #649 from weisong44/SAMZA-1849 and squashes the following commits:

a3ba2f70 [Wei Song] Merge branch 'master' into SAMZA-1849
41299b5b [Wei Song] Merge remote-tracking branch 'upstream/master'
e7a716c0 [Wei Song] Updated based on review comments
0601566f [Wei Song] SAMZA-1849: Table Descriptors should take Serde at construction time so that descriptors can be typed
239a0950 [Wei Song] Merge remote-tracking branch 'upstream/master'
eca00204 [Wei Song] Merge remote-tracking branch 'upstream/master'
51562391 [Wei Song] Merge remote-tracking branch 'upstream/master'
de708f5e [Wei Song] Merge remote-tracking branch 'upstream/master'
df2f8d7b [Wei Song] Merge remote-tracking branch 'upstream/master'
f28b491d [Wei Song] Merge remote-tracking branch 'upstream/master'
4782c61d [Wei Song] Merge remote-tracking branch 'upstream/master'
0440f75f [Wei Song] Merge remote-tracking branch 'upstream/master'
aae0f380 [Wei Song] Merge remote-tracking branch 'upstream/master'
a15a7c9a [Wei Song] Merge remote-tracking branch 'upstream/master'
5cbf9af9 [Wei Song] Merge remote-tracking branch 'upstream/master'
3f7ed71f [Wei Song] Added self to committer list

5 days agoSAMZA-1890: Fix the creation of MapFunction in TestExecutionPlanner.
Pawas Chhokra [Wed, 19 Sep 2018 23:34:40 +0000 (16:34 -0700)] 
SAMZA-1890: Fix the creation of MapFunction in TestExecutionPlanner.

dnishimura Kindly take a look.

Author: Pawas Chhokra <pchhokra@pchhokra-mn3.linkedin.biz>

Reviewers: Sanil Jain <snjain@linkedin.com>, Daniel Nishimura <dnishimu@linkedin.com>

Closes #648 from PawasChhokra/TestExecutionPlanner

6 days agoSAMZA-1840: Refactor TestRunner Apis to use StreamDescriptor and SystemDescriptor
Sanil Jain [Wed, 19 Sep 2018 19:21:12 +0000 (12:21 -0700)] 
SAMZA-1840: Refactor TestRunner Apis to use StreamDescriptor and SystemDescriptor

CollectionStream -> InMemoryInputDescriptor & InMemoryOutputDescriptor
CollectionStreamSystemSpec -> InMemorySystemDescriptor

Author: Sanil Jain <snjain@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>, Cameron Lee <calee@linkedin.com>

Closes #634 from Sanil15/SAMZA-1840

6 days agoSAMZA-1879: Remove deprecated containerId from ContainerModel
Cameron Lee [Wed, 19 Sep 2018 19:18:46 +0000 (12:18 -0700)] 
SAMZA-1879: Remove deprecated containerId from ContainerModel

Author: Cameron Lee <calee@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>, Shanthoosh Venkatraman <svenkatr@linkedin.com>

Closes #639 from cameronlee314/remove_container_id

6 days agoSAMZA-1859: Zookeeper implementation of MetadataStore.
Shanthoosh Venkataraman [Wed, 19 Sep 2018 19:16:15 +0000 (12:16 -0700)] 
SAMZA-1859: Zookeeper implementation of MetadataStore.

Author: Shanthoosh Venkataraman <spvenkat@usc.edu>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>, Daniel Nishumura <dnishimu@linkedin.com>

Closes #629 from shanthoosh/metadata_store_zk_impl

6 days agoSAMZA-1874: Refactor SamzaContainer and TaskInstance unit tests to make shared contex...
Cameron Lee [Wed, 19 Sep 2018 17:21:57 +0000 (10:21 -0700)] 
SAMZA-1874: Refactor SamzaContainer and TaskInstance unit tests to make shared context changes easier

This replaces https://github.com/apache/samza/pull/638, I accidentally messed up that branch.
The difference between this PR and the last review by prateekm is https://github.com/apache/samza/pull/646/commits/5d552996ac50d2a0b1dd5034a624d9417e74dc57

Author: Cameron Lee <calee@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #646 from cameronlee314/refactor_unit_tests_for_shared_context_new

6 days agoCleaning the docs, these create confusion for other reviewers
Sanil Jain [Wed, 19 Sep 2018 17:08:20 +0000 (10:08 -0700)] 
Cleaning the docs, these create confusion for other reviewers

bharathkk for review

Author: Sanil Jain <snjain@linkedin.com>

Reviewers: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Closes #641 from Sanil15/SAMZA-1886

7 days agoStream-table join for Samza-sql: Use SamzaSQLRelMsgSerde as value serde for repartiti...
Aditya Toomula [Mon, 17 Sep 2018 23:37:30 +0000 (16:37 -0700)] 
Stream-table join for Samza-sql: Use SamzaSQLRelMsgSerde as value serde for repartitioning the stream

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Srinivasulu Punuru <spunuru@linkedin.com>

Closes #643 from atoomula/join-serde

8 days agoSAMZA-1795: table: add common retry for IO functions
Peng Du [Mon, 17 Sep 2018 19:16:39 +0000 (12:16 -0700)] 
SAMZA-1795: table: add common retry for IO functions

Add common retry functionality to table IO functions for data stores
that do not have native retry support. We use failsafe as the retry
library.

Author: Peng Du <pdu@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #618 from pdu-mn1/retry-support

11 days agoSAMZA-1838: Make some minor improvements to ExecutionPlanner
Ahmed Abdul Hamid [Fri, 14 Sep 2018 17:51:55 +0000 (10:51 -0700)] 
SAMZA-1838: Make some minor improvements to ExecutionPlanner

prateekm vjagadish1989 whenever you get a chance. This is an initial set of mostly minor changes.

Author: Ahmed Abdul Hamid <ahabdulh@ahabdulh-mn1.linkedin.biz>

Reviewers: Jagadish<jagadish@apache.org>

Closes #623 from ahmedahamid/dev/ahabdulh/execution-planner-stylistic-improv

13 days agoSAMZA-1875: Reuse table instances in TableManager
Wei Song [Wed, 12 Sep 2018 04:25:06 +0000 (21:25 -0700)] 
SAMZA-1875: Reuse table instances in TableManager

We currently are invoking TableProvider.getTable() when TableManager.getTable(tableId) is invoked every time, this would in turn cause a new table instance being created. The assumption used so far is that end user would cache table instances, which is not always true. Therefore we should reuse table instance created earlier in TableManager.

Author: Wei Song <wsong@linkedin.com>

Reviewers: Peng Du <pdu@linkedin.com>

Closes #636 from weisong44/fix_table_manager and squashes the following commits:

839bf76d [Wei Song] Added caching for table instances in TableManager
51562391 [Wei Song] Merge remote-tracking branch 'upstream/master'
de708f5e [Wei Song] Merge remote-tracking branch 'upstream/master'
df2f8d7b [Wei Song] Merge remote-tracking branch 'upstream/master'
f28b491d [Wei Song] Merge remote-tracking branch 'upstream/master'
4782c61d [Wei Song] Merge remote-tracking branch 'upstream/master'
0440f75f [Wei Song] Merge remote-tracking branch 'upstream/master'
aae0f380 [Wei Song] Merge remote-tracking branch 'upstream/master'
a15a7c9a [Wei Song] Merge remote-tracking branch 'upstream/master'
5cbf9af9 [Wei Song] Merge remote-tracking branch 'upstream/master'
3f7ed71f [Wei Song] Added self to committer list

2 weeks agoSAMZA-1865: Increase coordination timeouts in zookeeper integration tests.
Shanthoosh Venkataraman [Tue, 11 Sep 2018 18:39:07 +0000 (11:39 -0700)] 
SAMZA-1865: Increase coordination timeouts in zookeeper integration tests.

Author: Shanthoosh Venkataraman <svenkataraman@linkedin.com>
Author: Shanthoosh Venkataraman <spvenkat@usc.edu>

Reviewers: Jagadish<jagadish@apache.org>, Boris Shkolnik<boryas@apache.org>

Closes #632 from shanthoosh/fix_zk_crap

2 weeks agoSAMZA-1837: Fix TestLocalTableWithSideInputs failures.
Shanthoosh Venkataraman [Tue, 11 Sep 2018 18:12:07 +0000 (11:12 -0700)] 
SAMZA-1837: Fix TestLocalTableWithSideInputs failures.

* Fix TestRunner API config overriding.
* By default, RocksDBTableProvider enables host affinity which failed this particular test. To fix test failure, turn off host affinity through `TestRunner.overrideConfig` API.

Author: Shanthoosh Venkataraman <svenkata@linkedin.com>
Author: Shanthoosh Venkataraman <spvenkat@usc.edu>

Reviewers: Yi Pan <nickpan47@gmail.com>, Sanil Jain <sanil.jain15@gmail.com>

Closes #621 from shanthoosh/fix_broken_local_table_test and squashes the following commits:

1694a1b7 [Shanthoosh Venkataraman] Review comments.
f99afe00 [Shanthoosh Venkataraman] SAMZA-1837: Fix TestLocalTableWithSideInputs failures.

2 weeks agoSAMZA-1870: hdfs offset comparator to handle end of stream offset
Hai Lu [Tue, 11 Sep 2018 16:57:59 +0000 (09:57 -0700)] 
SAMZA-1870: hdfs offset comparator to handle end of stream offset

This happens particularly when using HDFS as a bootstrap stream:

org.apache.samza.SamzaException: Invalid offset for MultiFileHdfsReader: END_OF_STREAM
at org.apache.samza.system.hdfs.reader.MultiFileHdfsReader.getCurFileIndex(MultiFileHdfsReader.java:64)
at org.apache.samza.system.hdfs.HdfsSystemAdmin.offsetComparator(HdfsSystemAdmin.java:224)
at org.apache.samza.system.chooser.BootstrappingChooser.org$apache$samza$system$chooser$BootstrappingChooser$$checkOffset(BootstrappingChooser.scala:274)
at org.apache.samza.system.chooser.BootstrappingChooser.choose(BootstrappingChooser.scala:204)
at org.apache.samza.system.chooser.DefaultChooser.choose(DefaultChooser.scala:294)
at org.apache.samza.system.SystemConsumers.choose(SystemConsumers.scala:210)
at org.apache.samza.task.AsyncRunLoop.chooseEnvelope(AsyncRunLoop.java:208)
at org.apache.samza.task.AsyncRunLoop.run(AsyncRunLoop.java:156)
at org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:787)
at org.apache.samza.runtime.LocalContainerRunner.run(LocalContainerRunner.java:101)
at org.apache.samza.runtime.LocalContainerRunner.main(LocalContainerRunner.java:148)

Author: Hai Lu <halu@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #633 from lhaiesp/master

2 weeks agoSAMZA-1824: Handle errors from the async-NMClient when launching containers
Jagadish [Mon, 10 Sep 2018 16:48:35 +0000 (09:48 -0700)] 
SAMZA-1824: Handle errors from the async-NMClient when launching containers

- Updated internal state that tracks "pending" containers correctly
- Refactored `YarnClusterResourceManager` for testability. Add an unit test

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Jake Maes<jmaes@linkedin.com>

Closes #615 from vjagadish1989/container-launch-error

2 weeks agoSAMZA-1817: Long classpath support for non-split deployments
Sanil Jain [Fri, 7 Sep 2018 22:44:15 +0000 (15:44 -0700)] 
SAMZA-1817: Long classpath support for non-split deployments

https://jira.apache.org/jira/browse/SAMZA-1817

Author: Sanil Jain <snjain@linkedin.com>

Reviewers: Shanthoosh Venkataraman <spvenkat@usc.edu>, Boris Shkolnik <boryas@apache.org>

Closes #619 from Sanil15/SAMZA-1817

2 weeks agoSAMZA-1789: unify ApplicationDescriptor and ApplicationRunner for high- and low-level...
Yi Pan (Data Infrastructure) [Fri, 7 Sep 2018 06:35:59 +0000 (23:35 -0700)] 
SAMZA-1789: unify ApplicationDescriptor and ApplicationRunner for high- and low-level APIs in YARN and standalone environment

This is the initial PR for SEP-13. High-lighted changes:
- Define StreamApplication and TaskApplication with describe(ApplicationDescriptor) API to define processing logic of a Stream application
   - the objects instantiated and registered to ApplicationDescriptor in describe() method should be serializable
- Define ApplicationRunner to have mandatory constructor parameter of ApplicationDescriptor
- Define ProcessorLifecycleListenerFactory to allow user inject local logic and instantiate local objects in the processors in an application

Author: Yi Pan (Data Infrastructure) <nickpan47@gmail.com>
Author: Yi Pan (Data Infrastructure) <yipan@yipan-mn1.linkedin.biz>
Author: Yi Pan (Data Infrastructure) <yipan@yipan-ld2.linkedin.biz>
Author: Prateek Maheshwari <pmaheshwari@linkedin.com>
Author: Prateek Maheshwari <prateekm@utexas.edu>
Author: prateekm <prateekm@utexas.edu>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>, Cameron Lee <calee@linkedin.com>

Closes #606 from nickpan47/app-runtime-with-processor-callbacks and squashes the following commits:

3e60d44a [Yi Pan (Data Infrastructure)] SAMZA-1789: final revision on ApplicationDescriptor and ApplicationRunner APIs
bdb5b0fc [Yi Pan (Data Infrastructure)] SAMZA-1789: ApplicationRunner and ApplicationDescriptor final revision
66af5b70 [Yi Pan (Data Infrastructure)] SAMZA-1789: addressing Cameron's review comments.
ec4bb1dc [Yi Pan (Data Infrastructure)] SAMZA-1789: merge with fix for SAMZA-1836
9c89c63d [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks
91fcd73a [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks
34ffda8a [Yi Pan (Data Infrastructure)] SAMZA-1789: disabling tests due to SAMZA-1836
02076c85 [Yi Pan (Data Infrastructure)] SAMZA-1789: fixed the modifier for the mandatory constructor of ApplicationRunner; Disabled three tests due to wrong configure for test systems
222abf21 [Yi Pan (Data Infrastructure)] SAMZA-1789: added a constructor to StreamProcessor to take a StreamProcessorListenerFactory
7a73992a [Yi Pan (Data Infrastructure)] SAMZA-1789: fixing checkstyle and javadoc errors
9997b98b [Yi Pan (Data Infrastructure)] SAMZA-1789: renamed all ApplicationDescriptor classes with full-spelling of Application
f4b3d43a [Yi Pan (Data Infrastructure)] SAMZA-1789: Fxing TaskApplication examples and some checkstyle errors
f2969f8d [Yi Pan (Data Infrastructure)] SAMZA-1789: fixed ApplicationDescriptor to use InputDescriptor and OutputDescriptor; addressed Prateek's comments.
f04404cc [Yi Pan (Data Infrastructure)] SAMZA-1789: move createStreams out of the loop in prepareJobs
33753f72 [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks
12c09af0 [Yi Pan (Data Infrastructure)] SAMZA-1789: Fix a merging error (with SAMZA-1813)
a072118d [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks
e7af6932 [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks
8d4d3ffd [Yi Pan (Data Infrastructure)] Merge with master
055bd91e [Yi Pan (Data Infrastructure)] SAMZA-1789: fix unit test with ThreadJobFactory
247dcff4 [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks
1621c4d0 [Yi Pan (Data Infrastructure)] SAMZA-1789: a few more fixes to address Cameron's reviews
6e446fe6 [Yi Pan (Data Infrastructure)] SAMZA-1789: address Cameron's review comments.
4382d45d [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks
3b2f04d5 [Yi Pan (Data Infrastructure)] SAMZA-1789: moved all impl classes from samza-api to samza-core.
db96da83 [Yi Pan (Data Infrastructure)] SAMZA-1789: WIP - revision to address review feedbacks.
01433717 [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks
a82708bb [Yi Pan (Data Infrastructure)] SAMZA-1789: unify ApplicationDescriptor and ApplicationRunner for high- and low-level APIs in YARN and standalone environment
c4bb0dce [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-runtime-with-processor-callbacks
f20cdcda [Yi Pan (Data Infrastructure)] WIP: adding unit tests. Pending update on StreamProcessorLifecycleListener, LocalContainerRunner, and SamzaContainerListener
973eb526 [Yi Pan (Data Infrastructure)] WIP: compiles, still working on LocalContainerRunner refactor
fb1bc49e [Yi Pan (Data Infrastructure)] Merge branch 'master' into app-spec-with-app-runtime-Jul-16-18
30a4e5f0 [Yi Pan (Data Infrastructure)] WIP: application runner refactor - proto-type for SEP-13
95577b74 [Yi Pan (Data Infrastructure)] WIP: trying to figure out the two interface classes for spec: a) spec builder in init(); b) spec reader in all other lifecycle methods
42782d81 [Yi Pan (Data Infrastructure)] Merge branch 'prateek-remove-app-runner-stream-spec' into app-spec-with-app-runtime-Jul-16-18
d43e9231 [Yi Pan (Data Infrastructure)] WIP: proto-type with ApplicationRunnable and no ApplicationRunner exposed to user
f1cb8f0e [Yi Pan (Data Infrastructure)] Merge branch 'master' into single-app-api-May-21-18
7e71dc7e [Yi Pan (Data Infrastructure)] Merge with master
85619301 [Prateek Maheshwari] Merge branch 'master' into stream-spec-cleanup
7d7aa508 [Prateek Maheshwari] Updated with Cameron and Daniel's feedback.
8e6fc2da [prateekm] Remove all usages of StreamSpec and ApplicationRunner from the operator spec and impl layers.

2 weeks agoSAMZA-1821: fix the IllegalStateException complaining duplicate key when fetching...
Weiqing Yang [Tue, 4 Sep 2018 23:27:30 +0000 (16:27 -0700)] 
SAMZA-1821: fix the IllegalStateException complaining duplicate key when fetching systemStream configs

## What changes were proposed in this pull request?
This PR is to fix IllegalStateException complaining duplicate key when fetching resource configs in SamzaSqlApplicationConfig.

## How was this patch tested?
Pass the local build and current unit tests.
Added a new unit test.

Author: Weiqing Yang <wiyang@wiyang-mn1.linkedin.biz>

Reviewers: Aditya Toomula <atoomula@linkedin.com>, Srinivasulu Punuru <spunuru@linkedin.com>

Closes #616 from weiqingy/SAMZA-1821 and squashes the following commits:

e85df23f [Weiqing Yang] use distinct()
61245627 [Weiqing Yang] SAMZA-1821: fix the IllegalStateException complaining duplicate key when fetching systemStream configs

3 weeks agoSAMZA-1836: StreamManager created before ExecutionPlanner should also apply configura...
Yi Pan (Data Infrastructure) [Fri, 31 Aug 2018 22:57:58 +0000 (15:57 -0700)] 
SAMZA-1836: StreamManager created before ExecutionPlanner should also apply configuration overrides

Our integration test framework uses configuration overrides (i.e. jobs.*) to override the user system configuration set in the code (e.g. KafkaSystemDescriptor) to test systems. However, the StreamManager we created before calling ExecutionPlanner.plan() does not apply the overrides and causes failure in tests since the system was not correctly set to in-memory systems by configuration overrides.

Author: Yi Pan (Data Infrastructure) <nickpan47@gmail.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>, Boris S <boryas@apache.org>, Xinyu Liu <xinyu@apache.org>, Shanthoosh Venkataraman <santhoshvenkat1988@gmail.com>

Closes #620 from nickpan47/SAMZA-1836 and squashes the following commits:

a376b888 [Yi Pan (Data Infrastructure)] SAMZA-1836: StreamManager created before ExecutionPlanner should also apply the configuration overrides
35f2c0b7 [Yi Pan (Data Infrastructure)] SAMZA-1836: fixing two unit tests that should use InMemorySystemFactory instead of KafkaSystemDescriptor

3 weeks agoSAMZA-1786: Introduce metadata store abstraction.
Shanthoosh Venkataraman [Wed, 29 Aug 2018 22:38:36 +0000 (15:38 -0700)] 
SAMZA-1786: Introduce metadata store abstraction.

As a part of SEP-11, this patch adds MetadataStore interface to store task and container locality in both yarn and standalone deployment models. Please refer to SEP-11 for more details.

Few important points to note:
1. As a part of this changes, LocalityManager/TaskAsssignmentManager alone will be updated to use this interface(subsequently in upcoming future RB's other util classes will be moved to use this interface as well).
2. In an immediate followup RB, ZkMetadataStore(storing metadata information in zookeeper) will be added. It will be used in standalone to read/write locality into zookeeper(through LocalityManager & other standard util classes).
3. In future, ExecutionPlan, streamGraph and other job related metadata can be stored in any custom store through the same abstraction.

Testing:
1. Added unit tests for new classes introduced in the patch(Fixed the existing unit tests in LocalityManager/TaskAssignmentManager).
2. All the changes in the patch were validated with test jobs in samza-hello-samza(https://github.com/apache/samza-hello-samza).
3. LinkedIn testing job(maes-tests-host-affinity) was verified with these changes to validate if things work end-to-end.

Author: Shanthoosh Venkataraman <spvenkat@usc.edu>

Reviewers: Bharath Kumarasubramanian <bkumaras@linkedin.com>, Prateek Maheshwari <pmaheshwari@apache.org>, Daniel Nishimura <dnishimu@linkedin.com>

Closes #583 from shanthoosh/metadata_store_iface

4 weeks agoSAMZA-1822: Samza 0.14.1 not correctly handling OffsetOutOfRangeException exception
Gaurav Agarwal [Mon, 27 Aug 2018 21:45:05 +0000 (14:45 -0700)] 
SAMZA-1822: Samza 0.14.1 not correctly handling OffsetOutOfRangeException exception

4 weeks agoSAMZA-1804: System and Stream Descriptors
Prateek Maheshwari [Mon, 27 Aug 2018 19:06:27 +0000 (12:06 -0700)] 
SAMZA-1804: System and Stream Descriptors

Design details: https://cwiki.apache.org/confluence/display/SAMZA/SEP-14%3A+System+and+Stream+Descriptors

Author: Prateek Maheshwari <pmaheshwari@apache.org>

Reviewers: Yi Pan <nickpan47@gmail.com>, Cameron Lee <calee@linkedin.com>

Closes #603 from prateekm/stream-descriptor

4 weeks agoSAMZA-1826: Fix unit test failure in table tests
Yi Pan (Data Infrastructure) [Mon, 27 Aug 2018 01:51:12 +0000 (18:51 -0700)] 
SAMZA-1826: Fix unit test failure in table tests

The way that we verify the join counts in PageViewToProfileJoinFunction across multiple table tests (i.e. TestLocalTable, TestRemoteTable, TestLocalTableWithSideInputs) creates some conflicts in the static counter map and triggers test failure in certain sequence of ordering of tests. Fixing it by requiring explicit name of the join functions in tests and register / verify by unique op names.

Author: Yi Pan (Data Infrastructure) <nickpan47@gmail.com>

Closes #617 from nickpan47/SAMZA-1826 and squashes the following commits:

a25c484d [Yi Pan (Data Infrastructure)] SAMZA-1826: removing assertion on internal state of MapFunction in integration tests
566f13c5 [Yi Pan (Data Infrastructure)] SAMZA-1826: Fix unit test failure in table tests

4 weeks agoSAMZA-1825: Fix for the test failure
Srinivasulu Punuru [Fri, 24 Aug 2018 18:27:19 +0000 (11:27 -0700)] 
SAMZA-1825: Fix for the test failure

4 weeks agoSAMZA-1820: Support for all the calcite timestamp functions
Srinivasulu Punuru [Fri, 24 Aug 2018 00:04:29 +0000 (17:04 -0700)] 
SAMZA-1820: Support for all the calcite timestamp functions

The DataContextImpl that we are passing to Calcite supports just the current_timestamp. This change adds support for all the other timestamp functions.
Timestamp functions like MONTH(DATE) needs support for EXTRACT function. Adding that to the operator table.

Author: Srinivasulu Punuru <spunuru@linkedin.com>

Reviewers: Aditya Toomula <atoomula@linkedin.com>

Closes #614 from srinipunuru/time.1

4 weeks agoSAMZA-1812: Added configuration for changelog to local store backed tables
Wei Song [Thu, 23 Aug 2018 23:33:16 +0000 (16:33 -0700)] 
SAMZA-1812: Added configuration for changelog to local store backed tables

Currently changelog for tables can be supported implicitly by adding user defined configuration, to be more user friendly we want to expose it though table API. This is applicable to RocksDB and in memory table implementation.
 - By default, changelog is enabled with auto-generated name <job-name><job-id>-table<table-id>
 - Added ability to disable changelog
 - Added ability to set changelog stream name, by default
 - Added validation of changelog stream name, and modify to confirm to Kafka topic name spec
 - Added configuration for replication factor
 - Disable changelog when user enables side input

Author: Wei Song <wsong@linkedin.com>

Reviewers: Bharath Kumarasubramanian <bkumarasubramanian@linkedin.com>

Closes #611 from weisong44/fix-changelog and squashes the following commits:

010bdfd0 [Wei Song] Updated based on review comments
9a58d566 [Wei Song] Merge branch 'master' into fix-changelog
f28b491d [Wei Song] Merge remote-tracking branch 'upstream/master'
2a4e85fa [Wei Song] Updated based on review comments
eae48aee [Wei Song] Updated based on review comments
25fe2ebc [Wei Song] Updated based on review comments
56fb7f27 [Wei Song] Merge branch 'master' into fix-changelog
4782c61d [Wei Song] Merge remote-tracking branch 'upstream/master'
721d261b [Wei Song] Added more unit tests to TestRocksDbTableDescriptor
83b7c2d5 [Wei Song] Merge branch 'master' into fix-changelog
0440f75f [Wei Song] Merge remote-tracking branch 'upstream/master'
9ec54788 [Wei Song] SAMZA-1812: Added configuration for changelog to local store backed tables
aae0f380 [Wei Song] Merge remote-tracking branch 'upstream/master'
a15a7c9a [Wei Song] Merge remote-tracking branch 'upstream/master'
5cbf9af9 [Wei Song] Merge remote-tracking branch 'upstream/master'
3f7ed71f [Wei Song] Added self to committer list

4 weeks agoString format params fix.
Boris S [Thu, 23 Aug 2018 00:51:16 +0000 (17:51 -0700)] 
String format params fix.

Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Bharath Kumarasubramanian <codin.martial@gmail.com>, Prateek Maheshwari <pmaheshwari@apache.org>

Closes #613 from sborya/StringFormatParams

5 weeks agoSAMZA-1813: ApplicationRunner should use Planner generated configs for StreamManager
Prateek Maheshwari [Tue, 21 Aug 2018 17:59:18 +0000 (10:59 -0700)] 
SAMZA-1813: ApplicationRunner should use Planner generated configs for StreamManager

Author: Prateek Maheshwari <pmaheshwari@apache.org>

Reviewers: Jagadish Venkatraman <vjagadish1989@gmail.com>, Bharath Kumarasubramanian <bkumaras@linkedin.com>

Closes #612 from prateekm/stream-manager

5 weeks agoSAMZA-1733: Adding comments, adding emptyness check to MetricsSnapshotReporter message
rmatharu@linkedin.com [Sat, 18 Aug 2018 01:41:09 +0000 (18:41 -0700)] 
SAMZA-1733: Adding comments, adding emptyness check to MetricsSnapshotReporter message

* Added comments for MetricsConfig
* Added simple emptyness check for MetricsSnapshotReporter

Author: rmatharu@linkedin.com <rmatharu@linkedin.com>

Reviewers: Prateek M<pmaheshw@linkedin.com>

Closes #610 from rmatharu/bugfix-emptymessage

5 weeks agoSAMZA-1733: Bugfix: Moving to use string for type info, to avoid ClassNotFound on...
rmatharu@linkedin.com [Thu, 16 Aug 2018 00:41:25 +0000 (17:41 -0700)] 
SAMZA-1733: Bugfix: Moving to use string for type info, to avoid ClassNotFound on deserial

Exception type was serialized as a Class. However in case of Exceptions defined in non-samza-related or user-defined libs, the ExceptionEvent fails to be serialized.

Author: rmatharu@linkedin.com <rmatharu@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #609 from rmatharu/stringclasstype

5 weeks agoSAMZA-1809: Add more logging to intermediate topic partition count inference
bharathkk [Wed, 15 Aug 2018 22:18:57 +0000 (15:18 -0700)] 
SAMZA-1809: Add more logging to intermediate topic partition count inference

Author: bharathkk <codin.martial@gmail.com>

Reviewers: Ahmed Abdul Hamid <ahabdulh@linkedin.com>

Closes #607 from bharathkk/execution-planner-log-fix

5 weeks agoSAMZA-1733: Create diagnostic topic if diagnostics enabled
rmatharu@linkedin.com [Wed, 15 Aug 2018 17:55:38 +0000 (10:55 -0700)] 
SAMZA-1733: Create diagnostic topic if diagnostics enabled

* Current SnapshotReporter semantics are to specify stream as <SYS-NAME>.<STREAM-NAME>

* We create topic in JobRunner so that AM can also emit metrics (if desired).

Author: rmatharu@linkedin.com <rmatharu@linkedin.com>

Reviewers: Cameron Lee <calee@linkedin.com>

Closes #602 from rmatharu/topiccreate

6 weeks agoSAMZA-1752: Pass full config to the IO resolver
Srinivasulu Punuru [Fri, 10 Aug 2018 23:45:12 +0000 (16:45 -0700)] 
SAMZA-1752: Pass full config to the IO resolver

SQL IO Resolver needs full configs so that it can filter out the configs specific to the source that the SQL application is interested in. This change provides the IO resolver with the full config.

Author: Srinivasulu Punuru <spunuru@linkedin.com>

Reviewers: Aditya Toomula <atoomula@linkedin.com>

Closes #557 from srinipunuru/fullconfig.1

6 weeks agoSAMZA-1806: Allow `task.broadcast.inputs` to be set to empty string
Abhishek Shivanna [Fri, 10 Aug 2018 22:55:36 +0000 (15:55 -0700)] 
SAMZA-1806: Allow `task.broadcast.inputs` to be set to empty string

This fix addresses the issue that getBroadcastSystemStreamPartitions
in TaskConfigJava throws an IllegalArgumentException exception when
`task.broadcast.inputs` is set to an empty string.

Author: Abhishek Shivanna <abhisheks91@gmail.com>

Reviewers: Bharath K <bkumarasubramanian@linkedin.com>

Closes #604 from abhishekshivanna/master

6 weeks agoSAMZA-1788: Add LocationIdProvider abstraction.
Shanthoosh Venkataraman [Fri, 10 Aug 2018 21:26:42 +0000 (14:26 -0700)] 
SAMZA-1788: Add LocationIdProvider abstraction.

Currently in standalone, by default hostName of the standalone processor is used as LocationId. However, for containerized environments like azure cloud, kubernetes this defaulting does not work. Standalone processors can be launched from different kubernetes container on a physical machine(where each kubernetes container has different locatliyID than other kubernetes container within same machine).

To solve this problem, we introduce locationID abstraction to allow users to plugin a uniqueId identifying the execution environment of the processor.

In containerized environments, LocationId is a composite key of multiple fields: (sliceId, containerId, hostname) By default hostname will be used as LocationId(if not configured by the user).

All the processors of an application registered from an locationID should be able to share(read/write) their local state stores. Any custom LocationIdProvider is expected to honor this contract when generating the locationID.

This patch is part of SEP-11. Please refer to it for more details.

Author: Shanthoosh Venkataraman <spvenkat@usc.edu>

Reviewers: Jagadish<jagadish@apache.org>

Closes #585 from shanthoosh/add_location_id_interface

6 weeks agoSAMZA-1763: Add async methods to Table API
Peng Du [Fri, 10 Aug 2018 18:21:12 +0000 (11:21 -0700)] 
SAMZA-1763: Add async methods to Table API

Currently, Table API only has blocking/sync methods which limit the
throughput of remote tables. This change adds async methods to the API
to enable high throughput remote table accesses through usage of async
IO. The new methods are added to ReadableTable and ReadWriteTable. A
high level summary of the change is below:

- added async methods to table ReadableTable and ReadWriteTable.
- added async methods to TableRead/WriteFunction
- CompletableFuture is used for the async abstraction
- CachingTable are updated to support async methods
- added default impls for sync methods backed by async in table functions
- added helper class, Throttler/AsyncHelper to ease table development
- fixed existing test cases with table implementations
- added more thorough unit tests to RemoteTable CRUD methods

Additionally remove explicit check of config entries for remote table in
TestTableDescriptorsProvider since there is already a test case on
RemoteTableDescriptor.

Author: Peng Du <pdu@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>, Wei <wsong@linkedin.com>

Closes #593 from pdu-mn1/async-table-api-futures

6 weeks agoSAMZA-1803: Make InMemoryManager system aware
Bharath Kumarasubramanian [Thu, 9 Aug 2018 17:49:51 +0000 (10:49 -0700)] 
SAMZA-1803: Make InMemoryManager system aware

- Fix getSystemStreamMetadata in InMemoryManager to filter based on the system name on top of stream names

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Xinyu Liu <xinyuiscool@github.com>

Closes #601 from bharathkk/in-memory-fix

6 weeks agoSAMZA-1733: Defaulting to non-null serde, in case none is specified
rmatharu@linkedin.com [Wed, 8 Aug 2018 00:43:15 +0000 (17:43 -0700)] 
SAMZA-1733: Defaulting to non-null serde, in case none is specified

Currently, if no system or stream serde is specified the SnapshotReporter simply crashes
because a MetricsSnapshot cannot be serialized using a
ByteArraySerializer (what it defaults to).

This change ensures that in this case the MetricsSnapshotReporter defaults to a valid constructor.

Author: rmatharu@linkedin.com <rmatharu@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #596 from rmatharu/defaultserde

6 weeks agoFix for transient error failure -- concurrent evictions causing incorrectness
rmatharu@linkedin.com [Wed, 8 Aug 2018 00:39:04 +0000 (17:39 -0700)] 
Fix for transient error failure -- concurrent evictions causing incorrectness

Author: rmatharu@linkedin.com <rmatharu@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #599 from rmatharu/bugfix

7 weeks agoSAMZA-1802: Enable host affinity when RocksDB is present
Wei Song [Tue, 7 Aug 2018 17:13:01 +0000 (10:13 -0700)] 
SAMZA-1802: Enable host affinity when RocksDB is present

We should enable host affinity when RocksDB table is present, this should be done in RocksDB table provider.

Author: Wei Song <wsong@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@linkedin.com>

Closes #600 from weisong44/add-host-affinity and squashes the following commits:

78e1b84a [Wei Song] Enable host affinity in RocksDB table provider
a15a7c9a [Wei Song] Merge remote-tracking branch 'upstream/master'
5cbf9af9 [Wei Song] Merge remote-tracking branch 'upstream/master'
3f7ed71f [Wei Song] Added self to committer list

7 weeks agoy
bharathkk [Sat, 4 Aug 2018 00:21:33 +0000 (17:21 -0700)] 
y

- Fixed serde issues during persisted offsets locally
- Added unit tests for TaskSideInputStorageManager
- Added integration tests for table integration

Author: bharathkk <codin.martial@gmail.com>
Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #594 from bharathkk/side-input-tests

7 weeks agoSAMZA-1769: Remote app runner status
Boris Shkolnik [Fri, 3 Aug 2018 18:21:43 +0000 (11:21 -0700)] 
SAMZA-1769: Remote app runner status

Call for 'status' or 'kill' does not require Execution plan calculation.

Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Xinyu Liu <xinyuiscool@github.com>

Closes #597 from sborya/RemoteAppRunnerStatus and squashes the following commits:

7e1feea0 [Boris S] retry
1c0b3e4f [Boris S] checkstyle
e8d8d517 [Boris S] skipp graph planner for app status command
88f85595 [Boris S] Merge branch 'master' of https://github.com/apache/samza
0edf343b [Boris S] Merge branch 'master' of https://github.com/apache/samza
67e611ee [Boris S] Merge branch 'master' of https://github.com/apache/samza
dd39d089 [Boris S] Merge branch 'master' of https://github.com/apache/samza
1ad58d43 [Boris S] Merge branch 'master' of https://github.com/apache/samza
06b1ac36 [Boris Shkolnik] Merge branch 'master' of https://github.com/sborya/samza
5e6f5fb5 [Boris Shkolnik] Merge branch 'master' of https://github.com/apache/samza
010fa168 [Boris S] Merge branch 'master' of https://github.com/apache/samza
bbffb79b [Boris S] Merge branch 'master' of https://github.com/apache/samza
d4620d66 [Boris S] Merge branch 'master' of https://github.com/apache/samza
410ce78b [Boris S] Merge branch 'master' of https://github.com/apache/samza
a31a7aa2 [Boris Shkolnik] reduce debugging from info to debug in KafkaCheckpointManager.java

7 weeks agoSAMZA-1790: LocalContainerRunner should not extend AbstractApplicationRunner.
Prateek Maheshwari [Thu, 2 Aug 2018 13:55:54 +0000 (06:55 -0700)] 
SAMZA-1790: LocalContainerRunner should not extend AbstractApplicationRunner.

LocalContainerRunner is the launcher for the process running SamzaContainer in YARN. It extends the AbstractApplicationRunner since the container was using ApplicationRunner#getStreamSpec to create StreamSpecs from config to create the High Level API DAG. It doesn't implement any of the other APIs from the ApplicationRunner.

With SAMZA-1659 and SAMZA-1745, SamzaContainer no longer needs access to StreamSpec to create and execute the High Level API DAG. We can now clean up the LocalContainerRunner implementation so that it doesn't need to implement the ApplicationRunner interface.

Author: Prateek Maheshwari <pmaheshwari@apache.org>

Reviewers: Jagadish Venkatraman <vjagadis1989@gmail.com>, Yi Pan <nickpan47@gmail.com>

Closes #586 from prateekm/lcr-cleanup

7 weeks agoSAMZA-1733: Adding filter mechanism to MetricsSnapshotReporter
rmatharu@linkedin.com [Thu, 2 Aug 2018 01:40:27 +0000 (18:40 -0700)] 
SAMZA-1733: Adding filter mechanism to MetricsSnapshotReporter

This regex based filter mechanism allow for blacklisting of metrics being reporter by the Snapshot Reporter, by specifying the blacklist in config.

* Backwards compatible -- no config => reports everything.
* Regex based, e.g.,
metrics.reporter.snapshot.blacklist=
.*
or
^(?!.*?(?:SamzaContainerMetrics|TaskInstanceMetrics)).*$

to report only container metrics and task instance metrics.

* Uses a hashset to ensure blacklisted metric are matched against the regex only once.

Author: rmatharu@linkedin.com <rmatharu@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #591 from rmatharu/filter

7 weeks agoSAMZA-1759: Stream Assert utilities for low level and high level api for TestFramework
Sanil Jain [Thu, 2 Aug 2018 00:18:21 +0000 (17:18 -0700)] 
SAMZA-1759: Stream Assert utilities for low level and high level api for TestFramework

Adding utilities and corresponding test for low and high level api

Author: Sanil Jain <snjain@linkedin.com>

Reviewers: Shanthoosh Venkataraman <spvenkat@usc.edu>

Closes #568 from Sanil15/SAMZA-1759 and squashes the following commits:

a4861089 [Sanil Jain] Reverting back travis increase for wait time
876a3a58 [Sanil Jain] Increase travis timeout
9e6482b1 [Sanil Jain] Fixing travis build, removing unused imports
526244e8 [Sanil Jain] Merge branch 'master' into SAMZA-1759
9f489acf [Sanil Jain] Moving tests that use MessageStreamAssert to same package name in test folder to use package private
a93e5a14 [Sanil Jain] Marking collection transient to ensure newer api changes work
5e6d3ed1 [Sanil Jain] Making MessageStreamAssert package private
a5a521cc [Sanil Jain] Splitting operator assertions outside StreamAssert to MessageStreamAssert, addressing review, renaming utils
d1e64180 [Sanil Jain] Cleaning unused imports
ff218ff7 [Sanil Jain] Removing contains method for operator level assertios for high level api
c5768772 [Sanil Jain] Merge branch 'SAMZA-1759' of https://github.com/Sanil15/samza into SAMZA-1759
c69d1bbb [Sanil Jain] StreamAssert Utilities for Low level and High Level Api, Adding More Test for Low Level api for testing multiple partitions and in mulithreaded mode
e3c8e2a5 [Sanil Jain] StreamAssert Utilities for Low level and High Level Api, Adding More Test for Low Level api for testing multiple partitions and in mulithreaded mode

7 weeks agoSAMZA-1796: PassthroughJobCoordinator doesn't create changelog streams
xinyuiscool [Wed, 1 Aug 2018 23:12:06 +0000 (16:12 -0700)] 
SAMZA-1796: PassthroughJobCoordinator doesn't create changelog streams

Currently only the ClusterBasedJobCoordinator and ZkJobCoordinator are creating changelog streams. The Passthrough one should also do it.

Author: xinyuiscool <xiliu@linkedin.com>

Reviewers: Bharath K <bharathkk@gmail.com>

Closes #595 from xinyuiscool/SAMZA-1796

7 weeks agoSAMZA-1794: setting application acl in launch context
Hai Lu [Wed, 1 Aug 2018 22:20:10 +0000 (15:20 -0700)] 
SAMZA-1794: setting application acl in launch context

Currently we don't set application acl for container launch context. See https://hadoop.apache.org/docs/r2.6.4/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setApplicationACLs(java.util.Map)

This could potentially cause problem if samza job is running on a secured YARN cluster. Say user A submits the job, then by default only user A can view the log and the status of the job. Even worse case is that user A submits the job through some proxy account, then even user A herself/himself couldn't access to logs/status of the application.

We need to make some changes for the YARN application submission to set application acls in launch context as configured.

Author: Hai Lu <halu@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #592 from lhaiesp/master

8 weeks agoSAMZA-1768: Handle corrupted OFFSET file
Xinyu Liu [Mon, 30 Jul 2018 18:46:39 +0000 (11:46 -0700)] 
SAMZA-1768: Handle corrupted OFFSET file

This patch addresses the following tickets:

SAMZA-1778: SIGSEGV when reading properties (metrics) on a closed RocksDB store
SAMZA-1777: Logged store OFFSET file write during flush should be atomic
SAMZA-1768: Handle corrupted OFFSET file elegantly

Author: xinyuiscool <xiliu@linkedin.com>

Reviewers: Prateek M <prateek@apache.org>

Closes #588 from xinyuiscool/SAMZA-1768

8 weeks agoSAMZA-1785: add retry logic in eventhubs system consumer for non transient error
Hai Lu [Mon, 30 Jul 2018 16:28:16 +0000 (09:28 -0700)] 
SAMZA-1785: add retry logic in eventhubs system consumer for non transient error

Implement a retry logic in EH system consumer because of lack of nurse job on azure and lack of retry logic in samza standlone.

The retry logic can be tuned through config to control max retry count allowed within a certain time window (sliding window).

Author: Hai Lu <halu@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #587 from lhaiesp/master

8 weeks agoSAMZA-1791: Fixing problem with Serde of StackTraceElement, was causing test failure
rmatharu@linkedin.com [Sat, 28 Jul 2018 01:29:33 +0000 (18:29 -0700)] 
SAMZA-1791: Fixing problem with Serde of StackTraceElement, was causing test failure

Root cause of the failing test (after constructor param addition) is that java's StackTraceElement does not serialize/deserialize properly in all cases.

Counter example:

StackTraceElement s1 = new StackTraceElement("a", "b", null, 10);
byte[] b = new ObjectMapper().writeValueAsString(s1).getBytes("UTF-8");
StackTraceElement s2 = new ObjectMapper().readValue(b, StackTraceElement.class);
System.out.println(s1.equals(s2));// prints false

In reality, the null fileName for StackTraceElement **occurrs** for java's internal method calls
such as org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:50), sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)

Author: rmatharu@linkedin.com <rmatharu@linkedin.com>

Reviewers: Shanthoosh Venkatraman <svenkata@linkedin.com>

Closes #590 from rmatharu/testfix

8 weeks agoSAMZA-1733: Populating ListGauge metric using DiagnosticsAppender for exceptions
Ray Matharu [Fri, 27 Jul 2018 18:53:41 +0000 (11:53 -0700)] 
SAMZA-1733: Populating ListGauge metric using DiagnosticsAppender for exceptions

This PR shows how the ListGauge can be used to emit exceptions using a DiagnosticsAppender.
1. DiagnosticsAppender is enabled using a config (diagnostics.appender.enable)
2. DiagnosticsAppender adds exception-events to a listgauge which is a samza container metric
2. This ListGauge uses a time-and-count based eviction policy, so that exception-events are not emitted to Kafka(SnapshotReporter) forever.

Author: Ray Matharu <rmatharu@linkedin.com>

Reviewers: Yi Pan <nickpan47@gmail.com>, Jagadish Venkatraman <jvenkatr1989@gmail.com>, Shanthoosh Venkatraman <svenkatr@linkedin.com>

Closes #543 from rayman7718/diagnosticsappender

8 weeks agoSAMZA-1745: Remove all usages of StreamSpec and ApplicationRunner from the operator...
Prateek Maheshwari [Fri, 27 Jul 2018 18:24:00 +0000 (11:24 -0700)] 
SAMZA-1745: Remove all usages of StreamSpec and ApplicationRunner from the operator spec and impl layers.

This PR is a pre-requisite for adding support for user-provided SystemDescriptors and StreamDescriptors to the High Level API.

It removes all usages of StreamSpec and ApplicationRunner from the OperatorSpec and OperatorImpl layers. DAG specification (StreamGraphSpec, OperatorSpecs) now only relies on logical streamIds (and in future, will use the user-provided StreamDescriptors). DAG execution (i.e., StreamOperatorTask, OperatorImpls) now only relies on logical streamIds and their corresponding SystemStreams, which are obtained using StreamConfig in OperatorImplGraph.

After this change, StreamSpec can be thought of as the API between StreamManager and SystemAdmins for creating and validating streams. Ideally ExecutionPlanner shouldn't rely on StreamSpec either, but it currently does so extensively, so I'll leave that refactor for later.

Additional changes:
1. ApplicationRunner is no longer responsible for creating/returning StreamSpec instances. Instances can be created directly using the StreamSpec constructors, or by using one of the util methods in the new StreamUtil class.

2. StreamSpec class no longer tracks the isBroadcast and isBounded status for streams.
The former was being used for communicating broadcast status from the StreamGraphSpec to the planner so that it could write the broadcast input configurations. This is now done using a separate Set of broadcast streamIds in StreamGraphSpec.
The latter was being set by the ApplicationRunner based on a config, and then passed to the planner so that it could write the bounded input configs. This was redundant, so I removed it.

Author: Prateek Maheshwari <pmaheshwari@apache.org>
Author: Prateek Maheshwari <prateekm@utexas.edu>
Author: Prateek Maheshwari <pmaheshwari@linkedin.com>
Author: prateekm <prateekm@utexas.edu>

Reviewers: Jagadish Venkatraman <vjagadish1989@gmail.com>, Yi Pan <nickpan47@gmail.com>, Cameron Lee <calee@linkedin.com>

Closes #552 from prateekm/stream-spec-cleanup

2 months agoHandle end of stream in BootstrapChooser
Bharath Kumarasubramanian [Fri, 27 Jul 2018 00:17:51 +0000 (17:17 -0700)] 
Handle end of stream in BootstrapChooser

Handle end of stream envelopes in bootstrap chooser and don't invoke check offsets since offset of end of stream are not comparable.

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #582 from bharathkk/end-of-stream

2 months agoNO-JIRA: Change instances of "is catched up" to "is caught up"
Tom Butterwith [Thu, 26 Jul 2018 18:18:04 +0000 (11:18 -0700)] 
NO-JIRA: Change instances of "is catched up" to "is caught up"

Replaces any use of "catched" with "caught"

Author: Tom Butterwith <thomas.butterwith@skyscanner.net>

Reviewers: Jagadish Venkatraman <vjagadish1989@gmail.com>, Shanthoosh Venkatraman <svenkatr@linkedin.com>

This patch had conflicts when merged, resolved by
Committer: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #573 from tbutterwith/NO-TICKET/Fix-grammer-when-job-catches-up

2 months agoSAMZA-1733: Adding containerID to metric header
rmatharu@linkedin.com [Wed, 25 Jul 2018 22:03:02 +0000 (15:03 -0700)] 
SAMZA-1733: Adding containerID to metric header

Adding containerID to MetricsHeader (published by MetricsSnapshotReporter).
It is populated using the value set for the env variable ShellCommandConfig.ENV_EXECUTION_ENV_CONTAINER_ID

Author: rmatharu@linkedin.com <rmatharu@linkedin.com>

Reviewers: Yi Pan<nickpan47@gmail.com>

Closes #572 from rmatharu/containerid

2 months agoSAMZA-1677: Make httpcore and httpclient dependencies consistent
Cameron Lee [Wed, 25 Jul 2018 18:19:11 +0000 (11:19 -0700)] 
SAMZA-1677: Make httpcore and httpclient dependencies consistent

Samza currently depends on httpclient 4.5.2 and httpcore 4.4.5. However, httpclient 4.5.2 also has a direct dependency on httpcore 4.4.4, which is not backwards compatible with httpcore 4.4.5 since some classes were removed (e.g. ThreadSafe/NotThreadSafe annotation classes).

Although this does not currently cause any direct build problems, there may be cases where this conflict introduces transitive dependency conflicts. In addition, this inconsistency can cause confusion in future development if those libraries need to be used.

Author: Cameron Lee <calee@linkedin.com>

Reviewers: Jagadish Venkatraman <vjagadish1989@gmail.com>, Shanthoosh Venkatraman <svenkatr@linkedin.com>

Closes #534 from cameronlee314/httpcore

2 months agoSide inputs: Bug fixes
Bharath Kumarasubramanian [Wed, 25 Jul 2018 05:17:25 +0000 (22:17 -0700)] 
Side inputs: Bug fixes

 - Use the correct regex in Util#getSystemStreamFromNameOrId
 - Handle end of stream during dispatch of message to side input
processor
 - Fix validation to check for presence of side input processor for a
given store instead of looking up the side input processor factory.

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #580 from bharathkk/side-input-bugs

2 months agoSAMZA-1784: Update committers.md
Wei Song [Wed, 25 Jul 2018 05:12:41 +0000 (22:12 -0700)] 
SAMZA-1784: Update committers.md

Added "Wei Song" to committer list

Author: Wei Song <wsong@linkedin.com>

Reviewers: Xinyu Liu <xiliu@linkedin.com>

Closes #579 from weisong44/committer

2 months agoSAMZA-1779: Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types
Aditya Toomula [Tue, 24 Jul 2018 21:25:19 +0000 (14:25 -0700)] 
SAMZA-1779: Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Srinivasulu Punuru <spunuru@linkedin.com>

Closes #575 from atoomula/bytes1 and squashes the following commits:

855a03d7 [Aditya Toomula] Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types
df4886d8 [Aditya Toomula] Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types
80268fc1 [Aditya Toomula] Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types

2 months agoSAMZA-1782: Making getTableSpecs API in TableConfigGenerator util class public
Aditya Toomula [Tue, 24 Jul 2018 19:28:55 +0000 (12:28 -0700)] 
SAMZA-1782: Making getTableSpecs API in TableConfigGenerator util class public

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #577 from atoomula/table3

2 months agoSAMZA-1773: Side inputs for local stores
Bharath Kumarasubramanian [Tue, 24 Jul 2018 16:01:09 +0000 (09:01 -0700)] 
SAMZA-1773: Side inputs for local stores

prateekm vjagadish
Please take a look.

I will update the PR with the unit tests for SideInputStorageManager and the util functions.

Author: Bharath Kumarasubramanian <bkumaras@linkedin.com>
Author: Prateek Maheshwari <pmaheshwari@apache.org>
Author: Prateek Maheshwari <pmaheshw@pmaheshw-mn2.linkedin.biz>

Reviewers: Cameron Lee <calee@linkedin.com>, Jagadish Venkatraman <vjagadish1989@gmail.com>, Shanthoosh Venkatraman <svenkatr@linkedin.com>

Closes #570 from bharathkk/side-input-v3

2 months agoSAMZA-1781: Minor: Disable flaky samza-yarn test. Fix tracked in
Jagadish [Tue, 24 Jul 2018 01:35:20 +0000 (18:35 -0700)] 
SAMZA-1781: Minor: Disable flaky samza-yarn test. Fix tracked in

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #576 from vjagadish1989/flakytest-1

2 months agoSAMZA-1775: add some delay before renew under transient EH exception
Hai Lu [Fri, 20 Jul 2018 03:23:30 +0000 (20:23 -0700)] 
SAMZA-1775: add some delay before renew under transient EH exception

There is no delay at all before we renew the partition. This sometimes lead to spam in the log for the following messages:

Received transient exception from EH client. Renew partition receiver for ssp ...

Author: Hai Lu <halu@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #571 from lhaiesp/master

2 months agoSAMZA-1774: Support table API in low level
Aditya Toomula [Thu, 19 Jul 2018 22:06:27 +0000 (15:06 -0700)] 
SAMZA-1774: Support table API in low level

Code changes to support table in low level API.

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Srini P<spunuru@linkedin.com>

Closes #556 from atoomula/table1

2 months agoSAMZA-1755: Fix JsonRelConverter to recursively convert relRecords.
Aditya Toomula [Fri, 13 Jul 2018 22:26:34 +0000 (15:26 -0700)] 
SAMZA-1755: Fix JsonRelConverter to recursively convert relRecords.

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Srini P<spunuru@linkedin.com>

Closes #558 from atoomula/sql1

2 months agoSAMZA-1730: Adding state valiations in StreamProcessor before any lifecycle operation...
Shanthoosh Venkataraman [Tue, 10 Jul 2018 23:34:58 +0000 (16:34 -0700)] 
SAMZA-1730: Adding state valiations in StreamProcessor before any lifecycle operation and group coordination.

Author: Shanthoosh Venkataraman <santhoshvenkat1988@gmail.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #535 from shanthoosh/abced

2 months agoSAMZA-1761: Reduce runTime of TestZkUtils test from 40s to 800ms.
Shanthoosh Venkataraman [Mon, 9 Jul 2018 19:29:14 +0000 (12:29 -0700)] 
SAMZA-1761: Reduce runTime of TestZkUtils test from 40s to 800ms.

Author: Shanthoosh Venkataraman <santhoshvenkat1988@gmail.com>

Reviewers: Prateek <pmaheshw@linkedin.com>

Closes #565 from shanthoosh/SAMZA-1761

2 months agoSAMZA-1735: Adding ListGaugeMBean, to enable MBean validation
Ray Matharu [Fri, 29 Jun 2018 23:40:21 +0000 (16:40 -0700)] 
SAMZA-1735: Adding ListGaugeMBean, to enable MBean validation

Adding ListGaugeMBean, to enable MBean validation.
Tested with LocalContainerRunner and YARN job.

JIRA SAMZA-1733/ SAMZA-1735.

Author: Ray Matharu <rmatharu@linkedin.com>

Reviewers: Cameron Lee <calee@linkedin.com>

Closes #567 from rmatharu/ListGaugeMBean

2 months agoSAMZA-1758: Configuring a timeout for TestRunner to execute the SamzaJob
sanil15 [Thu, 28 Jun 2018 22:48:28 +0000 (15:48 -0700)] 
SAMZA-1758: Configuring a timeout for TestRunner to execute the SamzaJob

Author: sanil15 <sanil.jain15@gmail.com>
Author: Sanil Jain <snjain@linkedin.com>

Reviewers: Bharath Kumarasubramanian <codin.martial@gmail.com>

Closes #563 from Sanil15/SAMZA-1758 and squashes the following commits:

5b198a0f [Sanil Jain] Fixing a bug for preconditions
6d2fd334 [Sanil Jain] Adressing Review
1eb3847c [Sanil Jain] Removing explicit handling of TimeoutException and adding more docs
0a9689c9 [sanil15] Addressing Review, moving tests from SamzaFailureTests, improving doc, adding validation
b79b5628 [sanil15] Using ExceptionUtils to get full stack trace, adding more docs
dd816ff8 [sanil15] Addressing review, using waitForFinish(timeout) to configure a timeout for TestRunner, adding some Failure tests
903c1162 [sanil15] Configuring a timeout for TestRunner to execute the SamzaJob

2 months agoSAMZA-1738: Merge in some minor additions from Linkedin branch
Cameron Lee [Thu, 28 Jun 2018 16:10:44 +0000 (09:10 -0700)] 
SAMZA-1738: Merge in some minor additions from Linkedin branch

Author: Cameron Lee <calee@linkedin.com>

Reviewers: Yi Pan <nickpan47@gmail.com>

Closes #549 from cameronlee314/sync_li_trunk

2 months agoSAMZA-1762: Fix Memory link in the Timer Registry Map
xinyuiscool [Tue, 26 Jun 2018 21:35:00 +0000 (14:35 -0700)] 
SAMZA-1762: Fix Memory link in the Timer Registry Map

Found a memory leak in the SystemTimerScheduler which does not remove the timers from scheduledFutures after the timers are fired. This caused memory problem for Samza jobs using TimerFn feature. This patch fixes this issue.

Author: xinyuiscool <xiliu@linkedin.com>

Reviewers: Boris S <boris@apache.org>

Closes #566 from xinyuiscool/SAMZA-1762

3 months agoSAMZA-1753: Added timestamp to Incoming message envelope.
Boris S [Mon, 25 Jun 2018 20:08:50 +0000 (13:08 -0700)] 
SAMZA-1753: Added timestamp to Incoming message envelope.

Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: xinyuiscool@apache.org

Closes #559 from sborya/kafkaTS

3 months agoMinor: Changing tests which use resource files to use Class.getResource instead of...
Cameron Lee [Sat, 23 Jun 2018 01:18:06 +0000 (18:18 -0700)] 
Minor: Changing tests which use resource files to use Class.getResource instead of File to get the path to the resource; without this change, these tests fail when run in the IDE

Author: Cameron Lee <calee@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #562 from cameronlee314/get_resource

3 months agoSAMZA-1718: Simplify management of Zookeeper coordination state
Jagadish [Sat, 23 Jun 2018 01:06:51 +0000 (18:06 -0700)] 
SAMZA-1718: Simplify management of Zookeeper coordination state

1. Currently coordination related state is spread across several Zookeeper classes. There are also back-and-forth flows that exist between the ZkJobCoordinator, ZkControllerImpl, ZkControllerListener and ZkLeaderElector. This PR nukes un-necessary interfaces (and their implementation classes), simplifies state management and unifies state in the ZkJobCoordinator class.

2. Clearly defined life-cycle hooks on events:
- Protocol validations happen once during the lifecycle of a StreamProcessor (instead of each new session)
- New subscriptions to listeners happen at each a new Zk session

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Prateek M<pmaheshw@linkedin.com>

Closes #525 from vjagadish/zk-simplify

3 months agoFixed failed unit tests in TestCachingTable
Wei Song [Sat, 23 Jun 2018 00:29:29 +0000 (17:29 -0700)] 
Fixed failed unit tests in TestCachingTable

This is due to the recent refactoring of table metrics, for some reason running build locally didn't catch these failed tests.

Author: Wei Song <wsong@wsong-mn2.linkedin.biz>
Author: Cameron Lee <calee@linkedin.com>
Author: Jagadish <jvenkatraman@linkedin.com>
Author: Ray Matharu <rmatharu@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>, Cameron L<calee@linkedin.com>

Closes #560 from weisong44/table-metrics-fix

3 months agoSAMZA-1748: Standalone failure tests.
Shanthoosh Venkataraman [Fri, 22 Jun 2018 22:53:43 +0000 (15:53 -0700)] 
SAMZA-1748: Standalone failure tests.

In the standalone model, a processor can leave and join the group at any point in time.  This processor reshuffle is referred to as rebalancing which results in task(work) redistribution amongst other available, live processors in the group.

Processor rebalancing in existing standalone integration tests(junit tests) is accomplished through clean shutdown of the processors. However, in real production scenarios, processor rebalancing is triggered through unclean shutdown and full garbage collection(GC) of the processors.

As a part of this patch to cover those scenarios, the following integration tests are added.

1. Force killing the leader processor of the group.
2. Force killing a single follower in the group.
3. Force killing multiple followers in the group.
4. Force killing the leader and a follower in the  group.
5. Suspending and resuming the leader of the group.

Since existing standalone integration tests cover event consumption/production after the re-balancing phase, these new tests will just test the coordination. We'll iterate on this initial suite and add tests whenever necessary.

Author: Shanthoosh Venkataraman <santhoshvenkat1988@gmail.com>

Reviewers: Jagadish V <vjagadish1989@apache.org>

Closes #554 from shanthoosh/standalone_failure_tests

3 months agoSAMZA-1756: System exit calls in ApplicationRunnerMain break ProcessJob and cause...
Cameron Lee [Fri, 22 Jun 2018 22:46:14 +0000 (15:46 -0700)] 
SAMZA-1756: System exit calls in ApplicationRunnerMain break ProcessJob and cause unit tests to get skipped

Author: Cameron Lee <calee@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #561 from cameronlee314/app_runner_main_exit

3 months agoSAMZA-1648: Integration Test Framework & Collection Stream Impl
sanil15 [Fri, 22 Jun 2018 22:19:50 +0000 (15:19 -0700)] 
SAMZA-1648: Integration Test Framework & Collection Stream Impl

This patch provides the following:
- TestRunner: Tesing Wrapper to run Samza job
- CollectionStream: Acts as a stream descriptor for in memory collections
- CollectionStreamSystem: System associated with a Collection
- StreamUtils: Utilities over streams
- Sample example of tests

Link to SEP: https://cwiki.apache.org/confluence/display/SAMZA/SEP-12%3A+Integration+Test+Framework

Author: sanil15 <sanil.jain15@gmail.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #501 from Sanil15/SAMZA-1648

3 months agoSAMZA-1726: Isolate InMemorySystemFactory to run separately per job
sanil15 [Fri, 22 Jun 2018 19:44:57 +0000 (12:44 -0700)] 
SAMZA-1726: Isolate InMemorySystemFactory to run separately per job

Tested by running the corresponding integration and unit tests

Author: sanil15 <sanil.jain15@gmail.com>

Reviewers: Xinyu Liu <xinyuliu.us@gmail.com>

Closes #532 from Sanil15/SAMZA-1726

3 months agoSAMZA-1751: Refactored metrics for table API
Wei Song [Fri, 15 Jun 2018 23:27:45 +0000 (16:27 -0700)] 
SAMZA-1751: Refactored metrics for table API

Refactored metrics for table API
 - Added TableMetricsUtil that encapsulates required parameters, maintains naming consistency and simplifies metrics creation API for tables.
 - Added metrics to local table
 - Maintained consistency between local, remote and caching table

Author: Wei Song <wsong@wsong-mn2.linkedin.biz>

Reviewers: Peng Du<pdu@linkedin.com>

Closes #555 from weisong44/table-metrics

3 months agoSAMZA-1747: Add metric to measure effectiveness of host-affinity
Jagadish [Thu, 14 Jun 2018 21:04:53 +0000 (14:04 -0700)] 
SAMZA-1747: Add metric to measure effectiveness of host-affinity

We require visibility into how effectively host-affinity performs. The goal is to help easily answer the following questions.
- How effectively is YARN matching my preferred-host requests
- When does Samza fallback to abandoning locality and issuing any-host requests?

design doc: https://docs.google.com/document/d/1oeNKDnG4JIGT2846us-jpnGW_RUjMjPIKDeEUlE_-jg/edit#

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Prateek M<pmaheshw@linkedin.com>

Closes #553 from vjagadish1989/hostaffinity-metrics

3 months agoAdd a new ListGauge metric-type
Ray Matharu [Thu, 14 Jun 2018 04:01:53 +0000 (21:01 -0700)] 
Add a new ListGauge metric-type

This PR introduces a ListGauge type,
A subsequent PR: https://github.com/apache/samza/pull/543 shows how this issued in conjunction with a diagnostics appender for error-tracking and dumpting to kafka via SnapshotReporter.

Author: Ray Matharu <rmatharu@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>, Cameron Lee <calee@linkedin.com>

Closes #541 from rayman7718/listgauge

3 months agoSAMZA-1709: Use lazy creation for SystemAdmins in ApplicationRunners
Cameron Lee [Wed, 13 Jun 2018 23:12:44 +0000 (16:12 -0700)] 
SAMZA-1709: Use lazy creation for SystemAdmins in ApplicationRunners

An instance of SystemAdmins is created when instantiating any AbstractApplicationRunner, but the SystemAdmins is only actually needed for some of the methods for some of the runners. For example, LocalApplicationRunner.kill does not need SystemAdmins, and LocalContainerRunner does not need SystemAdmins for anything.
Doing lazy instantiation allows us to more easily manage the SystemAdmins lifecycle, since it removes the need to add lifecycle hooks for the ApplicationRunner.
This also fixes the lifecycle management for SystemAdmins in ApplicationRunners.

Author: Cameron Lee <calee@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #512 from cameronlee314/runnner_system_admins

3 months agoSAMZA-1670: When fetching a newest offset for a partition, also prefetch and cache...
Cameron Lee [Wed, 13 Jun 2018 23:04:02 +0000 (16:04 -0700)] 
SAMZA-1670: When fetching a newest offset for a partition, also prefetch and cache the newest offsets for other partitions on the container

Author: Cameron Lee <calee@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #520 from cameronlee314/partition_metadata

3 months agoSAMZA-1740: Moving SamzaSqlRelRecord to samza-api as it is needed to be used in UDFs
Aditya Toomula [Mon, 11 Jun 2018 17:49:45 +0000 (10:49 -0700)] 
SAMZA-1740: Moving SamzaSqlRelRecord to samza-api as it is needed to be used in UDFs

Please see description in the ticket. Also, implementing equals and hashCode methods for SamzaSqlRelRecord and SamzaSqlRelMessage.

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Srini P<spunuru@linkedin.com>, Jagadish <jagadish@apache.org>

Closes #545 from atoomula/sql

3 months agoFix loadDefaults error msg.
Boris S [Fri, 8 Jun 2018 21:25:27 +0000 (14:25 -0700)] 
Fix loadDefaults error msg.

SAMZA-1744

Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Xinyu Liu <xinyu@apache.org>

Closes #551 from sborya/loadDefaultsErrorMsg and squashes the following commits:

c3003ad2 [Boris S] Fixed error message
0edf343b [Boris S] Merge branch 'master' of https://github.com/apache/samza
67e611ee [Boris S] Merge branch 'master' of https://github.com/apache/samza
dd39d089 [Boris S] Merge branch 'master' of https://github.com/apache/samza
1ad58d43 [Boris S] Merge branch 'master' of https://github.com/apache/samza
06b1ac36 [Boris Shkolnik] Merge branch 'master' of https://github.com/sborya/samza
5e6f5fb5 [Boris Shkolnik] Merge branch 'master' of https://github.com/apache/samza
010fa168 [Boris S] Merge branch 'master' of https://github.com/apache/samza
bbffb79b [Boris S] Merge branch 'master' of https://github.com/apache/samza
d4620d66 [Boris S] Merge branch 'master' of https://github.com/apache/samza
410ce78b [Boris S] Merge branch 'master' of https://github.com/apache/samza
a31a7aa2 [Boris Shkolnik] reduce debugging from info to debug in KafkaCheckpointManager.java

3 months agoSAMZA-1742: Add metrics reporter parameter to LocalApplicationRunner constructor.
Daniel Nishimura [Fri, 8 Jun 2018 21:02:53 +0000 (14:02 -0700)] 
SAMZA-1742: Add metrics reporter parameter to LocalApplicationRunner constructor.

vjagadish1989 this has already been reviewed and approved by you and cameronlee314 internally. Please approve here. Thanks!

Author: Daniel Nishimura <dnishimura@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #550 from dnishimura/samza-1742-localapplicationrunner-custom-metrics

3 months agoSAMZA-1741: fix issue that EH consumer taking too long to shutdown
Hai Lu [Fri, 8 Jun 2018 17:05:36 +0000 (10:05 -0700)] 
SAMZA-1741: fix issue that EH consumer taking too long to shutdown

1.  lower the shutdown timeout from 1 min to 15 seconds
2. make sure EventHubManagers are shutdown in parallel
3. print a thread dump when we do fail during shutdown

Author: Hai Lu <halu@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>, Prateek <pmaheshw@linkedin.com>

Closes #548 from lhaiesp/master

3 months agoConvert a put to to a delete operation in ReadWriteTable and TableWriteFunction when...
Wei Song [Thu, 7 Jun 2018 23:51:30 +0000 (16:51 -0700)] 
Convert a put to to a delete operation in ReadWriteTable and TableWriteFunction when input value is null

Currently, the behavior of putting a null value is inconsistent: it is a delete for RocksDB, and not supported in in-memory store, and on a case-by-case basis for remote tables. It is desirable to unify the behavior. Furthermore, it eases the writing of a change captured stream to a table. A change captured stream contains typically 3 types of events: INSERT, UPDATE and DELETE, and they need to be applied properly when written to a table to produce a correct snapshot. In a change captured stream the payload of a DELETE event is typically is null, and this would result in a delete operation to a table in sendTo() operator.

Author: Wei Song <wsong@wsong-mn2.linkedin.biz>

Closes #547 from weisong44/table-fix

3 months agoMinor: KafkaConfig should treat empty changelog name as no changelog.
Prateek Maheshwari [Wed, 6 Jun 2018 18:00:06 +0000 (11:00 -0700)] 
Minor: KafkaConfig should treat empty changelog name as no changelog.

If a store changelog stream name is empty, treat is as a non-changelogged store instead of throwing an exception.

Author: Prateek Maheshwari <pmaheshwari@linkedin.com>

Reviewers: Jagadish Venkatraman <vjagadish1989@gmail.com>

Closes #546 from prateekm/kafka-changelog

3 months agoSAMZA-929: Set initialDelay in tokenRenewExecutor schedule to 0
Apoorva Sareen [Tue, 5 Jun 2018 18:44:49 +0000 (11:44 -0700)] 
SAMZA-929: Set initialDelay in tokenRenewExecutor schedule to 0

Changed initialDelay in tokenRenewExecutor scheduler to 0 so that it can re-login using the keytab as soon as the application master container starts.  This way even if application master restarts after the delegation token in launcher context has expired, it will be able to use the new token to launch other containers.

Author: Apoorva Sareen <asareen@MacBook-Pro-2.local>

Reviewers: Jagadish<jagadish@apache.org>

Closes #544 from apoorva121/master

3 months agoSAMZA-1736: Add counters and timers for batch get/put/delete operations in KeyValueSt...
Prateek Maheshwari [Fri, 1 Jun 2018 20:17:37 +0000 (13:17 -0700)] 
SAMZA-1736: Add counters and timers for batch get/put/delete operations in KeyValueStorageEngine

Author: Prateek Maheshwari <pmaheshwari@linkedin.com>

Reviewers: Cameron Lee <calee@linkedin.com>, Shanthoosh Venkatraman <svenkatr@linkedin.com>

Closes #539 from prateekm/store-metrics

3 months agoSAMZA-1719: Add caching support to table-api
Peng Du [Thu, 31 May 2018 17:43:30 +0000 (10:43 -0700)] 
SAMZA-1719: Add caching support to table-api

This change adds caching support for Samza tables. This is especially
useful for remote table where the accesses can have high latency for
applications that can tolerate staleness. Caching is added in the form
of a composite table that wraps the actual table and a cache. We reuse
the ReadWriteTable interface for the cache. A simple Guava cache-based
table is provided in this change.

Original PR was inadvertently closed: https://github.com/apache/samza/pull/524

Author: Peng Du <pdu@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>, Wei <wsong@linkedin.com>

Closes #531 from pdu-mn1/table-cache

3 months ago[MINOR] Add logging for EventHubs configs
Jagadish [Wed, 30 May 2018 22:51:08 +0000 (15:51 -0700)] 
[MINOR] Add logging for EventHubs configs

prateekm for review

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Prateek M<pmaheshw@linkedin.com>

Closes #540 from vjagadish1989/eh-logging

3 months agoSAMZA-1724: Guarantee exit from ApplicationRunnerMain during deploys
Prateek Maheshwari [Wed, 30 May 2018 17:37:23 +0000 (10:37 -0700)] 
SAMZA-1724: Guarantee exit from ApplicationRunnerMain during deploys

Author: Prateek Maheshwari <pmaheshw@LM-LSNSCDW6508.linkedin.biz>

Reviewers: Jagadish Venkatraman <vjagadish1989@gmail.com>

Closes #538 from prateekm/process-exit