samza.git
3 days agoSAMZA-2061: Make fixes in Samza 1.0 documentation master
pchhokra [Sat, 19 Jan 2019 00:20:07 +0000 (16:20 -0800)] 
SAMZA-2061: Make fixes in Samza 1.0 documentation

Author: pchhokra <pchhokra@linkedin.com>

Reviewers: Shanthoosh Venkataraman <spvenkat@usc.edu>

Closes #877 from PawasChhokra/documentation and squashes the following commits:

d15f2459 [pchhokra] Documentation changes
544d9aa3 [pchhokra] Documentation changes

4 days agoDiagnosticsAppender for log4j2
Ray Matharu [Thu, 17 Jan 2019 22:06:00 +0000 (14:06 -0800)] 
DiagnosticsAppender for log4j2

This PR adds a DiagnosticsAppender for log4j2.
There exists one already for log4j in samza-log4j.

Two points to note
1. The Appender, LogEvent, and Configuration APIs between log4j2 and log4j are completely different.

2. Log4j requires you to extend the AppenderSkeleton, while log4j2 requires extending the AbstractAppender.

So there was very little overlap (and very little value) in creating a base-class for the two appenders, specially given that java doesn't allow multi-inheritance.

Author: Ray Matharu <rmatharu@linkedin.com>

Reviewers: Pawas C<pchhokra@linkedin.com>

Closes #882 from rmatharu/log4j2appender

4 days agoSAMZA-2076: RocksDbTableDescriptor should use Long type for TTL
Wei Song [Thu, 17 Jan 2019 21:40:21 +0000 (13:40 -0800)] 
SAMZA-2076: RocksDbTableDescriptor should use Long type for TTL

Samza uses millisec as config value, while in rocksDB it's defined as int32. It's currently defined as integer in RocksDbTableDescriptor, the range isn't large enough to match, and it should be of Long type.

Author: Wei Song <wsong@linkedin.com>

Reviewers: Hai Lu <halu@linkedin.com>

Closes #887 from weisong44/SAMZA-2076

4 days agoy
Wei Song [Thu, 17 Jan 2019 19:56:05 +0000 (11:56 -0800)] 
y

Currently different features such as rate limiting, retries, etc. are implemented together in remote table, the implementation will become more and more complex and error prone as we add more functionality. It will be necessary to separate theses features into their own class/module. This is also a necessary step as we move on to add batching support.

 - Converted to a composition-based implementation
 - The remote table would only provide core functionality and basic metrics
 - Splitted ReadWriteTable into sync and async part (AsyncReadWriteTable)
 - Introduced AsyncRateLimitedTable, AsyncRetriableTable

Author: Wei Song <wsong@linkedin.com>

Reviewers: Jagadish Venkatraman <jvenkatraman@linkedin.com>

Closes #880 from weisong44/SAMZA-2066

4 days agoSAMZA-1965: SAMZA 1.0 DOCUMENTATION FOR TEST Framework
Sanil15 [Thu, 17 Jan 2019 19:17:26 +0000 (11:17 -0800)] 
SAMZA-1965: SAMZA 1.0 DOCUMENTATION FOR TEST Framework

Author: Sanil15 <sanil.jain15@gmail.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #751 from Sanil15/SAMZA-1965

5 days agoSAMZA-2056: Adding a TaskMode in the TaskModel
Ray Matharu [Thu, 17 Jan 2019 18:18:39 +0000 (10:18 -0800)] 
SAMZA-2056: Adding a TaskMode in the TaskModel

This PR adds a TaskMode (an enum) to the TaskModel.
This assignment is persisted to the metastore using a SetTaskModeMapping type.

Author: Ray Matharu <rmatharu@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>, Shanthoosh Venkatraman <svenkatr@linkedin.com>

Closes #871 from rmatharu/taskmode

6 days agoSAMZA-2073: Do not commit the task offsets when shutting down the SamzaContainer.
Shanthoosh Venkataraman [Wed, 16 Jan 2019 18:26:07 +0000 (10:26 -0800)] 
SAMZA-2073: Do not commit the task offsets when shutting down the SamzaContainer.

SAMZA-1489 added support for committing the offsets of all the  tasks when shutting down a SamzaContainer. Other components in samza such as a CheckpointListener's  aren't developed to account for possibility of commit after the consumers are stopped.

This  unnecessarily results in a unclean shutdown of a samza standalone processor during the rebalancing phase. Here're the sample logs:
```
apache.samza.container.SamzaContainer129d533c to shutdown.
2019/01/15 02:38:09.738 INFO [SamzaContainer] [Samza StreamProcessor Container Thread-0] [hello-brooklin-task] [] Shutting down SamzaContainer.
2019/01/15 02:38:09.738 INFO [SamzaContainer] [Samza StreamProcessor Container Thread-0] [hello-brooklin-task] [] Shutting down consumer multiplexer.
2019/01/15 02:38:09.741 INFO [KafkaProducer] [hello-brooklin-task-i001-auditor] [hello-brooklin-task] [] [Producer clientId=hello-brooklin-task-i001-auditor, transactionalId=nullClosing the Kafka producer with timeoutMillis = 9223370489334886066 ms.
2019/01/15 02:38:09.748 INFO [SamzaContainer] [Samza StreamProcessor Container Thread-0] [hello-brooklin-task] [] Shutting down task instance stream tasks.
2019/01/15 02:38:09.748 INFO [SamzaContainer] [Samza StreamProcessor Container Thread-0] [hello-brooklin-task] [] Shutting down timer executor
2019/01/15 02:38:09.749 INFO [SamzaContainer] [Samza StreamProcessor Container Thread-0] [hello-brooklin-task] [] Committing offsets for all task instances
2019/01/15 02:38:09.753 ERROR [SamzaContainer] [Samza StreamProcessor Container Thread-0] [hello-brooklin-task] [] Caught exception/error while shutting down container.
java.lang.IllegalStateException: This consumer has already been closed.
 at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:1735)
 at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1214)
 at com.linkedin.kafka.liclients.consumer.LiKafkaConsumerImpl.commitOffsets(LiKafkaConsumerImpl.java:311)
 at com.linkedin.kafka.liclients.consumer.LiKafkaConsumerImpl.commitSync(LiKafkaConsumerImpl.java:282)
 at com.linkedin.brooklin.client.BaseConsumerImpl.commit(BaseConsumerImpl.java:136)
```

Since the final commit is not critical, it will be better to not do it as a part of the SamzaContainer shutdown sequence.

Author: Shanthoosh Venkataraman <spvenkat@usc.edu>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #884 from shanthoosh/remove_task_commit_during_shutdown

6 days agoSAMZA-2072: Update guava to 23.0
xiliu [Wed, 16 Jan 2019 01:54:22 +0000 (17:54 -0800)] 
SAMZA-2072: Update guava to 23.0

Startpoint is relying on an old version of guava, which should be updated to 23.0 for the newer api.

Author: xiliu <xiliu@linkedin.com>

Reviewers: Hai Lu <ihaiesp@apache.org>

Closes #883 from xinyuiscool/SAMZA-2072

6 days agoSAMZA-2068: Separating container launch logic into util class
xiliu [Wed, 16 Jan 2019 00:34:27 +0000 (16:34 -0800)] 
SAMZA-2068: Separating container launch logic into util class

The container launch logic needs to be invoked for beam-runner to run beam containers. This is a small refactoring of LocalContainerRunner.java.

Author: xiliu <xiliu@linkedin.com>

Reviewers: Prateek M <prateekm@apache.org>

Closes #881 from xinyuiscool/SAMZA-2068

6 days agoSAMZA-2070: Upgrade calcite and add tests for array, map and nested records
Srinivasulu Punuru [Tue, 15 Jan 2019 22:06:04 +0000 (14:06 -0800)] 
SAMZA-2070: Upgrade calcite and add tests for array, map and nested records

Upgraded calcite version which adds support for array, map. It also adds basic support for nested records through a DOT Operator. But the DOT operator doesn't work item with its execution engine. because of couple of issues.

1. When there is a structured record, Calcite flattens the nested record through a LogicalProject using RelStructuredTypeFlattener. which generates plan like the one below. So it introduces RexFieldAccess operator to access fields "$4.zip". But RexToLixTranslator doesn't support RexFieldAccess yet.

`
LogicalProject(EXPR$0=[DOT(ITEM($7, 0), 'kind')])
  LogicalProject(__key__=[$0], id=[$1], name=[$2], companyId=[$3], zip=[$4.zip], number=[$4.streetnum.number], selfEmployed=[$5], phoneNumbers=[$6], mapValues=[$7])
    EnumerableTableScan(table=[[testavro, PROFILE]])
`

2. There is no Operator implementation for DOT Operator in the RexImpTable.

Hence disabled the NestedRecord test for now.

Author: Srinivasulu Punuru <spunuru@linkedin.com>

Reviewers: Aditya Toomula <atoomula@linkedin.com>

Closes #873 from srinipunuru/struct.2 and squashes the following commits:

332d6746 [Srinivasulu Punuru] Removed the commented code
2fc02daf [Srinivasulu Punuru] Disabling the assertions in samza sql tests
1a995feb [Srinivasulu Punuru] Tests for array, map and nested records

7 days agoMinor: Disable outdated sonar-scanner in .travis.yml file.
Daniel Nishimura [Tue, 15 Jan 2019 16:27:34 +0000 (08:27 -0800)] 
Minor: Disable outdated sonar-scanner in .travis.yml file.

Causing Travis-CI builds to appear broken on master.

Author: Daniel Nishimura <dnishimura@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #868 from dnishimura/disable-sonar-scanner

7 days agoAdding myself to committers list.
Shanthoosh Venkataraman [Tue, 15 Jan 2019 04:48:12 +0000 (20:48 -0800)] 
Adding myself to committers list.

9 days agoFix typos in core-concepts.md
Jeffrey Dallatezza [Sat, 12 Jan 2019 22:38:16 +0000 (14:38 -0800)] 
Fix typos in core-concepts.md

This addresses some minor grammatical issues.

Author: Jeffrey Dallatezza <jeffreydallatezza@gmail.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #875 from jsdt/master

10 days agoSAMZA-2018: State restore improvements using RocksDB bulk load
Ray Matharu [Sat, 12 Jan 2019 00:23:01 +0000 (16:23 -0800)] 
SAMZA-2018: State restore improvements using RocksDB bulk load

This PR makes the following changes:
* Moves all the state-restore code from TaskStorageManager.scala to ContainerStorageManager (and its internal private java classes).
* Introduces a StoreMode in StorageEngineFactory.getStorageEngine to add a StoreMode enum.
* Changes RocksDB store creation to use that enum and use Rocksdb's bulk load option when creating store in bulk-load mode.
* Changes the ContainerStorageManager to create stores in BulkLoad mode when restoring, then closes such persistent and changelogged stores, and re-opens them in Read-Write mode.
* Adds tests for ContainerStorageManager and changes tests for TaskStorageManager accordingly.

Author: Ray Matharu <rmatharu@linkedin.com>
Author: rmatharu <40646191+rmatharu@users.noreply.github.com>

Reviewers: Jagadish Venkatraman <vjagadish1989@gmail.com>

Closes #843 from rmatharu/refactoringCSM

2 weeks agoSAMZA-1817: Adding pathing jar support for Long Classpath
Sanil15 [Mon, 7 Jan 2019 17:55:23 +0000 (09:55 -0800)] 
SAMZA-1817: Adding pathing jar support for Long Classpath

- To support long classpath with deterministic jar loading we use a concept called pathing jar
- A new archive called pathing.jar is created with a custom manifest
- This custom manifest file contains the deterministic classpath that we construct

Readings:
- [Adding Classes to the JAR File's Classpath
](https://docs.oracle.com/javase/tutorial/deployment/jar/downman.html)
- [How to avoid argument line too long with manifest files
](https://stackoverflow.com/questions/3057841/too-long-line-in-manifest-file-while-trying-to-create-jar)
- [Pathing Jar information: (from tools)
](https://stackoverflow.com/questions/201816/how-to-set-a-long-java-classpath-in-windows)

Author: Sanil15 <sanil.jain15@gmail.com>

Reviewers: Boris Shkolnik <boryas@apache.org>, svenkataraman@linkedin.com

Closes #824 from Sanil15/SAMZA-1817-pathing

2 weeks agoSamzaSQL: Documentation Enhancement (fix broken same-page links)
Shenoda Guirguis [Fri, 4 Jan 2019 23:44:19 +0000 (15:44 -0800)] 
SamzaSQL: Documentation Enhancement (fix broken same-page links)

Author: Shenoda Guirguis <sguirguis@linkedin.com>

Reviewers: atoomula

Closes #870 from shenodaguirguis/docfix

2 weeks agoSAMZA-2047: Fixing udfs that return samzasqlrelrecord type and other such types to...
Aditya Toomula [Wed, 2 Jan 2019 21:35:48 +0000 (13:35 -0800)] 
SAMZA-2047: Fixing udfs that return samzasqlrelrecord type and other such types to work with joins.

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: vjagadish1989,weiqingy

Closes #866 from atoomula/joinudf

2 weeks agoDocumentation fixes for Samza SQL
Shenoda Guirguis [Wed, 2 Jan 2019 21:33:24 +0000 (13:33 -0800)] 
Documentation fixes for Samza SQL

Author: Shenoda Guirguis <sguirguis@linkedin.com>

Reviewers: atoomula,vjagadish1989

Closes #865 from shenodaguirguis/docfix

4 weeks agoSAMZA-1985: Startpoint and StartpointManager implementation.
Daniel Nishimura [Fri, 21 Dec 2018 22:11:07 +0000 (14:11 -0800)] 
SAMZA-1985: Startpoint and StartpointManager implementation.

This is the first PR for [SEP-18](https://cwiki.apache.org/confluence/display/SAMZA/SEP-18%3A+Startpoints+-+Manipulating+Starting+Offsets+for+Input+Streams). Please see updated SEP-18 for details.
This PR implements the StartpointManager and Startpoint model and the initial integration with the OffsetManager. The OffsetManager manages the deletion of Startpoints when the initial checkpoint commits happen per task after start-up.

The immediate follow-ons to this PR are:
1. Have the various `JobCoordinators` to re-map the Startpoints appropriately to each task by utilizing the `StartpointManager#groupStartpointsPerTask(SystemStreamPartition, JobModel)` method implemented in this PR. SEP-18 describes this in more detail.
2. Implement `StartpointConsumerVisitor` for each of the provided `SystemConsumer`s.

Author: Daniel Nishimura <dnishimura@linkedin.com>

Reviewers: Jake Maes<jmaes@linkedin.com>, Cameron L<calee@linkedin.com>

Closes #860 from dnishimura/samza-1985-startpoint-manager

4 weeks agoSAMZA-2048: Add guide to run Beam wordcount example
Xinyu Liu [Fri, 21 Dec 2018 18:54:14 +0000 (10:54 -0800)] 
SAMZA-2048: Add guide to run Beam wordcount example

Use the maven archetype to generate the example project for beam wordcount examples. Add the steps to set it up and run the examples.

Author: xiliu <xiliu@linkedin.com>

Reviewers: Hai Lu <lhaiesp@apache.org>

Closes #867 from xinyuiscool/SAMZA-2048

4 weeks agoSAMZA-2018: State restore improvements using RocksDB writebatch API
Ray Matharu [Tue, 18 Dec 2018 21:06:04 +0000 (13:06 -0800)] 
SAMZA-2018: State restore improvements using RocksDB writebatch API

This PR enables the RocksDbKeyValueStore to use the writeBatch API.

Author: Ray Matharu <rmatharu@linkedin.com>

Reviewers: Jacob Maes <jmakes@apache.org>, Prateek Maheshwari <pmaheshwari@apache.org>

Closes #864 from rmatharu/writebatch

5 weeks agoSAMZA-2043: Consolidate ReadableTable and ReadWriteTable
Wei Song [Mon, 17 Dec 2018 23:11:27 +0000 (15:11 -0800)] 
SAMZA-2043: Consolidate ReadableTable and ReadWriteTable

So far we've not seen a lot of use in maintaining separate implementation for ReadableTable and ReadWriteTable, which adds quite a bit complexity. Hence consolidating them.

Author: Wei Song <wsong@linkedin.com>

Reviewers: Xinyu Liu <xiliu@linkedin.com>

Closes #861 from weisong44/SAMZA-2043

5 weeks agoSAMZA-2041: add hdfs and kinesis descriptor
Hai Lu [Fri, 14 Dec 2018 22:35:45 +0000 (14:35 -0800)] 
SAMZA-2041: add hdfs and kinesis descriptor

Author: Hai Lu <halu@linkedin.com>

Reviewers: Xinyu Liu <xiliu@linkedin.com>

Closes #857 from lhaiesp/master

5 weeks agoMinor: properly close KafkaCheckpointManager in TestKafkaCheckpointManager
Cameron Lee [Wed, 12 Dec 2018 21:42:48 +0000 (13:42 -0800)] 
Minor: properly close KafkaCheckpointManager in TestKafkaCheckpointManager

Higher versions of KafkaServerTestHarness will validate that certain threads are no longer alive when the test class exits. Eventually, when kafka is upgraded, that validation would fail.

Author: Cameron Lee <calee@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #856 from cameronlee314/kcm_test

5 weeks agoMinor: Log full thread stacks in thread dumps.
Prateek Maheshwari [Wed, 12 Dec 2018 21:41:48 +0000 (13:41 -0800)] 
Minor: Log full thread stacks in thread dumps.

Author: Prateek Maheshwari <pmaheshwari@apache.org>

Reviewers: Cameron Lee <calee@linkedin.com>, Jagadish Venkatraman <vjagadish1989@gmail.com>

Closes #855 from prateekm/detailed-thread-dump

5 weeks agoUpdate website release instructions, Fix a broken link
Jagadish [Wed, 12 Dec 2018 05:00:14 +0000 (21:00 -0800)] 
Update website release instructions, Fix a broken link

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #858 from vjagadish1989/website-reorg41

5 weeks agoSAMZA-2035: Detect CachingTableDescriptor as remote table.
Aditya Toomula [Tue, 11 Dec 2018 19:16:49 +0000 (11:16 -0800)] 
SAMZA-2035: Detect CachingTableDescriptor as remote table.

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: shenodaguirguis,weiqingy

Closes #854 from atoomula/cache and squashes the following commits:

ecc3b93f [Aditya Toomula] SAMZA-2035: Detect CachingTableDescriptor as remote table.
ecc3bdba [Aditya Toomula] Detect CachingTableDescriptor as remote table.

6 weeks agoFlaky test fix
Ray Matharu [Tue, 11 Dec 2018 18:34:55 +0000 (10:34 -0800)] 
Flaky test fix

MockStorageEngine has a static list of incomingMessageEnvelopes, which was a non thread-safe ArrayList.
However in case of parallel restore (recent change), this needs to be a thread-safe list.

This causes a StorageRecoveryTool test to be flaky.

Author: Ray Matharu <rmatharu@linkedin.com>

Reviewers: Shanthoosh Venkatraman <svenkata@linkedin.com>

Closes #851 from rmatharu/flakytestfix

6 weeks agoFix more broken links
Jagadish [Tue, 11 Dec 2018 08:02:39 +0000 (00:02 -0800)] 
Fix more broken links

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #853 from vjagadish1989/website-reorg40

6 weeks agoFix links in release documentation
Jagadish [Tue, 11 Dec 2018 07:27:12 +0000 (23:27 -0800)] 
Fix links in release documentation

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #852 from vjagadish1989/website-reorg38

6 weeks agoMinor typos/reword for meetups page
Jagadish [Sun, 9 Dec 2018 05:07:41 +0000 (21:07 -0800)] 
Minor typos/reword for meetups page

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #850 from vjagadish1989/website-reorg37

6 weeks agoAdd videos and descriptions from the last Samza meet-up
Jagadish [Sun, 9 Dec 2018 04:41:23 +0000 (20:41 -0800)] 
Add videos and descriptions from the last Samza meet-up

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #849 from vjagadish1989/website-reorg36

6 weeks agoSAMZA-2028: Samza-SQL Diagnostics: add metrics to Join and Aggregate operators
Shenoda Guirguis [Fri, 7 Dec 2018 23:05:43 +0000 (15:05 -0800)] 
SAMZA-2028: Samza-SQL Diagnostics: add metrics to Join and Aggregate operators

by adding metrics to Join and Aggregate, it concludes the first phase (adding metrics) of Samza-SQL Diagnostics.

Author: Shenoda Guirguis <sguirguis@linkedin.com>

Reviewers: atoomula

Closes #848 from shenodaguirguis/joinmetrics

6 weeks agoSAMZA-1835: Consolidate all processorId generation code.
Shanthoosh Venkataraman [Fri, 7 Dec 2018 02:11:38 +0000 (18:11 -0800)] 
SAMZA-1835: Consolidate all processorId generation code.

Currently, the processorId creation function createProcessorId() is repeated in three different implementation of `JobCoordinator` viz `ZkJobCoordinator`, `PassthroughJobCoordinator`, and `AzureJobCoordinator`.  Here're the few problems that stems from this duplication.

1. `ProcessorId` is passed into the `MetricsReporterFactory` through the factory create method: `MetricsReporter getMetricsReporter(String name, String processorId, Config config);`. Custom `MetricsReporter` implementations currently use the processorId as a component in the generated metric names. Metrics reporters are instantiated from `LocalApplicationRunner` and`processorId` is currently passed in as null to `MetricsReporterFactory.getMetricsReporter`. This corrupts the generated metrics names.
2. `ZkJobCoordinator`, `ZkUtils`,  `ZkLeaderElector` and different downstream components of `LocalApplicationRunner` currently instantiate and manage their private reporters, rather than the sharing common `MetricsRegistry` managed by `LocalApplicationRunner`. Since there is no common namespace and reporter shared by downstream components of `LocalApplicationRunner`,  generating metrics dashboards for standalone is kind of a hassle.

This PR is comprised of the following changes:

1. Moved the processorId generation to `LocalApplicationRunner` and injects the generated `processorId` to all the downstream layers.
2. Deprecated the getProcessorId API in `JobCoordinator` interface.
3. Add the `processorId` and `metricsRegistry` arguments to the `getJobCoordinator` method of `JobCoordinatorFactory` interface.
4. Fixed the unit tests and added unit tests for `LocalApplicationRunner.createProcessorId`.

Author: Shanthoosh Venkataraman <svenkata@linkedin.com>
Author: Shanthoosh Venkataraman <spvenkat@usc.edu>
Author: svenkata <svenkataraman@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #844 from shanthoosh/SAMZA-1835

6 weeks agoSAMZA-2002: SamzaSQL Diagnostics: instrument rest of operators (except join & aggrega...
Shenoda Guirguis [Thu, 6 Dec 2018 18:56:38 +0000 (10:56 -0800)] 
SAMZA-2002: SamzaSQL Diagnostics: instrument rest of operators (except join & aggregate) and at Query level

Second phase of instrumenting SamzaSQL operators to add and maintain metrics. All operators, except join and aggregate, are instrumented to add Processing Time and Input Rate metrics. Whenever output rate could be different (e.g., filter operator) the output rate is also added. At query level, we have Query Latency, and input and output rates.

Author: Shenoda Guirguis <sguirguis@linkedin.com>

Reviewers: Srinivasulu Punuru <spunuru@linkedin.com>, Aditya Toomula <atoomula@linkedin.com>

Closes #831 from shenodaguirguis/addmetrics.3

6 weeks agoSAMZA-2030: Config mock
Boris S [Thu, 6 Dec 2018 17:54:55 +0000 (09:54 -0800)] 
SAMZA-2030: Config mock

Fix getOption of ScalaMapConfig to support mocking.

Author: Boris S <bshkolnik@linkedin.com>
Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Closes #847 from sborya/ConfigMock

6 weeks agoSAMZA-2012: Add API for wiring an external context through to application processing...
Cameron Lee [Thu, 6 Dec 2018 00:41:34 +0000 (16:41 -0800)] 
SAMZA-2012: Add API for wiring an external context through to application processing code

This PR also refactors TestSamzaSqlRemoteTable to be in samza-test instead of samza-sql, since it seems to actually be an integration test. It is useful to move that test in this PR so that tests that may need an external context can be consolidated.

Author: Cameron Lee <calee@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>, Shanthoosh Venkatraman <svenkatr@linkedin.com>

Closes #829 from cameronlee314/external_context

6 weeks agoSAMZA-2019: for 1 partition broadcast topic generate topic#0 config
Boris S [Wed, 5 Dec 2018 22:13:50 +0000 (14:13 -0800)] 
SAMZA-2019: for 1 partition broadcast topic generate topic#0 config

+ address few review comments

Author: Boris S <bshkolnik@linkedin.com>
Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: xiliu <xiliu@linkedin.com>

Closes #846 from sborya/isBroadcast1

6 weeks agoSAMZA-2019: add Is broadcast per stream config
Boris S [Wed, 5 Dec 2018 19:54:44 +0000 (11:54 -0800)] 
SAMZA-2019: add Is broadcast per stream config

Author: Boris S <bshkolnik@linkedin.com>
Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: xinyuiscool <xiliu@linkedin.com>, Prateek M <prateekm@apache.org>

Closes #837 from sborya/isBroadcast

6 weeks agoSAMZA-1973: Unify the TaskNameGrouper interface for yarn and standalone.
Shanthoosh Venkataraman [Wed, 5 Dec 2018 18:56:55 +0000 (10:56 -0800)] 
SAMZA-1973: Unify the TaskNameGrouper interface for yarn and standalone.

This patch consists of the following changes:
* Unify the different methods present in the TaskNameGrouper interface. This will enable us to have a single interface method usable for both the yarn and standalone models.
* Generate locationId aware task assignment to processors in standalone.
* Move the task assignment persistence logic from a custom `TaskNameGrouper` implementation to `JobModelManager`, so that this works for any kind of custom group.
* General code clean up in `JobModelManager`,  `TaskAssignmentManager` and in other samza internal classes.
* Read/write taskLocality of the processors in standalone.
* Updated the existing java docs and added java docs where they were missing.

Testing:
* Fixed the existing unit-tests due to the changes.
* Added new unit tests for the functionality changed added as a part of this patch.
* Tested this patch with a sample job from `hello-samza` project and verified that it works as expected.

Please refer to [SEP-11](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75957309) for more details.

Author: Shanthoosh Venkataraman <spvenkat@usc.edu>
Author: Shanthoosh Venkataraman <svenkata@linkedin.com>
Author: svenkata <svenkataraman@linkedin.com>

Reviewers: Prateek M<pmaheshw@linkedin.com>

Closes #790 from shanthoosh/task_name_grouper_changes

6 weeks agoSAMZA-2026: Refactor remote table API to separate retry policy settings
Wei Song [Tue, 4 Dec 2018 23:51:58 +0000 (15:51 -0800)] 
SAMZA-2026: Refactor remote table API to separate retry policy settings

As per subject, the goal is to make configuration of retry policies consistent with other API's.

Author: Wei Song <wsong@linkedin.com>

Reviewers: Aditya Toomula <atoomula@linkedin.com>

Closes #842 from weisong44/SAMZA-2026

6 weeks agoSAMZA-2021: Adding an API to rel converter to filter out system messages.
Aditya Toomula [Tue, 4 Dec 2018 23:50:07 +0000 (15:50 -0800)] 
SAMZA-2021: Adding an API to rel converter to filter out system messages.

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: srinipunuru

Closes #839 from atoomula/system and squashes the following commits:

0dcba87b [Aditya Toomula] Adding an API to rel converter to filter out system messages.
2bee3ba4 [Aditya Toomula] Adding an API to rel converter to filter out system messages.

6 weeks agoSAMZA-2025: InputOperatorImpl should work with filtering InputTransformer
Deepthi Sridharan [Tue, 4 Dec 2018 23:47:25 +0000 (15:47 -0800)] 
SAMZA-2025: InputOperatorImpl should work with filtering InputTransformer

InputOperatorImpl should handle the case where InputTransformer returns null record. It makes having simple filtering operation as part of the transformer easy.

Author: Deepthi Sridharan <desridharan@linkedin.com>

Reviewers: atoomula, prateekm

Closes #841 from DEEPTHIKORAT/tranformer

6 weeks agoMinor fix to some config variable names and accessor methods.
Prateek Maheshwari [Tue, 4 Dec 2018 22:08:57 +0000 (14:08 -0800)] 
Minor fix to some config variable names and accessor methods.

Author: Prateek Maheshwari <pmaheshwari@apache.org>

Reviewers: Jagadish<jagadish@apache.org>

Closes #840 from prateekm/fix-config-names

6 weeks agoSAMZA-1989: SystemStreamGrouper interface change for SEP-5
Shanthoosh Venkataraman [Tue, 4 Dec 2018 21:53:56 +0000 (13:53 -0800)] 
SAMZA-1989: SystemStreamGrouper interface change for SEP-5

Samza users may need to increase the partition count of the input streams of their stateful samza jobs. For example, Kafka needs to limit the maximum size of each partition to scale up its performance. Thus the number of partitions of a Kafka topic needs to be expanded to reduce the partition size if the average byte-in-rate or retention time of the Kafka topic has doubled.

In order to perform a join between streams, stateful jobs generally have to route the partitions from the different input streams to same task of a container. However, when a input stream repartitioning happens, key space of a partition gets redistributed. This will make the stateful jobs to produce erroneous results.

So if the partition count of input stream is increased then the users have to manually purge the changelog topics, local RocksDb state of their stateful jobs. This  results in an increased operational complexity and data loss.

This patch takes a first stab at solving the above problem and is comprised of the following changes:

* Introduce a new group method in `SystemStreamPartitionGrouper` interface to generate task assignment factoring in the partition expansion of input streams.
* Introduced a `StreamPartitionMapper` abstraction to allow the user to plugin the input stream partitioning function.
* Fixed the existing unit tests and added new unit tests to validate the new grouper changes.

In a followup PR shortly, these grouper changes would be integrated with `JobModelManager`(Waiting for PR 790 to be landed for this. It had made significant changes to `JobModelManager`)

Author: Shanthoosh Venkataraman <spvenkat@usc.edu>

Reviewers: Prateek M<pmaheshw@linkedin.com>, Ray Matharu<rmatharu@linkedin.com>, Daniel Nishimura<dnishimura@linkedin.com>

Closes #803 from shanthoosh/SEP-5

7 weeks agoMoving store test to TestContainerStorageManager from TestSamzaContainer
Ray Matharu [Tue, 4 Dec 2018 02:08:36 +0000 (18:08 -0800)] 
Moving store test to TestContainerStorageManager from TestSamzaContainer

There was a test in TestSamzaContainer that needs to be moved to TestContainerStorageManager because the restore logic is moved there.

Minor change in TestSamzaContainer and ContainerStorageManager

Author: Ray Matharu <rmatharu@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #838 from rmatharu/storeTest-fix

7 weeks agoSAMZA-2017: Update committer doc
Aditya Toomula [Sat, 1 Dec 2018 01:12:51 +0000 (17:12 -0800)] 
SAMZA-2017: Update committer doc

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: vjagadish1989

Closes #836 from atoomula/new

7 weeks agoSAMZA-2018: State restore improvements
Ray Matharu [Sat, 1 Dec 2018 01:07:23 +0000 (17:07 -0800)] 
SAMZA-2018: State restore improvements

This PR makes the following changes:

* Consumer consolidation to ensure 1 storeConsumer per system, earlier it was 1 consumer per SSP per store.
* Refactoring stores to use ContainerStorageManager with parallelization for restoration, and serial execution of sysConsumers start, stop, register, etc.

Author: Ray Matharu <rmatharu@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #823 from rmatharu/consumerConsolidate

7 weeks agoSAMZA-570: Enabling auto-discovery of regex input topics
Ray Matharu [Sat, 1 Dec 2018 01:06:30 +0000 (17:06 -0800)] 
SAMZA-570: Enabling auto-discovery of regex input topics

This PR makes the following changes

* Enriches StreamPartitionCountMonitor to periodically monitor input-regexes to match to actual inputs and stop the job when a new input stream is discovered.

* Add a new API to SysAdmin to allow listing of all streams, e.g., Kafka-topics. KafkaSysAdmin implementation of this uses KafkaConsumer's listTopics API. (Even if listTopics had 1 million topics with 100 bytes per topic total, temporary memory overhead will be 100 MB).

* Added config job.coordinator.monitor-input-regex.frequency.ms for the monitoring frequency, and job.coordinator.monitor-input-regex.%s for each input system. Users can then choose desired regex for each input system, e.g., job.coordinator.monitor-input-regex.kafka=test-.*.

* We can later enrich RegexTopicGen rewriter to add a monitor-input-regex config to allow periodic jonitoring

* Tested: Unit test for SPCM and tested with test jobs on local grid.

Author: Ray Matharu <rmatharu@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #796 from rmatharu/newtopic-test

7 weeks agoSAMZA-2014: Samza-sql: Support table as both source (for join) and destination in...
Aditya Toomula [Fri, 30 Nov 2018 21:40:05 +0000 (13:40 -0800)] 
SAMZA-2014: Samza-sql: Support table as both source (for join) and destination in the same application

While parsing queries in an application, with in SamzaSqlApplicationConfig, we collect all input sources and output sources from all queries and create descriptors for input sources first followed by output sources. But there could be only one table descriptor instance per table. Writable table is a readable table but vice versa is not true. If we go through input sources, we will end up creating readable table descriptor and would not be able to create writable table descriptor again when we go through output sources (the code will be ugly if we have to achieve this). There are couple of ways to solve this:
- Always make a table readable and writable
- Go through output sources first followed by input sources.

Choosing option 2 as making a table always read-writable does not make sense.

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Srinivasulu Punuru <spunuru@linkedin.com>

Closes #834 from atoomula/table

7 weeks agoSAMZA-2015: Refactor timer handling in tables to be consistent with stores
Wei Song [Fri, 30 Nov 2018 20:52:59 +0000 (12:52 -0800)] 
SAMZA-2015: Refactor timer handling in tables to be consistent with stores

Currently when timer is disabled, we do not instantiate timer instances for tables, this introduced potential opportunities for NPE in the future. We wanted to refactor to use the same approach used in store implementation based on HighResolutionClock.

Author: Wei Song <wsong@linkedin.com>

Reviewers: Xinyu Liu <xiliu@linkedin.com>

Closes #835 from weisong44/SAMZA-2015

7 weeks agoFix minor issue with operator spec graph traversal
Ahmed Abdul Hamid [Fri, 30 Nov 2018 18:33:35 +0000 (10:33 -0800)] 
Fix minor issue with operator spec graph traversal

Remove redundant traversal loop added by mistake during conflict merge.

Author: Ahmed Abdul Hamid <ahabdulh@ahabdulh-mn1.linkedin.biz>

Reviewers: Aditya Toomla <atoomla@linkedin.com>

Closes #833 from ahmedahamid/master

7 weeks agoSAMZA-2013: Account for cycles in graph traversal within Execution Planner
Ahmed Abdul Hamid [Fri, 30 Nov 2018 03:23:46 +0000 (19:23 -0800)] 
SAMZA-2013: Account for cycles in graph traversal within Execution Planner

Author: Ahmed Abdul Hamid <ahabdulh@ahabdulh-mn1.linkedin.biz>

Reviewers: Aditya Toomla <atoomla@linkedin.com>

Closes #832 from ahmedahamid/master

7 weeks agoSAMZA-2010: Handle null value in LocalReadWriteTable.putAll()
Wei Song [Thu, 29 Nov 2018 23:31:58 +0000 (15:31 -0800)] 
SAMZA-2010: Handle null value in LocalReadWriteTable.putAll()

To be consistent with put(), null values in input should be delete operation

Author: Wei Song <wsong@linkedin.com>

Reviewers: Ahmed Abdul Hamid <ahabdulhamid@linkedin.com>

Closes #827 from weisong44/SAMZA-2010

7 weeks agoSAMZA-1638: Recreate SystemProducer on KafkaCheckpointManager.writeCheckpoint failures.
Shanthoosh Venkataraman [Thu, 29 Nov 2018 19:53:39 +0000 (11:53 -0800)] 
SAMZA-1638: Recreate SystemProducer on KafkaCheckpointManager.writeCheckpoint failures.

Retry loop in the existing `KafkaCheckpointManager` implementation retries using the same `SystemProducer` instance on exception and does not recreate it.

When some irrecoverable exceptions occur within the `SystemProducer`, all the subsequent produce message invocations on the `SystemProducer` instance will fail. This had made the entire retry loop on `KafkaCheckpointManager` pointless.

This patch consists of the following changes:
1. This patch addresses the above problem by recreating the `SystemProducer` instance on failure and adds a unit test to verify the functionality.
2. Minor code cleanup in classes: `TestKafkaCheckpointManager` and `KafkaCheckpointManager`.

Author: Shanthoosh Venkataraman <spvenkat@usc.edu>
Author: Shanthoosh Venkataraman <svenkata@linkedin.com>

Reviewers: Dong Lin <lindong28@gmail.com>

Closes #792 from shanthoosh/kafka_checkpoint_manager_fix

7 weeks agoSAMZA-1976: MetadataStore API cleanup.
Shanthoosh Venkataraman [Thu, 29 Nov 2018 17:46:13 +0000 (09:46 -0800)] 
SAMZA-1976: MetadataStore API cleanup.

This PR consists of the following changes:
* Switching all the API methods from using byte[] array as key type to string.
* Fixed `CoordinatorMetadataStore`, `ZkMetadataStore` tests due to the type change of key.

Shortly in a followup PR,  namespace unification for different metadata stored in standalone and YARN model will be done.

Author: Shanthoosh Venkataraman <spvenkat@usc.edu>

Reviewers: Prateek <prateekm@linkedin.com>

Closes #791 from shanthoosh/metadata_store_api_cleanup

7 weeks agoSAMZA-2007: Samza-sql - Fix Samza-SQL to not expect LogicalTableModify in the Calcite...
Aditya Toomula [Wed, 28 Nov 2018 23:53:30 +0000 (15:53 -0800)] 
SAMZA-2007: Samza-sql - Fix Samza-SQL to not expect LogicalTableModify in the Calcite plan.

This is required for supporting schema evolution without failing the jobs.

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Srinivasulu Punuru <spunuru@linkedin.com>

Closes #821 from atoomula/modify and squashes the following commits:

17b4b1c1 [Aditya Toomula] SAMZA-2007: Samza-sql - Fix Samza-SQL to not expect LogicalTableModify in the Calcite plan.
65be581a [Aditya Toomula] SAMZA-2007: Samza-sql - Fix Samza-SQL to not expect LogicalTableModify in the Calcite plan.
fb50ee81 [Aditya Toomula] SAMZA-2007: Samza-sql - Fix Samza-SQL to not expect LogicalTableModify in the Calcite plan.
9fff9573 [Aditya Toomula] SAMZA-2007: Samza-sql - Fix Samza-SQL to not expect LogicalTableModify in the Calcite plan.
f3e887c6 [Aditya Toomula] dummy

7 weeks agoSAMZA-2004: Add ability to disable table metrics
Wei Song [Tue, 27 Nov 2018 18:40:34 +0000 (10:40 -0800)] 
SAMZA-2004: Add ability to disable table metrics

For jobs with very high throughput, it is desirable to disable metrics on tables. We would introduce the option to disable all metrics for a table on table descriptor.

Author: Wei Song <wsong@linkedin.com>

Reviewers: Xinyu Liu <xiliu@linkedin.com>

Closes #822 from weisong44/SAMZA-2004-2

8 weeks agoUpdate version to 1.0.0 in docs
Jagadish [Tue, 27 Nov 2018 13:41:40 +0000 (05:41 -0800)] 
Update version to 1.0.0 in docs

8 weeks agoMerge branch 'master' of https://github.com/apache/samza
Jagadish [Tue, 27 Nov 2018 12:00:16 +0000 (04:00 -0800)] 
Merge branch 'master' of https://github.com/apache/samza

8 weeks agoEnsure that DOC pages for older releases are easy to discover
Jagadish [Tue, 27 Nov 2018 11:59:13 +0000 (03:59 -0800)] 
Ensure that DOC pages for older releases are easy to discover

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #820 from vjagadish1989/website-reorg35

8 weeks agoMake older release pages better discoverable
Jagadish [Tue, 27 Nov 2018 11:57:43 +0000 (03:57 -0800)] 
Make older release pages better discoverable

8 weeks agoUse consistent font /heading sizes for all pages
Jagadish [Tue, 27 Nov 2018 11:21:25 +0000 (03:21 -0800)] 
Use consistent font /heading sizes for all pages

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #819 from vjagadish1989/website-reorg34

8 weeks agoUse consistent font /heading sizes for all pages
Jagadish [Tue, 27 Nov 2018 11:18:22 +0000 (03:18 -0800)] 
Use consistent font /heading sizes for all pages

8 weeks agoMerge branch 'master' of https://github.com/apache/samza
Jagadish [Tue, 27 Nov 2018 09:47:09 +0000 (01:47 -0800)] 
Merge branch 'master' of https://github.com/apache/samza

8 weeks agoCommit for website publish for 1.0.0
Jagadish [Tue, 27 Nov 2018 09:46:54 +0000 (01:46 -0800)] 
Commit for website publish for 1.0.0

8 weeks agoClean up docs for standalone
Jagadish [Tue, 27 Nov 2018 08:28:39 +0000 (00:28 -0800)] 
Clean up docs for standalone

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #817 from vjagadish1989/website-reorg32

8 weeks agoMerge branch 'master' of https://github.com/apache/samza
Jagadish [Tue, 27 Nov 2018 08:27:11 +0000 (00:27 -0800)] 
Merge branch 'master' of https://github.com/apache/samza

8 weeks agoClean up standalone docs
Jagadish [Tue, 27 Nov 2018 08:26:06 +0000 (00:26 -0800)] 
Clean up standalone docs

8 weeks agoSAMZA-2006: Removed config from table provider constructor
Wei Song [Mon, 26 Nov 2018 23:32:31 +0000 (15:32 -0800)] 
SAMZA-2006: Removed config from table provider constructor

With the latest API change in Samza 1.0, config can be obtained from Context object during init(), therefore we do not to pass this in the constructor.

Author: Wei Song <wsong@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@linkedin.com>

Closes #816 from weisong44/SAMZA-2006

8 weeks agoClean-up open source docs for Samza SQL
Jagadish [Mon, 26 Nov 2018 22:20:24 +0000 (14:20 -0800)] 
Clean-up open source docs for Samza SQL

atoomula srinipunuru FYI..

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes #815 from vjagadish1989/website-reorg31

8 weeks agoClean up open-source documentation for Samza SQL
Jagadish [Mon, 26 Nov 2018 22:18:01 +0000 (14:18 -0800)] 
Clean up open-source documentation for Samza SQL

8 weeks agoMerge branch 'master' of https://github.com/apache/samza
Jagadish [Mon, 26 Nov 2018 22:17:14 +0000 (14:17 -0800)] 
Merge branch 'master' of https://github.com/apache/samza

2 months agoSAMZA-1998: Table API refactoring
Wei Song [Wed, 21 Nov 2018 01:22:18 +0000 (17:22 -0800)] 
SAMZA-1998: Table API refactoring

Table API refactoring
     - Removed TableSpec
     - Consolidated configuration generation for tables to table descriptors
     - Refactored constructor so that only local table would require serde's
     - Removed table provider for RocksDB- and in-memory tables, and added LocalTableProvider
     - Updates to unit tests
     - Various refactoring

Author: Wei Song <wsong@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@linkedin.com>

Closes #807 from weisong44/SAMZA-1998

2 months agoSAMZA-1997: Samza-sql diagnostics - instrument project operator
Shenoda Guirguis [Tue, 20 Nov 2018 22:05:47 +0000 (14:05 -0800)] 
SAMZA-1997: Samza-sql diagnostics - instrument project operator

When the user uses Samza-SQL, they use high level declarative language (SQL) for ease and speed of implementation of their Samza Job. Therefore, monitoring the job should provide metrics at this high/logical level. This is the goal of the Samza-SQL diagnostics project. In this first baby-step, we start with instrumenting the Project operator to provide run-time metrics.

Author: Shenoda Guirguis <sguirgui@sguirgui-ld2.linkedin.biz>

Reviewers: Srinivasulu Punuru <spunuru@linkedin.com>, Aditya Toomula <atoomula@linkedin.com>

Closes #806 from shenodaguirguis/samza-sql-diagnostics

2 months agoSAMZA-2001: Samza-sql: Handle null records in rel converter and in joins
Aditya Toomula [Tue, 20 Nov 2018 21:18:57 +0000 (13:18 -0800)] 
SAMZA-2001: Samza-sql: Handle null records in rel converter and in joins

* Synced some AvroRelConverter fixes from linkedin version.
* Null value handling in AvroRelConverter and Join function.
* Null value handling in Table API StreamTableJoinOperatorImpl class.

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Srinivasulu Punuru <spunuru@linkedin.com>

Closes #812 from atoomula/nulljoins

2 months agoSAMZA-1994: Table API: Add missed key lookups metric for table reads
Aditya Toomula [Tue, 20 Nov 2018 21:02:13 +0000 (13:02 -0800)] 
SAMZA-1994: Table API: Add missed key lookups metric for table reads

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Srinivasulu Punuru <spunuru@linkedin.com>, Wei Song <wsong@linkedin.com>

Closes #811 from atoomula/metric

2 months agoSAMZA-1707: Samza onTimer method triggering before init
Xinyu Liu [Tue, 20 Nov 2018 20:31:56 +0000 (12:31 -0800)] 
SAMZA-1707: Samza onTimer method triggering before init

Currently there was a bug when registering a timer with a very short amount of delay, it might not be invoked since it depends on the creation of the run loop. This patch fixed the problem by double checking the ready timers when run loop is created (listener is registered.)

Author: xinyuiscool <xiliu@linkedin.com>

Reviewers: Prateek M <prateekm@apache.org>

Closes #810 from xinyuiscool/SAMZA-1707

2 months agoSAMZA-1999: Fix NullPointerException when sink is log.outputstream
Weiqing Yang [Mon, 19 Nov 2018 17:59:23 +0000 (09:59 -0800)] 
SAMZA-1999: Fix NullPointerException when sink is log.outputstream

## What changes were proposed in this pull request?
The PR is to fix a bug which throws NullPointerException when sink is log.outputstream

## How was this patch tested?
Pass build and current tests.
Test in Samza SQL shell.

Author: Weiqing Yang <yangweiqing001@gmail.com>

Reviewers: Srinivasulu Punuru <spunuru@linkedin.com>

Closes #808 from weiqingy/SAMZA-1999

2 months agoSAMZA-2000: update contributor page
Hai Lu [Fri, 16 Nov 2018 22:13:03 +0000 (14:13 -0800)] 
SAMZA-2000: update contributor page

Trivial commit to test committer workflow

Author: Hai Lu <halu@linkedin.com>

Reviewers: xiliu <xiliu@apache.org>

Closes #809 from lhaiesp/master

2 months agoSAMZA-1972: Make Operator Timer metrics calculation configurable
xinyuiscool [Fri, 16 Nov 2018 00:01:53 +0000 (16:01 -0800)] 
SAMZA-1972: Make Operator Timer metrics calculation configurable

This patch introduces two changes:
1. Make the timer metrics in OperatorImpl to be optional, and disabled by default. Adding TimerMetrics has quite a big performance impact which affects jobs with large number of operators, so it should be turned on for debugging only.
2. Register operator-level metrics on the container metrics registry. The task level registry has too many metrics which are usually ignored by the users. Having it in the container level will reduce the total amount of metrics published as well as the memory footprint.

Tested by hello-samza and works as expected.

Author: xinyuiscool <xiliu@linkedin.com>

Reviewers: Jagadish V <vjagadish1989@gmail.com>

Closes #805 from xinyuiscool/SAMZA-1972

2 months agoMerge branch 'master' of https://github.com/apache/samza
Jagadish [Wed, 14 Nov 2018 03:17:26 +0000 (19:17 -0800)] 
Merge branch 'master' of https://github.com/apache/samza

2 months agoUpdated API documentation for high and low level APIs.
Prateek Maheshwari [Wed, 14 Nov 2018 02:17:17 +0000 (18:17 -0800)] 
Updated API documentation for high and low level APIs.

vjagadish1989 nickpan47 Please take a look.

Author: Prateek Maheshwari <pmaheshwari@apache.org>

Reviewers: Jagadish<jagadish@apache.org>

Closes #802 from prateekm/api-docs

2 months agoSAMZA-1986: Samza-sql: Use system name along with stream name for streamId
Aditya Toomula [Mon, 12 Nov 2018 19:42:51 +0000 (11:42 -0800)] 
SAMZA-1986: Samza-sql: Use system name along with stream name for streamId

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Srinivasulu Punuru <spunuru@linkedin.com>

Closes #800 from atoomula/streamid

2 months agoSAMZA-1988: Properly suffix modules with direct Scala dependencies with the Scala...
Daniel Nishimura [Mon, 12 Nov 2018 19:11:28 +0000 (11:11 -0800)] 
SAMZA-1988: Properly suffix modules with direct Scala dependencies with the Scala version.

List of modules without a Scala version suffix that have direct Scala dependencies and the direct Scala API calls in each module are in the JIRA ticket: https://issues.apache.org/jira/browse/SAMZA-1988

Author: Daniel Nishimura <dnishimura@linkedin.com>

Reviewers: Sanil Jain <snjain@linkedin.com>

Closes #801 from dnishimura/samza-1988-scala-version-suffixes

2 months agoSAMZA-1978: Use samza offset reset value in kafka consumer
Boris S [Sat, 10 Nov 2018 00:23:05 +0000 (16:23 -0800)] 
SAMZA-1978: Use samza offset reset value in kafka consumer

Author: Boris S <bshkolnik@linkedin.com>
Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@apache.org>

Closes #753 from sborya/UseSamazResetInKafka

2 months agoSAMZA-1616: Samza-Sql - Support remote table for stream-table join
Aditya Toomula [Fri, 9 Nov 2018 16:30:59 +0000 (08:30 -0800)] 
SAMZA-1616: Samza-Sql - Support remote table for stream-table join

Author: Aditya Toomula <atoomula@linkedin.com>

Reviewers: Srinivasulu Punuru <spunuru@linkedin.com>

Closes #794 from atoomula/remote

2 months agoSAMZA-1981: Consolidate table descriptors to samza-api
Wei Song [Thu, 8 Nov 2018 22:04:28 +0000 (14:04 -0800)] 
SAMZA-1981: Consolidate table descriptors to samza-api

As per subject, table descriptors moved are
 - LocalTableDescriptor
 - RemoteTableDescriptor
 - HybridTableDescriptor
 - GuavaCacheTableDescriptor
 - CachingTableDescriptor

Author: Wei Song <wsong@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@linkedin.com>

Closes #799 from weisong44/SAMZA-1981

2 months agoSAMZA-1980: Rename LocalStoreBackedTable to LocalTable
Wei Song [Wed, 7 Nov 2018 22:46:17 +0000 (14:46 -0800)] 
SAMZA-1980: Rename LocalStoreBackedTable to LocalTable

As per subject, this is to keep naming of local tables consistent with other table types.

Author: Wei Song <wsong@linkedin.com>

Reviewers: Prateek Maheshwari <pmaheshwari@linkedin.com>

Closes #798 from weisong44/SAMZA-1980 and squashes the following commits:

51c17ff4 [Wei Song] Renamed LocalStoreBackedTable to LocalTable
9c121207 [Wei Song] Merge remote-tracking branch 'upstream/master'
89bfc14c [Wei Song] Merge remote-tracking branch 'upstream/master'
a53e5628 [Wei Song] SAMZA-1964 Make getTableSpec() in RemoteTableDescriptor reentrant
c9e8bf7c [Wei Song] Merge remote-tracking branch 'upstream/master'
7c777fec [Wei Song] Merge remote-tracking branch 'upstream/master'
a06e8ec2 [Wei Song] Merge remote-tracking branch 'upstream/master'
2c679c39 [Wei Song] Merge remote-tracking branch 'upstream/master'
a56c28dc [Wei Song] Merge remote-tracking branch 'upstream/master'
097958c8 [Wei Song] Merge remote-tracking branch 'upstream/master'
05822f0a [Wei Song] Merge remote-tracking branch 'upstream/master'
f7480505 [Wei Song] Merge remote-tracking branch 'upstream/master'
7706ab1f [Wei Song] Merge remote-tracking branch 'upstream/master'
f5731b10 [Wei Song] Merge remote-tracking branch 'upstream/master'
1e5de45a [Wei Song] Merge remote-tracking branch 'upstream/master'
c85604e0 [Wei Song] Merge remote-tracking branch 'upstream/master'
242d8442 [Wei Song] Merge remote-tracking branch 'upstream/master'
ec7d8409 [Wei Song] Merge remote-tracking branch 'upstream/master'
e19b4dc9 [Wei Song] Merge remote-tracking branch 'upstream/master'
8ee78441 [Wei Song] Merge remote-tracking branch 'upstream/master'
1c6a2eae [Wei Song] Merge remote-tracking branch 'upstream/master'
a6c94add [Wei Song] Merge remote-tracking branch 'upstream/master'
41299b5b [Wei Song] Merge remote-tracking branch 'upstream/master'
239a0950 [Wei Song] Merge remote-tracking branch 'upstream/master'
eca00204 [Wei Song] Merge remote-tracking branch 'upstream/master'
51562391 [Wei Song] Merge remote-tracking branch 'upstream/master'
de708f5e [Wei Song] Merge remote-tracking branch 'upstream/master'
df2f8d7b [Wei Song] Merge remote-tracking branch 'upstream/master'
f28b491d [Wei Song] Merge remote-tracking branch 'upstream/master'
4782c61d [Wei Song] Merge remote-tracking branch 'upstream/master'
0440f75f [Wei Song] Merge remote-tracking branch 'upstream/master'
aae0f380 [Wei Song] Merge remote-tracking branch 'upstream/master'
a15a7c9a [Wei Song] Merge remote-tracking branch 'upstream/master'
5cbf9af9 [Wei Song] Merge remote-tracking branch 'upstream/master'
3f7ed71f [Wei Song] Added self to committer list

2 months agoSAMZA-1921: upgrade to use the latest java AdminClient.
Boris S [Wed, 7 Nov 2018 22:39:59 +0000 (14:39 -0800)] 
SAMZA-1921: upgrade to use the latest java AdminClient.

In this PR, I've refactored create/clear streams methods.

Author: Boris S <bshkolnik@linkedin.com>
Author: Boris S <boryas@apache.org>
Author: Boris Shkolnik <bshkolni@linkedin.com>
Author: svenkata <svenkataraman@linkedin.com>

Reviewers: Shanthoosh Venkataraman <svenkataraman@linkedin.com>

Closes #789 from sborya/JavaAdminClient

2 months agoCleanup docs for HDFS connector
Jagadish [Sat, 3 Nov 2018 00:35:20 +0000 (17:35 -0700)] 
Cleanup docs for HDFS connector

Author: Jagadish <jvenkatraman@linkedin.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #793 from vjagadish1989/website-reorg30

2 months agoCleanup docs for HDFS connector
Jagadish [Sat, 3 Nov 2018 00:33:26 +0000 (17:33 -0700)] 
Cleanup docs for HDFS connector

2 months agoSAMZA-1952: StreamPartitionCountMonitor for standalone.
Shanthoosh Venkataraman [Fri, 2 Nov 2018 16:38:30 +0000 (09:38 -0700)] 
SAMZA-1952: StreamPartitionCountMonitor for standalone.

This patch adds the capability to detect the partition change of the input streams of a stateless standalone jobs and trigger a re-balancing phase(which will essentially account for new partitions from input stream and distribute it to the live processors of the group).

Existing partition count detection of input streams is broken in yarn for stateful jobs. This will be addressed for both yarn and standalone as a part of #622

Author: Shanthoosh Venkataraman <spvenkat@usc.edu>

Reviewers: Boris Shkolnik <boryas@apache.org>

Closes #726 from shanthoosh/stream_partition_count_monitor_for_standalone

2 months agoMerge branch 'master' of https://github.com/apache/samza
Jagadish [Thu, 1 Nov 2018 22:38:32 +0000 (15:38 -0700)] 
Merge branch 'master' of https://github.com/apache/samza

2 months agoMerge branch 'master' of https://github.com/apache/samza
Jagadish [Wed, 31 Oct 2018 21:22:21 +0000 (14:22 -0700)] 
Merge branch 'master' of https://github.com/apache/samza

2 months agoy
Boris S [Wed, 31 Oct 2018 21:22:04 +0000 (14:22 -0700)] 
y

Author: Boris S <boryas@apache.org>
Author: Boris S <bshkolnik@linkedin.com>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Ray Matharu <rmatharu@linkedin.com>

Closes #779 from sborya/RemoveGetKafkaSystemConsumerConfig

2 months agoSAMZA-1943 Remove ExtendedSystemAdmin and deprecated getNewestOffsets method.
Boris S [Wed, 31 Oct 2018 20:45:28 +0000 (13:45 -0700)] 
SAMZA-1943 Remove ExtendedSystemAdmin and deprecated getNewestOffsets method.

Author: Boris S <boryas@apache.org>
Author: Boris S <bshkolnik@linkedin.com>
Author: Boris Shkolnik <bshkolni@linkedin.com>

Reviewers: Bharath Kumarasubramanian <bkumarasubramanian@linkedin.com>

Closes #782 from sborya/removeExtendedSystemAdmin

2 months agoJavadoc cleanup for new Application, Descriptor, Context and Table APIs - Part 2
Prateek Maheshwari [Wed, 31 Oct 2018 20:36:54 +0000 (13:36 -0700)] 
Javadoc cleanup for new Application, Descriptor, Context and Table APIs - Part 2

Currently, we don't allow imports for use only in javadocs. This requires using FQNs in link tags, which is not very readable. Checkstyle's UnusedImport rule has an option to allow imports for use in javadoc comments (processJavadocs=true, should be read as "check javadocs for import usage == true").

AFAICT, there's no good way to change the check's properties within a submodule. This PR adds both versions (strict and relaxed) to the checkstyle, and disables the strict validation for samza-api only.

This PR also updates the javadocs to use the class names with imports.

Author: Prateek Maheshwari <pmaheshwari@apache.org>

Reviewers: Cameron Lee <calee@linkedin.com>

Closes #760 from prateekm/javadoc-cleanup

2 months agoSAMZA-1970: Support for physical names in InMemorySystem
Sanil15 [Wed, 31 Oct 2018 19:41:40 +0000 (12:41 -0700)] 
SAMZA-1970: Support for physical names in InMemorySystem

if super is not there, java compiles this to this.withPhysicalName which results in StackOverflows

Author: Sanil15 <sanil.jain15@gmail.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #788 from Sanil15/SAMZA-1970-edit

2 months agoSAMZA-1971: Fix NPE in partition key computation for InMemorySystemProducer
bharathkk [Wed, 31 Oct 2018 19:41:20 +0000 (12:41 -0700)] 
SAMZA-1971: Fix NPE in partition key computation for InMemorySystemProducer

Author: bharathkk <codin.martial@gmail.com>

Reviewers: Jagadish<jagadish@apache.org>

Closes #786 from bharathkk/fix-inmemory-partitionkey-npe