3 weeks agoMerge pull request #606 from apache/dependabot/maven/service/org.postgresql-postgresq... master
Chitral Verma [Tue, 6 Sep 2022 03:53:42 +0000 (05:53 +0200)] 
Merge pull request #606 from apache/dependabot/maven/service/org.postgresql-postgresql-42.4.1

Bump postgresql from 42.3.1 to 42.4.1 in /service

3 weeks agoMerge pull request #607 from apache/dependabot/maven/measure/org.postgresql-postgresq...
Chitral Verma [Tue, 6 Sep 2022 03:53:09 +0000 (05:53 +0200)] 
Merge pull request #607 from apache/dependabot/maven/measure/org.postgresql-postgresql-42.4.1

Bump postgresql from 42.3.1 to 42.4.1 in /measure

7 weeks agoBump postgresql from 42.3.1 to 42.4.1 in /measure 607/head
dependabot[bot] [Sat, 6 Aug 2022 05:55:06 +0000 (05:55 +0000)] 
Bump postgresql from 42.3.1 to 42.4.1 in /measure

Bumps [postgresql]( from 42.3.1 to 42.4.1.
- [Release notes](
- [Changelog](
- [Commits](

- dependency-name: org.postgresql:postgresql
  dependency-type: direct:production

Signed-off-by: dependabot[bot] <>
7 weeks agoBump postgresql from 42.3.1 to 42.4.1 in /service 606/head
dependabot[bot] [Sat, 6 Aug 2022 05:53:24 +0000 (05:53 +0000)] 
Bump postgresql from 42.3.1 to 42.4.1 in /service

Bumps [postgresql]( from 42.3.1 to 42.4.1.
- [Release notes](
- [Changelog](
- [Commits](

- dependency-name: org.postgresql:postgresql
  dependency-type: direct:production

Signed-off-by: dependabot[bot] <>
8 months ago[GRIFFIN-362] Add postgresql and oracle driver into dependencies
lipzhu [Mon, 24 Jan 2022 05:21:04 +0000 (10:51 +0530)] 
[GRIFFIN-362] Add postgresql and oracle driver into dependencies

**What changes were proposed in this pull request?**
1. Add Oracle and postgresql JDBC driver into dependencies in measure module due user report
2. Update postgresql jdbc driver version to the latest in service module.

**Does this PR introduce any user-facing change?**

How was this patch tested?
Unit Tests

Closes #597 from lipzhu/GRIFFIN-362.

Authored-by: lipzhu <>
Signed-off-by: chitralverma <>
8 months ago[GRIFFIN-369] Bug fix for avro format in Spark 2.3.x environment
Lipeng Zhu [Mon, 24 Jan 2022 05:06:22 +0000 (10:36 +0530)] 
[GRIFFIN-369] Bug fix for avro format in Spark 2.3.x environment

**What changes were proposed in this pull request?**
Built in Avro format is released in Spark 2.4.0,
For Griffin, we still need to convert the Avro to com.databricks.spark.avro in Spark 2.3.x environment.

**Does this PR introduce any user-facing change?**

**How was this patch tested?**
Unit Tests

Closes #598 from lipzhu/GRIFFIN-369.

Lead-authored-by: Lipeng Zhu <>
Co-authored-by: Chitral Verma <>
Co-authored-by: lipzhu <>
Signed-off-by: chitralverma <>
8 months ago[GRIFFIN-367] For task GRIFFIN-367, update local deploy document. (#596)
Lipeng Zhu [Thu, 20 Jan 2022 06:52:53 +0000 (14:52 +0800)] 
[GRIFFIN-367] For task GRIFFIN-367, update local deploy document. (#596)

* For task GRIFFIN-367, update local deploy document.

* Update doc.

11 months ago[GRIFFIN-365] Measure Enhancements and Stability fixes (#593)
Chitral Verma [Mon, 4 Oct 2021 15:12:14 +0000 (20:42 +0530)] 
[GRIFFIN-365] Measure Enhancements and Stability fixes (#593)

* [GRIFFIN-365] Update pom.xml with scapegoat and other changes

* [GRIFFIN-365] Remove ban on elasticsearch-spark dependency

* [GRIFFIN-365] Measure enhancements

* [GRIFFIN-365] Fix test cases

* [GRIFFIN-365] Updates to documentation and fix for breaking tests

* [GRIFFIN-365] Revert elasticsearch changes

* Update

* Update

14 months agoMerge pull request #590 from chitralverma/improve-mergepr-script
William Guo [Wed, 7 Jul 2021 11:29:49 +0000 (19:29 +0800)] 
Merge pull request #590 from chitralverma/improve-mergepr-script

[GRIFFIN-360] Improvements to

14 months agoMerge pull request #583 from chitralverma/check-stale-pr-and-issues
William Guo [Wed, 7 Jul 2021 11:29:20 +0000 (19:29 +0800)] 
Merge pull request #583 from chitralverma/check-stale-pr-and-issues

[GRIFFIN-347] Setup automated workflows for greetings and stale checks

14 months agoMerge pull request #591 from chitralverma/fix-measures
William Guo [Mon, 5 Jul 2021 12:57:43 +0000 (20:57 +0800)] 
Merge pull request #591 from chitralverma/fix-measures

[GRIFFIN-358] Rewrite the Rule/ Measure implementations

14 months ago[GRIFFIN-358] Fix import 591/head
chitralverma [Wed, 30 Jun 2021 06:46:34 +0000 (12:16 +0530)] 
[GRIFFIN-358] Fix import

15 months ago[GRIFFIN-358] Added documentation for SchemaConformanceMeasure
chitralverma [Fri, 11 Jun 2021 23:08:22 +0000 (04:38 +0530)] 
[GRIFFIN-358] Added documentation for SchemaConformanceMeasure

15 months ago[GRIFFIN-358] Changed Metric output format and fixed test cases
chitralverma [Fri, 11 Jun 2021 22:04:44 +0000 (03:34 +0530)] 
[GRIFFIN-358] Changed Metric output format and fixed test cases

15 months ago[GRIFFIN-358] Added SchemaConformance measure
chitralverma [Fri, 11 Jun 2021 19:05:43 +0000 (00:35 +0530)] 
[GRIFFIN-358] Added SchemaConformance measure

15 months ago[GRIFFIN-358] Error handling and code formatting changes
chitralverma [Fri, 11 Jun 2021 15:25:39 +0000 (20:55 +0530)] 
[GRIFFIN-358] Error handling and code formatting changes

15 months ago[GRIFFIN-358] Added sampling option to ProfilingMeasure
chitralverma [Fri, 11 Jun 2021 05:55:42 +0000 (11:25 +0530)] 
[GRIFFIN-358] Added sampling option to ProfilingMeasure

15 months ago[GRIFFIN-358] Fixed breaking test case
chitralverma [Fri, 4 Jun 2021 05:19:43 +0000 (10:49 +0530)] 
[GRIFFIN-358] Fixed breaking test case

15 months ago[GRIFFIN-358] Added code documentation for all new measures.
chitralverma [Fri, 4 Jun 2021 03:26:54 +0000 (08:56 +0530)] 
[GRIFFIN-358] Added code documentation for all new measures.

15 months ago[GRIFFIN-358] Added parallelization to MeasureExecutor
chitralverma [Wed, 2 Jun 2021 22:09:03 +0000 (03:39 +0530)] 
[GRIFFIN-358] Added parallelization to MeasureExecutor

15 months ago[GRIFFIN-358] Changes structure of Measure
chitralverma [Sat, 29 May 2021 18:37:30 +0000 (00:07 +0530)] 
[GRIFFIN-358] Changes structure of Measure

15 months ago[GRIFFIN-358] Added test cases for Data pre proc
chitralverma [Sat, 29 May 2021 18:22:49 +0000 (23:52 +0530)] 
[GRIFFIN-358] Added test cases for Data pre proc

16 months ago[GRIFFIN-358] Updated Configurations for pre proc and batch all measures
chitralverma [Fri, 28 May 2021 20:54:40 +0000 (02:24 +0530)] 
[GRIFFIN-358] Updated Configurations for pre proc and batch all measures

16 months ago[GRIFFIN-358] Allow users to run old "evaluate.rule" configs as well
chitralverma [Fri, 28 May 2021 20:46:26 +0000 (02:16 +0530)] 
[GRIFFIN-358] Allow users to run old "evaluate.rule" configs as well

16 months ago[GRIFFIN-358] Added accuracy measure configuration guide.
chitralverma [Fri, 28 May 2021 19:57:28 +0000 (01:27 +0530)] 
[GRIFFIN-358] Added accuracy measure configuration guide.

16 months ago[GRIFFIN-358] Changed 'target' to 'ref' to clear terminology
chitralverma [Fri, 28 May 2021 02:05:21 +0000 (07:35 +0530)] 
[GRIFFIN-358] Changed 'target' to 'ref' to clear terminology

16 months ago[GRIFFIN-358] Added profiling measure configuration guide.
chitralverma [Sun, 2 May 2021 21:33:52 +0000 (03:03 +0530)] 
[GRIFFIN-358] Added profiling measure configuration guide.

16 months ago[GRIFFIN-358] Added measure configuration guide for duplication and sparkSql measures.
chitralverma [Sun, 2 May 2021 17:39:43 +0000 (23:09 +0530)] 
[GRIFFIN-358] Added measure configuration guide for duplication and sparkSql measures.

16 months ago[GRIFFIN-358] Update Duplication Measure to exclude null values
chitralverma [Sun, 2 May 2021 17:04:07 +0000 (22:34 +0530)] 
[GRIFFIN-358] Update Duplication Measure to exclude null values

16 months ago[GRIFFIN-358] Added general documentation for new dimensions/ measures and completene...
chitralverma [Sun, 2 May 2021 13:03:55 +0000 (18:33 +0530)] 
[GRIFFIN-358] Added general documentation for new dimensions/ measures and completeness measure configuration guide.

17 months ago[GRIFFIN-347] Removed automation for issues as that's handled by Jira 583/head
chitralverma [Wed, 21 Apr 2021 17:00:45 +0000 (22:30 +0530)] 
[GRIFFIN-347] Removed automation for issues as that's handled by Jira

17 months ago[GRIFFIN-347] Updated with master
chitralverma [Wed, 21 Apr 2021 14:56:48 +0000 (20:26 +0530)] 
[GRIFFIN-347] Updated with master

17 months ago[GRIFFIN-360] Improvements to 590/head
chitralverma [Wed, 21 Apr 2021 14:41:27 +0000 (20:11 +0530)] 
[GRIFFIN-360] Improvements to

17 months ago[GRIFFIN-358] Fixed breaking test cases
chitralverma [Wed, 7 Apr 2021 10:16:59 +0000 (15:46 +0530)] 
[GRIFFIN-358] Fixed breaking test cases

17 months ago[GRIFFIN-358] Fixed formatting
chitralverma [Mon, 5 Apr 2021 17:06:35 +0000 (22:36 +0530)] 
[GRIFFIN-358] Fixed formatting

17 months ago[GRIFFIN-358] Added ProfilingMeasureTest
chitralverma [Mon, 5 Apr 2021 11:37:00 +0000 (17:07 +0530)] 
[GRIFFIN-358] Added ProfilingMeasureTest

17 months ago[GRIFFIN-358] Added DuplicationMeasureTest
chitralverma [Mon, 5 Apr 2021 05:26:41 +0000 (10:56 +0530)] 
[GRIFFIN-358] Added DuplicationMeasureTest

17 months ago[GRIFFIN-358] Added SparkSqlMeasureTest
chitralverma [Sun, 4 Apr 2021 17:37:48 +0000 (23:07 +0530)] 
[GRIFFIN-358] Added SparkSqlMeasureTest

17 months ago[GRIFFIN-358] Added AccuracyMeasureTest
chitralverma [Sun, 4 Apr 2021 17:04:07 +0000 (22:34 +0530)] 
[GRIFFIN-358] Added AccuracyMeasureTest

17 months ago[GRIFFIN-358] Added CompletenessMeasureTest
chitralverma [Sun, 4 Apr 2021 11:25:18 +0000 (16:55 +0530)] 
[GRIFFIN-358] Added CompletenessMeasureTest

17 months ago[GRIFFIN-358] Merge Measure constants
chitralverma [Sat, 3 Apr 2021 16:59:24 +0000 (22:29 +0530)] 
[GRIFFIN-358] Merge Measure constants

17 months ago[GRIFFIN-358] New Accuracy Measure
chitralverma [Sat, 3 Apr 2021 15:50:19 +0000 (21:20 +0530)] 
[GRIFFIN-358] New Accuracy Measure

17 months ago[GRIFFIN-358] Changes to Metric Flush process
chitralverma [Mon, 29 Mar 2021 18:37:34 +0000 (00:07 +0530)] 
[GRIFFIN-358] Changes to Metric Flush process

18 months ago[GRIFFIN-358] New Duplication (Distinctness, Uniqueness) Measure
chitralverma [Mon, 29 Mar 2021 17:38:25 +0000 (23:08 +0530)] 
[GRIFFIN-358] New Duplication (Distinctness, Uniqueness) Measure

18 months ago[GRIFFIN-358] New SparkSQL Measure
chitralverma [Sun, 28 Mar 2021 17:47:25 +0000 (23:17 +0530)] 
[GRIFFIN-358] New SparkSQL Measure

18 months ago[GRIFFIN-358] New Profiling Measure
chitralverma [Sun, 28 Mar 2021 15:49:58 +0000 (21:19 +0530)] 
[GRIFFIN-358] New Profiling Measure

18 months ago[GRIFFIN-358] Rewrite new measure hierarchy and new completeness measure
chitralverma [Wed, 24 Mar 2021 13:47:35 +0000 (19:17 +0530)] 
[GRIFFIN-358] Rewrite new measure hierarchy and new completeness measure

18 months ago[GRIFFIN-358] Rewrite dataset preprocessing as SQL Queries
chitralverma [Sun, 21 Mar 2021 05:06:08 +0000 (10:36 +0530)] 
[GRIFFIN-358] Rewrite dataset preprocessing as SQL Queries

18 months ago[GRIFFIN-345] Support cross-version compilation for Scala and Spark dependencies
chitralverma [Tue, 9 Mar 2021 09:03:02 +0000 (14:33 +0530)] 
[GRIFFIN-345] Support cross-version compilation for Scala and Spark dependencies

**What changes were proposed in this pull request?**

_This PR affects only the measure module._

In newer environments specially clouds, Griffin measure module may face compatibility issues due the old Scala and Spark versions. To remedy this following topics are covered in this ticket,

- Cross-compilation across scala major versions (2.11, 2.12)
- Update Spark Version (2.4+)
- Create maven profiles to build different scala and spark versions
- Changes to build strategy

This process is also done is apache spark to build for different versions of Scala and Hadoop.

**Does this PR introduce any user-facing change?**

**How was this patch tested?**
Via maven build process.

Closes #589 from chitralverma/cross-version-build.

Authored-by: chitralverma <>
Signed-off-by: chitralverma <>
21 months agoSupport http remote conf
yuxiaoyu [Mon, 7 Dec 2020 03:59:14 +0000 (11:59 +0800)] 
Support http remote conf

In our production practice, many Griffin jobs run on yarn in cluster mode. We upload  different conf files to the http file server and we also provide services that generate specific configurations based on different HTTP URLs.
So we supports setting HTTP URLs as conf in submitting Griffin jobs in this PR. There is no effect on JSON or File conf mode. And it works well in our production environment for a long time.

Author: yuxiaoyu <>

Closes #587 from XiaoyuBD/support_http_url_conf.

21 months agoFix doc format glitches
Eugene [Wed, 2 Dec 2020 10:14:01 +0000 (03:14 -0700)] 
Fix doc format glitches

Author: Eugene <>

Closes #588 from toyboxman/doc-pr.

22 months ago[maven-release-plugin] prepare for next development iteration griffin-0.6.0-rc1
Yuepeng [Mon, 9 Nov 2020 06:10:43 +0000 (23:10 -0700)] 
[maven-release-plugin] prepare for next development iteration

22 months ago[maven-release-plugin] prepare release griffin-0.6.0 griffin-0.6.0
Yuepeng [Mon, 9 Nov 2020 06:09:19 +0000 (23:09 -0700)] 
[maven-release-plugin] prepare release griffin-0.6.0

22 months agoChange connectors to connector for datasource
William Guo [Sun, 8 Nov 2020 08:26:32 +0000 (13:56 +0530)] 
Change connectors to connector for datasource

Closes #586 from guoyuepeng/change_connectors_to_connector_for_datasource.

Lead-authored-by: William Guo <>
Co-authored-by: deyiyao <>
Co-authored-by: ahutsunshine <>
Signed-off-by: Chitral Verma <>
23 months agoupdate angular cli version for release issue
William Guo [Mon, 26 Oct 2020 09:35:16 +0000 (17:35 +0800)] 
update angular cli version for release issue

Author: William Guo <>

Closes #585 from guoyuepeng/update_angular_cli_verion_for_release.

23 months agocompaliancy fix
William Guo [Mon, 26 Oct 2020 03:14:23 +0000 (11:14 +0800)] 
compaliancy fix

Author: William Guo <>

Closes #584 from guoyuepeng/compaliancy_fix_before_release_0.7.0.

2 years agoFor task GRIFFIN-347, Add workflows
chitralverma [Mon, 21 Sep 2020 05:05:53 +0000 (10:35 +0530)] 
For task GRIFFIN-347, Add workflows

2 years ago[GRIFFIN-SERVICE] use service tar.gz deploy and start
ambition119 [Tue, 15 Sep 2020 01:31:34 +0000 (09:31 +0800)] 
[GRIFFIN-SERVICE] use service tar.gz deploy and start

   If we use` java -jar` start service ,then It is inconvenient to modify the configuration file. Modifying the configuration file requires recompiling the jar. And it is inconvenient to stop the service.
   This PR provides `tar.gz` installation method and shell startup method,and it has been deployed and run in our production environment.

1. ls -al service-0.6.0-SNAPSHOT

2. cd service-0.6.0-SNAPSHOT/

3. ./bin/ start

4. jps
   17860 GriffinWebApplication

5. ./bin/ stop
    stopping 17860 of service ...

  If start service, we can access http://${ip}:8080

Author: ambition119 <>

Closes #582 from ambition119/service.tar.gz.

2 years ago[GRIFFIN-339] Import griffin tool for debug and run user jobs
wankunde [Mon, 17 Aug 2020 07:41:10 +0000 (13:11 +0530)] 
[GRIFFIN-339] Import griffin tool for debug and run user jobs

With Griffin tool, user can run dq jobs in command line.
This is helpful for user to debug and run user dq jobs.

Closes #581 from wankunde/measure_tools.

Authored-by: wankunde <>
Signed-off-by: chitralverma <>
2 years ago[GRIFFIN-305] Standardize sink hierarchy
chitralverma [Mon, 10 Aug 2020 02:49:42 +0000 (10:49 +0800)] 
[GRIFFIN-305] Standardize sink hierarchy

**What changes were proposed in this pull request?**
Currently, the implementation of `Sinks` in Griffin poses the below issues. This PR aims at fixing these issues.
- `Sinks` are based on the recursive MultiSink class which is a sink itself but the underlying implementation is that of a `Seq` which causes ambiguity and isn't much useful. This has been removed.
- Some unused code like `SinkContext` has been removed.
- Data is converted from the performant DataFrame to RDD while persisting in both streaming and batch pipelines. A new method `sinkBatchRecords` has been added to allow operations directly on DataFrame for batch pipelines. Streaming will still use the old implementation which will be replaced with structured streaming.
- Refactored the methods of `Sink` like changed `start`/ `finish` to `open`/ `close` and `jobName` was incorrectly passed as `metricName`.
- Presently, only one instance of a sink with a given type can be defined in the env config. This will not allow the cases where you want to configure multiple sinks of same type like HDFS or JDBC. Added sink `name` to env config which is used to define the sink that should be used in the job config also.
- Updated all sinks as per the changes above. With some additional changes to ConsoleSink

**Does this PR introduce any user-facing change?**
Yes. As mentioned above, the sink config has changed in env and job configs.

**How was this patch tested?**
Griffin test suite and additional unit test cases

Author: chitralverma <>

Closes #575 from chitralverma/standardize-sink-hierarchy.

2 years agoFix Unit Test Issue In Measure Test Case
Eugene [Sun, 21 Jun 2020 12:13:49 +0000 (05:13 -0700)] 
Fix Unit Test Issue In Measure Test Case

[GRIFFIN-329] Measure unit test cases fail on the condition of no docker image

The unit test case tries to download a ES docker image and run the following cases. If the downloading fails, some cases will abort due to exceptions. In the revision, a new flag is introduced in execution, unless the docker image is avaiable always, some cases will be excluded.

Author: Eugene <>

Closes #580 from toyboxman/Fix.

2 years ago[GRIFFIN-326] New Data Connector for Elasticsearch
chitralverma [Thu, 4 Jun 2020 11:01:34 +0000 (19:01 +0800)] 
[GRIFFIN-326] New Data Connector for Elasticsearch

**What changes were proposed in this pull request?**

This ticket proposes the following changes,
- Deprecate the current implementation in favour of the direct implementation in the official [elasticsearch-hadoop]( library.
- This library is built on DataSource API built on spark 2.2.x+ and thus brings support for filter pushdowns, column pruning, unified read and write and additional optimizations.
- Many configuration options are available for ES connectivity, [check here](
- Any filters can be applied as expressions directly on the data frame and are pushed automatically to the source.

**Does this PR introduce any user-facing change?**
Yes. As mentioned above, the old connector has been deprecated and config structure for Elasticsearch data connector has changed now.

**How was this patch tested?**
Griffin test suite and additional unit test cases

Author: chitralverma <>

Closes #569 from chitralverma/new-elastic-search-connector.

2 years agoUpgrade UI packages for jquery
DongfangLu [Wed, 13 May 2020 10:50:54 +0000 (18:50 +0800)] 
Upgrade UI packages for jquery

Upgrade jquery

Author: DongfangLu <>

Closes #571 from ludongfang/ui_package_upgrade.

2 years ago[GRIFFIN-316] Fix job exception handling
Yu [Wed, 11 Mar 2020 15:15:19 +0000 (23:15 +0800)] 
[GRIFFIN-316] Fix job exception handling

**What changes were proposed in this pull request?**

Currently we are using Try instance to represent the results of a DQ job, whether succeeded or failed. But as we are only wrapping the Boolean result by applying "Try" at the most outside level, the underlying failure would not be able to caught and it would always return "Success" even if exception got.

This is to modify all the underlying execute/doExecute methods of a DQ job, by handling exception with "Try" instances so that it could be passed properly to users when things get wrong.

**Does this PR introduce any user-facing change?**

**How was this patch tested?**
Griffin test suite.

Author: Yu <>

Closes #562 from PnPie/exception_catch.

2 years agooptimize get metric maps in 'MetricWriteStep'
yuxiaoyu [Tue, 11 Feb 2020 12:15:53 +0000 (20:15 +0800)] 
optimize get metric maps in 'MetricWriteStep'

**Why/What changes?**
In 'MetricWriteStep.getMetricMaps()' the dataframe was transformed to json rdd, and then collect, and then transformed to Seq[Map].
It's not elegant and hard to understand. More optimized way is to collect it first, and then transform it to Seq[Map] directly.

We have test it with our DQ cases. It works well.

Author: yuxiaoyu <>

Closes #566 from XiaoyuBD/optimizeMetricWriteGetMaps.

2 years ago[GRIFFIN-323] Refactor configuration Data Source Connector
chitralverma [Mon, 10 Feb 2020 09:09:29 +0000 (17:09 +0800)] 
[GRIFFIN-323] Refactor configuration Data Source Connector

**What changes were proposed in this pull request?**

This ticket proposes the following changes,

- remove 'version' from 'DataConnectorParam' as it is not being used anywhere in the codebase.
- change 'connectors' from array type to a single JSON object. Since a data source named X may only be of one type (hive, file etc), the connector field should not be an array.
- rename connectors to connector
- update existing config files and documentation for reference

**Does this PR introduce any user-facing change?**
Yes. As mentioned above, the config structure has changed now.

**How was this patch tested?**
Griffin test suite.

Author: chitralverma <>

Closes #568 from chitralverma/refactor-data-connector-config.

2 years ago[GRIFFIN-322] Add SQL mode for ES connector
yuxiaoyu [Sat, 8 Feb 2020 08:15:03 +0000 (16:15 +0800)] 
[GRIFFIN-322] Add SQL mode for ES connector

As  [GRIFFIN-322]( , we want add sql mode for es connector.

**The sql mode would more effective and user-friendly.**

Current mode config:
{   "class": "org.apache.griffin.measure.datasource.connector.batch.ElasticSearchGriffinDataConnector",
    "index": "index-xxx",
    "type": "metric",
    "host": "xxxxxxxxxx",
    "port": "xxxx",
    "fields": ["col_a", "col_b", "col_c"],
    "size": 100}

SQL mode config:
{    "class": "org.apache.griffin.measure.datasource.connector.batch.ElasticSearchGriffinDataConnector",
     "sql.mode": true,
     "host": "xxxxxxxxxx",
     "port": "xxxx",
     "sql": "select col_a, col_b, col_c from index-xx limit 100"}

Compared with current mode, SQL mode could support other types except number type.

Author: yuxiaoyu <>

Closes #567 from XiaoyuBD/enrichEsConnectorAddSqlMode.

2 years ago[GRIFFIN-317] Define guidelines for Griffin Project Improvement Proposals (GPIP)
chitralverma [Thu, 16 Jan 2020 07:20:21 +0000 (15:20 +0800)] 
[GRIFFIN-317] Define guidelines for Griffin Project Improvement Proposals (GPIP)

**What changes were proposed in this pull request?**

Taking inspiration from Apache Spark, this ticket aims to define guidelines for Griffin Project Improvement Proposals (GPIP).

The purpose of a GPIP is to inform and involve the user community in major improvements to the Apache Griffin codebase throughout the development process, to increase the likelihood that user needs are met.

A GPIP aims to discuss the design and implementation of major features and changes in a collaborative manner. These major features must not be small/ incremental/ wide-scoped, as these features can be resolved by normal Jira process.

**Does this PR introduce any user-facing change?**

**How was this patch tested?**
Not Applicable

Author: chitralverma <>

Closes #563 from chitralverma/griffin-pip-template.

2 years ago[GRIFFIN-318] Replace all YYYY with yyyy in all user guides and examples
neveljkovic [Mon, 6 Jan 2020 09:17:36 +0000 (17:17 +0800)] 
[GRIFFIN-318] Replace all YYYY with yyyy in all user guides and examples
Replace YYYY with yyyy and DD with dd

Author: neveljkovic <>

Closes #565 from neveljkovic/GRIFFIN-318.

2 years ago[GRIFFIN-319] Deprecate old Data Connectors
chitralverma [Mon, 6 Jan 2020 09:12:03 +0000 (17:12 +0800)] 
[GRIFFIN-319] Deprecate old Data Connectors

**What changes were proposed in this pull request?**

This ticket aims to inform users of the deprecated data source connectors.

Deprecated connectors:

- MySqlDataConnector in favour of JDBCBasedDataConnector
- AvroBatchDataConnector in favour of FileBasedDataConnector
- TextDirBatchDataConnector in favour of FileBasedDataConnector

The documentation is also updated corresponding to the new connectors for reference.

**Does this PR introduce any user-facing change?**

**How was this patch tested?**
Not Applicable

Author: chitralverma <>

Closes #564 from chitralverma/deprecate-old-data-connectors.

2 years ago[GRIFFIN-315] Adding JDBC based data connector
tusharpatil20 [Tue, 31 Dec 2019 02:28:00 +0000 (10:28 +0800)] 
[GRIFFIN-315] Adding JDBC based data connector

**What changes were proposed in this pull request?**

JDBC based data connector to read data from different JDBC based data sources.

**Does this PR introduce any user-facing change?**

**How was this patch tested?**
Griffin test suite.

Author: tusharpatil20 <>

Closes #561 from tusharpatil20/JDBCBased-source-connector.

2 years ago[GRIFFIN-312] Code Style Standardization
chitralverma [Wed, 25 Dec 2019 02:39:11 +0000 (10:39 +0800)] 
[GRIFFIN-312] Code Style Standardization

**What changes were proposed in this pull request?**

This PR targets the following,
- fix the various warnings during build and in source code,
- perform code formatting as per a standard style,
- fix scalastyle integration

Since ScalaStyle targets scala source code only, it should be a part of the measure module only. Current misconfiguration is also suppressing the formatting errors

Scalafmt is used for code formatting.

**Does this PR introduce any user-facing change?**

**How was this patch tested?**
Griffin test suite.

Author: chitralverma <>

Closes #560 from chitralverma/code-style-standardization.

2 years agoEnum based configs
tusharpatil [Mon, 16 Dec 2019 13:22:20 +0000 (21:22 +0800)] 
Enum based configs

**What changes were proposed in this pull request?**

All the predefined: `DQTypes, DSLTypes, FlattenType, OutputType, ProcessType, SinkType and WriteMode` are compared with config using a regex-based approach. This will make unnecessary overhead in terms of execution time and maintainability.

This PR uses a predefined enum based approach than regex-based to provide the same functionality.

**Does this PR introduce any user-facing change?**

**How was this patch tested?**
Griffin test suite.

Author: tusharpatil <tus>
Author: tusharpatil20 <>

Closes #558 from tusharpatil20/enum-based-configs.

2 years ago[GRIFFIN-310] Unified scala code style and enable scala code style checking by default
wankunde [Fri, 13 Dec 2019 13:25:14 +0000 (21:25 +0800)] 
[GRIFFIN-310] Unified scala code style and enable scala code style checking by default

Griffin has more and more contributors, so we need a unified code style and enable code style by default.

Author: wankunde <>

Closes #559 from wankunde/scalastyle.

2 years ago[GRIFFIN-301] Update custom data connector to have the same parameters as build-in...
wankunde [Sat, 30 Nov 2019 05:08:12 +0000 (13:08 +0800)] 
[GRIFFIN-301] Update custom data connector to have the same parameters as build-in data connector

Now custom data connectors have different parameters with build-in data connector, which will confuse the user.

For example :

Author: wankunde <>

Closes #556 from wankunde/custom_data_connector.

2 years ago[GRIFFIN-304] Eliminate older contexts
chitralverma [Sat, 30 Nov 2019 05:01:30 +0000 (13:01 +0800)] 
[GRIFFIN-304] Eliminate older contexts

**What changes were proposed in this pull request?**

As SparkSession is a direct replacement for SparkContext, SQLContext and HiveContext, there is no need to pass/ instantiate them. If any of the oder contexts are needed, they can be derived from SparkSession.

This issue aims to eliminate dependency on older Contexts in favour of SparkSession.

**Does this PR introduce any user-facing change?**

**How was this patch tested?**
Griffin test suite.

Author: chitralverma <>

Closes #557 from chitralverma/eliminate-older-contexts.

2 years agoBug fix for reflecting a custom sink object
wankunde [Mon, 25 Nov 2019 11:32:00 +0000 (19:32 +0800)] 
Bug fix for reflecting a custom sink object

Bug fix for reflecting a custom sink object.

Author: wankunde <>

Closes #551 from wankunde/custom_sink.

2 years ago[GRIFFIN-297] Allow support for additional file based data sources
chitralverma [Thu, 21 Nov 2019 01:26:44 +0000 (09:26 +0800)] 
[GRIFFIN-297] Allow support for additional file based data sources

**What changes were proposed in this pull request?**

The PR extends the current support beyond just Avro and Text for various file based data sources (Parquet, ORC, etc).

 - Allows users to specify additional file based data sources like Parquet, CSV, TSV, ORC etc.
 - Allows data to be read directly from stand-alone files as well as directories present in both local/ distributed file systems.
 - Allows users to specify schema directly through options (useful for CSV/ TSV types).

A sample config looks like,

  "name": "source",
  "baseline": true,
  "connectors": [
      "type": "file",
      "version": "1.7",
      "config": {
        "format": "parquet",
        "options": {
          "k1": "v1",
          "k2": "v2"
        "paths": [

**Does this PR introduce any user-facing change?**

**How was this patch tested?**
Griffin test suite. Some additional unit test has also been added.

Author: chitralverma <>

Closes #555 from chitralverma/allow_file_based_batch_connectors.

2 years ago[GRIFFIN-298] add CompletenessExpr2DQSteps test case
wankunde [Sat, 16 Nov 2019 03:11:34 +0000 (11:11 +0800)] 
[GRIFFIN-298] add CompletenessExpr2DQSteps test case

Add some test case for CompletenessExpr2DQSteps transform, in addition some small code optimization.

Author: wankunde <>

Closes #550 from wankunde/CompletenessExpr2DQSteps.

2 years ago[GRIFFIN-299] Add oracle jdk8 support in travis build phase
wankunde [Thu, 14 Nov 2019 01:07:33 +0000 (09:07 +0800)] 
[GRIFFIN-299] Add oracle jdk8 support in travis build phase

As Ubuntu Xenial has become the default Travis CI build environment, we may build fail. 

The workaround is to either add `dist: trusty` to your .travis.yml file or use `openjdk8`.

Author: wankunde <>

Closes #552 from wankunde/travis.

2 years ago[GRIFFIN-295] Limit the memory used by test case
wankunde [Fri, 1 Nov 2019 13:56:28 +0000 (21:56 +0800)] 
[GRIFFIN-295] Limit the memory used by test case

The container memory size is 3G in travis, but out test cases always uses more than 3G memory, so `Cannot allocate memory` will be thrown.

Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000fe980000, 23592960, 0) failed; error='Cannot allocate memory' (errno=12)
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 23592960 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/travis/build/apache/griffin/measure/hs_err_pid11948.log
# [ timer expired, abort... ]

There are two kind of programs in our tests, the maven main program and the tests run by maven-surefire-plugin and scalatest-maven-plugin.
If the memory is unlimited, test cases will occupy as much memory as possible  especially spark jobs.

Spark jobs will not free the memory until a full GC occurs , even if we have stopped the spark context .so we need to limit the momery used by test cases.

We can limit the maven memory used by set export MAVEN_OPTS=" -Xmx1024m -XX:ReservedCodeCacheSize=128m" , and we can limit the memory used by spark job tests by configuring the maven-surefire-plugin and scalatest-maven-plugin.

For example:
Before we limit the memory used, maven program occupy 1.5G memory and spark job occupy 1.8G memory.
<img width="1153" alt="1" src="">
<img width="1150" alt="2" src="">

After we limit the memory used, maven program occupy 1G memory and spark job occupy 1G memory.
<img width="1142" alt="3" src="">
<img width="1139" alt="4" src="">

Author: wankunde <>

Closes #546 from wankunde/testcase_memory_limit.

2 years ago[GRIFFIN-294] bugfix for completness enumeration wrong sql
Zhao Li [Fri, 1 Nov 2019 13:36:02 +0000 (21:36 +0800)] 
[GRIFFIN-294] bugfix for completness enumeration wrong sql

if there is only 'hive_none' in enumeration values list, it will generate wrong sql. Update code to fix this bug

Author: Zhao Li <>
Author: Zhao <>

Closes #545 from LittleZhao/griffin-294.

2 years agofix tests failure
ahutsunshine [Thu, 24 Oct 2019 11:30:35 +0000 (19:30 +0800)] 
fix tests failure

1. fix hive connect failure
2.improve test run time

Author: ahutsunshine <>

Closes #543 from ahutsunshine/master.

2 years agoremove -Xms500m -Xmx1g -XX:MaxPermSize=256m temporarily
William Guo [Tue, 22 Oct 2019 13:30:00 +0000 (21:30 +0800)] 
remove -Xms500m -Xmx1g -XX:MaxPermSize=256m temporarily

Author: William Guo <>

Closes #544 from guoyuepeng/fix_maven_memory_opt.

2 years ago[GRIFFIN-293][SERVICE] livy.need.queue=true
neveljkovic [Mon, 14 Oct 2019 12:14:56 +0000 (20:14 +0800)] 
[GRIFFIN-293][SERVICE] livy.need.queue=true

This is how we fixed issue described in
Solution is deployed to our servers and works OK.

Author: neveljkovic <>

Closes #541 from neveljkovic/griffin-293.

2 years ago[GRIFFIN-289] New feature for griffin COMPLETENESS dq type
‘Zhao [Thu, 10 Oct 2019 15:13:27 +0000 (23:13 +0800)] 
[GRIFFIN-289] New feature for griffin COMPLETENESS dq type

As describing in GRIFFIN-289, add two new ways to check 'incompleteness' record: regular expression and  enumeration.

Add 'error.confs' in dq json file. Each json object in 'error.confs' list means one column configuration.

If do not have 'error.confs', using old 'incompleteness' process, which is compatible for existing json file.

Add ut for the new json format.

Author: ‘Zhao <>
Author: Zhao Li <>

Closes #538 from LittleZhao/griffin-289.

3 years agoadd placeholder for cron expression
jasonliaoxiaoge [Thu, 26 Sep 2019 14:53:27 +0000 (22:53 +0800)] 
add placeholder for cron expression

add placeholder for cron expression, cause java quartz is a little difference from crontab in linux

Author: jasonliaoxiaoge <>

Closes #503 from jasonliaoxiaoge/master.

3 years ago[GRIFFIN-290] Fix bug for submitting job to livy
wankunde [Tue, 17 Sep 2019 23:29:05 +0000 (07:29 +0800)] 
[GRIFFIN-290] Fix bug for submitting job to livy

When griffin submit multiple DQ jobs to livy, the http parameter `name` is always griffin.
So livy will reject them.

job request :

[owner: null, request: [proxyUser: None, file: hdfs://nameservice-standby/user/kun.wan/measure-0.6.0-SNAPSHOT.jar,
args: {
"spark" :

Unknown macro: { "log.level" }
"sinks" : [

Unknown macro: { .... }
"griffin.checkpoint" : [ ]
"measure.type" : "griffin",
"id" : 5202,
"name" : "spu_null_check",
"owner" : "test",
"description" : "check null value for store and category",
"deleted" : false,
"timestamp" : 1568195100000,
"dq.type" : "PROFILING",
"sinks" : [ "ELASTICSEARCH", "HDFS" ],
"process.type" : "BATCH",
"rule.description" :

"data.sources" : [

Unknown macro: { .... }
"evaluate.rule" :

"measure.type" : "griffin"
},raw,raw, driverMemory: 1g, executorMemory: 6g, executorCores: 2, numExecutors: 6, queue: root.users.kun_dot_wan, name: griffin]]

livy Response :

400 Bad Request
[Date:"Thu, 12 Sep 2019 10:00:00 GMT", Content-Type:"application/json;charset=utf-8", Content-Length:"47", Server:"Jetty(9.3.24.v20180605)"]
{"msg":"Duplicate session name: Some(griffin)"}

Author: wankunde <>

Closes #534 from wankunde/livy_bug.

3 years ago[GRIFFIN-291] Relocate HttpClient code in measure jar using shade plugin
wankunde [Sat, 14 Sep 2019 07:45:39 +0000 (15:45 +0800)] 
[GRIFFIN-291] Relocate HttpClient code in measure jar using shade plugin

Now projects use different version of httpclient , and very easy to have conflicts with other components.

We can shade the httpclient sources into measure jar and the conflicts disappear.

Author: wankunde <>

Closes #535 from wankunde/httpclient.

3 years ago[GRIFFIN-288] optimize hdfs sink
wankunde [Fri, 13 Sep 2019 14:17:03 +0000 (22:17 +0800)] 
[GRIFFIN-288] optimize hdfs sink

When we sink records to hdfs , it may be OOM if the result is huge.

19/09/06 18:52:39 INFO LineBufferedStream: 19/09/06 18:52:39 ERROR sink.HdfsSink: Java heap space
19/09/06 18:52:39 INFO LineBufferedStream: java.lang.OutOfMemoryError: Java heap space
19/09/06 18:52:39 INFO LineBufferedStream:      at java.util.Arrays.copyOf(
19/09/06 18:52:39 INFO LineBufferedStream:      at java.lang.AbstractStringBuilder.ensureCapacityInternal(
19/09/06 18:52:39 INFO LineBufferedStream:      at java.lang.AbstractStringBuilder.append(
19/09/06 18:52:39 INFO LineBufferedStream:      at java.lang.StringBuilder.append(
19/09/06 18:52:39 INFO LineBufferedStream:      at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
19/09/06 18:52:39 INFO LineBufferedStream:      at scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:364)
19/09/06 18:52:39 INFO LineBufferedStream:      at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
19/09/06 18:52:39 INFO LineBufferedStream:      at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
19/09/06 18:52:39 INFO LineBufferedStream:      at scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:357)
19/09/06 18:52:39 INFO LineBufferedStream:      at scala.collection.AbstractTraversable.addString(Traversable.scala:104)
19/09/06 18:52:39 INFO LineBufferedStream:      at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:323)
19/09/06 18:52:39 INFO LineBufferedStream:      at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
19/09/06 18:52:39 INFO LineBufferedStream:      at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:325)
19/09/06 18:52:39 INFO LineBufferedStream:      at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
19/09/06 18:52:39 INFO LineBufferedStream:      at$apache$griffin$measure$sink$HdfsSink$$sinkRecords2Hdfs(HdfsSink.scala:191)
19/09/06 18:52:39 INFO LineBufferedStream:      at org.apache.griffin.measure.sink.HdfsSink.sinkRecords(HdfsSink.scala:133)
19/09/06 18:52:39 INFO LineBufferedStream:      at org.apache.griffin.measure.sink.MultiSinks$$anonfun$sinkRecords$1.apply(MultiSinks.scala:63)
19/09/06 18:52:39 INFO LineBufferedStream:      at org.apache.griffin.measure.sink.MultiSinks$$anonfun$sinkRecords$1.apply(MultiSinks.scala:61)
19/09/06 18:52:39 INFO LineBufferedStream:      at scala.collection.immutable.List.foreach(List.scala:392)
19/09/06 18:52:39 INFO LineBufferedStream:      at org.apache.griffin.measure.sink.MultiSinks.sinkRecords(MultiSinks.scala:61)
19/09/06 18:52:39 INFO LineBufferedStream:      at org.apache.griffin.measure.step.write.RecordWriteStep.execute(RecordWriteStep.scala:49)
19/09/06 18:52:39 INFO LineBufferedStream:      at org.apache.griffin.measure.step.transform.SparkSqlTransformStep.doExecute(SparkSqlTransformStep.scala:40)
19/09/06 18:52:39 INFO LineBufferedStream:      at org.apache.griffin.measure.step.transform.TransformStep$class.execute(TransformStep.scala:72)
19/09/06 18:52:39 INFO LineBufferedStream:      at org.apache.griffin.measure.step.transform.SparkSqlTransformStep.execute(SparkSqlTransformStep.scala:27)
19/09/06 18:52:39 INFO LineBufferedStream:      at org.apache.griffin.measure.step.transform.TransformStep$$anonfun$2$$anonfun$apply$1.apply$mcV$sp(TransformStep.scala:51)
19/09/06 18:52:39 INFO LineBufferedStream:      at org.apache.griffin.measure.step.transform.TransformStep$$anonfun$2$$anonfun$apply$1.apply(TransformStep.scala:50)
19/09/06 18:52:39 INFO LineBufferedStream:      at org.apache.griffin.measure.step.transform.TransformStep$$anonfun$2$$anonfun$apply$1.apply(TransformStep.scala:50)
19/09/06 18:52:39 INFO LineBufferedStream:      at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
19/09/06 18:52:39 INFO LineBufferedStream:      at scala.concurrent.impl.Future$
19/09/06 18:52:39 INFO LineBufferedStream:      at java.util.concurrent.ThreadPoolExecutor.runWorker(
19/09/06 18:52:39 INFO LineBufferedStream:      at java.util.concurrent.ThreadPoolExecutor$
19/09/06 18:52:39 INFO LineBufferedStream:      at

Author: wankunde <>

Closes #533 from wankunde/hdfssink.

3 years ago[GRIFFIN-286] Remove spark-testing-base dependency jar
wankunde [Mon, 9 Sep 2019 00:28:19 +0000 (08:28 +0800)] 
[GRIFFIN-286] Remove spark-testing-base dependency jar

Now we use spark-testing-base jar to test spark job in measure module, but this jar maybe conflict with the spark version(CDH spark version,spark AE) or scala version(few scala version with specified spark version).

So I suggest removing the dependency of this package.

Author: wankunde <>

Closes #531 from wankunde/remoteSparkTestBase.

3 years agoMade UI tests run without errors
Simon George [Wed, 4 Sep 2019 13:38:54 +0000 (21:38 +0800)] 
Made UI tests run without errors

These changes allow the UI tests to execute without errors when run using "npm test"

Author: Simon George <>

Closes #529 from simegeorge/fix-ui-tests.

3 years ago[GRIFFIN-279] Upgrade Spring boot to 2.1.7.RELEASE
Johnnie [Mon, 2 Sep 2019 23:40:33 +0000 (07:40 +0800)] 
[GRIFFIN-279] Upgrade Spring boot to 2.1.7.RELEASE

As spring boot 1.x is end of life from Aug 1st 2019, it would be great to migrate to 2.1.x.

Below is the announcement

Migrate Guide is

Author: Johnnie <>

Closes #528 from joohnnie/GRIFFIN-279.

3 years agoMerge pull request #530 from aleksgor/GRIFFIN-AGORSHKOV
Lionel Liu [Thu, 29 Aug 2019 14:13:46 +0000 (22:13 +0800)] 
Merge pull request #530 from aleksgor/GRIFFIN-AGORSHKOV

Add Mysql, Cassandra and Elasticsearch connectors.

3 years agoMerge pull request #527 from joohnnie/GRIFFIN-280
Lionel Liu [Thu, 29 Aug 2019 14:13:24 +0000 (22:13 +0800)] 
Merge pull request #527 from joohnnie/GRIFFIN-280

GRIFFIN-280 update travis config to start griffin docker container

3 years agoRemove redurant dependency. 530/head
Gorshkov Aleksey [Tue, 27 Aug 2019 13:43:20 +0000 (16:43 +0300)] 
Remove redurant dependency.

3 years agoMerge branch 'master' into GRIFFIN-AGORSHKOV
aleksgor [Tue, 27 Aug 2019 12:23:57 +0000 (15:23 +0300)] 
Merge branch 'master' into GRIFFIN-AGORSHKOV

3 years agoAdd Mysql, Cassandra and Elasticsearch connectors.
Gorshkov Aleksey [Tue, 27 Aug 2019 12:20:03 +0000 (15:20 +0300)] 
Add Mysql, Cassandra and Elasticsearch connectors.

3 years ago[GRIFFIN-283] Move sink steps into TransformStep
wankunde [Mon, 26 Aug 2019 23:42:13 +0000 (07:42 +0800)] 
[GRIFFIN-283] Move sink steps into TransformStep

Treat sink steps as a part of a transform step, so we can keep focus on transform step codes.
Also the sink steps and some other transform step could be executed concurrently.

Author: wankunde <>

Closes #526 from wankunde/sink2.

3 years agoremove unused plugin 527/head
Johnnie [Mon, 26 Aug 2019 23:41:41 +0000 (16:41 -0700)] 
remove unused plugin