[SPARK-22389][SQL] data source v2 partitioning reporting interface
authorWenchen Fan <wenchen@databricks.com>
Mon, 22 Jan 2018 23:21:09 +0000 (15:21 -0800)
committergatorsmile <gatorsmile@gmail.com>
Mon, 22 Jan 2018 23:21:09 +0000 (15:21 -0800)
commit51eb750263dd710434ddb60311571fa3dcec66eb
treed62945bf6f5fc9ae41ccab5020c061fda7b28f4e
parent76b8b840ddc951ee6203f9cccd2c2b9671c1b5e8
[SPARK-22389][SQL] data source v2 partitioning reporting interface

## What changes were proposed in this pull request?

a new interface which allows data source to report partitioning and avoid shuffle at Spark side.

The design is pretty like the internal distribution/partitioing framework. Spark defines a `Distribution` interfaces and several concrete implementations, and ask the data source to report a `Partitioning`, the `Partitioning` should tell Spark if it can satisfy a `Distribution` or not.

## How was this patch tested?

new test

Author: Wenchen Fan <wenchen@databricks.com>

Closes #20201 from cloud-fan/partition-reporting.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java [new file with mode: 0644]
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Distribution.java [new file with mode: 0644]
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Partitioning.java [new file with mode: 0644]
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsReportPartitioning.java [new file with mode: 0644]
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourcePartitioning.scala [new file with mode: 0644]
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
sql/core/src/test/java/test/org/apache/spark/sql/sources/v2/JavaPartitionAwareDataSource.java [new file with mode: 0644]
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala