[SPARK-18663][SQL] Simplify CountMinSketch aggregate implementation
authorReynold Xin <rxin@databricks.com>
Fri, 2 Dec 2016 05:38:52 +0000 (21:38 -0800)
committerReynold Xin <rxin@databricks.com>
Fri, 2 Dec 2016 05:38:52 +0000 (21:38 -0800)
commitd3c90b74edecc527ee468bead41d1cca0b667668
tree1b64571522c38155e472e0da58dac55907a22225
parenta5f02b00291e0a22429a3dca81f12cf6d38fea0b
[SPARK-18663][SQL] Simplify CountMinSketch aggregate implementation

## What changes were proposed in this pull request?
SPARK-18429 introduced count-min sketch aggregate function for SQL, but the implementation and testing is more complicated than needed. This simplifies the test cases and removes support for data types that don't have clear equality semantics:

1. Removed support for floating point and decimal types.

2. Removed the heavy randomized tests. The underlying CountMinSketch implementation already had pretty good test coverage through randomized tests, and the SPARK-18429 implementation is just to add an aggregate function wrapper around CountMinSketch. There is no need for randomized tests at three different levels of the implementations.

## How was this patch tested?
A lot of the change is to simplify test cases.

Author: Reynold Xin <rxin@databricks.com>

Closes #16093 from rxin/SPARK-18663.
common/sketch/src/main/java/org/apache/spark/util/sketch/CountMinSketch.java
common/sketch/src/main/java/org/apache/spark/util/sketch/CountMinSketchImpl.java
common/sketch/src/test/scala/org/apache/spark/util/sketch/CountMinSketchSuite.scala
project/MimaExcludes.scala
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CountMinSketchAgg.scala
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentileSuite.scala
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CountMinSketchAggSuite.scala
sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
sql/core/src/test/scala/org/apache/spark/sql/CountMinSketchAggQuerySuite.scala