[SPARK-24250][SQL] support accessing SQLConf inside tasks
authorWenchen Fan <wenchen@databricks.com>
Sat, 19 May 2018 10:51:02 +0000 (18:51 +0800)
committerWenchen Fan <wenchen@databricks.com>
Sat, 19 May 2018 10:51:02 +0000 (18:51 +0800)
commitdd37529a8dada6ed8a49b8ce50875268f6a20cba
tree450a2b1392045a477173d7ee1738a9fe742c0b6f
parent434d74e337465d77fa49ab65e2b5461e5ff7b5c7
[SPARK-24250][SQL] support accessing SQLConf inside tasks

## What changes were proposed in this pull request?

Previously in #20136 we decided to forbid tasks to access `SQLConf`, because it doesn't work and always give you the default conf value. In #21190 we fixed the check and all the places that violate it.

Currently the pattern of accessing configs at the executor side is: read the configs at the driver side, then access the variables holding the config values in the RDD closure, so that they will be serialized to the executor side. Something like
```
val someConf = conf.getXXX
child.execute().mapPartitions {
  if (someConf == ...) ...
  ...
}
```

However, this pattern is hard to apply if the config needs to be propagated via a long call stack. An example is `DataType.sameType`, and see how many changes were made in #21190 .

When it comes to code generation, it's even worse. I tried it locally and we need to change a ton of files to propagate configs to code generators.

This PR proposes to allow tasks to access `SQLConf`. The idea is, we can save all the SQL configs to job properties when an SQL execution is triggered. At executor side we rebuild the `SQLConf` from job properties.

## How was this patch tested?

a new test suite

Author: Wenchen Fan <wenchen@databricks.com>

Closes #21299 from cloud-fan/config.
core/src/main/scala/org/apache/spark/TaskContextImpl.scala
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/ReadOnlySQLConf.scala [new file with mode: 0644]
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala [new file with mode: 0644]