iceberg.git
8 hours agoEnsure the default value of hive.in.test to avoid overwriting (#5844) master
Liang-Chi Hsieh [Tue, 27 Sep 2022 22:00:22 +0000 (15:00 -0700)] 
Ensure the default value of hive.in.test to avoid overwriting (#5844)

12 hours agoDocs: Update output of expire_snapshots procedure (#5866)
Kunni [Tue, 27 Sep 2022 17:14:14 +0000 (01:14 +0800)] 
Docs: Update output of expire_snapshots procedure (#5866)

12 hours agoDocs: Make it clear metadata tables support time travel in Spark (#4709)
Mingliang Liu [Tue, 27 Sep 2022 17:11:19 +0000 (10:11 -0700)] 
Docs: Make it clear metadata tables support time travel in Spark (#4709)

14 hours agoSpark 3.2, 3.3: Fix nullability propagation for MergeRows node (#5679)
Prashant Singh [Tue, 27 Sep 2022 15:50:31 +0000 (21:20 +0530)] 
Spark 3.2, 3.3: Fix nullability propagation for MergeRows node (#5679)

18 hours agoCore: Ignore TestManifestCaching#testWeakFileIOReferenceCleanUp (#5865)
Eduard Tudenhöfner [Tue, 27 Sep 2022 11:31:20 +0000 (13:31 +0200)] 
Core: Ignore TestManifestCaching#testWeakFileIOReferenceCleanUp (#5865)

A flaky test that we'll disable for now.

Issue opened https://github.com/apache/iceberg/issues/5861

37 hours agoAPI: Support setting table statistics (#5794)
Piotr Findeisen [Mon, 26 Sep 2022 16:36:04 +0000 (18:36 +0200)] 
API: Support setting table statistics (#5794)

Implements `Transaction.updateStatistics` API.

37 hours agocore: Provide mechanism to cache manifest file content (#4518)
rizaon [Mon, 26 Sep 2022 16:10:27 +0000 (23:10 +0700)] 
core: Provide mechanism to cache manifest file content (#4518)

* Core: Add CONTENT_CACHES in ManifestFiles.java

* Fix kryo serialization failure for HadoopFileIO

* Add DEBUG log if ContentCache is not created

* Rename properties related to manifest caching

* Fix small string mistakes in testWeakFileIOReferenceCleanUp

* Clarify config documentation and change access modifier

* Fix checkstyle and catch UnsupportedOperationException

37 hours agoCore: Add strict-mode property to JDBC Catalog (#5830)
Eduard Tudenhöfner [Mon, 26 Sep 2022 16:05:13 +0000 (18:05 +0200)] 
Core: Add strict-mode property to JDBC Catalog (#5830)

3 days agoAWS: Add socket connection timeout for Apache Http Builder (#5787)
Rushan Jiang [Sat, 24 Sep 2022 23:12:01 +0000 (19:12 -0400)] 
AWS: Add socket connection timeout for Apache Http Builder (#5787)

4 days agoPython: Remove version modifier (#5835)
Fokko Driesprong [Sat, 24 Sep 2022 05:45:15 +0000 (07:45 +0200)] 
Python: Remove version modifier (#5835)

4 days agoBuild: Bump Rat to 0.15 (#5839)
Fokko Driesprong [Sat, 24 Sep 2022 00:14:59 +0000 (02:14 +0200)] 
Build: Bump Rat to 0.15 (#5839)

4 days agoPython: Bump pre-commit (#5842)
Fokko Driesprong [Sat, 24 Sep 2022 00:14:33 +0000 (02:14 +0200)] 
Python: Bump pre-commit (#5842)

4 days agoAPI, Core: Remove deprecated methods from Snapshot API (#5734)
Eduard Tudenhöfner [Fri, 23 Sep 2022 21:58:11 +0000 (23:58 +0200)] 
API, Core: Remove deprecated methods from Snapshot API (#5734)

This also refactors the Snapshot API to track v1 manifest locations and lazily load them when needed, thus removing the need for `FileIO` in the different parsers

4 days agoPython: Add license checker to the source distribution (#5840)
Fokko Driesprong [Fri, 23 Sep 2022 21:49:46 +0000 (23:49 +0200)] 
Python: Add license checker to the source distribution (#5840)

* Python: Add license checker to the source distribution

* Missing comma

4 days agoPython: Add NOTICE to source dist and wheel (#5843)
Fokko Driesprong [Fri, 23 Sep 2022 18:30:28 +0000 (20:30 +0200)] 
Python: Add NOTICE to source dist and wheel (#5843)

```
➜  python git:(master) ✗ poetry build
Building pyiceberg (0.1.0.dev0)
  - Building sdist
  - Built pyiceberg-0.1.0.dev0.tar.gz
  - Building wheel
  - Built pyiceberg-0.1.0.dev0-py3-none-any.whl
➜  python git:(master) ✗ cd dist
➜  dist git:(master) ✗ tar -xf pyiceberg-0.1.0.tar.gz
tar: Error opening archive: Failed to open 'pyiceberg-0.1.0.tar.gz'
➜  dist git:(master) ✗ tar -xf pyiceberg-0.1.0.dev0.tar.gz
➜  dist git:(master) ✗ cd pyiceberg-0.1.0.dev0
➜  pyiceberg-0.1.0.dev0 git:(master) ✗ ls -lah | grep -i notice
-rw-r--r--   1 fokkodriesprong  staff   251B Jun 14 21:52 NOTICE
```

4 days agoAWS: update AWS Integration Test to fix false positives (#5784)
Rushan Jiang [Fri, 23 Sep 2022 16:41:48 +0000 (12:41 -0400)] 
AWS: update AWS Integration Test to fix false positives (#5784)

4 days agoPython: Add the Makefile to the source distribution (#5838)
Fokko Driesprong [Fri, 23 Sep 2022 15:55:41 +0000 (17:55 +0200)] 
Python: Add the Makefile to the source distribution (#5838)

This makes it easier for testing.

```
➜  python git:(fd-add-makefile-to-sdist) ✗ poetry build
Building pyiceberg (0.1.0.dev0)
  - Building sdist
  - Built pyiceberg-0.1.0.dev0.tar.gz
  - Building wheel
  - Built pyiceberg-0.1.0.dev0-py3-none-any.whl
➜  python git:(fd-add-makefile-to-sdist) ✗ cd dist
➜  dist git:(fd-add-makefile-to-sdist) ✗ tar -xf pyiceberg-0.1.0.dev0.tar.gz
➜  dist git:(fd-add-makefile-to-sdist) ✗ cd pyiceberg-0.1.0.dev0
➜  pyiceberg-0.1.0.dev0 git:(fd-add-makefile-to-sdist) ✗ find . | grep Makefile
./Makefile
➜  pyiceberg-0.1.0.dev0 git:(fd-add-makefile-to-sdist) ✗ head Makefile
```

5 days agoAdd REST Servlet/Server Implementations (#5781)
Daniel Weeks [Fri, 23 Sep 2022 02:54:08 +0000 (19:54 -0700)] 
Add REST Servlet/Server Implementations (#5781)

* Add REST servlet implementation

* Update accessors

* Add util for form decoding and tests

* Update test dependency

* Spotless

5 days agoREST: implement handling of OAuth error responses followup (#5820)
Bryan Keller [Fri, 23 Sep 2022 02:51:43 +0000 (19:51 -0700)] 
REST: implement handling of OAuth error responses followup (#5820)

* WIP error handling for OAuth

* cleanup

* tests

* handle non-oauth errors in oauth

* add comment

* allow null fields

* more tests

* more cleanup

* remove unneeded precondition checks

* Fix test

* use assert4j

* Keep client API the same

* Keep ErrorResponse the same

* PR feedback

* handle invalid error response body format

5 days agoPython: Test if version is PEP440 compliant (#5834)
Fokko Driesprong [Thu, 22 Sep 2022 23:35:15 +0000 (01:35 +0200)] 
Python: Test if version is PEP440 compliant (#5834)

5 days agoPython: Add BoundBooleanExpressionVisitor for bound expressions (#5303)
Samuel Redai [Thu, 22 Sep 2022 23:33:30 +0000 (19:33 -0400)] 
Python: Add BoundBooleanExpressionVisitor for bound expressions (#5303)

Co-authored-by: Fokko Driesprong <fokko@apache.org>
5 days agoCore: Do not copy stats because a scan filter is true (#5815)
Manu Zhang [Thu, 22 Sep 2022 21:21:59 +0000 (05:21 +0800)] 
Core: Do not copy stats because a scan filter is true (#5815)

5 days agoBuild: Apply spotless on integration modules as well (#5827)
Eduard Tudenhöfner [Thu, 22 Sep 2022 16:32:09 +0000 (18:32 +0200)] 
Build: Apply spotless on integration modules as well  (#5827)

This was unintentionally forgotten and usually spotless is able to infer
all java files from the source sets. However, the way we defined the
source sets doesn't seem working with spotless and so
we were defining relevant modules via `target`.

5 days agoPython: Include the tests in source distribution (#5829)
Fokko Driesprong [Thu, 22 Sep 2022 15:09:43 +0000 (17:09 +0200)] 
Python: Include the tests in source distribution (#5829)

Adds the tests to the source distribution so we
can run them when validating a release.

5 days agoBuild: Add the path to the Action yaml (#5828)
Fokko Driesprong [Thu, 22 Sep 2022 15:09:26 +0000 (17:09 +0200)] 
Build: Add the path to the Action yaml (#5828)

The paths was invalid, but after fixing it, the Action did
not run because it isn't part of the paths. I think it would
be good to add this as well 👍

5 days agoBuild: Fix Python mkdocs path (#5821)
Fokko Driesprong [Thu, 22 Sep 2022 07:43:24 +0000 (09:43 +0200)] 
Build: Fix Python mkdocs path (#5821)

5 days agoCore: Use JsonUtil.generate in ErrorResponseParser (#5816)
Eduard Tudenhöfner [Thu, 22 Sep 2022 07:39:34 +0000 (09:39 +0200)] 
Core: Use JsonUtil.generate in ErrorResponseParser (#5816)

This has been fixed by #5698 but then reverted by #5810

6 days agoGithub: Update issue template with latest release (#5818)
Fokko Driesprong [Wed, 21 Sep 2022 20:02:23 +0000 (22:02 +0200)] 
Github: Update issue template with latest release (#5818)

6 days agoPython: Split Python docs (#5727)
Fokko Driesprong [Wed, 21 Sep 2022 19:30:52 +0000 (21:30 +0200)] 
Python: Split Python docs (#5727)

* Python: Split Python docs

This PR will split the Python docs in a separate site. The main reason
for this is that the docs are part of the Java release, which is not in
sync with the Python release cylce. Meaning that there is a high probability
that the docs does not match with current version of the code.

This will publish the docs to Github pages, by pushing this to the `gh-pages`
branch. We can set up an alias from Apache, and point pyiceberg.apache.org to
the github pages endpoint.

I also tried readthedocs, but I found that not straightforward. Mostly because
they have a build process on their end that will pull the code, and build the
docs. This involves another pipeline that we have to monitor, and we have to
set up webhooks. I am a simple man, and I like simple things, therefore I went
for mkdocs. This can push the docs to github pages in a single command:
https://www.mkdocs.org/user-guide/deploying-your-docs/#project-pages

Considerations:

- Decided to keep it to a single page for now, we can break it out into different
  pages later on. Let me know what you think of this.
- We build the docs now when we push to master, probably we'll change this
  later to trigger on tags.
- I've removed the Python docs from the other docs to avoid confusion and make sure
  that we have a single source of truth.

An example is shown here: https://fokko.github.io/incubator-iceberg/
(Once this is merged, I'll remove that one)

Closes #363
Closes #3283

* Comments

6 days agoAPI,Core: Add scan planning metrics for skipped data/delete files (#5788)
Eduard Tudenhöfner [Wed, 21 Sep 2022 08:56:20 +0000 (10:56 +0200)] 
API,Core: Add scan planning metrics for skipped data/delete files (#5788)

6 days agoCore: Reduce duplicated code in JSON Parsers (#5802)
Eduard Tudenhöfner [Wed, 21 Sep 2022 07:23:15 +0000 (09:23 +0200)] 
Core: Reduce duplicated code in JSON Parsers (#5802)

7 days agoCore: Serialize statistics files in TableMetadata (#5799)
Piotr Findeisen [Wed, 21 Sep 2022 00:00:50 +0000 (02:00 +0200)] 
Core: Serialize statistics files in TableMetadata (#5799)

7 days agoPython: Fix docstring (#5804)
Fokko Driesprong [Tue, 20 Sep 2022 23:58:24 +0000 (01:58 +0200)] 
Python: Fix docstring (#5804)

7 days agoAPI: Remove unneeded class variable (#5805)
Fokko Driesprong [Tue, 20 Sep 2022 23:57:39 +0000 (01:57 +0200)] 
API: Remove unneeded class variable (#5805)

7 days agoPython: Remove duplicate dispatch types (#5808)
Fokko Driesprong [Tue, 20 Sep 2022 23:54:39 +0000 (01:54 +0200)] 
Python: Remove duplicate dispatch types (#5808)

7 days agoAPI/Core: Make ScanReport and its related classes Immutable (#5780)
Eduard Tudenhöfner [Tue, 20 Sep 2022 23:53:24 +0000 (01:53 +0200)] 
API/Core: Make ScanReport and its related classes Immutable (#5780)

* Use true immutable objects that are type-safe, thread-safe, null-safe
* Get builder classes for free

This is relying on https://immutables.github.io/ (Apache License 2.0), which allows generating immutable objects and builders via annotation processing.
* Immutable objects are serialization ready (including JSON and its binary forms)
* Supports lazy, derived and optional attributes
* Immutable objects are constructed once, in a consistent state, and can be safely shared
  * Will fail if mandatory attributes are missing
  * Cannot be sneakily modified when passed to other code
* Immutable objects are naturally thread-safe and can therefore be safely shared among threads
  * No excessive copying
  * No excessive synchronization
* Object definitions are pleasant to write and read
  * No boilerplate setter and getters
  * No ugly IDE-generated hashCode, equals and toString methods that end up being stored in source control.

Note that we are specifically preventing people from using Jackson-related annotations (`@JsonSerialize` & `@JsonDeserialize`) in order to avoid potential runtime classpath dependency issues where annotations can be missing and lead to different behavior.

7 days agoFlink 1.14&1.15 backport: Set custom Hadoop configuration (#5775)
Kunni [Tue, 20 Sep 2022 20:37:12 +0000 (04:37 +0800)] 
Flink 1.14&1.15 backport: Set custom Hadoop configuration (#5775)

7 days agoRevert "REST: implement handling of OAuth error responses (#5698)" (#5810)
Daniel Weeks [Tue, 20 Sep 2022 19:50:35 +0000 (12:50 -0700)] 
Revert "REST: implement handling of OAuth error responses (#5698)" (#5810)

This reverts commit c293af2f1d962f44bb5b67a20ef3c67bd5823a38.

7 days agoPython: Handle optional Avro fields in conversion. (#5796)
Joshua Robinson [Tue, 20 Sep 2022 18:07:36 +0000 (20:07 +0200)] 
Python: Handle optional Avro fields in conversion. (#5796)

Found by processing fields in manifestentry with empty split_offsets field.

For pos_to_dict, check if values is None before processing as list,
struct, or dict. Added unit tests to verify.

Thanks to @fokko for the fix.

8 days agoPython: Fine-tune the API (#5672)
Fokko Driesprong [Tue, 20 Sep 2022 02:11:08 +0000 (04:11 +0200)] 
Python: Fine-tune the API (#5672)

8 days agoAWS: Allow users to set the assume role session name (#5765)
Rushan Jiang [Tue, 20 Sep 2022 01:53:16 +0000 (21:53 -0400)] 
AWS: Allow users to set the assume role session name (#5765)

8 days agoREST: implement handling of OAuth error responses (#5698)
Bryan Keller [Mon, 19 Sep 2022 22:56:08 +0000 (15:56 -0700)] 
REST: implement handling of OAuth error responses (#5698)

* WIP error handling for OAuth

* cleanup

* tests

* handle non-oauth errors in oauth

* add comment

* allow null fields

* more tests

* more cleanup

* remove unneeded precondition checks

* Fix test

* use assert4j

8 days agoPython: PyArrow support for S3/S3A with properties (#5747)
Joshua Robinson [Mon, 19 Sep 2022 13:36:54 +0000 (15:36 +0200)] 
Python: PyArrow support for S3/S3A with properties (#5747)

8 days agoPython: Remove the pre-validators (#5686)
Fokko Driesprong [Mon, 19 Sep 2022 06:39:22 +0000 (08:39 +0200)] 
Python: Remove the pre-validators (#5686)

I would like to remove the pre-validators because they are confusing.

Mostly because in the pre-validators the defaults and aliases aren't
applied, do we have to check all the cases. Removing those requires
setting defaults, and checking them afterward.

8 days agoPython: Add docker-compose for s3 tests (#5750)
Fokko Driesprong [Mon, 19 Sep 2022 06:38:19 +0000 (08:38 +0200)] 
Python: Add docker-compose for s3 tests (#5750)

This PR adds a docker-compose.yml with the right configuration to
run the s3 tests

8 days agoPython: Set the Iceberg Spec version (#5766)
Fokko Driesprong [Mon, 19 Sep 2022 06:37:20 +0000 (08:37 +0200)] 
Python: Set the Iceberg Spec version (#5766)

We pass in the version of the response that we expect for a certain
version. If we change anything in the future in the spec, we can
maintain backward compatibility until the version is being bumped.

9 days agoBuild: Bump spotless-plugin-gradle from 6.10.0 to 6.11.0 (#5786)
dependabot[bot] [Sun, 18 Sep 2022 17:29:39 +0000 (19:29 +0200)] 
Build: Bump spotless-plugin-gradle from 6.10.0 to 6.11.0 (#5786)

Bumps [spotless-plugin-gradle](https://github.com/diffplug/spotless) from 6.10.0 to 6.11.0.
- [Release notes](https://github.com/diffplug/spotless/releases)
- [Changelog](https://github.com/diffplug/spotless/blob/main/CHANGES.md)
- [Commits](https://github.com/diffplug/spotless/compare/gradle/6.10.0...gradle/6.11.0)

---
updated-dependencies:
- dependency-name: com.diffplug.spotless:spotless-plugin-gradle
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
9 days agoBuild: Bump actions/stale from 5.1.1 to 5.2.0 (#5785)
dependabot[bot] [Sun, 18 Sep 2022 17:29:04 +0000 (19:29 +0200)] 
Build: Bump actions/stale from 5.1.1 to 5.2.0 (#5785)

Bumps [actions/stale](https://github.com/actions/stale) from 5.1.1 to 5.2.0.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/stale/compare/v5.1.1...v5.2.0)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
10 days agoAWS: Refactor util methods for applying AWS clients configurations (#5684)
Rushan Jiang [Sun, 18 Sep 2022 00:32:26 +0000 (20:32 -0400)] 
AWS: Refactor util methods for applying AWS clients configurations (#5684)

11 days agoPython: Bump fsspec and s3fs to 2022.8.2 (#5757)
Fokko Driesprong [Fri, 16 Sep 2022 17:09:25 +0000 (10:09 -0700)] 
Python: Bump fsspec and s3fs to 2022.8.2 (#5757)

11 days agoBuild: Relocate httpclient5 dependency for runtime jars (#5761)
Ajantha Bhat [Fri, 16 Sep 2022 17:00:47 +0000 (22:30 +0530)] 
Build: Relocate httpclient5 dependency for runtime jars (#5761)

12 days agoPython: Add CLI command to list files (#5690)
Fokko Driesprong [Fri, 16 Sep 2022 03:21:22 +0000 (20:21 -0700)] 
Python: Add CLI command to list files (#5690)

This makes it easy to check the FileIO:

```
> pyiceberg files nyc.taxis
Snapshots: nyc.taxis
└── Snapshot 5937117119577207079, schema 0: file:/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/nyc.db/taxis/metadata/snap-5937117119577207079-1-94656c4f-4c66-4600-a4ca-f30377300527.avro
    └── Manifest: file:/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/nyc.db/taxis/metadata/94656c4f-4c66-4600-a4ca-f30377300527-m0.avro
        └── Datafile: file:/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/nyc.db/taxis/data/00003-4-a245d9ee-8462-4a08-8cbc-26b8b33b9377-00001.parquet
```

12 days agoSpark: Exclude Scala library files from JAR (#5754)
Ajantha Bhat [Thu, 15 Sep 2022 18:10:42 +0000 (23:40 +0530)] 
Spark: Exclude Scala library files from JAR (#5754)

* Spark: Fix runtime jars packaging scala library files

* apply review comments

12 days agoJdbcCatalog don't override namespace location if set (#5737)
Daniel Weeks [Thu, 15 Sep 2022 15:47:40 +0000 (08:47 -0700)] 
JdbcCatalog don't override namespace location if set (#5737)

* JdbcCatalog don't override namespace location if set

* Add namespace location tests

* Spotless

12 days agoBuild: Fix names and jobs (#5749)
Fokko Driesprong [Thu, 15 Sep 2022 13:42:15 +0000 (06:42 -0700)] 
Build: Fix names and jobs (#5749)

13 days agoBuild - Move global Spark 2.4 dependency in version.props to Spark 2.4 subproject...
Kyle Bendickson [Thu, 15 Sep 2022 05:27:41 +0000 (22:27 -0700)] 
Build - Move global Spark 2.4 dependency in version.props to Spark 2.4 subproject (#5759)

13 days agoBuild - Remove unused global flink dependency from versions.props (#5758)
Kyle Bendickson [Thu, 15 Sep 2022 05:24:43 +0000 (22:24 -0700)] 
Build - Remove unused global flink dependency from versions.props (#5758)

13 days agoAWS: Preload S3 client in GlueCatalog For LakeFormation enabled tables (#5756)
Xiaoxuan [Wed, 14 Sep 2022 20:23:10 +0000 (13:23 -0700)] 
AWS: Preload S3 client in GlueCatalog For LakeFormation enabled tables (#5756)

13 days agoPython: Make Get Properties CLI options consistent. (#5736)
Dhruv Pratap [Wed, 14 Sep 2022 15:08:37 +0000 (11:08 -0400)] 
Python: Make Get Properties CLI options consistent. (#5736)

Consistent with Set and Remove CLI options

* Python: Make Get Properties CLI options consistent with Set and Remove CLI Options
* Python: Explicitly specify command names in the annotations and not infer it from the method name to fix F811 lint issues of duplicate method declaration.

2 weeks agoAPI: Use hashCode instead of hash (#5751)
Fokko Driesprong [Tue, 13 Sep 2022 21:05:20 +0000 (14:05 -0700)] 
API: Use hashCode instead of hash (#5751)

Noticed this in the output, and decided to fix it right away:
```
/Users/fokkodriesprong/Desktop/iceberg/api/src/main/java/org/apache/iceberg/transforms/UnknownTransform.java:89: warning: [ObjectsHashCodeUnnecessaryVarargs] java.util.Objects.hash(non-varargs) should be
replaced with java.util.Objects.hashCode(value) to avoid unnecessary varargs array allocations.
    return Objects.hash(transform);
                       ^
```

2 weeks agoPython: Bump pydantic from 1.10.1 to 1.10.2 (#5744)
dependabot[bot] [Tue, 13 Sep 2022 16:53:04 +0000 (09:53 -0700)] 
Python: Bump pydantic from 1.10.1 to 1.10.2 (#5744)

Bumps [pydantic](https://github.com/pydantic/pydantic) from 1.10.1 to 1.10.2.
- [Release notes](https://github.com/pydantic/pydantic/releases)
- [Changelog](https://github.com/pydantic/pydantic/blob/main/HISTORY.md)
- [Commits](https://github.com/pydantic/pydantic/compare/v1.10.1...v1.10.2)

---
updated-dependencies:
- dependency-name: pydantic
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2 weeks agoBuild: Bump fastavro from 1.6.0 to 1.6.1 in /python (#5745)
dependabot[bot] [Tue, 13 Sep 2022 16:00:46 +0000 (09:00 -0700)] 
Build: Bump fastavro from 1.6.0 to 1.6.1 in /python (#5745)

Bumps [fastavro](https://github.com/fastavro/fastavro) from 1.6.0 to 1.6.1.
- [Release notes](https://github.com/fastavro/fastavro/releases)
- [Changelog](https://github.com/fastavro/fastavro/blob/master/ChangeLog)
- [Commits](https://github.com/fastavro/fastavro/compare/1.6.0...1.6.1)

---
updated-dependencies:
- dependency-name: fastavro
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2 weeks agoFlink: Fixed an issue where Flink1.13 batch entry was not accurate (#5731)
xzw_deepnova [Fri, 9 Sep 2022 07:41:27 +0000 (15:41 +0800)] 
Flink: Fixed an issue where Flink1.13 batch entry was not accurate (#5731)

2 weeks agoDocs: Add snapshot references metadata table (#5725)
Rajarshi Sarkar [Thu, 8 Sep 2022 18:16:58 +0000 (23:46 +0530)] 
Docs: Add snapshot references metadata table (#5725)

2 weeks agoAPI: Add estimatedRowsCount to ScanTask (#5720)
Anton Okolnychyi [Thu, 8 Sep 2022 14:56:29 +0000 (16:56 +0200)] 
API: Add estimatedRowsCount to ScanTask (#5720)

2 weeks agoFlink: Fixed an issue where Flink1.14 batch entry was not accurate (#5716)
xzw_deepnova [Thu, 8 Sep 2022 14:00:11 +0000 (22:00 +0800)] 
Flink: Fixed an issue where Flink1.14 batch entry was not accurate (#5716)

2 weeks agoNessie: Prevent accidental deletion of referenced files (#5718)
Ajantha Bhat [Thu, 8 Sep 2022 11:40:37 +0000 (17:10 +0530)] 
Nessie: Prevent accidental deletion of referenced files (#5718)

Files that are still referenced by other branches/tags

2 weeks agoPython:Fix FileIO fallback to pyarrow when s3fs not present. (#5717)
Joshua Robinson [Wed, 7 Sep 2022 10:27:03 +0000 (12:27 +0200)] 
Python:Fix FileIO fallback to pyarrow when s3fs not present. (#5717)

2 weeks agoDell: Add document. (#4993)
Xia [Wed, 7 Sep 2022 06:03:28 +0000 (14:03 +0800)] 
Dell: Add document. (#4993)

3 weeks agoFlink: Fixed an issue where Flink batch entry was not accurate (#5642)
xzw_deepnova [Wed, 7 Sep 2022 04:43:08 +0000 (12:43 +0800)] 
Flink: Fixed an issue where Flink batch entry was not accurate (#5642)

3 weeks agoPython: Pass through the location for the FileIO (#5709)
Fokko Driesprong [Tue, 6 Sep 2022 14:00:28 +0000 (16:00 +0200)] 
Python: Pass through the location for the FileIO (#5709)

This way we can determine to load the correct FileIO

3 weeks agoBuild: Upgrade to Gradle 7.5.1 (#5278)
Christopher Lambert [Tue, 6 Sep 2022 02:11:43 +0000 (04:11 +0200)] 
Build: Upgrade to Gradle 7.5.1 (#5278)

See https://docs.gradle.org/7.5.1/release-notes.html

Upgraded by running the following command twice and manually
re-applying iceberg customizations:

./gradlew wrapper --gradle-version 7.5.1 --distribution-type bin

3 weeks agoBuild: Enforce logging conventions with errorprone (#5528)
Christopher Lambert [Mon, 5 Sep 2022 17:38:12 +0000 (19:38 +0200)] 
Build: Enforce logging conventions with errorprone (#5528)

These errorprone checks are just warnings by default:
https://github.com/palantir/gradle-baseline/blob/4.0.0/baseline-error-prone/src/main/java/com/palantir/baseline/errorprone/Slf4jThrowable.java#L41
https://github.com/palantir/gradle-baseline/blob/4.0.0/baseline-error-prone/src/main/java/com/palantir/baseline/errorprone/LoggerEnclosingClass.java#L41
https://github.com/palantir/gradle-baseline/blob/4.0.0/baseline-error-prone/src/main/java/com/palantir/baseline/errorprone/PreferStaticLoggers.java#L43

Since the codebase has no violations we should increase the severity to
ERROR and enforce these conventions automatically going forward.

3 weeks agoPython: Remove fs_properties pass-through (#5689)
Samuel Redai [Mon, 5 Sep 2022 16:37:21 +0000 (12:37 -0400)] 
Python: Remove fs_properties pass-through (#5689)

3 weeks agoBuild: Update ORC to 1.8.0 (#5699)
William Hyun [Mon, 5 Sep 2022 16:36:15 +0000 (09:36 -0700)] 
Build: Update ORC to 1.8.0 (#5699)

3 weeks agoBuild: Bump jmh-gradle-plugin from 0.6.6 to 0.6.7 (#5700)
dependabot[bot] [Mon, 5 Sep 2022 16:34:05 +0000 (09:34 -0700)] 
Build: Bump jmh-gradle-plugin from 0.6.6 to 0.6.7 (#5700)

Bumps jmh-gradle-plugin from 0.6.6 to 0.6.7.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 weeks agoBuild: Bump jackson-annotations from 2.13.3 to 2.13.4 (#5702)
dependabot[bot] [Mon, 5 Sep 2022 16:33:18 +0000 (09:33 -0700)] 
Build: Bump jackson-annotations from 2.13.3 to 2.13.4 (#5702)

Bumps [jackson-annotations](https://github.com/FasterXML/jackson) from 2.13.3 to 2.13.4.
- [Release notes](https://github.com/FasterXML/jackson/releases)
- [Commits](https://github.com/FasterXML/jackson/commits)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 weeks agoBuild: Bump pytest from 7.1.2 to 7.1.3 in /python (#5703)
dependabot[bot] [Mon, 5 Sep 2022 16:32:32 +0000 (09:32 -0700)] 
Build: Bump pytest from 7.1.2 to 7.1.3 in /python (#5703)

Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.1.2 to 7.1.3.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/7.1.2...7.1.3)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 weeks agoCore: Avoid useless metadata retries (#5696)
Ryan Blue [Mon, 5 Sep 2022 16:32:01 +0000 (09:32 -0700)] 
Core: Avoid useless metadata retries (#5696)

3 weeks agoBuild: Update Avro to 1.11.1 (#5483)
Eduard Tudenhöfner [Mon, 5 Sep 2022 16:26:29 +0000 (18:26 +0200)] 
Build: Update Avro to 1.11.1 (#5483)

3 weeks agoAPI, Core: Include Expression filter in ScanReport (#5705)
Eduard Tudenhöfner [Mon, 5 Sep 2022 16:11:12 +0000 (18:11 +0200)] 
API, Core: Include Expression filter in ScanReport (#5705)

3 weeks agoFlink: fix missing generic types for some IcebergSource$Builder methods (#5697)
Steven Zhen Wu [Sun, 4 Sep 2022 08:36:59 +0000 (01:36 -0700)] 
Flink: fix missing generic types for some IcebergSource$Builder methods (#5697)

3 weeks agoSpark: Add custom metric for number of deletes applied by a SparkScan (#4588)
Wing Yew Poon [Sat, 3 Sep 2022 06:07:35 +0000 (23:07 -0700)] 
Spark: Add custom metric for number of deletes applied by a SparkScan (#4588)

3 weeks agoAPI: Remove source type from Transform (#5601)
Ryan Blue [Fri, 2 Sep 2022 21:00:15 +0000 (14:00 -0700)] 
API: Remove source type from Transform (#5601)

3 weeks agoCore: Add CommitStateUnknownException handling to REST (#5694)
Ryan Blue [Fri, 2 Sep 2022 19:50:39 +0000 (12:50 -0700)] 
Core: Add CommitStateUnknownException handling to REST (#5694)

3 weeks agoDocs: Update partitions metadata table (#5662)
lvyanquan [Fri, 2 Sep 2022 17:34:18 +0000 (01:34 +0800)] 
Docs: Update partitions metadata table (#5662)

3 weeks agoPython: Fix issues with optional dependencies (#5687)
Fokko Driesprong [Fri, 2 Sep 2022 15:26:57 +0000 (17:26 +0200)] 
Python: Fix issues with optional dependencies (#5687)

3 weeks agoDocs: Show most recent AWS SDK version (#5661)
Prashant Singh [Fri, 2 Sep 2022 06:23:36 +0000 (11:53 +0530)] 
Docs: Show most recent AWS SDK version (#5661)

Co-authored-by: Prashant Singh <psinghvk@amazon.com>
3 weeks agoSpark: Fix stats in rewrite metadata action (#5691)
Ryan Blue [Fri, 2 Sep 2022 06:20:57 +0000 (23:20 -0700)] 
Spark: Fix stats in rewrite metadata action (#5691)

* Core: Don't show dropped fields from the partition spec

* Use projection instead

* Use StructProjection in SparkDataFile.

Co-authored-by: Fokko Driesprong <fokko@apache.org>
3 weeks agoParquet: Close zstd input stream early to avoid memory pressure (#5681)
Bryan Keller [Thu, 1 Sep 2022 22:54:53 +0000 (15:54 -0700)] 
Parquet: Close zstd input stream early to avoid memory pressure (#5681)

3 weeks agoCore, AWS: Fix Kryo serialization failure for FileIO (#5437)
Prashant Singh [Thu, 1 Sep 2022 22:37:16 +0000 (04:07 +0530)] 
Core, AWS: Fix Kryo serialization failure for FileIO (#5437)

3 weeks agoCore: Support deleting tables without metadata files (#5510)
ChenLiang [Thu, 1 Sep 2022 19:48:32 +0000 (03:48 +0800)] 
Core: Support deleting tables without metadata files (#5510)

3 weeks agoCore: Fix exception handling in BaseTaskWriter (#5683)
Ryan Blue [Thu, 1 Sep 2022 19:44:19 +0000 (12:44 -0700)] 
Core: Fix exception handling in BaseTaskWriter (#5683)

* Core: Fix exception handling in BaseTaskWriter.

* Fix state check.

3 weeks ago[Python] FsspecFileIO that wraps any fsspec filesystem (#5332)
Samuel Redai [Thu, 1 Sep 2022 09:34:12 +0000 (05:34 -0400)] 
[Python] FsspecFileIO that wraps any fsspec filesystem (#5332)

3 weeks agoSpark 3.2: Add row-based changelog reader (#5682)
Yufei Gu [Thu, 1 Sep 2022 03:49:10 +0000 (20:49 -0700)] 
Spark 3.2: Add row-based changelog reader (#5682)

3 weeks agoSpark 3.3: Add row-based changelog reader (#5578)
Yufei Gu [Wed, 31 Aug 2022 22:31:59 +0000 (15:31 -0700)] 
Spark 3.3: Add row-based changelog reader (#5578)

3 weeks agoAWS: fix wrong config key for useArnRegionEnabled in AssumeRoleAwsClientFactory ...
Rushan Jiang [Wed, 31 Aug 2022 17:50:47 +0000 (13:50 -0400)] 
AWS: fix wrong config key for useArnRegionEnabled in AssumeRoleAwsClientFactory (#5680)

3 weeks agoPython: Reassign schema/partition-spec/sort-order IDs (#5627)
Fokko Driesprong [Wed, 31 Aug 2022 17:28:25 +0000 (19:28 +0200)] 
Python: Reassign schema/partition-spec/sort-order IDs  (#5627)

* Python: Reassign schema/partition-spec/sort-order ids

When creating a new schema

Resolves #5468

* Convert into pre-order

* Small docstring improvements

* Fix order of the structs/lists/maps

4 weeks agoPython: Include PyYaml as a dependency (#5674)
Fokko Driesprong [Tue, 30 Aug 2022 17:01:19 +0000 (19:01 +0200)] 
Python: Include PyYaml as a dependency (#5674)

Currently it is missing:

```
root@88de3a02961f:/# pip install "git+https://github.com/apache/iceberg.git#subdirectory=python[pyarrow]"^C
root@88de3a02961f:/# pyiceberg
Traceback (most recent call last):
  File "/usr/local/bin/pyiceberg", line 5, in <module>
    from pyiceberg.cli.console import run
  File "/usr/local/lib/python3.9/site-packages/pyiceberg/cli/console.py", line 30, in <module>
    from pyiceberg.catalog import Catalog, load_catalog
  File "/usr/local/lib/python3.9/site-packages/pyiceberg/catalog/__init__.py", line 37, in <module>
    from pyiceberg.utils.config import Config, merge_config
  File "/usr/local/lib/python3.9/site-packages/pyiceberg/utils/config.py", line 21, in <module>
    import yaml
ModuleNotFoundError: No module named 'yaml'
```

4 weeks agoAPI: Add API changes for table statistics tracking (#5021)
Piotr Findeisen [Tue, 30 Aug 2022 16:01:37 +0000 (18:01 +0200)] 
API: Add API changes for table statistics tracking (#5021)