parquet-format.git
8 months agoPARQUET-2110: Fix Typos in LogicalTypes.md master
Jin Cong Ho [Wed, 19 Jan 2022 23:43:19 +0000 (23:43 +0000)] 
PARQUET-2110: Fix Typos in LogicalTypes.md

15 months agoDocument dictionary page position (#177)
Gabor Szadovszky [Thu, 24 Jun 2021 11:56:23 +0000 (13:56 +0200)] 
Document dictionary page position (#177)

17 months agoPARQUET-2016: Reference column_order field from column indexes (#173)
Gabor Szadovszky [Thu, 22 Apr 2021 11:14:34 +0000 (13:14 +0200)] 
PARQUET-2016: Reference column_order field from column indexes (#173)

17 months agoPrepare for next development iteration
Antoine Pitrou [Wed, 14 Apr 2021 15:18:36 +0000 (17:18 +0200)] 
Prepare for next development iteration

17 months agoPARQUET-2019: Remove outdate KEYS file (#174)
Antoine Pitrou [Wed, 7 Apr 2021 16:47:31 +0000 (18:47 +0200)] 
PARQUET-2019: Remove outdate KEYS file (#174)

The reference KEYS files are in the SVN Parquet repositories ("dev" and "release").

17 months ago[maven-release-plugin] prepare for next development iteration
Antoine Pitrou [Wed, 7 Apr 2021 12:50:34 +0000 (14:50 +0200)] 
[maven-release-plugin] prepare for next development iteration

17 months ago[maven-release-plugin] prepare release apache-parquet-format-2.9.0-rc0 apache-parquet-format-2.9.0 apache-parquet-format-2.9.0-rc0
Antoine Pitrou [Wed, 7 Apr 2021 12:50:21 +0000 (14:50 +0200)] 
[maven-release-plugin] prepare release apache-parquet-format-2.9.0-rc0

17 months agoPARQUET-2015: Update changelog for 0.29.0 (#172)
Antoine Pitrou [Wed, 7 Apr 2021 12:16:29 +0000 (14:16 +0200)] 
PARQUET-2015: Update changelog for 0.29.0 (#172)

17 months agoPARQUET-1930: Bump Apache Thrift to 0.13 (#162)
Fokko Driesprong [Wed, 7 Apr 2021 08:30:52 +0000 (10:30 +0200)] 
PARQUET-1930: Bump Apache Thrift to 0.13 (#162)

17 months agoPARQUET-2013: Replace back "should" with "must" (#171)
Antoine Pitrou [Sun, 4 Apr 2021 22:43:18 +0000 (00:43 +0200)] 
PARQUET-2013: Replace back "should" with "must" (#171)

17 months agoPARQUET-2013: [Format] Mention that ConvertedType is deprecated (#169)
Antoine Pitrou [Sun, 4 Apr 2021 09:25:20 +0000 (11:25 +0200)] 
PARQUET-2013: [Format] Mention that ConvertedType is deprecated (#169)

Also slight wording improvements, and replace a "must" with "should" for writing the ConvertedType field.

17 months agoPARQUET-2011: Use "unit" for timestamp parameter, not "precision" (#161)
tanuja5 [Wed, 31 Mar 2021 15:51:31 +0000 (21:21 +0530)] 
PARQUET-2011: Use "unit" for timestamp parameter, not "precision" (#161)

The written spec shouldn't diverge from the Thrift definitions.

17 months agoPARQUET-1969: Test by GithubAction (#166)
Gabor Szadovszky [Tue, 30 Mar 2021 11:38:24 +0000 (13:38 +0200)] 
PARQUET-1969: Test by GithubAction (#166)

18 months agoPARQUET-1996: [Format] Add interoperable LZ4 codec, deprecate existing LZ4 codec...
Antoine Pitrou [Thu, 11 Mar 2021 19:55:02 +0000 (20:55 +0100)] 
PARQUET-1996: [Format] Add interoperable LZ4 codec, deprecate existing LZ4 codec (#168)

2 years agoPARQUET-1892: Update Thrift comment to explain CRC calculation in encrypted columns...
ggershinsky [Thu, 30 Jul 2020 07:04:31 +0000 (10:04 +0300)] 
PARQUET-1892: Update Thrift comment to explain CRC calculation in encrypted columns (#160)

2 years agoPARQUET-1862: fix comment mistake of DataPageHeaderV2 (#159)
Liam [Thu, 14 May 2020 09:20:42 +0000 (17:20 +0800)] 
PARQUET-1862: fix comment mistake of DataPageHeaderV2 (#159)

Statistics in DataPageHeaderV2 should be about the page, not column chunk.

2 years agoPARQUET-1777: add Parquet logos
Julien Le Dem [Tue, 28 Jan 2020 05:37:16 +0000 (06:37 +0100)] 
PARQUET-1777: add Parquet logos

Make sure you have checked _all_ steps below.

### Jira

- [x] My PR addresses [Parquet-1777 Jira](https://issues.apache.org/jira/browse/PARQUET-1777)
  - https://issues.apache.org/jira/browse/PARQUET-1777
  - no extra dependencies

### Commits

- [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
  1. Subject is separated from body by a blank line
  1. Subject is limited to 50 characters (not including Jira issue reference)
  1. Subject does not end with a period
  1. Subject uses the imperative mood ("add", not "adding")
  1. Body wraps at 72 characters
  1. Body explains "what" and "why", not "how"

### Documentation

- [x] No new functionality

Closes #157 from julienledem/add_logos and squashes the following commits:

0fa4af7 <Julien Le Dem>  add license headers
af066d0 <Julien Le Dem>  add Parquet logos

Authored-by: Julien Le Dem <julien@apache.org>
Signed-off-by: Uwe L. Korn <uwelk@xhochy.com>
2 years agoPrepare for next development iteration
Gabor Szadovszky [Mon, 13 Jan 2020 09:12:01 +0000 (10:12 +0100)] 
Prepare for next development iteration

2 years ago[maven-release-plugin] prepare for next development iteration
Gabor Szadovszky [Mon, 16 Dec 2019 13:21:44 +0000 (14:21 +0100)] 
[maven-release-plugin] prepare for next development iteration

2 years ago[maven-release-plugin] prepare release apache-parquet-format-2.8.0-rc0 apache-parquet-format-2.8.0 apache-parquet-format-2.8.0-rc0
Gabor Szadovszky [Mon, 16 Dec 2019 13:21:26 +0000 (14:21 +0100)] 
[maven-release-plugin] prepare release apache-parquet-format-2.8.0-rc0

2 years agoPARQUET-1714: Update CHANGES.md & update dev scm
Gabor Szadovszky [Mon, 16 Dec 2019 11:45:29 +0000 (12:45 +0100)] 
PARQUET-1714: Update CHANGES.md & update dev scm

Updated the SCM developer connection to the 'git@' format so no
username/password authentication is required for creating a release

2 years agoPARQUET-1708: Fix Thrift compiler warning (#156)
Jiajia Li [Wed, 11 Dec 2019 10:40:47 +0000 (18:40 +0800)] 
PARQUET-1708: Fix Thrift compiler warning (#156)

2 years agoPARQUET-1622: Add BYTE_STREAM_SPLIT encoding (#144)
martinradev [Tue, 3 Dec 2019 08:34:53 +0000 (08:34 +0000)] 
PARQUET-1622: Add BYTE_STREAM_SPLIT encoding (#144)

The patch extends the format to add the BYTE_STREAM_SPLIT
encoding and adds documentation for it.

2 years agoPARQUET-1687: Update release process (#155)
Gabor Szadovszky [Tue, 12 Nov 2019 15:56:35 +0000 (16:56 +0100)] 
PARQUET-1687: Update release process (#155)

Update prepare-release.sh to create RC tags and keep using the current
RC version with SNAPSHOT for development.
Update source-release.sh to retrieve the hash of the RC tag.
Create the new script finalize-release to create the final release tag
and update the development version.

2 years agoPARQUET-1672: [DOC] Fix broken link in README.md (#154)
Tarek [Mon, 7 Oct 2019 10:25:35 +0000 (11:25 +0100)] 
PARQUET-1672: [DOC] Fix broken link in README.md (#154)

3 years ago[maven-release-plugin] prepare for next development iteration
Ryan Blue [Wed, 25 Sep 2019 16:19:01 +0000 (09:19 -0700)] 
[maven-release-plugin] prepare for next development iteration

3 years ago[maven-release-plugin] prepare release apache-parquet-format-2.7.0 apache-parquet-format-2.7.0
Ryan Blue [Wed, 25 Sep 2019 16:18:41 +0000 (09:18 -0700)] 
[maven-release-plugin] prepare release apache-parquet-format-2.7.0

3 years agoPARQUET-1608: Fix URL in pom.xml
Ryan Blue [Wed, 25 Sep 2019 16:18:05 +0000 (09:18 -0700)] 
PARQUET-1608: Fix URL in pom.xml

3 years agoPARQUET-1608: Update developerConnection to github.
Ryan Blue [Wed, 25 Sep 2019 16:14:41 +0000 (09:14 -0700)] 
PARQUET-1608: Update developerConnection to github.

3 years agoPARQUET-1651: Typos in parquet.thrift (#152)
Gary Fredericks [Mon, 16 Sep 2019 07:32:24 +0000 (02:32 -0500)] 
PARQUET-1651: Typos in parquet.thrift (#152)

3 years agoPARQUET-1608: Release Parquet format 2.7.0 (#151)
Jim Apple [Fri, 13 Sep 2019 04:24:42 +0000 (21:24 -0700)] 
PARQUET-1608: Release Parquet format 2.7.0 (#151)

3 years agoPARQUET-1591: Remove @author tags (#137)
Fokko Driesprong [Mon, 26 Aug 2019 23:28:05 +0000 (01:28 +0200)] 
PARQUET-1591: Remove @author tags (#137)

3 years agoPARQUET-1630: Update Bloom filter format (#146)
Chen, Junjie [Mon, 26 Aug 2019 23:27:32 +0000 (07:27 +0800)] 
PARQUET-1630: Update Bloom filter format (#146)

3 years agoPARQUET-1630: Loosen size restrictions on Bloom filters (#150)
Jim Apple [Wed, 21 Aug 2019 16:47:42 +0000 (09:47 -0700)] 
PARQUET-1630:  Loosen size restrictions on Bloom filters (#150)

This patch uses a range reduction trick to produce a pseudorandom
number within an index without using the modulo operator '%', which is
often very slow.

The oldest reference I know to this trick is Kenneth A. Ross's IBM
research report from 2006, "Efficient Hash Probes on Modern
Processors", available at
https://domino.research.ibm.com/library/cyberdig.nsf/papers/DF54E3545C82E8A585257222006FD9A2/$File/rc24100.pdf

3 years agoPARQUET-1630: add empty compression union for Bloom filter (#149)
Jim Apple [Tue, 13 Aug 2019 15:49:12 +0000 (08:49 -0700)] 
PARQUET-1630: add empty compression union for Bloom filter (#149)

Right now no compression methods are supported. For more on Bloom
filter compression, see Michael Mitzenmacher's "Compressed Bloom
Filters",
https://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/cbf2.pdf

3 years agoPARQUET-1630: Clarify the Bloom filter algorithm (#147)
Jim Apple [Fri, 9 Aug 2019 00:18:56 +0000 (17:18 -0700)] 
PARQUET-1630: Clarify the Bloom filter algorithm (#147)

The specific questions answered from
https://lists.apache.org/thread.html/82d2e50f8c1007720564c5dc64aeae7947e949f3954a83436dc36760@%3Cdev.parquet.apache.org%3E
are

How is the bloom filter block selected from the 32 most-significant
bits from of the hash function? These details must be in the spec and
not in papers linked from the spec.

How is the number of blocks determined? From the overall filter size?

I think that the exact procedure for a lookup in each block should be
covered in a section, followed by a section for how to perform a look
up in the multi-block filter. The wording also needs to be cleaned up
so that it is always clear whether the filter that is referenced is a
block or the multi-block filter.

The spec should give more detail on how to choose the number of blocks
and on false positive rates. The sentence with “11.54 bits for each
distinct value inserted into the filter” is vague: is this the
multi-block filter? Why is a 1% false-positive rate “recommended”?

I think it is okay to use 0.5% as each block’s false-positive rate,
but then this should state how to achieve an overall false-positive
rate as a function of the number of distinct values.

3 years agoPARQUET-1627: Update specification so that legacy timestamp logical types can be...
Nándor Kollár [Thu, 8 Aug 2019 07:54:30 +0000 (09:54 +0200)] 
PARQUET-1627: Update specification so that legacy timestamp logical types can be written for local semantics as well (#148)

3 years agoPARQUET-1619: Encryption format changes (#142)
ggershinsky [Wed, 24 Jul 2019 16:54:12 +0000 (19:54 +0300)] 
PARQUET-1619: Encryption format changes (#142)

3 years agoPARQUET-1625: Align Bloom filter definition in parquet thrift with its spec (#145)
Chen, Junjie [Wed, 17 Jul 2019 07:38:15 +0000 (15:38 +0800)] 
PARQUET-1625: Align Bloom filter definition in parquet thrift with its spec (#145)

3 years agoPARQUET-1609: Specify which xxhash carefully (#143)
Jim Apple [Fri, 12 Jul 2019 09:26:22 +0000 (02:26 -0700)] 
PARQUET-1609: Specify which xxhash carefully (#143)

The hash function "xxhash" is actually a number of different hash
functions including xxHash, XXH64, XXH32, and XXH3. Additionally,
these hash functions accept "seeds", as most modern hash functions do,
including MurmurHash variants.

This patch specifies that the BloomFilter hash function default is
XXH64 with a seed of 0. It omits the confusing note about the ISA and
different variants of xxHash, since XXH64 is apparently
architecture-independent.

3 years agoPARQUET-1617: Add more detail to Bloom filter spec (#140)
Chen, Junjie [Wed, 10 Jul 2019 10:32:46 +0000 (18:32 +0800)] 
PARQUET-1617: Add more detail to Bloom filter spec (#140)

3 years agoPARUQET-1609: Update to use xxHash as hash strategy (#139)
Chen, Junjie [Thu, 4 Jul 2019 15:34:02 +0000 (23:34 +0800)] 
PARUQET-1609: Update to use xxHash as hash strategy (#139)

3 years agoPARQUET-1610: Fix minor typo (#82)
Bachar Wehbi [Tue, 25 Jun 2019 20:22:25 +0000 (22:22 +0200)] 
PARQUET-1610: Fix minor typo (#82)

3 years agoPARQUET-1610: Minor grammatical fixes (#132)
Umayah Abdennabi [Tue, 25 Jun 2019 20:20:03 +0000 (13:20 -0700)] 
PARQUET-1610: Minor grammatical fixes (#132)

3 years agoPARQUET-1590: Add Java 11 to Travis (#136)
Fokko Driesprong [Mon, 17 Jun 2019 07:17:10 +0000 (09:17 +0200)] 
PARQUET-1590: Add Java 11 to Travis (#136)

3 years agoPARQUET-1579: Add Github PR template (#135)
Fokko Driesprong [Thu, 13 Jun 2019 16:45:24 +0000 (18:45 +0200)] 
PARQUET-1579: Add Github PR template (#135)

3 years agoPARQUET-1592: rename bloom filter hash (#138)
Chen, Junjie [Thu, 13 Jun 2019 16:44:48 +0000 (00:44 +0800)] 
PARQUET-1592: rename bloom filter hash (#138)

3 years agoPARQUET-1588: Bump to Apache Thrift 0.12.0 (#133)
Fokko Driesprong [Wed, 12 Jun 2019 08:46:20 +0000 (10:46 +0200)] 
PARQUET-1588: Bump to Apache Thrift 0.12.0 (#133)

3 years agoPARQUET-1589: Bump Java to 1.8 (#134)
Fokko Driesprong [Tue, 11 Jun 2019 13:46:29 +0000 (15:46 +0200)] 
PARQUET-1589: Bump Java to 1.8 (#134)

3 years agoPARQUET-1585: Update old external links in the code base (#131)
Zoltan Ivanfi [Fri, 24 May 2019 12:43:02 +0000 (14:43 +0200)] 
PARQUET-1585: Update old external links in the code base (#131)

3 years agoPARQUET-1572: Clarify the definition of timestamp types (#130)
Zoltan Ivanfi [Fri, 24 May 2019 11:49:31 +0000 (13:49 +0200)] 
PARQUET-1572: Clarify the definition of timestamp types (#130)

3 years agoPARQUET-1561: Removed erroneous brackets from documentation. (#129)
d-becker [Fri, 3 May 2019 11:15:08 +0000 (13:15 +0200)] 
PARQUET-1561: Removed erroneous brackets from documentation. (#129)

3 years agoPARQUET-1561: Inconsistencies in the Delta Encoding specification (#128)
d-becker [Wed, 24 Apr 2019 15:53:39 +0000 (17:53 +0200)] 
PARQUET-1561: Inconsistencies in the Delta Encoding specification (#128)

* Addressed most points in PARQUET-1561. We should add examples containing
multiple miniblocks and blocks.

* Defined ULEB128 and zigzag encoding.

* Padding bits and bytes should be zero, but readers should not rely on it.

3 years agoPARQUET-1554: Compilation error when upgrading Scrooge version (#127)
Nándor Kollár [Wed, 3 Apr 2019 09:33:27 +0000 (11:33 +0200)] 
PARQUET-1554: Compilation error when upgrading Scrooge version (#127)

A comment used to be javadoc-style (/**) without an attribute following it, which lead to a compilation failure.

3 years agoPARQUET-1539: Clarify CRC checksum in page header (#126)
Boudewijn Braams [Tue, 5 Mar 2019 13:26:48 +0000 (14:26 +0100)] 
PARQUET-1539: Clarify CRC checksum in page header (#126)

3 years agoPARQUET-1487: Do not write original type for timezone-agnostic timestamps (#125)
Zoltan Ivanfi [Wed, 9 Jan 2019 12:35:34 +0000 (13:35 +0100)] 
PARQUET-1487: Do not write original type for timezone-agnostic timestamps (#125)

Clarify in the comments that we should only map the new TIMESTAMP type
to the old TIMESTAMP_MILLIS or TIMESTAMP_MICROS types when the semantics
match (UTC normalized and the precision matches).

3 years agoPARQUET-1462: Allow specifying new development version in prepare-release.sh (#116)
Zoltan Ivanfi [Tue, 4 Dec 2018 13:28:10 +0000 (14:28 +0100)] 
PARQUET-1462: Allow specifying new development version in prepare-release.sh (#116)

Before this change, prepare-release.sh only took the release version as a
parameter, the new development version was asked interactively for each
individual pom.xml file, which made answering them tedious.

3 years agoPARQUET-1437: Misleading comment in parquet.thrift (#115)
Zoltan Ivanfi [Tue, 30 Oct 2018 09:32:36 +0000 (10:32 +0100)] 
PARQUET-1437: Misleading comment in parquet.thrift (#115)

The documentation for list<ColumnOrder> column_orders stated that "Each
sort order corresponds to one column, determined by its position in the
list, matching the position of the column in the schema."

However, in reality, while the order of elements in these two
lists (schema and sort order) are the same, only leaf nodes are
represented in the list of sort orders, so the positions do not match.

3 years agoPARQUET-41: Add Bloom filter (#112)
Chen, Junjie [Fri, 12 Oct 2018 00:55:08 +0000 (08:55 +0800)] 
PARQUET-41: Add Bloom filter (#112)

* PARQUET-41: Add Bloom filter

* Grammar and structure tweaking for Bloom filter prose.

3 years agoPARQUET-1433: Parquet-format doesn't compile with Thrift 0.10.0 (#111)
nandorKollar [Fri, 5 Oct 2018 11:24:57 +0000 (13:24 +0200)] 
PARQUET-1433: Parquet-format doesn't compile with Thrift 0.10.0 (#111)

3 years ago[maven-release-plugin] prepare for next development iteration
Nandor Kollar [Thu, 27 Sep 2018 14:31:24 +0000 (16:31 +0200)] 
[maven-release-plugin] prepare for next development iteration

3 years ago[maven-release-plugin] prepare release apache-parquet-format-2.6.0 apache-parquet-format-2.6.0
Nandor Kollar [Thu, 27 Sep 2018 14:31:14 +0000 (16:31 +0200)] 
[maven-release-plugin] prepare release apache-parquet-format-2.6.0

3 years agoPARQUET-1424: Update CHANGES.md
Nandor Kollar [Thu, 27 Sep 2018 13:38:21 +0000 (15:38 +0200)] 
PARQUET-1424: Update CHANGES.md

3 years agoPARQUET-1429: Turn off DocLint on parquet-format (#108)
nandorKollar [Thu, 27 Sep 2018 14:22:38 +0000 (16:22 +0200)] 
PARQUET-1429: Turn off DocLint on parquet-format (#108)

The code generated by Thrift had several issues found by DocLint, which caused the attach-javadocs goal to fail when using Java 8.

3 years agoPARQUET-1428: Move columnar encryption into its feature branch 107/head
Nandor Kollar [Wed, 26 Sep 2018 11:49:14 +0000 (13:49 +0200)] 
PARQUET-1428: Move columnar encryption into its feature branch

Revert "PARQUET-1227: Thrift crypto metadata structures (#94)"

This reverts commit 518e206c3e6586b76e8315d5f62a8666ed62fa90.

3 years agoPARQUET-1428: Move columnar encryption into its feature branch
Nandor Kollar [Wed, 26 Sep 2018 11:49:07 +0000 (13:49 +0200)] 
PARQUET-1428: Move columnar encryption into its feature branch

Revert "PARQUET-1398: move iv_prefix to Algorithms (#103)"

This reverts commit c4a4ef22c99435ae069eb41e2977844e57dcfc37.

3 years agoPARQUET-1428: Move columnar encryption into its feature branch
Nandor Kollar [Wed, 26 Sep 2018 11:48:58 +0000 (13:48 +0200)] 
PARQUET-1428: Move columnar encryption into its feature branch

Revert "PARQUET-1401: optional RowGroup fields for handling hidden columns (#104)"

This reverts commit 677ed8ea23c60e5e42a4c537a454544884525593.

4 years agoPARQUET-1400: Deprecate parquet-mr related code in parquet-format (#105)
Gabor Szadovszky [Mon, 24 Sep 2018 12:16:42 +0000 (14:16 +0200)] 
PARQUET-1400: Deprecate parquet-mr related code in parquet-format (#105)

4 years agoPARQUET-1387: Nanosecond precision time and timestamp - parquet-format (#102)
nandorKollar [Tue, 28 Aug 2018 12:57:19 +0000 (14:57 +0200)] 
PARQUET-1387: Nanosecond precision time and timestamp - parquet-format (#102)

4 years agoPARQUET-1401: optional RowGroup fields for handling hidden columns (#104)
ggershinsky [Tue, 28 Aug 2018 12:56:59 +0000 (15:56 +0300)] 
PARQUET-1401: optional RowGroup fields for handling hidden columns (#104)

4 years agoPARQUET-1398: move iv_prefix to Algorithms (#103)
ggershinsky [Tue, 28 Aug 2018 12:56:23 +0000 (15:56 +0300)] 
PARQUET-1398: move iv_prefix to Algorithms (#103)

4 years agoPARQUET-1227: Thrift crypto metadata structures (#94)
ggershinsky [Mon, 23 Jul 2018 12:48:06 +0000 (15:48 +0300)] 
PARQUET-1227: Thrift crypto metadata structures (#94)

New Thrift structures for Parquet modular encryption.

4 years agoPARQUET-1351: Fix Travis builds by using trusty without thrift NodeJS and PHP (#100)
nandorKollar [Wed, 18 Jul 2018 17:01:12 +0000 (19:01 +0200)] 
PARQUET-1351: Fix Travis builds by using trusty without thrift NodeJS and PHP (#100)

* Use trusty image for Travis CI
* Compile Thrift without NodeJS and PHP. Looks like these are not present in the travis VM, and are not needed for Parquet.

4 years agoPARQUET-1312: Improve logical types documentation (#98)
nandorKollar [Mon, 25 Jun 2018 06:27:55 +0000 (08:27 +0200)] 
PARQUET-1312: Improve logical types documentation (#98)

4 years agoPARQUET-1266: LogicalTypes union in parquet-format doesn't include UUID
Nandor Kollar [Thu, 5 Apr 2018 08:01:42 +0000 (10:01 +0200)] 
PARQUET-1266: LogicalTypes union in parquet-format doesn't include UUID

4 years agoPARQUET-1294: Update release scripts for the new Apache policy
Gabor Szadovszky [Thu, 10 May 2018 14:22:24 +0000 (16:22 +0200)] 
PARQUET-1294: Update release scripts for the new Apache policy

4 years agoPARQUET-1290: clarify run lengths for RLE encoding (#96)
Tim Armstrong [Mon, 7 May 2018 16:51:05 +0000 (09:51 -0700)] 
PARQUET-1290: clarify run lengths for RLE encoding (#96)

4 years ago[maven-release-plugin] prepare for next development iteration
Zoltan Ivanfi [Thu, 29 Mar 2018 13:47:20 +0000 (15:47 +0200)] 
[maven-release-plugin] prepare for next development iteration

4 years ago[maven-release-plugin] prepare release apache-parquet-format-2.5.0 apache-parquet-format-2.5.0
Zoltan Ivanfi [Thu, 29 Mar 2018 13:47:01 +0000 (15:47 +0200)] 
[maven-release-plugin] prepare release apache-parquet-format-2.5.0

4 years agoRevert "[maven-release-plugin] prepare release apache-parquet-format-2.5.0"
Zoltan Ivanfi [Thu, 29 Mar 2018 13:44:08 +0000 (15:44 +0200)] 
Revert "[maven-release-plugin] prepare release apache-parquet-format-2.5.0"

This reverts commit a5b842613309a60b59d07af5d02a76c00e9ef2ac.

4 years ago[maven-release-plugin] prepare release apache-parquet-format-2.5.0
Zoltan Ivanfi [Thu, 29 Mar 2018 13:24:10 +0000 (15:24 +0200)] 
[maven-release-plugin] prepare release apache-parquet-format-2.5.0

4 years agoPARQUET-1234: Update CHANGES.md.
Gabor Szadovszky [Mon, 26 Mar 2018 13:27:06 +0000 (15:27 +0200)] 
PARQUET-1234: Update CHANGES.md.

4 years agoPARQUET-1260: Add Zoltan Ivanfi's code signing key to the KEYS file (#91)
Zoltan Ivanfi [Thu, 29 Mar 2018 13:22:22 +0000 (15:22 +0200)] 
PARQUET-1260: Add Zoltan Ivanfi's code signing key to the KEYS file (#91)

4 years agoPARQUET-1258: Update scm developer connection to github (#90)
Gabor Szadovszky [Wed, 28 Mar 2018 13:57:37 +0000 (15:57 +0200)] 
PARQUET-1258: Update scm developer connection to github (#90)

After moving to gitbox the old apache repo is not working anymore.
The pom.xml had to be updated accordingly.

4 years agoPARQUET-1251: Clarify ambiguous min/max stats for FLOAT/DOUBLE (#88)
Gabor Szadovszky [Mon, 26 Mar 2018 13:00:04 +0000 (15:00 +0200)] 
PARQUET-1251: Clarify ambiguous min/max stats for FLOAT/DOUBLE (#88)

Describe handling of the ambigous min/max statistics for FLOAT/DOUBLE.

4 years agoPARQUET-1242: parquet.thrift refers to wrong releases for the new compressions
Zoltan Ivanfi [Fri, 23 Mar 2018 13:55:52 +0000 (14:55 +0100)] 
PARQUET-1242: parquet.thrift refers to wrong releases for the new compressions

Author: Zoltan Ivanfi <zi@cloudera.com>

Closes #87 from zivanfi/PARQUET-1242 and squashes the following commits:

33cb102 [Zoltan Ivanfi] PARQUET-1242: parquet.thrift refers to wrong releases for the new compressions

4 years agoMerge pull request #89 from timarmstrong/master
Lars Volker [Fri, 23 Mar 2018 00:00:06 +0000 (17:00 -0700)] 
Merge pull request #89 from timarmstrong/master

Update Encodings.md with RLE_DICTIONARY

4 years agoMerge pull request #86 from lekv/p323
Lars Volker [Thu, 22 Mar 2018 22:24:24 +0000 (15:24 -0700)] 
Merge pull request #86 from lekv/p323

PARQUET-323: Mark INT96 as deprecated

4 years agoUpdate Encodings.md with RLE_DICTIONARY 89/head
Tim Armstrong [Thu, 22 Mar 2018 21:40:47 +0000 (14:40 -0700)] 
Update Encodings.md with RLE_DICTIONARY

RLE_DICTIONARY is never mentioned in Encodings.md yet is the recommended
enum value to use in Parquet 2.0.

4 years agoPARQUET-1236: Align version of slf4j-api
1028332163 [Wed, 21 Mar 2018 15:26:58 +0000 (16:26 +0100)] 
PARQUET-1236: Align version of slf4j-api

https://issues.apache.org/jira/browse/PARQUET-1236

Author: 1028332163 <1028332163@qq.com>

Closes #85 from PandaMonkey/master and squashes the following commits:

158f082 [1028332163] align version of slf4j-api

4 years agoPARQUET-323: Mark INT96 as deprecated 86/head
Lars Volker [Tue, 13 Mar 2018 00:33:30 +0000 (17:33 -0700)] 
PARQUET-323: Mark INT96 as deprecated

Closes #49

4 years agoPARQUET-1201: Implement page indexes
Gabor Szadovszky [Tue, 13 Feb 2018 16:08:44 +0000 (17:08 +0100)] 
PARQUET-1201: Implement page indexes

Added helper methods to read/write ColumnIndex and OffsetIndex objects.

Author: Gabor Szadovszky <gabor.szadovszky@cloudera.com>

Closes #81 from gszadovszky/PARQUET-1201 and squashes the following commits:

573dada [Gabor Szadovszky] PARQUET-1201: Implement page indexes

4 years agoPARQUET-1197: Log rat failures
Gabor Szadovszky [Thu, 18 Jan 2018 16:05:11 +0000 (17:05 +0100)] 
PARQUET-1197: Log rat failures

Author: Gabor Szadovszky <gabor.szadovszky@cloudera.com>

Closes #80 from gszadovszky/PARQUET-1197 and squashes the following commits:

c97db9d [Gabor Szadovszky] PARQUET-1197: Log rat failures

4 years agoPARQUET-1065: Deprecate type-defined sort ordering for INT96 type
Zoltan Ivanfi [Thu, 11 Jan 2018 14:08:45 +0000 (15:08 +0100)] 
PARQUET-1065: Deprecate type-defined sort ordering for INT96 type

Author: Zoltan Ivanfi <zi@cloudera.com>

Closes #77 from zivanfi/PARQUET-1065 and squashes the following commits:

b5a2117 [Zoltan Ivanfi] PARQUET-1065: Deprecate type-defined sort ordering for INT96 type

4 years agoPARQUET-1171: Clarify scope of usage for RLE, BIT_PACKED encodings
Wes McKinney [Wed, 10 Jan 2018 03:04:57 +0000 (22:04 -0500)] 
PARQUET-1171: Clarify scope of usage for RLE, BIT_PACKED encodings

See related discussions on mailing list, JIRA

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #79 from wesm/PARQUET-1171 and squashes the following commits:

185348e [Wes McKinney] Fix typo
f29b38c [Wes McKinney] Add notes to indicate scope of usage for RLE, BIT_PACKED encodings

4 years agoPARQUET-1064: Deprecate type-defined sort ordering for INTERVAL type.
Zoltan Ivanfi [Tue, 9 Jan 2018 14:48:00 +0000 (15:48 +0100)] 
PARQUET-1064: Deprecate type-defined sort ordering for INTERVAL type.

Author: Zoltan Ivanfi <zi@cloudera.com>

Closes #76 from zivanfi/PARQUET-1064 and squashes the following commits:

0ff7b14 [Zoltan Ivanfi] PARQUET-1064: Fixed typo.
5599951 [Zoltan Ivanfi] PARQUET-1064: Deprecate type-defined sort ordering for INTERVAL type.

4 years agoPARQUET-1156: Address dev/merge_parquet_pr.py problems.
Zoltan Ivanfi [Tue, 9 Jan 2018 14:44:48 +0000 (15:44 +0100)] 
PARQUET-1156: Address dev/merge_parquet_pr.py problems.

Identical to my change in parquet-mr, which already got approved and merged.

Author: Zoltan Ivanfi <zi@cloudera.com>

Closes #78 from zivanfi/PARQUET-1156 and squashes the following commits:

518faef [Zoltan Ivanfi] PARQUET-1156: Address dev/merge_parquet_pr.py problems.

4 years agoPARQUET-1145: Add license to .gitignore
Lars Volker [Mon, 13 Nov 2017 12:56:08 +0000 (13:56 +0100)] 
PARQUET-1145: Add license to .gitignore

Also removes .gitignore from the RAT whitelist.

Author: Lars Volker <lv@cloudera.com>

Closes #75 from lekv/license and squashes the following commits:

04523ef [Lars Volker] Also add license to .travis.yml
ce471fd [Lars Volker] PARQUET-1145: Add license to .gitignore

4 years ago[maven-release-plugin] prepare for next development iteration
Ryan Blue [Tue, 17 Oct 2017 19:25:34 +0000 (12:25 -0700)] 
[maven-release-plugin] prepare for next development iteration

4 years ago[maven-release-plugin] prepare release apache-parquet-format-2.4.0 apache-parquet-format-2.4.0
Ryan Blue [Tue, 17 Oct 2017 19:25:18 +0000 (12:25 -0700)] 
[maven-release-plugin] prepare release apache-parquet-format-2.4.0