GH-45937: [C++][Parquet] support to encode, write and validate variant#50252
GH-45937: [C++][Parquet] support to encode, write and validate variant#50252HuaHuaY wants to merge 3 commits into
Conversation
c6667ff to
b186aeb
Compare
| ::arrow::internal::Executor* executor_; | ||
|
|
||
| bool write_time_adjusted_to_utc_; | ||
| bool variant_validation_enabled_; |
There was a problem hiding this comment.
Why this belongs to ArrowWriterProperties but not WriterProperties? If users the low-level parquet writer without Arrow API, serialized variant values cannot be validated any more?
| PREFIX | ||
| "arrow-canonical-extensions") | ||
|
|
||
| add_subdirectory(variant) |
There was a problem hiding this comment.
Should these files be relocated to cpp/src/parquet/arrow/variant instead (just like what we did for geospatial types)? I think cpp/src/arrow/extension is for metadata of extension type and array. If ARROW_PARQUET is OFF, we don't need even these files to be compiled.
| return field->type()->storage_id() == Type::BINARY || | ||
| field->type()->storage_id() == Type::LARGE_BINARY; | ||
|
|
||
| bool IsSupportedPrimitiveTypedValue(const std::shared_ptr<DataType>& type) { |
There was a problem hiding this comment.
It seems that some Arrow primitive types are missing from here: https://arrow.apache.org/docs/format/CanonicalExtensions.html#primitive-type-mappings
| namespace parquet::arrow::internal { | ||
|
|
||
| PARQUET_EXPORT | ||
| ::arrow::Status ValidateVariants(const ::arrow::ChunkedArray& data, |
There was a problem hiding this comment.
Why making them internal? It seems useful if users want to validate values on their side.
| #include "arrow/status.h" | ||
| #include "arrow/util/visibility.h" | ||
|
|
||
| namespace arrow::extension { |
There was a problem hiding this comment.
It seems that all types defined in this file are not related to arrow so perhaps we can move them to cpp/src/parquet/variant folder and use namespace parquet or namesapce parquet::variant?
| namespace arrow::extension { | ||
|
|
||
| enum class VariantBasicType : uint8_t { | ||
| Primitive = 0, |
There was a problem hiding this comment.
Why not adding k prefix like kPrimitive?
Rationale for this change
This PR supports:
typed_valuearrayThis PR does not support:
typed_valueshredded data from thevaluearrayWhat changes are included in this PR?
Are these changes tested?
Yes.
Are there any user-facing changes?
variant_validation_enabled_inArrowWriterProperties.cpp/src/arrow/extension/variant/.