forked from prestodb/presto
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add configs to support the TTL of alluxio SDK cache #268
Open
beinan
wants to merge
7,003
commits into
twitter-forks:master
Choose a base branch
from
beinan:presto_local_cache_ttl
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PartitionedOutput plan node may change the order of input columns. In these cases, when translating PartitionedOutput to PartitionAndSerialize we need to add a ProjectNode that will reorder the columns.
SqlVarbinary needs to be converted to Java's byte[] as test type in order to match the results in test suite.
…e the metric same as prestodb side.
Also, do not poll intermediate tasks for results. These tasks write results to shuffle and therefore cannot return anything.
Reviewed By: bigfootjon Differential Revision: D44454761 fbshipit-source-id: 054af6bed6f50fa4cffeee00db0c9374ec466f08
Currently DetachedNativeExecutionProcess class is defined in test folder as it was only being used for test execution. But now that we want to support Presto-on-Spark in localMode, we want to be able to talk to an already running CPP process during e2e query execution. Moving it out into src to enable this
This reverts commit eaf0203.
Simplify query plan when input is empty.
This rule is replaced by SimplifyPlanWithEmptyInput rule.
Summary: Pull Request resolved: prestodb#19504 The scope of this PR is to address a couple of correlated enhancements for Decimal Types listed below 1) Introduced `TypeKind::HUGEINT` (`HugeintType`) with `int128_t` as its CPP type. The support is limited to the needs of the LongDecimalType. 2) Removed the `TypeKind::SHORT_DECIMAL`, `TypeKind::LONG_DECIMAL` enum values and replaced them with a Velox type ShortDecimalType based on BigintType and LongDecimalType based on HugeintType. This change replaces Decimal CPP types `UnscaledShortDecimal` with `int64_t`, `UnscaledLongDecimal` with `int128_t`. 3) Removed `SHORT_DECIMAL`, `LONG_DECIMAL` APIs and replaced them with `DECIMAL`. The above changes mean data of decimal types are stored in memory using 64-bit/128-bit integers. e.g: `FlatVector<int64_t>`, `FlatVector<int128_t>`. The individual value is an unscaled decimal value. A variant similarly holds an unscaled value using 64-bit/128-bit integers. The Decimal Type must be present to interpret the decimal semantics. Earlier, the decimal limit check was enabled in UnscaledXXXDecimal only in debug mode. That won't catch overflows in production (release build). The new approach would be to explicitly call `DecimalUtil::valueInRange(int128_t)`where ever a decimal value is computed. See usage in SumAggregate.h, DecimalArithmetic.cpp. This is what the Presto Java implementation does as well. Vector functions must use `DECIMAL` in the signature. Simple functions require a unique type for ShortDecimal and LongDecimal to bind the appropriate implementation. Therefore, simple functions are not supported for Decimal Types. See facebookincubator/velox#4069 X-link: facebookincubator/velox#4434 Reviewed By: mbasmanova Differential Revision: D45443908 Pulled By: Yuhta fbshipit-source-id: 4bb1d1d870a666aa0c8811840131fe236e328043
Summary: X-link: facebookincubator/velox#4797 3 more fixes to UnsafeRow serialization to make it compatible with Spark Java: - The first integer that describes the row size needs to be 32 not 64 bits. - This integer needs to be serialized in big endian order. Curiously, the remaining integers within the UnsafeRow itself are little endian. - The input buffer allocated needs to be initialized to zero, since not all portions of it will be initialized in the UnsafeRow serialization code. Reviewed By: mbasmanova Differential Revision: D45446862 fbshipit-source-id: 0961b9a27f367803bb1da149729128c0a6dbc15f
PartitionAndSerialize operator used to include row size in the serialized row: row size | <UnsafeRow>. This causes row size to be serialized twice as ShuffleWrite::collect was adding row size again. This change is to not include row size in the serialized row produced by PartitionAndSerialize operator.
After adding empty table optimization, the query plan which has empty table input of tpc-ds will change. Edit the plan here to reflect the change here.
Previously the json based function definition file is using static path related to current code repo structure, it'll raise error when we import/reuse the function registration method in other modules (file not found error). This PR changed the file path to a relative path based on current class-loader's resource path.
This helps provide a hook for PrestoSparkNativeTaskExecutoryFactory to shutdown the native process
Split E2E tests and run them in parallel. Also create a separate run for Spark tests. There are 5 jobs that are run in parallel with this change. Please look at each job for test counts and failures. Co-authored-by: Michael Shang <[email protected]>
Add broadcast read support for file based broadcast by adding new type of exchange source - BroadcastExchangeSource. BroadcastExchangeSource reads data from files specified in split location. Format of split location that this exchange source can handle: batch://<taskid>?broadcastInfo={fileInfos:[<fileInfo>]}
Add new CI image Add -DFOLLY_HAVE_INT128_T=ON to centos setup script Add known warnings to Linux Advance Velox Add support for aggregations over sorted inputs Co-authored-by: Masha Basmanova <[email protected]> Co-authored-by: Deepak Majeti <[email protected]>
Remove identity projection below a project node.
Add hive.allow-drop-table permission to the test java runner to avoid access denied error when dropping test tables in TestPrestoSparkNativeGeneralQueries#testDecimalRangeFilters test suite. Re-enabled the TestPrestoSparkNativeGeneralQueries#testDecimalRangeFilters suite as well.
Add ccache to speed up build. Co-authored-by: Michael Shang <[email protected]>
Summary: In the HTTPClient, callbacks are scheduled on an eventBase. HTTPClient is kept alive using a shared_ptr, but it contains a raw pointer to MemoryPool. This MemoryPool may be freed if Task is aborted earlier, but a callback is executed much later. We see crashes related to this when the batch cluster is under heavy load. So the fix here is to keep shared_ptr to MemoryPool isntead of a raw pointer ``` == NO RELEASE NOTE == ``` Pull Request resolved: prestodb#19865 Reviewed By: xiaoxmeng Differential Revision: D46674355 Pulled By: pranjalssh fbshipit-source-id: 9b53deb6357ff87b8e1a992f3205d0ce9d79c05c
Join output has a restriction that output from left input should be before output from right input. Fix the randomize null join key optimizer here to keep this order.
As title, left side input should be before right side input in join output.
Map $internal$json_string_to_array/map/row_cast to cast(json_parse(x) as array/map/row). Also, remove invalid mappings: row_constructor -> in, isnull -> in.
Task creation involves translating query plan into a set of operator pipelines. ShuffleWrite operator creation used to include creating ShuffleWriter which make take a long time (60s or longer) and cause create-or-update-task RPC to timeout. Move ShuffleWriter creation into ShuffleWrite::addInput to avoid timing out on create-or-update-task request.
This commit updates Drift version to 1.36 and Netty version to 4.1.92.Final to add support for TLS 1.3.
This reverts commit 5039ce5.
beinan
force-pushed
the
presto_local_cache_ttl
branch
from
June 21, 2023 18:45
1ad6c42
to
6edab30
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add support for the TTL of Alluxio SDK cache