From b719046dd7df87d0528fcf2aed1c1c870ae0fc87 Mon Sep 17 00:00:00 2001 From: Masha Basmanova Date: Tue, 9 Apr 2024 03:00:23 -0700 Subject: [PATCH] Add support for DECIMAL types to ArgumentTypeFuzzer (#9373) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Summary: Pull Request resolved: https://github.com/facebookincubator/velox/pull/9373 ArgumentTypeFuzzer serves two purposes. 1. Given a generic function signature, generate a random valid return type. 2. Given a generic function signature and a concrete return type, generate a list of valid argument types. Consider a signature of the map_keys function: ``` map(K, V) -> array(K) ``` This signature has 2 type variables: K and V. This signature doesn’t fully specify the return type, just that it must be an array. There are many possible valid return types, but all should be of the form array(K). When asked to generate a valid return type for this signature ArgumentTypeFuzzer may return array(bigint) or array(row(...)), but it should not return ‘varchar’. Now, if we fix the return type to array(bigint) and ask ArgumentTypeFuzzer to generate valid argument types, we may get back a map(bigint, varchar) or a map(bigint, double), but we do not expect a ‘varchar’ or a map(integer, float). By specifying the return type as array(bigint) we effectively bind one of the type variables: K = bigint. At the same time we leave V unspecified and ArgumentTypeFuzzer is free to choose any type for it. To generate a return type, create an ArgumentTypeFuzzer by specifying a function signature and a random number generator, then call fuzzReturnType() method. ``` ArgumentTypeFuzzer fuzzer(signature, rng) auto returnType = fuzzer.fuzzReturnType() ``` To generate argument types for a given return type, create an ArgumentTypeFuzzer by specifying a function signature, a return type and a random number generator, then call fuzzArgumentTypes() method. ``` ArgumentTypeFuzzer fuzzer(signature, returnType, rng) if (fuzzer.fuzzArgumentTypes()) { auto argumentTypes = fuzzer.argumentTypes(); } ``` Notice that fuzzArgumentTypes may fail to generate valid signatures. This can happen if specified 'returnType' is not a valid return type for this signature. This change extends ArgumentTypeFuzzer to support signatures that use generic decimal types. Consider a signature of least function: ``` (decimal(p, s),...) -> decimal(p, s) ``` This signature has 2 integer variables: p and s. The return type is not fully specified. It can be any valid decimal type. ArgumentTypeFuzzer::fuzzReturnType needs to generate values for ‘p’ and ‘s’ and return a decimal(p, s) type. If return type is fixed, say decimal(10, 7), ArgumentTypeFuzzer::fuzzArgumentTypes() needs to figure out that p=10 and s=7 and return a random number of argument types all of which are decimal(10, 7). Consider slightly different function: between ``` (decimal(p, s), decimal(p, s)) -> boolean ``` This signature also has 2 integer variables: p and s. The return type is fully specified though. Hence, ArgumentTypeFuzzer::fuzzReturnType should always return ‘boolean’. However, when return type is fixed to the only possible value, ‘boolean’, ArgumentTypeFuzzer::fuzzArgumentTypes() may generate any valid values for p and s and return any decimal type for the arguments as long as both argument types are the same. A pair of {decimal(10, 7), decimal(10, 7)} is a valid response, as well as {decimal(18, 15), decimal(18, 15)}. Let’s also look at a the signature of the ‘floor’ function: ``` (decimal(p, s)) -> decimal(rp, 0) ``` This function has 3 integer variables: p, s, and rp. The ‘rp’ variable has a constraint: ``` rp = min(38, p - s + min(s, 1)) ``` The return type can be any decimal with scale 0. ArgumentTypeFuzzer::fuzzReturnType may return decimal(10, 0) or decimal(7, 0), but it should not return decimal(5, 2). If we fix return type and ask ArgumentTypeFuzzer to generate valid argument types, it will need to figure out how to generate values p and s such that rp = min(38, p - s + min(s, 1)). This is a pretty challenging task. Hence, ArgumentTypeFuzzer::fuzzArgumentTypes() doesn’t support signatures with constraints on integer variables. It should be noted that ArgumentTypeFuzzer::fuzzReturnType() may also need to make sure that generated ‘rp’ is such that there exist ‘p’ and ‘s’ for which the formula above is true. For this particular formula this is easy because a solution exists for any rp: p = rp, s = 0. However, this is not true in general. It might be better to not support ArgumentTypeFuzzer::fuzzReturnType() for signatures with constraints on integer variables. To fuzz argument or return types, ArgumentTypeFuzzer needs to generate valid values for integer variables. Unlike type variables, integer variables have implicit constraints. A variable that represents a precision must have a value in [1, 38] range. A variable that represents scale must have a value in [1, precision] range. The fuzzer needs a way to determine which variable represents precision, which represents scale and for scale variables it needs to figure out what is the corresponding precision. The fuzzer infers these properties from the way variables are used. It examines the types of arguments and return type to figure out what each variable represents. When encountering decimal(p, s) type, the fuzzer determines that p is precision and s is scale. When encountering decimal(p, 5) type, the fuzzer determines that p is precision that must be >= 5. When encountering decimal(10, s), the fuzzer determines that s is scale that must be in [0, 5] range. This logic is implemented in the ArgumentTypeFuzzer::determineUnboundedIntegerVariables method. Reviewed By: bikramSingh91 Differential Revision: D55772808 fbshipit-source-id: 708f202fc7270aaeaa59e28175aea5147b4a7981 --- velox/expression/ReverseSignatureBinder.cpp | 4 +- velox/expression/ReverseSignatureBinder.h | 17 +- .../tests/ArgumentTypeFuzzerTest.cpp | 246 ++++++++++++++++++ .../tests/utils/ArgumentTypeFuzzer.cpp | 113 +++++++- .../tests/utils/ArgumentTypeFuzzer.h | 14 +- 5 files changed, 383 insertions(+), 11 deletions(-) diff --git a/velox/expression/ReverseSignatureBinder.cpp b/velox/expression/ReverseSignatureBinder.cpp index b21715fef36d..495920e6bf1f 100644 --- a/velox/expression/ReverseSignatureBinder.cpp +++ b/velox/expression/ReverseSignatureBinder.cpp @@ -39,7 +39,9 @@ bool ReverseSignatureBinder::tryBind() { if (hasConstrainedIntegerVariable(signature_.returnType())) { return false; } - return SignatureBinderBase::tryBind(signature_.returnType(), returnType_); + tryBindSucceeded_ = + SignatureBinderBase::tryBind(signature_.returnType(), returnType_); + return tryBindSucceeded_; } } // namespace facebook::velox::exec diff --git a/velox/expression/ReverseSignatureBinder.h b/velox/expression/ReverseSignatureBinder.h index 02b92d217cf4..087adc4dbe77 100644 --- a/velox/expression/ReverseSignatureBinder.h +++ b/velox/expression/ReverseSignatureBinder.h @@ -40,15 +40,28 @@ class ReverseSignatureBinder : private SignatureBinderBase { /// tryBind() and only if tryBind() returns true. If a type variable is not /// determined by tryBind(), it maps to a nullptr. const std::unordered_map& bindings() const { + VELOX_CHECK( + tryBindSucceeded_, "tryBind() must be called first and succeed"); return typeVariablesBindings_; } + /// Return the integer bindings produced by 'tryBind'. This function should be + /// called after 'tryBind' and only if 'tryBind' returns true. + const std::unordered_map& integerBindings() const { + VELOX_CHECK( + tryBindSucceeded_, "tryBind() must be called first and succeed"); + return integerVariablesBindings_; + } + private: - /// Return whether there is a constraint on an integer variable in type - /// signature. + // Return whether there is a constraint on an integer variable in type + // signature. bool hasConstrainedIntegerVariable(const TypeSignature& type) const; const TypePtr returnType_; + + // True if 'tryBind' has been called and succeeded. False otherwise. + bool tryBindSucceeded_{false}; }; } // namespace facebook::velox::exec diff --git a/velox/expression/tests/ArgumentTypeFuzzerTest.cpp b/velox/expression/tests/ArgumentTypeFuzzerTest.cpp index e139ae0c4f7a..595a09ccaf8d 100644 --- a/velox/expression/tests/ArgumentTypeFuzzerTest.cpp +++ b/velox/expression/tests/ArgumentTypeFuzzerTest.cpp @@ -403,4 +403,250 @@ TEST_F(ArgumentTypeFuzzerTest, orderableConstraint) { } } +TEST_F(ArgumentTypeFuzzerTest, fuzzDecimalArgumentTypes) { + auto fuzzArgumentTypes = [](const exec::FunctionSignature& signature, + const TypePtr& returnType) { + std::mt19937 seed{0}; + ArgumentTypeFuzzer fuzzer{signature, returnType, seed}; + bool ok = fuzzer.fuzzArgumentTypes(kMaxVariadicArgs); + VELOX_CHECK( + ok, + "Signature: {}, Return type: {}", + signature.toString(), + returnType->toString()); + return fuzzer.argumentTypes(); + }; + + // Argument type must match return type. + auto signature = exec::FunctionSignatureBuilder() + .integerVariable("p") + .integerVariable("s") + .returnType("decimal(p,s)") + .argumentType("decimal(p,s)") + .build(); + + auto argTypes = fuzzArgumentTypes(*signature, DECIMAL(10, 7)); + ASSERT_EQ(1, argTypes.size()); + EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[0]->toString()); + + signature = exec::FunctionSignatureBuilder() + .integerVariable("p") + .integerVariable("s") + .returnType("decimal(p,s)") + .argumentType("decimal(p,s)") + .argumentType("decimal(p,s)") + .variableArity() + .build(); + argTypes = fuzzArgumentTypes(*signature, DECIMAL(10, 7)); + ASSERT_LE(1, argTypes.size()); + for (const auto& argType : argTypes) { + EXPECT_EQ(DECIMAL(10, 7)->toString(), argType->toString()); + } + + // Argument type can be any decimal. + signature = exec::FunctionSignatureBuilder() + .integerVariable("p") + .integerVariable("s") + .returnType("boolean") + .argumentType("decimal(p,s)") + .build(); + + argTypes = fuzzArgumentTypes(*signature, BOOLEAN()); + ASSERT_EQ(1, argTypes.size()); + EXPECT_TRUE(argTypes[0]->isDecimal()); + + // Argument type can be any decimal with scale 30. + signature = exec::FunctionSignatureBuilder() + .integerVariable("p") + .returnType("boolean") + .argumentType("decimal(p,30)") + .build(); + + argTypes = fuzzArgumentTypes(*signature, BOOLEAN()); + ASSERT_EQ(1, argTypes.size()); + EXPECT_TRUE(argTypes[0]->isDecimal()); + EXPECT_EQ(30, getDecimalPrecisionScale(*argTypes[0]).second); + + // Another way to specify fixed scale. + signature = exec::FunctionSignatureBuilder() + .integerVariable("p") + .integerVariable("s", "3") + .returnType("boolean") + .argumentType("decimal(p,s)") + .build(); + + argTypes = fuzzArgumentTypes(*signature, BOOLEAN()); + ASSERT_EQ(1, argTypes.size()); + EXPECT_TRUE(argTypes[0]->isDecimal()); + EXPECT_EQ(3, getDecimalPrecisionScale(*argTypes[0]).second); + + // Argument type can be any decimal with precision 3. + signature = exec::FunctionSignatureBuilder() + .integerVariable("s") + .returnType("boolean") + .argumentType("decimal(3,s)") + .build(); + + argTypes = fuzzArgumentTypes(*signature, BOOLEAN()); + ASSERT_EQ(1, argTypes.size()); + EXPECT_TRUE(argTypes[0]->isDecimal()); + EXPECT_EQ(3, getDecimalPrecisionScale(*argTypes[0]).first); + + // Another way to specify fixed precision. + signature = exec::FunctionSignatureBuilder() + .integerVariable("p", "30") + .integerVariable("s") + .returnType("boolean") + .argumentType("decimal(p,s)") + .build(); + + argTypes = fuzzArgumentTypes(*signature, BOOLEAN()); + ASSERT_EQ(1, argTypes.size()); + EXPECT_TRUE(argTypes[0]->isDecimal()); + EXPECT_EQ(30, getDecimalPrecisionScale(*argTypes[0]).first); + + // Multiple arguments. All must be the same as return type. + signature = exec::FunctionSignatureBuilder() + .integerVariable("p") + .integerVariable("s") + .returnType("decimal(p,s)") + .argumentType("decimal(p,s)") + .argumentType("decimal(p,s)") + .argumentType("decimal(p,s)") + .build(); + + argTypes = fuzzArgumentTypes(*signature, DECIMAL(10, 7)); + ASSERT_EQ(3, argTypes.size()); + EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[0]->toString()); + EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[1]->toString()); + EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[2]->toString()); + + // Multiple arguments. Some have fixed precision, scale or both. + signature = exec::FunctionSignatureBuilder() + .integerVariable("p") + .integerVariable("s") + .returnType("decimal(p,s)") + .argumentType("decimal(p,s)") + .argumentType("decimal(p,10)") + .argumentType("decimal(12,s)") + .argumentType("decimal(2,1)") + .build(); + + argTypes = fuzzArgumentTypes(*signature, DECIMAL(10, 7)); + ASSERT_EQ(4, argTypes.size()); + EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[0]->toString()); + EXPECT_EQ(DECIMAL(10, 10)->toString(), argTypes[1]->toString()); + EXPECT_EQ(DECIMAL(12, 7)->toString(), argTypes[2]->toString()); + EXPECT_EQ(DECIMAL(2, 1)->toString(), argTypes[3]->toString()); + + // Multiple pairs of precision and scale variables. + signature = exec::FunctionSignatureBuilder() + .integerVariable("p1") + .integerVariable("s1") + .integerVariable("p2") + .integerVariable("s2") + .returnType("bigint") + .argumentType("decimal(p1,s1)") + .argumentType("decimal(p2,s2)") + .argumentType("decimal(p1,s1)") + .argumentType("decimal(p2,s2)") + .build(); + argTypes = fuzzArgumentTypes(*signature, BIGINT()); + ASSERT_EQ(4, argTypes.size()); + EXPECT_TRUE(argTypes[0]->isDecimal()); + EXPECT_TRUE(argTypes[1]->isDecimal()); + EXPECT_EQ(argTypes[0]->toString(), argTypes[2]->toString()); + EXPECT_EQ(argTypes[1]->toString(), argTypes[3]->toString()); +} + +TEST_F(ArgumentTypeFuzzerTest, fuzzDecimalReturnType) { + auto fuzzReturnType = [](const exec::FunctionSignature& signature) { + std::mt19937 seed{0}; + ArgumentTypeFuzzer fuzzer{signature, seed}; + return fuzzer.fuzzReturnType(); + }; + + // Return type can be any decimal. + auto signature = exec::FunctionSignatureBuilder() + .integerVariable("p") + .integerVariable("s") + .returnType("decimal(p,s)") + .argumentType("decimal(p,s)") + .build(); + + auto returnType = fuzzReturnType(*signature); + EXPECT_TRUE(returnType->isDecimal()); + + signature = exec::FunctionSignatureBuilder() + .integerVariable("p") + .integerVariable("s") + .returnType("decimal(p,s)") + .argumentType("decimal(p,s)") + .argumentType("decimal(p,s)") + .variableArity() + .build(); + + returnType = fuzzReturnType(*signature); + EXPECT_TRUE(returnType->isDecimal()); + + // Return type can be any decimal with scale 3. + signature = exec::FunctionSignatureBuilder() + .integerVariable("p") + .integerVariable("s") + .returnType("decimal(p,3)") + .argumentType("decimal(p,s)") + .build(); + + returnType = fuzzReturnType(*signature); + EXPECT_TRUE(returnType->isDecimal()); + EXPECT_EQ(3, getDecimalPrecisionScale(*returnType).second); + + // Another way to specify that scale must be 3. + signature = exec::FunctionSignatureBuilder() + .integerVariable("p") + .integerVariable("s", "3") + .returnType("decimal(p,s)") + .argumentType("decimal(p,s)") + .build(); + + returnType = fuzzReturnType(*signature); + EXPECT_TRUE(returnType->isDecimal()); + EXPECT_EQ(3, getDecimalPrecisionScale(*returnType).second); + + // Return type can be any decimal with precision 22. + signature = exec::FunctionSignatureBuilder() + .integerVariable("p") + .integerVariable("s") + .returnType("decimal(22,s)") + .argumentType("decimal(p,s)") + .build(); + + returnType = fuzzReturnType(*signature); + EXPECT_TRUE(returnType->isDecimal()); + EXPECT_EQ(22, getDecimalPrecisionScale(*returnType).first); + + // Another way to specify that precision must be 22. + signature = exec::FunctionSignatureBuilder() + .integerVariable("p", "22") + .integerVariable("s") + .returnType("decimal(p,s)") + .argumentType("decimal(p,s)") + .build(); + + returnType = fuzzReturnType(*signature); + EXPECT_TRUE(returnType->isDecimal()); + EXPECT_EQ(22, getDecimalPrecisionScale(*returnType).first); + + // Return type can only be DECIMAL(10, 7). + signature = exec::FunctionSignatureBuilder() + .integerVariable("p") + .integerVariable("s") + .returnType("decimal(10,7)") + .argumentType("decimal(p,s)") + .build(); + + returnType = fuzzReturnType(*signature); + EXPECT_EQ(DECIMAL(10, 7)->toString(), returnType->toString()); +} + } // namespace facebook::velox::test diff --git a/velox/expression/tests/utils/ArgumentTypeFuzzer.cpp b/velox/expression/tests/utils/ArgumentTypeFuzzer.cpp index 44c8a0a69bf3..1b32b5c94674 100644 --- a/velox/expression/tests/utils/ArgumentTypeFuzzer.cpp +++ b/velox/expression/tests/utils/ArgumentTypeFuzzer.cpp @@ -27,6 +27,9 @@ namespace facebook::velox::test { std::string typeToBaseName(const TypePtr& type) { + if (type->isDecimal()) { + return "decimal"; + } return boost::algorithm::to_lower_copy(std::string{type->kindName()}); } @@ -35,6 +38,92 @@ std::optional baseNameToTypeKind(const std::string& typeName) { return tryMapNameToTypeKind(kindName); } +namespace { + +bool isDecimalBaseName(const std::string& typeName) { + auto normalized = boost::algorithm::to_lower_copy(typeName); + + return normalized == "decimal"; +} + +/// Returns true only if 'str' contains digits. +bool isPositiveInteger(const std::string& str) { + return !str.empty() && + std::find_if(str.begin(), str.end(), [](unsigned char c) { + return !std::isdigit(c); + }) == str.end(); +} + +} // namespace + +void ArgumentTypeFuzzer::determineUnboundedIntegerVariables( + const exec::TypeSignature& type) { + if (!isDecimalBaseName(type.baseName())) { + return; + } + + VELOX_CHECK_EQ(2, type.parameters().size()) + + const auto& precision = type.parameters()[0].baseName(); + const auto& scale = type.parameters()[1].baseName(); + + // Bind 'name' variable, if not already bound, using 'constant' constraint + // ('name'='123'). Return bound value if 'name' is already bound or was + // successfully bound to a constant value. Return std::nullopt otherwise. + auto tryFixedBinding = [&](const auto& name) -> std::optional { + auto it = variables().find(name); + if (it == variables().end()) { + VELOX_CHECK( + isPositiveInteger(name), + "Precision and scale of a decimal type must refer to a variable " + "or specify a position integer constant: {}", + name) + return std::stoi(name); + } + + if (integerBindings_.count(name) > 0) { + return integerBindings_[name]; + } + + if (isPositiveInteger(it->second.constraint())) { + const auto value = std::stoi(it->second.constraint()); + integerBindings_[name] = value; + return value; + } + + return std::nullopt; + }; + + std::optional p = tryFixedBinding(precision); + std::optional s = tryFixedBinding(scale); + + if (p.has_value() && s.has_value()) { + return; + } + + if (s.has_value()) { + p = std::max(ShortDecimalType::kMinPrecision, s.value()); + if (p < LongDecimalType::kMaxPrecision) { + p = p.value() + rand32(0, LongDecimalType::kMaxPrecision - p.value()); + } + + integerBindings_[precision] = p.value(); + return; + } + + if (p.has_value()) { + s = rand32(0, p.value()); + integerBindings_[scale] = s.value(); + return; + } + + p = rand32(1, LongDecimalType::kMaxPrecision); + s = rand32(0, p.value()); + + integerBindings_[precision] = p.value(); + integerBindings_[scale] = s.value(); +} + void ArgumentTypeFuzzer::determineUnboundedTypeVariables() { for (auto& [variableName, variableInfo] : variables()) { if (!variableInfo.isTypeParameter()) { @@ -74,6 +163,7 @@ bool ArgumentTypeFuzzer::fuzzArgumentTypes(uint32_t maxVariadicArgs) { return false; } bindings_ = binder.bindings(); + integerBindings_ = binder.integerBindings(); } else { for (const auto& [name, _] : signature_.variables()) { bindings_.insert({name, nullptr}); @@ -81,13 +171,16 @@ bool ArgumentTypeFuzzer::fuzzArgumentTypes(uint32_t maxVariadicArgs) { } determineUnboundedTypeVariables(); + for (const auto& argType : signature_.argumentTypes()) { + determineUnboundedIntegerVariables(argType); + } for (auto i = 0; i < formalArgsCnt; i++) { TypePtr actualArg; if (formalArgs[i].baseName() == "any") { actualArg = randType(); } else { actualArg = exec::SignatureBinder::tryResolveType( - formalArgs[i], variables(), bindings_); + formalArgs[i], variables(), bindings_, integerBindings_); VELOX_CHECK(actualArg != nullptr); } argumentTypes_.push_back(actualArg); @@ -114,15 +207,23 @@ TypePtr ArgumentTypeFuzzer::fuzzReturnType() { "Only fuzzing uninitialized return type is allowed."); determineUnboundedTypeVariables(); - if (signature_.returnType().baseName() == "any") { + determineUnboundedIntegerVariables(signature_.returnType()); + + const auto& returnType = signature_.returnType(); + + if (returnType.baseName() == "any") { returnType_ = randType(); - return returnType_; } else { returnType_ = exec::SignatureBinder::tryResolveType( - signature_.returnType(), variables(), bindings_); - VELOX_CHECK_NE(returnType_, nullptr); - return returnType_; + returnType, variables(), bindings_, integerBindings_); } + + VELOX_CHECK_NOT_NULL(returnType_); + return returnType_; +} + +int32_t ArgumentTypeFuzzer::rand32(int32_t min, int32_t max) { + return boost::random::uniform_int_distribution(min, max)(rng_); } } // namespace facebook::velox::test diff --git a/velox/expression/tests/utils/ArgumentTypeFuzzer.h b/velox/expression/tests/utils/ArgumentTypeFuzzer.h index 540a353221bf..f21ced70c521 100644 --- a/velox/expression/tests/utils/ArgumentTypeFuzzer.h +++ b/velox/expression/tests/utils/ArgumentTypeFuzzer.h @@ -65,10 +65,18 @@ class ArgumentTypeFuzzer { return signature_.variables(); } - /// Bind each type variable that is not determined by the return type to a - /// randomly generated type. + // Returns random integer between min and max inclusive. + int32_t rand32(int32_t min, int32_t max); + + // Bind each type variable that is not determined by the return type to a + // randomly generated type. void determineUnboundedTypeVariables(); + // Bind integer variables used in specified decimal(p,s) type signature to + // randomly generated values if not already bound. + // Noop if 'type' is not a decimal type signature. + void determineUnboundedIntegerVariables(const exec::TypeSignature& type); + TypePtr randType(); /// Generates an orderable random type, including structs, and arrays. @@ -83,6 +91,8 @@ class ArgumentTypeFuzzer { /// Bindings between type variables and their actual types. std::unordered_map bindings_; + std::unordered_map integerBindings_; + /// RNG to generate random types for unbounded type variables when necessary. std::mt19937& rng_; };