Add support for DECIMAL types to ArgumentTypeFuzzer (facebookincubato…

…r#9373) Summary: Pull Request resolved: facebookincubator#9373 ArgumentTypeFuzzer serves two purposes. 1. Given a generic function signature, generate a random valid return type. 2. Given a generic function signature and a concrete return type, generate a list of valid argument types. Consider a signature of the map_keys function: ``` map(K, V) -> array(K) ``` This signature has 2 type variables: K and V. This signature doesn’t fully specify the return type, just that it must be an array. There are many possible valid return types, but all should be of the form array(K). When asked to generate a valid return type for this signature ArgumentTypeFuzzer may return array(bigint) or array(row(...)), but it should not return ‘varchar’. Now, if we fix the return type to array(bigint) and ask ArgumentTypeFuzzer to generate valid argument types, we may get back a map(bigint, varchar) or a map(bigint, double), but we do not expect a ‘varchar’ or a map(integer, float). By specifying the return type as array(bigint) we effectively bind one of the type variables: K = bigint. At the same time we leave V unspecified and ArgumentTypeFuzzer is free to choose any type for it. To generate a return type, create an ArgumentTypeFuzzer by specifying a function signature and a random number generator, then call fuzzReturnType() method. ``` ArgumentTypeFuzzer fuzzer(signature, rng) auto returnType = fuzzer.fuzzReturnType() ``` To generate argument types for a given return type, create an ArgumentTypeFuzzer by specifying a function signature, a return type and a random number generator, then call fuzzArgumentTypes() method. ``` ArgumentTypeFuzzer fuzzer(signature, returnType, rng) if (fuzzer.fuzzArgumentTypes()) { auto argumentTypes = fuzzer.argumentTypes(); } ``` Notice that fuzzArgumentTypes may fail to generate valid signatures. This can happen if specified 'returnType' is not a valid return type for this signature. This change extends ArgumentTypeFuzzer to support signatures that use generic decimal types. Consider a signature of least function: ``` (decimal(p, s),...) -> decimal(p, s) ``` This signature has 2 integer variables: p and s. The return type is not fully specified. It can be any valid decimal type. ArgumentTypeFuzzer::fuzzReturnType needs to generate values for ‘p’ and ‘s’ and return a decimal(p, s) type. If return type is fixed, say decimal(10, 7), ArgumentTypeFuzzer::fuzzArgumentTypes() needs to figure out that p=10 and s=7 and return a random number of argument types all of which are decimal(10, 7). Consider slightly different function: between ``` (decimal(p, s), decimal(p, s)) -> boolean ``` This signature also has 2 integer variables: p and s. The return type is fully specified though. Hence, ArgumentTypeFuzzer::fuzzReturnType should always return ‘boolean’. However, when return type is fixed to the only possible value, ‘boolean’, ArgumentTypeFuzzer::fuzzArgumentTypes() may generate any valid values for p and s and return any decimal type for the arguments as long as both argument types are the same. A pair of {decimal(10, 7), decimal(10, 7)} is a valid response, as well as {decimal(18, 15), decimal(18, 15)}. Let’s also look at a the signature of the ‘floor’ function: ``` (decimal(p, s)) -> decimal(rp, 0) ``` This function has 3 integer variables: p, s, and rp. The ‘rp’ variable has a constraint: ``` rp = min(38, p - s + min(s, 1)) ``` The return type can be any decimal with scale 0. ArgumentTypeFuzzer::fuzzReturnType may return decimal(10, 0) or decimal(7, 0), but it should not return decimal(5, 2). If we fix return type and ask ArgumentTypeFuzzer to generate valid argument types, it will need to figure out how to generate values p and s such that rp = min(38, p - s + min(s, 1)). This is a pretty challenging task. Hence, ArgumentTypeFuzzer::fuzzArgumentTypes() doesn’t support signatures with constraints on integer variables. It should be noted that ArgumentTypeFuzzer::fuzzReturnType() may also need to make sure that generated ‘rp’ is such that there exist ‘p’ and ‘s’ for which the formula above is true. For this particular formula this is easy because a solution exists for any rp: p = rp, s = 0. However, this is not true in general. It might be better to not support ArgumentTypeFuzzer::fuzzReturnType() for signatures with constraints on integer variables. To fuzz argument or return types, ArgumentTypeFuzzer needs to generate valid values for integer variables. Unlike type variables, integer variables have implicit constraints. A variable that represents a precision must have a value in [1, 38] range. A variable that represents scale must have a value in [1, precision] range. The fuzzer needs a way to determine which variable represents precision, which represents scale and for scale variables it needs to figure out what is the corresponding precision. The fuzzer infers these properties from the way variables are used. It examines the types of arguments and return type to figure out what each variable represents. When encountering decimal(p, s) type, the fuzzer determines that p is precision and s is scale. When encountering decimal(p, 5) type, the fuzzer determines that p is precision that must be >= 5. When encountering decimal(10, s), the fuzzer determines that s is scale that must be in [0, 5] range. This logic is implemented in the ArgumentTypeFuzzer::determineUnboundedIntegerVariables method. Reviewed By: bikramSingh91 Differential Revision: D55772808 fbshipit-source-id: 708f202fc7270aaeaa59e28175aea5147b4a7981
acvictor · Apr 9, 2024 · b719046 · b719046
1 parent 650eb6a
commit b719046
Show file tree

Hide file tree

Showing 5 changed files with 383 additions and 11 deletions.
diff --git a/velox/expression/ReverseSignatureBinder.cpp b/velox/expression/ReverseSignatureBinder.cpp
@@ -39,7 +39,9 @@ bool ReverseSignatureBinder::tryBind() {
   if (hasConstrainedIntegerVariable(signature_.returnType())) {
     return false;
   }
-  return SignatureBinderBase::tryBind(signature_.returnType(), returnType_);
+  tryBindSucceeded_ =
+      SignatureBinderBase::tryBind(signature_.returnType(), returnType_);
+  return tryBindSucceeded_;
 }
 
 } // namespace facebook::velox::exec
diff --git a/velox/expression/ReverseSignatureBinder.h b/velox/expression/ReverseSignatureBinder.h
@@ -40,15 +40,28 @@ class ReverseSignatureBinder : private SignatureBinderBase {
   /// tryBind() and only if tryBind() returns true. If a type variable is not
   /// determined by tryBind(), it maps to a nullptr.
   const std::unordered_map<std::string, TypePtr>& bindings() const {
+    VELOX_CHECK(
+        tryBindSucceeded_, "tryBind() must be called first and succeed");
     return typeVariablesBindings_;
   }
 
+  /// Return the integer bindings produced by 'tryBind'. This function should be
+  /// called after 'tryBind' and only if 'tryBind' returns true.
+  const std::unordered_map<std::string, int>& integerBindings() const {
+    VELOX_CHECK(
+        tryBindSucceeded_, "tryBind() must be called first and succeed");
+    return integerVariablesBindings_;
+  }
+
  private:
-  /// Return whether there is a constraint on an integer variable in type
-  /// signature.
+  // Return whether there is a constraint on an integer variable in type
+  // signature.
   bool hasConstrainedIntegerVariable(const TypeSignature& type) const;
 
   const TypePtr returnType_;
+
+  // True if 'tryBind' has been called and succeeded. False otherwise.
+  bool tryBindSucceeded_{false};
 };
 
 } // namespace facebook::velox::exec
diff --git a/velox/expression/tests/ArgumentTypeFuzzerTest.cpp b/velox/expression/tests/ArgumentTypeFuzzerTest.cpp
@@ -403,4 +403,250 @@ TEST_F(ArgumentTypeFuzzerTest, orderableConstraint) {
   }
 }
 
+TEST_F(ArgumentTypeFuzzerTest, fuzzDecimalArgumentTypes) {
+  auto fuzzArgumentTypes = [](const exec::FunctionSignature& signature,
+                              const TypePtr& returnType) {
+    std::mt19937 seed{0};
+    ArgumentTypeFuzzer fuzzer{signature, returnType, seed};
+    bool ok = fuzzer.fuzzArgumentTypes(kMaxVariadicArgs);
+    VELOX_CHECK(
+        ok,
+        "Signature: {}, Return type: {}",
+        signature.toString(),
+        returnType->toString());
+    return fuzzer.argumentTypes();
+  };
+
+  // Argument type must match return type.
+  auto signature = exec::FunctionSignatureBuilder()
+                       .integerVariable("p")
+                       .integerVariable("s")
+                       .returnType("decimal(p,s)")
+                       .argumentType("decimal(p,s)")
+                       .build();
+
+  auto argTypes = fuzzArgumentTypes(*signature, DECIMAL(10, 7));
+  ASSERT_EQ(1, argTypes.size());
+  EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[0]->toString());
+
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p")
+                  .integerVariable("s")
+                  .returnType("decimal(p,s)")
+                  .argumentType("decimal(p,s)")
+                  .argumentType("decimal(p,s)")
+                  .variableArity()
+                  .build();
+  argTypes = fuzzArgumentTypes(*signature, DECIMAL(10, 7));
+  ASSERT_LE(1, argTypes.size());
+  for (const auto& argType : argTypes) {
+    EXPECT_EQ(DECIMAL(10, 7)->toString(), argType->toString());
+  }
+
+  // Argument type can be any decimal.
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p")
+                  .integerVariable("s")
+                  .returnType("boolean")
+                  .argumentType("decimal(p,s)")
+                  .build();
+
+  argTypes = fuzzArgumentTypes(*signature, BOOLEAN());
+  ASSERT_EQ(1, argTypes.size());
+  EXPECT_TRUE(argTypes[0]->isDecimal());
+
+  // Argument type can be any decimal with scale 30.
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p")
+                  .returnType("boolean")
+                  .argumentType("decimal(p,30)")
+                  .build();
+
+  argTypes = fuzzArgumentTypes(*signature, BOOLEAN());
+  ASSERT_EQ(1, argTypes.size());
+  EXPECT_TRUE(argTypes[0]->isDecimal());
+  EXPECT_EQ(30, getDecimalPrecisionScale(*argTypes[0]).second);
+
+  // Another way to specify fixed scale.
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p")
+                  .integerVariable("s", "3")
+                  .returnType("boolean")
+                  .argumentType("decimal(p,s)")
+                  .build();
+
+  argTypes = fuzzArgumentTypes(*signature, BOOLEAN());
+  ASSERT_EQ(1, argTypes.size());
+  EXPECT_TRUE(argTypes[0]->isDecimal());
+  EXPECT_EQ(3, getDecimalPrecisionScale(*argTypes[0]).second);
+
+  // Argument type can be any decimal with precision 3.
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("s")
+                  .returnType("boolean")
+                  .argumentType("decimal(3,s)")
+                  .build();
+
+  argTypes = fuzzArgumentTypes(*signature, BOOLEAN());
+  ASSERT_EQ(1, argTypes.size());
+  EXPECT_TRUE(argTypes[0]->isDecimal());
+  EXPECT_EQ(3, getDecimalPrecisionScale(*argTypes[0]).first);
+
+  // Another way to specify fixed precision.
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p", "30")
+                  .integerVariable("s")
+                  .returnType("boolean")
+                  .argumentType("decimal(p,s)")
+                  .build();
+
+  argTypes = fuzzArgumentTypes(*signature, BOOLEAN());
+  ASSERT_EQ(1, argTypes.size());
+  EXPECT_TRUE(argTypes[0]->isDecimal());
+  EXPECT_EQ(30, getDecimalPrecisionScale(*argTypes[0]).first);
+
+  // Multiple arguments. All must be the same as return type.
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p")
+                  .integerVariable("s")
+                  .returnType("decimal(p,s)")
+                  .argumentType("decimal(p,s)")
+                  .argumentType("decimal(p,s)")
+                  .argumentType("decimal(p,s)")
+                  .build();
+
+  argTypes = fuzzArgumentTypes(*signature, DECIMAL(10, 7));
+  ASSERT_EQ(3, argTypes.size());
+  EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[0]->toString());
+  EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[1]->toString());
+  EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[2]->toString());
+
+  // Multiple arguments. Some have fixed precision, scale or both.
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p")
+                  .integerVariable("s")
+                  .returnType("decimal(p,s)")
+                  .argumentType("decimal(p,s)")
+                  .argumentType("decimal(p,10)")
+                  .argumentType("decimal(12,s)")
+                  .argumentType("decimal(2,1)")
+                  .build();
+
+  argTypes = fuzzArgumentTypes(*signature, DECIMAL(10, 7));
+  ASSERT_EQ(4, argTypes.size());
+  EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[0]->toString());
+  EXPECT_EQ(DECIMAL(10, 10)->toString(), argTypes[1]->toString());
+  EXPECT_EQ(DECIMAL(12, 7)->toString(), argTypes[2]->toString());
+  EXPECT_EQ(DECIMAL(2, 1)->toString(), argTypes[3]->toString());
+
+  // Multiple pairs of precision and scale variables.
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p1")
+                  .integerVariable("s1")
+                  .integerVariable("p2")
+                  .integerVariable("s2")
+                  .returnType("bigint")
+                  .argumentType("decimal(p1,s1)")
+                  .argumentType("decimal(p2,s2)")
+                  .argumentType("decimal(p1,s1)")
+                  .argumentType("decimal(p2,s2)")
+                  .build();
+  argTypes = fuzzArgumentTypes(*signature, BIGINT());
+  ASSERT_EQ(4, argTypes.size());
+  EXPECT_TRUE(argTypes[0]->isDecimal());
+  EXPECT_TRUE(argTypes[1]->isDecimal());
+  EXPECT_EQ(argTypes[0]->toString(), argTypes[2]->toString());
+  EXPECT_EQ(argTypes[1]->toString(), argTypes[3]->toString());
+}
+
+TEST_F(ArgumentTypeFuzzerTest, fuzzDecimalReturnType) {
+  auto fuzzReturnType = [](const exec::FunctionSignature& signature) {
+    std::mt19937 seed{0};
+    ArgumentTypeFuzzer fuzzer{signature, seed};
+    return fuzzer.fuzzReturnType();
+  };
+
+  // Return type can be any decimal.
+  auto signature = exec::FunctionSignatureBuilder()
+                       .integerVariable("p")
+                       .integerVariable("s")
+                       .returnType("decimal(p,s)")
+                       .argumentType("decimal(p,s)")
+                       .build();
+
+  auto returnType = fuzzReturnType(*signature);
+  EXPECT_TRUE(returnType->isDecimal());
+
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p")
+                  .integerVariable("s")
+                  .returnType("decimal(p,s)")
+                  .argumentType("decimal(p,s)")
+                  .argumentType("decimal(p,s)")
+                  .variableArity()
+                  .build();
+
+  returnType = fuzzReturnType(*signature);
+  EXPECT_TRUE(returnType->isDecimal());
+
+  // Return type can be any decimal with scale 3.
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p")
+                  .integerVariable("s")
+                  .returnType("decimal(p,3)")
+                  .argumentType("decimal(p,s)")
+                  .build();
+
+  returnType = fuzzReturnType(*signature);
+  EXPECT_TRUE(returnType->isDecimal());
+  EXPECT_EQ(3, getDecimalPrecisionScale(*returnType).second);
+
+  // Another way to specify that scale must be 3.
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p")
+                  .integerVariable("s", "3")
+                  .returnType("decimal(p,s)")
+                  .argumentType("decimal(p,s)")
+                  .build();
+
+  returnType = fuzzReturnType(*signature);
+  EXPECT_TRUE(returnType->isDecimal());
+  EXPECT_EQ(3, getDecimalPrecisionScale(*returnType).second);
+
+  // Return type can be any decimal with precision 22.
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p")
+                  .integerVariable("s")
+                  .returnType("decimal(22,s)")
+                  .argumentType("decimal(p,s)")
+                  .build();
+
+  returnType = fuzzReturnType(*signature);
+  EXPECT_TRUE(returnType->isDecimal());
+  EXPECT_EQ(22, getDecimalPrecisionScale(*returnType).first);
+
+  // Another way to specify that precision must be 22.
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p", "22")
+                  .integerVariable("s")
+                  .returnType("decimal(p,s)")
+                  .argumentType("decimal(p,s)")
+                  .build();
+
+  returnType = fuzzReturnType(*signature);
+  EXPECT_TRUE(returnType->isDecimal());
+  EXPECT_EQ(22, getDecimalPrecisionScale(*returnType).first);
+
+  // Return type can only be DECIMAL(10, 7).
+  signature = exec::FunctionSignatureBuilder()
+                  .integerVariable("p")
+                  .integerVariable("s")
+                  .returnType("decimal(10,7)")
+                  .argumentType("decimal(p,s)")
+                  .build();
+
+  returnType = fuzzReturnType(*signature);
+  EXPECT_EQ(DECIMAL(10, 7)->toString(), returnType->toString());
+}
+
 } // namespace facebook::velox::test