Skip to content

Commit

Permalink
Make wrapped C++ functions pickleable (#30099)
Browse files Browse the repository at this point in the history
* Add `test_*_repr()` to test behavior with different Python versions.

* Adjust expected repr for PyPy

* Adjust another expected repr for PyPy

* Try again: undo mistaken adjustment for PyPy

* Give up on test_pytypes test_capsule_with_name_repr (not sufficiently important); PyPy still generates 2 different kinds of errors: test_print failure on macOS with Python 3.8; Python 3.9, 3.10 have no leading `<`

* `_wrapped_simple_callable` proof of concept

* Add `module_::def_as_native()`

* Resolve PyPy `TypeError: cannot create weak reference to builtin_function_or_method object`

* Replace `PyCapsule` with `function_record_PyObject`.

* function_record_PyTypeObject: Replace C++20 designated initializers with full list of values.

* Introduce `PYBIND11_DETAIL_FUNCTION_RECORD_ABI_ID` and use along with `PYBIND11_PLATFORM_ABI_ID_V4` to version `function_record_PyTypeObject` `tp_name`

* Move `std::once_flag` out of `inline` function (in hopes that that fixes flaky behavior of test_gil_scoped.py). IncludeCleaner fixes.

* `tp_vectorcall` was introduced only with Python 3.8

* clang-tidy auto-fixes

* Disable `-Wmissing-field-initializers`. Guard `PyType_Ready(&function_record_PyTypeObject)` also with a simple `static bool first_call`

* Give up on the `std::call_once` idea, for Python 3.6 compatibility (it works with all other Python versions). Instead call `function_record_PyTypeObject_PyType_Ready()` from `get_internals()`.

* Add `__reduce_ex__` to `function_record_PyTypeObject`. Add `_pybind11_detail_function_record_import_helper` (proof of concept).

* Move `function_record_PyTypeObject_PyType_Ready()` call in `get_internals()` so that it is always called when `get_internals()` is called the first time.

* gcc 4.8.5 and 7.5.0 reject `PYBIND11_WARNING_DISABLE_GCC("-Wmissing-field-initializers")`

* `function_record_PyTypeObject_PyType_Ready()`, `get_pybind11_detail_function_record_pickle_helper()` call-once initializations triggered from `cpp_function::initialize_generic()`

* gcc 4.8.5 and 7.5.0 reject `PYBIND11_WARNING_DISABLE_GCC("-Wcast-function-type")`

* Python 3.6, 3.7: Skip `get_pybind11_detail_function_record_pickle_helper()` call-once initialization triggered from `cpp_function::initialize_generic()`

* New version of `_function_record_pickle_helper`, using `collections.namedtuple`

* Explicit `str(tup_obj[1])` to fix 🐍 3 • centos:7 • x64 segfault

* Factor out detail/function_record_pyobject.h

* Use PYBIND11_NAMESPACE_BEGIN/END for function_record_PyTypeObject_methods

* Factor out function_record_PyTypeObject_methods::tp_name_impl, mainly to stop clang-format from breaking up a string literal.

* Simplify implementation of UNEXPECTED CALL functions.

* Factor out `detail::get_scope_module()`

* IncludeCleaner fixes (Google toolchain).

* Comment out unreachable code (to resolve MSVC Werrors).

* Use built-in `eval()` instead of `function_record_pickle_helper()`

Much simpler!

(Note that the `function_record_pickle_helper()` code is NOT removed in this commit.)

This approach was discovered in an attempt to solve the problem that
stubgen picks up `_function_record_pickle_helper_v1`.

For example (tensorflow_text/core/pybinds/tflite_registrar.pyi):

```diff
+from typing import Any
+
+def _function_record_pickle_helper_v1(*args, **kwargs) -> Any: ...
```

* Remove `function_record_pickle_helper()`

* Mark `internals::function_record_capsule_name` as OBSOLETE.

* Add comment pointing to #30099

* Archive experimental code from video meet with @rainwoodman 2024-02-15

* Add a pickle roundtrip test starting with `m.simple_callable.__self__` and a long comment to explain the unusual behavior.

* PyPy does not have `m.simple_callable.__self__`

* Change "UNUSUAL" comment as suggested by @rainwoodman (only very slightly differently as suggested).
  • Loading branch information
Ralf W. Grosse-Kunstleve authored Feb 20, 2024
1 parent f468b2c commit de89591
Show file tree
Hide file tree
Showing 10 changed files with 290 additions and 64 deletions.
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ set(PYBIND11_HEADERS
include/pybind11/detail/cross_extension_shared_state.h
include/pybind11/detail/descr.h
include/pybind11/detail/dynamic_raw_ptr_cast_if_possible.h
include/pybind11/detail/function_record_pyobject.h
include/pybind11/detail/init.h
include/pybind11/detail/internals.h
include/pybind11/detail/native_enum_data.h
Expand Down
1 change: 1 addition & 0 deletions include/pybind11/attr.h
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ struct argument_record {

/// Internal data structure which holds metadata about a bound function (signature, overloads,
/// etc.)
#define PYBIND11_DETAIL_FUNCTION_RECORD_ABI_ID "v1" // PLEASE UPDATE if the struct is changed.
struct function_record {
function_record()
: is_constructor(false), is_new_style_constructor(false), is_stateless(false),
Expand Down
205 changes: 205 additions & 0 deletions include/pybind11/detail/function_record_pyobject.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
// Copyright (c) 2024 The Pybind Development Team.
// All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.

// For background see the description of PR google/pywrapcc#30099.

#pragma once

#include "../attr.h"
#include "../pytypes.h"
#include "common.h"

#include <cstring>

PYBIND11_NAMESPACE_BEGIN(PYBIND11_NAMESPACE)
PYBIND11_NAMESPACE_BEGIN(detail)

struct function_record_PyObject {
PyObject_HEAD
function_record *cpp_func_rec;
};

PYBIND11_NAMESPACE_BEGIN(function_record_PyTypeObject_methods)

PyObject *tp_new_impl(PyTypeObject *type, PyObject *args, PyObject *kwds);
PyObject *tp_alloc_impl(PyTypeObject *type, Py_ssize_t nitems);
int tp_init_impl(PyObject *self, PyObject *args, PyObject *kwds);
void tp_dealloc_impl(PyObject *self);
void tp_free_impl(void *self);

static PyObject *reduce_ex_impl(PyObject *self, PyObject *, PyObject *);

PYBIND11_WARNING_PUSH
#if defined(__GNUC__) && __GNUC__ >= 8
PYBIND11_WARNING_DISABLE_GCC("-Wcast-function-type")
#endif
static PyMethodDef tp_methods_impl[]
= {{"__reduce_ex__", (PyCFunction) reduce_ex_impl, METH_VARARGS | METH_KEYWORDS, nullptr},
{nullptr, nullptr, 0, nullptr}};
PYBIND11_WARNING_POP

// Note that this name is versioned.
constexpr char tp_name_impl[]
= "pybind11_detail_function_record_" PYBIND11_DETAIL_FUNCTION_RECORD_ABI_ID
"_" PYBIND11_PLATFORM_ABI_ID_V4;

PYBIND11_NAMESPACE_END(function_record_PyTypeObject_methods)

// Designated initializers are a C++20 feature:
// https://en.cppreference.com/w/cpp/language/aggregate_initialization#Designated_initializers
// MSVC rejects them unless /std:c++20 is used (error code C7555).
PYBIND11_WARNING_PUSH
PYBIND11_WARNING_DISABLE_CLANG("-Wmissing-field-initializers")
#if defined(__GNUC__) && __GNUC__ >= 8
PYBIND11_WARNING_DISABLE_GCC("-Wmissing-field-initializers")
#endif
static PyTypeObject function_record_PyTypeObject = {
PyVarObject_HEAD_INIT(nullptr, 0)
/* const char *tp_name */ function_record_PyTypeObject_methods::tp_name_impl,
/* Py_ssize_t tp_basicsize */ sizeof(function_record_PyObject),
/* Py_ssize_t tp_itemsize */ 0,
/* destructor tp_dealloc */ function_record_PyTypeObject_methods::tp_dealloc_impl,
/* Py_ssize_t tp_vectorcall_offset */ 0,
/* getattrfunc tp_getattr */ nullptr,
/* setattrfunc tp_setattr */ nullptr,
/* PyAsyncMethods *tp_as_async */ nullptr,
/* reprfunc tp_repr */ nullptr,
/* PyNumberMethods *tp_as_number */ nullptr,
/* PySequenceMethods *tp_as_sequence */ nullptr,
/* PyMappingMethods *tp_as_mapping */ nullptr,
/* hashfunc tp_hash */ nullptr,
/* ternaryfunc tp_call */ nullptr,
/* reprfunc tp_str */ nullptr,
/* getattrofunc tp_getattro */ nullptr,
/* setattrofunc tp_setattro */ nullptr,
/* PyBufferProcs *tp_as_buffer */ nullptr,
/* unsigned long tp_flags */ Py_TPFLAGS_DEFAULT,
/* const char *tp_doc */ nullptr,
/* traverseproc tp_traverse */ nullptr,
/* inquiry tp_clear */ nullptr,
/* richcmpfunc tp_richcompare */ nullptr,
/* Py_ssize_t tp_weaklistoffset */ 0,
/* getiterfunc tp_iter */ nullptr,
/* iternextfunc tp_iternext */ nullptr,
/* struct PyMethodDef *tp_methods */ function_record_PyTypeObject_methods::tp_methods_impl,
/* struct PyMemberDef *tp_members */ nullptr,
/* struct PyGetSetDef *tp_getset */ nullptr,
/* struct _typeobject *tp_base */ nullptr,
/* PyObject *tp_dict */ nullptr,
/* descrgetfunc tp_descr_get */ nullptr,
/* descrsetfunc tp_descr_set */ nullptr,
/* Py_ssize_t tp_dictoffset */ 0,
/* initproc tp_init */ function_record_PyTypeObject_methods::tp_init_impl,
/* allocfunc tp_alloc */ function_record_PyTypeObject_methods::tp_alloc_impl,
/* newfunc tp_new */ function_record_PyTypeObject_methods::tp_new_impl,
/* freefunc tp_free */ function_record_PyTypeObject_methods::tp_free_impl,
/* inquiry tp_is_gc */ nullptr,
/* PyObject *tp_bases */ nullptr,
/* PyObject *tp_mro */ nullptr,
/* PyObject *tp_cache */ nullptr,
/* PyObject *tp_subclasses */ nullptr,
/* PyObject *tp_weaklist */ nullptr,
/* destructor tp_del */ nullptr,
/* unsigned int tp_version_tag */ 0,
/* destructor tp_finalize */ nullptr,
#if PY_VERSION_HEX >= 0x03080000
/* vectorcallfunc tp_vectorcall */ nullptr,
#endif
};
PYBIND11_WARNING_POP

static bool function_record_PyTypeObject_PyType_Ready_first_call = true;

inline void function_record_PyTypeObject_PyType_Ready() {
if (function_record_PyTypeObject_PyType_Ready_first_call) {
if (PyType_Ready(&function_record_PyTypeObject) < 0) {
throw error_already_set();
}
function_record_PyTypeObject_PyType_Ready_first_call = false;
}
}

inline bool is_function_record_PyObject(PyObject *obj) {
if (PyType_Check(obj) != 0) {
return false;
}
PyTypeObject *obj_type = Py_TYPE(obj);
// Fast path (pointer comparison).
if (obj_type == &function_record_PyTypeObject) {
return true;
}
// This works across extension modules. Note that tp_name is versioned.
if (strcmp(obj_type->tp_name, function_record_PyTypeObject.tp_name) == 0) {
return true;
}
return false;
}

inline function_record *function_record_ptr_from_PyObject(PyObject *obj) {
if (is_function_record_PyObject(obj)) {
return ((detail::function_record_PyObject *) obj)->cpp_func_rec;
}
return nullptr;
}

inline object function_record_PyObject_New() {
auto *py_func_rec = PyObject_New(function_record_PyObject, &function_record_PyTypeObject);
if (py_func_rec == nullptr) {
throw error_already_set();
}
py_func_rec->cpp_func_rec = nullptr; // For clarity/purity. Redundant in practice.
return reinterpret_steal<object>((PyObject *) py_func_rec);
}

PYBIND11_NAMESPACE_BEGIN(function_record_PyTypeObject_methods)

// Guard against accidents & oversights, in particular when porting to future Python versions.
inline PyObject *tp_new_impl(PyTypeObject *, PyObject *, PyObject *) {
pybind11_fail("UNEXPECTED CALL OF function_record_PyTypeObject_methods::tp_new_impl");
// return nullptr; // Unreachable.
}

inline PyObject *tp_alloc_impl(PyTypeObject *, Py_ssize_t) {
pybind11_fail("UNEXPECTED CALL OF function_record_PyTypeObject_methods::tp_alloc_impl");
// return nullptr; // Unreachable.
}

inline int tp_init_impl(PyObject *, PyObject *, PyObject *) {
pybind11_fail("UNEXPECTED CALL OF function_record_PyTypeObject_methods::tp_init_impl");
// return -1; // Unreachable.
}

// The implementation needs the definition of `class cpp_function`.
void tp_dealloc_impl(PyObject *self);

inline void tp_free_impl(void *) {
pybind11_fail("UNEXPECTED CALL OF function_record_PyTypeObject_methods::tp_free_impl");
}

inline PyObject *reduce_ex_impl(PyObject *self, PyObject *, PyObject *) {
// Deliberately ignoring the arguments for simplicity (expected is `protocol: int`).
const function_record *rec = function_record_ptr_from_PyObject(self);
if (rec == nullptr) {
pybind11_fail(
"FATAL: function_record_PyTypeObject reduce_ex_impl(): cannot obtain cpp_func_rec.");
}
if (rec->name != nullptr && rec->name[0] != '\0' && rec->scope
&& PyModule_Check(rec->scope.ptr()) != 0) {
object scope_module = get_scope_module(rec->scope);
if (scope_module) {
return make_tuple(reinterpret_borrow<object>(PyEval_GetBuiltins())["eval"],
make_tuple(str("__import__('importlib').import_module('")
+ scope_module + str("')")))
.release()
.ptr();
}
}
set_error(PyExc_RuntimeError, repr(self) + str(" is not pickleable."));
return nullptr;
}

PYBIND11_NAMESPACE_END(function_record_PyTypeObject_methods)

PYBIND11_NAMESPACE_END(detail)
PYBIND11_NAMESPACE_END(PYBIND11_NAMESPACE)
1 change: 1 addition & 0 deletions include/pybind11/detail/internals.h
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ struct internals {
# if PYBIND11_INTERNALS_VERSION > 4
// Note that we have to use a std::string to allocate memory to ensure a unique address
// We want unique addresses since we use pointer equality to compare function records
// OBSOLETE: google/pywrapcc#30099
std::string function_record_capsule_name = internals_function_record_capsule_name;
# endif

Expand Down
11 changes: 2 additions & 9 deletions include/pybind11/functional.h
Original file line number Diff line number Diff line change
Expand Up @@ -96,15 +96,8 @@ struct type_caster<std::function<Return(Args...)>> {
auto *cfunc_self = PyCFunction_GET_SELF(cfunc.ptr());
if (cfunc_self == nullptr) {
PyErr_Clear();
} else if (isinstance<capsule>(cfunc_self)) {
auto c = reinterpret_borrow<capsule>(cfunc_self);

function_record *rec = nullptr;
// Check that we can safely reinterpret the capsule into a function_record
if (detail::is_function_record_capsule(c)) {
rec = c.get_pointer<function_record>();
}

} else {
function_record *rec = function_record_ptr_from_PyObject(cfunc_self);
while (rec != nullptr) {
if (rec->is_stateless
&& same_type(typeid(function_type),
Expand Down
74 changes: 32 additions & 42 deletions include/pybind11/pybind11.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#pragma once

#include "detail/class.h"
#include "detail/function_record_pyobject.h"
#include "detail/init.h"
#include "detail/native_enum_data.h"
#include "detail/smart_holder_sfinae_hooks_only.h"
Expand All @@ -20,6 +21,7 @@
#include "options.h"
#include "typing.h"

#include <cassert>
#include <cstdlib>
#include <cstring>
#include <memory>
Expand Down Expand Up @@ -528,20 +530,11 @@ class cpp_function : public function {
if (rec->sibling) {
if (PyCFunction_Check(rec->sibling.ptr())) {
auto *self = PyCFunction_GET_SELF(rec->sibling.ptr());
if (!isinstance<capsule>(self)) {
chain = detail::function_record_ptr_from_PyObject(self);
if (chain && !chain->scope.is(rec->scope)) {
/* Never append a method to an overload chain of a parent class;
instead, hide the parent's overloads in this case */
chain = nullptr;
} else {
auto rec_capsule = reinterpret_borrow<capsule>(self);
if (detail::is_function_record_capsule(rec_capsule)) {
chain = rec_capsule.get_pointer<detail::function_record>();
/* Never append a method to an overload chain of a parent class;
instead, hide the parent's overloads in this case */
if (!chain->scope.is(rec->scope)) {
chain = nullptr;
}
} else {
chain = nullptr;
}
}
}
// Don't trigger for things like the default __init__, which are wrapper_descriptors
Expand All @@ -561,21 +554,14 @@ class cpp_function : public function {
= reinterpret_cast<PyCFunction>(reinterpret_cast<void (*)()>(dispatcher));
rec->def->ml_flags = METH_VARARGS | METH_KEYWORDS;

capsule rec_capsule(unique_rec.release(),
detail::get_function_record_capsule_name(),
[](void *ptr) { destruct((detail::function_record *) ptr); });
detail::function_record_PyTypeObject_PyType_Ready(); // Call-once initialization.
object py_func_rec = detail::function_record_PyObject_New();
((detail::function_record_PyObject *) py_func_rec.ptr())->cpp_func_rec
= unique_rec.release();
guarded_strdup.release();

object scope_module;
if (rec->scope) {
if (hasattr(rec->scope, "__module__")) {
scope_module = rec->scope.attr("__module__");
} else if (hasattr(rec->scope, "__name__")) {
scope_module = rec->scope.attr("__name__");
}
}

m_ptr = PyCFunction_NewEx(rec->def, rec_capsule.ptr(), scope_module.ptr());
object scope_module = detail::get_scope_module(rec->scope);
m_ptr = PyCFunction_NewEx(rec->def, py_func_rec.ptr(), scope_module.ptr());
if (!m_ptr) {
pybind11_fail("cpp_function::cpp_function(): Could not allocate function object");
}
Expand Down Expand Up @@ -604,9 +590,9 @@ class cpp_function : public function {
// chain.
chain_start = rec;
rec->next = chain;
auto rec_capsule
= reinterpret_borrow<capsule>(((PyCFunctionObject *) m_ptr)->m_self);
rec_capsule.set_pointer(unique_rec.release());
auto *py_func_rec
= (detail::function_record_PyObject *) PyCFunction_GET_SELF(m_ptr);
py_func_rec->cpp_func_rec = unique_rec.release();
guarded_strdup.release();
} else {
// Or end of chain (normal behavior)
Expand Down Expand Up @@ -680,6 +666,8 @@ class cpp_function : public function {
}
}

friend void detail::function_record_PyTypeObject_methods::tp_dealloc_impl(PyObject *);

/// When a cpp_function is GCed, release any memory allocated by pybind11
static void destruct(detail::function_record *rec, bool free_strings = true) {
// If on Python 3.9, check the interpreter "MICRO" (patch) version.
Expand Down Expand Up @@ -729,13 +717,11 @@ class cpp_function : public function {
/// Main dispatch logic for calls to functions bound using pybind11
static PyObject *dispatcher(PyObject *self, PyObject *args_in, PyObject *kwargs_in) {
using namespace detail;
assert(isinstance<capsule>(self));
const function_record *overloads = function_record_ptr_from_PyObject(self);
assert(overloads != nullptr);

/* Iterator over the list of potentially admissible overloads */
const function_record *overloads = reinterpret_cast<function_record *>(
PyCapsule_GetPointer(self, get_function_record_capsule_name())),
*current_overload = overloads;
assert(overloads != nullptr);
const function_record *current_overload = overloads;

/* Need to know how many arguments + keyword arguments there are to pick the right
overload */
Expand Down Expand Up @@ -1209,6 +1195,17 @@ class cpp_function : public function {

PYBIND11_NAMESPACE_BEGIN(detail)

PYBIND11_NAMESPACE_BEGIN(function_record_PyTypeObject_methods)

// This implementation needs the definition of `class cpp_function`.
inline void tp_dealloc_impl(PyObject *self) {
auto *py_func_rec = (function_record_PyObject *) self;
cpp_function::destruct(py_func_rec->cpp_func_rec);
py_func_rec->cpp_func_rec = nullptr;
}

PYBIND11_NAMESPACE_END(function_record_PyTypeObject_methods)

/// Instance creation function for all pybind11 types. It only allocates space for the
/// C++ object, but doesn't call the constructor -- an `__init__` function must do that.
extern "C" inline PyObject *pybind11_object_new(PyTypeObject *type, PyObject *, PyObject *) {
Expand Down Expand Up @@ -2288,14 +2285,7 @@ class class_ : public detail::generic_type {
if (!func_self) {
throw error_already_set();
}
if (!isinstance<capsule>(func_self)) {
return nullptr;
}
auto cap = reinterpret_borrow<capsule>(func_self);
if (!detail::is_function_record_capsule(cap)) {
return nullptr;
}
return cap.get_pointer<detail::function_record>();
return detail::function_record_ptr_from_PyObject(func_self.ptr());
}
};

Expand Down
Loading

0 comments on commit de89591

Please sign in to comment.