Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special handling of __setstate__ to support alternative pickle mechanisms. #30094

Merged
merged 7 commits into from
Jan 29, 2024

Conversation

rwgk
Copy link
Contributor

@rwgk rwgk commented Jan 26, 2024

Description

This PR works around a subtle problem introduced with pybind/pybind11@a380ed9 in May 2016. That change hard-wired that __setstate__ is treated as a constructor. Unfortunately that is incompatible with pickle mechanisms similar to what was established with Boost.Python back in 2002:

A very similar mechanism was adopted for PyCLIF, e.g.:

To support alternative pickle mechanisms with full backward compatibility, this PR introduces special handling of __setstate__[non-constructor] to preempt the condition added with pybind/pybind11@a380ed9. The logic change in pybind11.h is very straightforward, please see there for details. (The change might look bigger than it is because the code was slightly refactored for readability.)

Note for completeness: The Boost.Python pickle mechanism works safely with any pickle protocol version (see added test), while the py::pickle mechanism requires pickle protocol >= 2 to avoid undefined behavior (as documented here).

See also: Simplification in PyCLIF-pybind11 code generator enabled by this PR.

Piggy-backed: The added tests in tests/test_methods_and_attributes.py are strictly speaking unrelated (it could be moved to a separate PR). However, it was work related to the metaclass conflict error that triggered the work on this PR.

Suggested changelog entry:

@rwgk rwgk marked this pull request as ready for review January 26, 2024 16:05
rec->name = guarded_strdup("");
} else if (std::strcmp(rec->name, "@__setstate__") == 0) {
// See google/pywrapcc#30094 for background.
rec->name = guarded_strdup(rec->name + 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider typing the string __setstate__ explicitly to decouple the user provided descriptive method name "spec", and the python member name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

rec->name = guarded_strdup(rec->name ? rec->name : "");
if (rec->name == nullptr) {
rec->name = guarded_strdup("");
} else if (std::strcmp(rec->name, "@__setstate__") == 0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the @ name following some precedence? If not I would suggest using a more descriptive name, e.g., introducing a namespace and some sort of binding protocol version tags, like, pybind:2:__setstate__?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After turning this over in my mind I settled on __setstate__[non-constructor].

The ideas behind it are:

  • Expressive, as you suggested (I copy-pasted "non-constructor" from one of your comments).
  • It's immediately obvious that this cannot possibly be a method name that appears in the Python API. IOW it's a strong hint that it's special.
  • It looks a bit like a typing annotation (good or bad?).
  • I feel it's better not to include "pybind" (binding systems come and go, in the long run, conventions stays because of client code relying on them).

@@ -60,6 +63,47 @@ void wrap(py::module m) {

} // namespace exercise_trampoline

namespace exercise_getinitargs_getstate_setstate {

// Mechanism similar to what was established with Boost.Python:
Copy link

@rainwoodman rainwoodman Jan 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to actually describe the protocol here rather than just linking to the origin. My stab at this based on your PR description is that:

Test binding __set_state__ to a non-constructor method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take another look. I changed the comment here significantly.

For the description of the protocol I'm pointing to https://www.boost.org/doc/libs/1_31_0/libs/python/doc/v2/pickle.html now (the page is actually from 2002). The main point is that __setstate__[non-constructor] is to support something that is long established.

.def("__getinitargs__",
[](const StoreTwoWithState &self) { return py::tuple(py::cast(self.GetInitArgs())); })
.def("__getstate__", &StoreTwoWithState::GetState)
.def(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused that both reduce_ex and setstate are provided in the same python class. I thought this would completely bypass setstate during unpickling?

Copy link
Contributor Author

@rwgk rwgk Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to the comment further up:

// The code below exercises the pickle protocol intentionally exactly as used
// in PyCLIF, to ensure that pybind11 stays fully compatible with existing
// client code relying on the long-established protocol. (It is impractical to
// change all such client code.)

The __reduce_ex__ does indeed not necessarily have to call __getinitargs__ and __getstate__, but I feel reducing the protocol as used in the wild just for the test would be a loss.

BTW: __reduce__ (Boost.Python) / __reduce_ex__ (PyCLIF) is the trick to make pickling robust for any Python version and pickle protocol. In theory __getinitargs__ and __getstate__ are not needed or could be replaced by another protocol, but in 2002 I decided to emulate what I found at that time in the Python documentation (for native types).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This is to make a class that supports all different versions of the pickle protocol. My Python experience only started around 2010, so I do lack that part of background.

@rwgk
Copy link
Contributor Author

rwgk commented Jan 29, 2024

Thanks @rainwoodman!

@rwgk rwgk merged commit 80c9ee6 into google:main Jan 29, 2024
147 checks passed
@rwgk rwgk deleted the setstate_pywrapcc branch January 29, 2024 22:14
@rwgk rwgk added the could be backported Could be backported to pybind11 master label Jan 29, 2024
@rwgk rwgk changed the title Special handling of @__setstate__ to support alternative pickle mechanisms. Special handling of __setstate__ to support alternative pickle mechanisms. Jan 29, 2024
rwgk pushed a commit to google/clif that referenced this pull request Jan 30, 2024
…eplace ad hoc workaround.

See the description of google/pybind11clif#30094 for background.

Note for completeness: This CL is an improvement in its own right, but the original trigger was to remove a blocker for child cl/600951417 (Use the default Python `type` as `pybind11::metaclass` for PyCLIF-wrapped classes). The ad hoc workaround replaced in this CL does not work with the default Python `type` as metaclass.

PyCLIF-pybind11 TGPs using this change: b/287289622#comment54

TGP testing of this CL in isolation is not required: The only TAP project exercising this change is clif.pybind11.

PiperOrigin-RevId: 602505662
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
could be backported Could be backported to pybind11 master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants