Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in FlowControl::send_data (self.window_size >= sz as usize) #607

Open
nox opened this issue Mar 3, 2022 · 9 comments
Open

Panic in FlowControl::send_data (self.window_size >= sz as usize) #607

nox opened this issue Mar 3, 2022 · 9 comments

Comments

@nox
Copy link
Contributor

nox commented Mar 3, 2022

Application Specific Information:
assertion failed: self.window_size >= sz as usize

Thread 0 Crashed:
0   std                             0x5616f493ed64      std::sys_common::backtrace::__rust_end_short_backtrace (backtrace.rs:139)
1   <unknown>                       0x5616f49407e9      rust_begin_unwind (panicking.rs:498)
2   core                            0x5616f3628131      core::panicking::panic_fmt (panicking.rs:107)
3   core                            0x5616f362807d      core::panicking::panic (panicking.rs:48)
4   h2                              0x5616f477fa60      h2::proto::streams::flow_control::FlowControl::send_data (flow_control.rs:176)
5   h2                              0x5616f39efd4f      [inlined] h2::proto::streams::prioritize::Prioritize::pop_frame::{{closure}} (prioritize.rs:765)
6   tracing                         0x5616f39efd4f      tracing::span::Span::in_scope (span.rs:1100)
7   h2                              0x5616f36454e6      h2::proto::streams::prioritize::Prioritize::pop_frame (prioritize.rs:763)
8   h2                              0x5616f363a563      h2::proto::streams::prioritize::Prioritize::poll_complete (prioritize.rs:505)
9   h2                              0x5616f39dca72      [inlined] h2::proto::streams::send::Send::poll_complete (send.rs:301)
10  h2                              0x5616f39dca72      [inlined] h2::proto::streams::streams::Inner::poll_complete (streams.rs:852)
11  h2                              0x5616f39dca72      h2::proto::streams::streams::Streams<T>::poll_complete (streams.rs:181)
12  h2                              0x5616f39264e1      h2::proto::connection::Connection<T>::poll (connection.rs:266)
13  h2                              0x5616f383e761      [inlined] h2::server::Connection<T>::poll_closed (server.rs:504)
14  h2                              0x5616f383e761      h2::server::Connection<T>::poll_accept (server.rs:416)
15  hyper                           0x5616f3633b6d      [inlined] hyper::proto::h2::server::Serving<T>::poll_server (server.rs:268)
16  hyper                           0x5616f3633b6d      hyper::proto::h2::server::Server<T>::poll (server.rs:202)
17  hyper                           0x5616f382c015      [inlined] hyper::server::conn::ProtoServer<T>::poll (conn.rs:1135)
18  hyper                           0x5616f382c015      hyper::server::conn::upgrades::UpgradeableConnection<T>::poll (conn.rs:1337)
19  core                            0x5616f3b2ba9b      [inlined] core::pin::Pin<T>::poll (future.rs:119)
20  oxy_http_server                 0x5616f3b2ba9b      [inlined] oxy_http_server::http_impl::Http1And2::serve_requests_from_connection::{{closure}}::{{closure}} (select.rs:505)
21  tokio                           0x5616f3b2ba9b      tokio::future::poll_fn::PollFn<T>::poll (poll_fn.rs:38)
22  oxy_http_server                 0x5616f37b7eae      [inlined] oxy_http_server::http_impl::Http1And2::serve_requests_from_connection::{{closure}} (http_impl.rs:133)
23  core                            0x5616f37b7eae      core::future::from_generator::GenFuture<T>::poll (mod.rs:80)
24  core                            0x5616f37a85c0      [inlined] core::pin::Pin<T>::poll (future.rs:119)
25  oxy_http_server                 0x5616f37a85c0      [inlined] oxy_http_server::HttpServer<T>::serve::{{closure}} (lib.rs:99)
26  core                            0x5616f37a85c0      core::future::from_generator::GenFuture<T>::poll (mod.rs:80)
27  tokio                           0x5616f3b76b29      [inlined] tokio::runtime::task::core::CoreStage<T>::poll (core.rs:151)
28  tokio                           0x5616f3b76b29      [inlined] tokio::runtime::task::harness::poll_future::{{closure}} (harness.rs:461)
29  core                            0x5616f3b76b29      [inlined] core::panic::unwind_safe::AssertUnwindSafe<T>::call_once (unwind_safe.rs:271)
30  std                             0x5616f3b76b29      [inlined] std::panicking::try::do_call (panicking.rs:406)
31  std                             0x5616f3b76b29      [inlined] std::panicking::try (panicking.rs:370)
32  std                             0x5616f3b76b29      [inlined] std::panic::catch_unwind (panic.rs:133)
33  tokio                           0x5616f3b76b29      [inlined] tokio::runtime::task::harness::poll_future (harness.rs:449)
34  tokio                           0x5616f3b76b29      [inlined] tokio::runtime::task::harness::Harness<T>::poll_inner (harness.rs:98)
35  tokio                           0x5616f3b76b29      tokio::runtime::task::harness::Harness<T>::poll (harness.rs:53)
36  tokio                           0x5616f47e2f3f      [inlined] tokio::runtime::task::raw::RawTask::poll (raw.rs:70)
37  tokio                           0x5616f47e2f3f      [inlined] tokio::runtime::task::LocalNotified<T>::run (mod.rs:343)
38  tokio                           0x5616f47e2f3f      [inlined] tokio::runtime::thread_pool::worker::Context::run_task::{{closure}} (worker.rs:443)
39  tokio                           0x5616f47e2f3f      [inlined] tokio::coop::with_budget::{{closure}} (coop.rs:102)
40  std                             0x5616f47e2f3f      [inlined] std::thread::local::LocalKey<T>::try_with (local.rs:399)
41  std                             0x5616f47e2f3f      std::thread::local::LocalKey<T>::with (local.rs:375)
42  tokio                           0x5616f47dce7c      [inlined] tokio::coop::with_budget (coop.rs:95)
43  tokio                           0x5616f47dce7c      [inlined] tokio::coop::budget (coop.rs:72)
44  tokio                           0x5616f47dce7c      tokio::runtime::thread_pool::worker::Context::run_task (worker.rs:419)
45  tokio                           0x5616f47dbeb7      tokio::runtime::thread_pool::worker::Context::run
46  tokio                           0x5616f47c7745      [inlined] tokio::runtime::thread_pool::worker::run::{{closure}} (worker.rs:371)
47  tokio                           0x5616f47c7745      tokio::macros::scoped_tls::ScopedKey<T>::set (scoped_tls.rs:61)
48  tokio                           0x5616f47db88c      tokio::runtime::thread_pool::worker::run (worker.rs:368)
49  tokio                           0x5616f47ddf6f      [inlined] tokio::runtime::thread_pool::worker::Launch::launch::{{closure}} (worker.rs:347)
50  tokio                           0x5616f47ddf6f      [inlined] tokio::runtime::blocking::task::BlockingTask<T>::poll (task.rs:42)
51  tokio                           0x5616f47ddf6f      [inlined] tokio::runtime::task::core::CoreStage<T>::poll::{{closure}} (core.rs:161)
52  tokio                           0x5616f47ddf6f      tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut (unsafe_cell.rs:14)
53  tokio                           0x5616f47d2308      [inlined] tokio::runtime::task::core::CoreStage<T>::poll (core.rs:151)
54  tokio                           0x5616f47d2308      [inlined] tokio::runtime::task::harness::poll_future::{{closure}} (harness.rs:461)
55  core                            0x5616f47d2308      [inlined] core::panic::unwind_safe::AssertUnwindSafe<T>::call_once (unwind_safe.rs:271)
56  std                             0x5616f47d2308      [inlined] std::panicking::try::do_call (panicking.rs:406)
57  std                             0x5616f47d2308      [inlined] std::panicking::try (panicking.rs:370)
58  std                             0x5616f47d2308      [inlined] std::panic::catch_unwind (panic.rs:133)
59  tokio                           0x5616f47d2308      [inlined] tokio::runtime::task::harness::poll_future (harness.rs:449)
60  tokio                           0x5616f47d2308      [inlined] tokio::runtime::task::harness::Harness<T>::poll_inner (harness.rs:98)
61  tokio                           0x5616f47d2308      tokio::runtime::task::harness::Harness<T>::poll (harness.rs:53)
62  tokio                           0x5616f47bd126      [inlined] tokio::runtime::task::raw::RawTask::poll (raw.rs:70)
63  tokio                           0x5616f47bd126      [inlined] tokio::runtime::task::UnownedTask<T>::run (mod.rs:379)
64  tokio                           0x5616f47bd126      tokio::runtime::blocking::pool::Inner::run (pool.rs:264)
65  tokio                           0x5616f47c1f87      [inlined] tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}} (pool.rs:244)
66  std                             0x5616f47c1f87      std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:123)
67  std                             0x5616f47e3552      [inlined] std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} (mod.rs:484)
68  core                            0x5616f47e3552      [inlined] core::panic::unwind_safe::AssertUnwindSafe<T>::call_once (unwind_safe.rs:271)
69  std                             0x5616f47e3552      [inlined] std::panicking::try::do_call (panicking.rs:406)
70  std                             0x5616f47e3552      [inlined] std::panicking::try (panicking.rs:370)
71  std                             0x5616f47e3552      [inlined] std::panic::catch_unwind (panic.rs:133)
72  std                             0x5616f47e3552      [inlined] std::thread::Builder::spawn_unchecked::{{closure}} (mod.rs:483)
73  core                            0x5616f47e3552      core::ops::function::FnOnce::call_once{{vtable.shim}} (function.rs:227)
74  alloc                           0x5616f49464c3      [inlined] alloc::boxed::Box<T>::call_once (boxed.rs:1694)
75  alloc                           0x5616f49464c3      [inlined] alloc::boxed::Box<T>::call_once (boxed.rs:1694)
76  std                             0x5616f49464c3      std::sys::unix::thread::Thread::new::thread_start (thread.rs:106)
77  <unknown>                       0x7faf8b7dbea7      start_thread
78  <unknown>                       0x7faf8b59cdef      clone
79  <unknown>                       0x0                 <unknown>
@xujihui1985
Copy link

@nox Hi nox, do you have any clue on this?

@Gowee
Copy link

Gowee commented Sep 13, 2023

I encounter the same issue occasionally with v0.3.21. To catch more useful info, I manually add more logs inside h2. But it is not easy to reproduce it.

Does the panic indicate some programming errors outside the h2 library or some possible internal bugs?

@mstyura
Copy link
Contributor

mstyura commented Sep 6, 2024

The issue still happen on version 0.4.6:
image
UPD:
also happen in context of client connection:

          Crashed: Thread: SIGABRT  0x0000000000000000
#00 pc 0x8c9b4 libc.so (BuildId: 21cd09af0ad2ccf35d6b8ba2d064b3c4)
#01 pc 0x8c984 libc.so (BuildId: 21cd09af0ad2ccf35d6b8ba2d064b3c4)
#02 pc 0x4c47cc lib.so (panic_abort::__rust_start_panic::abort::h9b26335161c4d44c [lib.rs:49]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#03 pc 0x4c47c0 lib.so (panic_abort.__rust_start_panic [lib.rs:44]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#04 pc 0x4c0b38 lib.so (std.panicking.rust_panic [panicking.rs:847]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#05 pc 0x4c085c lib.so (std::panicking::rust_panic_with_hook::h9a9a95261333cc0c [panicking.rs:817]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#06 pc 0x4c05c0 lib.so (std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h3e3954ede52e3e1a [panicking.rs:656]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#07 pc 0x4bf874 lib.so (std::sys_common::backtrace::__rust_end_short_backtrace::hcdd454926dbe14e7 [backtrace.rs:171]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#08 pc 0x4c0360 lib.so (rust_begin_unwind [panicking.rs:652]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#09 pc 0x4d1350 lib.so (core::panicking::panic_fmt::hd43de60a0f4a7faa [panicking.rs:72]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#10 pc 0x4d13d0 lib.so (core::panicking::panic::hfb07255d1d63f30c [panicking.rs:146]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#11 pc 0x46e634 lib.so (h2::proto::streams::flow_control::FlowControl::send_data::h0bf7134d7505f503 [flow_control.rs:181]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#12 pc 0x2622e8 lib.so (h2::proto::streams::prioritize::Prioritize::pop_frame::h2c183c7b2e675956 [prioritize.rs:797]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#13 pc 0x2291d8 lib.so (_$LT$hyper..proto..h2..client..H2ClientFuture$LT$B$C$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h8f0713c06de2f6ae [prioritize.rs:536]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#14 pc 0x2b48e4 lib.so (tokio::runtime::task::raw::poll::hb485f18e92e021d4 [core.rs:328]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#15 pc 0x4a7d08 lib.so (tokio::task::local::LocalSet::tick::h5a0aab602b50c9c1 [raw.rs:201]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#16 pc 0x4a7f00 lib.so (_$LT$tokio..task..local..LocalSet$u20$as$u20$core..future..future..Future$GT$::poll::he9c83869361e6673 [local.rs:876]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#17 pc 0x2b9c70 lib.so (tokio::runtime::runtime::Runtime::block_on_inner::h1543425d0e361b5f [future.rs:123]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#18 pc 0x26c744 lib.so (std::sys_common::backtrace::__rust_begin_short_backtrace::h26fe551535da8cb4 [runtime.rs:335]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#19 pc 0x273534 lib.so (core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h2f8529abcb194436 [mod.rs:542]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#20 pc 0x4c1eb4 lib.so (std::sys::pal::unix::thread::Thread::new::thread_start::h772a29ac11c0cc47 [boxed.rs:2022]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#21 pc 0xfaecc libc.so (BuildId: 21cd09af0ad2ccf35d6b8ba2d064b3c4)
#22 pc 0x8e1f0 libc.so (BuildId: 21cd09af0ad2ccf35d6b8ba2d064b3c4)

mstyura added a commit to mstyura/h2 that referenced this issue Oct 24, 2024
mstyura added a commit to mstyura/h2 that referenced this issue Oct 24, 2024
@mstyura
Copy link
Contributor

mstyura commented Nov 7, 2024

I've changed basically the same assert in code which executed earlier and added more context to assert message to see the actual values which triggers asserts:

// There *must* be be enough connection level
// capacity at this point.
assert!(
    len <= self.flow.window_size(),
    "data len out of connection flow capacity: len={len}, csz={csz}, sz={sz}, eos={eos}, window={window}, available={available}, requested={requested}, buffered={buffered}`",
    csz = self.flow.window_size(),
    sz = sz,
    eos = frame.is_end_stream(),
    window = stream_capacity,
    available = stream.send_flow.available(),
    requested = stream.requested_send_capacity,
    buffered = stream.buffered_send_data
);

Assert hit several time a day, so far I've got those:

data len out of connection flow capacity: len=6720, csz=3528, sz=6720, eos=false, window=6720, available=6720, requested=8192, buffered=6720
data len out of connection flow capacity: len=8192, csz=1809, sz=8192, eos=false, window=10001, available=10001, requested=16384, buffered=10001
data len out of connection flow capacity: len=8192, csz=3392, sz=8192, eos=false, window=13872, available=13872, requested=16384, buffered=13872
data len out of connection flow capacity: len=6880, csz=5017, sz=6880, eos=false, window=6880, available=6880, requested=6880, buffered=6880
data len out of connection flow capacity: len=8192, csz=5456, sz=8192, eos=false, window=13648, available=13648, requested=16384, buffered=13648
data len out of connection flow capacity: len=8192, csz=7121, sz=8192, eos=false, window=15313, available=15313, requested=16384, buffered=15313
data len out of connection flow capacity: len=8192, csz=0, sz=8192, eos=false, window=8192, available=8192, requested=16384, buffered=8192
data len out of connection flow capacity: len=8192, csz=2638, sz=8192, eos=false, window=10830, available=10830, requested=16384, buffered=10830
data len out of connection flow capacity: len=8192, csz=6756, sz=8192, eos=false, window=14948, available=14948, requested=16384, buffered=14948
data len out of connection flow capacity: len=8192, csz=6756, sz=8192, eos=false, window=14948, available=14948, requested=16384, buffered=14948
data len out of connection flow capacity: len=3008, csz=1252, sz=3008, eos=false, window=3008, available=3008, requested=3008, buffered=3008

There are some additional asserts I've observed also on modified version of h2 with assert in flow control:

assert!(
    self.window_size.0 >= sz as i32,
    "size out of window capacity sz={}; window={}; available={}",
    sz,
    self.window_size,
    self.available
);
size out of window capacity sz=8192; window=3899; available=9083
size out of window capacity sz=8192; window=1879; available=8192
size out of window capacity sz=8192; window=6078; available=8192
size out of window capacity sz=8192; window=3657; available=8192
size out of window capacity sz=8192; window=1232; available=8192
size out of window capacity sz=4914; window=0; available=4914
size out of window capacity sz=8192; window=5544; available=8192

mstyura added a commit to mstyura/h2 that referenced this issue Nov 12, 2024
@mstyura
Copy link
Contributor

mstyura commented Nov 18, 2024

Enriched assert even more, here is some data from crashes:

data len out of connection flow capacity: len=8192, conn_flow=FlowControl { window_size: Window(5997), available: Window(0), stream_id: StreamId(0), direction: Send }, stream_send_flow=FlowControl { window_size: Window(310458), available: Window(14189), stream_id: StreamId(73), direction: Send }, sz=8192, eos=false, window=14189, requested=16384, buffered=13801

data len out of connection flow capacity: len=8192, conn_flow=FlowControl { window_size: Window(0), available: Window(0), stream_id: StreamId(0), direction: Send }, stream_send_flow=FlowControl { window_size: Window(8192), available: Window(8192), stream_id: StreamId(137), direction: Send }, sz=8192, eos=false, window=8192, requested=16384, buffered=8192

data len out of connection flow capacity: len=8192, conn_flow=FlowControl { window_size: Window(3781), available: Window(0), stream_id: StreamId(0), direction: Send }, stream_send_flow=FlowControl { window_size: Window(180224), available: Window(11973), stream_id: StreamId(123), direction: Send }, sz=8192, eos=false, window=11973, requested=16384, buffered=11973

data len out of connection flow capacity: len=8192, conn_flow=FlowControl { window_size: Window(4662), available: Window(0), stream_id: StreamId(0), direction: Send }, stream_send_flow=FlowControl { window_size: Window(450206), available: Window(12854), stream_id: StreamId(297), direction: Send }, sz=8192, eos=false, window=12854, requested=16384, buffered=12854

data len out of connection flow capacity: len=8192, conn_flow=FlowControl { window_size: Window(7156), available: Window(0), stream_id: StreamId(0), direction: Send }, stream_send_flow=FlowControl { window_size: Window(196608), available: Window(13880), stream_id: StreamId(63), direction: Send }, sz=8192, eos=false, window=13880, requested=16384, buffered=13880

data len out of connection flow capacity: len=8192, conn_flow=FlowControl { window_size: Window(279), available: Window(0), stream_id: StreamId(0), direction: Send }, stream_send_flow=FlowControl { window_size: Window(647168), available: Window(8471), stream_id: StreamId(727), direction: Send }, sz=8192, eos=false, window=8471, requested=16384, buffered=8471

@mstyura
Copy link
Contributor

mstyura commented Nov 19, 2024

Hello @Noah-Kennedy,

Apologies for reaching out here as well, but I came across your message in Discord regarding this issue a few days ago. I’m not sure if you saw my question there, so I hope you don’t mind if I repeat it here: did you uncover any insights into the issue after your investigation?

From your message, it seems that you identified the problem as occurring during graceful shutdown. Could you please share any additional details? Perhaps with more context, it would be possible to create a stress test to replicate the issue.

So far, I’ve gathered enriched assertion messages from crashes, but they haven’t provided much useful information beyond indicating that the assertion fails on connections with moderately long lifespans (based on the stream ID value). I’m also working on collecting core dumps, though I’m unsure if they’ll yield more actionable insights.

I’ve tried several artificial stress tests, similar to hammer.rs, and attempted to multiplex 1,000 concurrent requests via a single H2 connection backed by tokio::io::DuplexStream, but haven’t managed to reproduce the issue. I also experimented with sending RST_STREAM responses to potentially disrupt window size accounting, but this didn’t lead to any breakthroughs either.

Any pointers or insights you could share would be greatly appreciated. Thank you in advance!

@seanmonstar
Copy link
Member

I notice there's a couple TODOs in there around "proper error handling" of self.flow.assign_capacity() (in a couple spots, actually). Could that be related?

@mstyura
Copy link
Contributor

mstyura commented Nov 19, 2024

@seanmonstar, I’ve also noticed that there are some TODOs related to flow control window handling. However, since I don’t have a complete picture of how this was originally intended to work, I’m unsure if these TODOs could trigger as issues in real-world scenarios.

On my end, I’m now considering trying a custom version of h2 where I add expect calls in the TODO areas and convert some debug_assert! into assert! around flow control. This might help identify if the issue is actually triggered earlier in lifetime of the connection.

@seanmonstar
Copy link
Member

Looking a bit closer, could it be related to a connection adjusting the SETTINGS_INITIAL_WINDOW_SIZE? It's possible the increase or decrease might not be fully accounted for with h2's "assigned capacity"...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants