Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consumer NACK with the specified sequence number packet but the retransmission fails #906

Open
mengbieting opened this issue Sep 9, 2022 · 14 comments
Milestone

Comments

@mengbieting
Copy link

Hello, I want to inquire about a problem that NACK is sent but no retransmission succeed.

The scenario is as follows: SDK pushes the stream to mediasoup server1, and mediasoup server2 receives this stream from mediasoup server1 and then restores the packet to a frame (mediasoup server2 as consumer and receive data through pipetransport).

When I set the packet loss rate to 10% (the rtt is small) in test enviroment, I found that mediasoup server2 would sometimes request a packet with a certain sequence number until it finally reached the maximum number of times, and mediasoup server1 received the NACK of this sequence number packet message, but the packet buffer of this sequence number was not found in the send buffer of the consumer.

I continued to debug it and found that the problem is:

Suppose mediasoup server1 received RTP packets with sequence numbers 1, 2, 3, 4, and then received RTX packets with original sequence numbers 5 (the RTP packets with sequence numbers greater than 5 have not been received now), and later received RTP packet with sequence number 7. Because the sequence is out of order, nackGenerator will generate the NACK message of the seq=6 packet (the packet with the sequence number 5 will not be required to retransmit because it has been marked in the recoverList), but when the RTX packet with the original sequence number 5 is received, this sequence number is not in the In the nackList, so the packet with sequence number 5 was discarded and was not forwarded to the consumer (mediasoup server2). The seq=6 packet was finally retransmitted to the mediasoup server2 successfully, but the seq=5 packet was discarded, even if the mediasoup server2 kept requesting retransmission, it could not be retransmitted.

image

I compared the processing logic of NackModule::OnReceivedPacket in nack_module.cc in webrtc, and found that in this case, webrtc just inserts the packet into the recoverList, does not require retransmission, but it does not discard the packet, and It is to continue to stuff data into the packetBuffer in RtpVideoStreamReceiver.

image

I would like to ask, what is the purpose of not forwarding this packet to consumers here? Is it better to return true here, thanks

@jmillan
Copy link
Member

jmillan commented Sep 9, 2022

Thanks for the detailed observations.

Next week some of us won't be fully online, but we'll check at this as soon as possible.

@mengbieting
Copy link
Author

thank you~

@penguinol
Copy link
Contributor

@jmillan @mengbieting
We also noticed this issue. libwebrtc uses rtx for bandwidth estimate, so rtx packets may arrived earlier than rtp
packets.
Espically when there is packets loss, we will insert the rtx packet into recovered list and drop it, and will never sending for this packet.

And there is another situaltion:
Currently we will not send nack immediatley, but wait a while, we may receive rtx packet before sending nack.

if (SeqManager<uint16_t>::IsSeqLowerThan(seq, this->lastSeq))
{
auto it = this->nackList.find(seq);
// It was a nacked packet.
if (it != this->nackList.end())
{
MS_DEBUG_DEV(
"NACKed packet received [ssrc:%" PRIu32 ", seq:%" PRIu16 ", recovered:%s]",
packet->GetSsrc(),
packet->GetSequenceNumber(),
isRecovered ? "true" : "false");
auto retries = it->second.retries;
this->nackList.erase(it);
if (retries != 0)
return true;
else
return false;
}

So maybe we should always return true here.

@mengbieting
Copy link
Author

thanks for reply~
Do you mean when mediasoup receive rtx packet before sending nack, maybe we should return true in NackGenerator::ReceivePacket if isRecovered deal logic? because if the rtx packet is received before the nack is sent, the code will enter the processing logic of NackGenerator::ReceivePacket if isRecovered

image

@penguinol
Copy link
Contributor

penguinol commented Sep 13, 2022

There are two situaltions:

  1. RTX seq num >= max RTP seq num. Seq num is not in nack list. will return false here:

    // Do not let a packet pass if it's newer than last seen seq and came via
    // RTX.
    return false;

  2. RTX seq num < max RTP seq num. Seq num is in nack list, but has not been sent yet, will return false here

    if (retries != 0)
    return true;
    else
    return false;

Return true is a temporary way to fix the problem, but i'm not sure whether there is any side-effect.
In generally, we don't want to send duplicate packets to receivers.
Need more time to think about it.

@penguinol
Copy link
Contributor

The first situaltion is not recoverable, because we have inserted the seq num to recoveredList, and will never send a nack for this packet again.
The second situaltion is recoverable, because we will send nack later.

@mengbieting
Copy link
Author

thanks, yes, the second situaltion is recoverable, even if the nack has not been sent yet, we will send the nack later. After receiving the rtx packet again, it will be recovered and forwarded to the consumer, consumer will not need to nack this packet all the time.
But the first case is unrecoverable. In this case, if we do not recovered and forwarding the packet to the consumer, the consumer will keep nacking the packet until the maximum number of times, so it may be a better decision to restore the packet to the consumer in this case

@HuHeng
Copy link

HuHeng commented Sep 14, 2022

why mediasoup received rtx who's origin seq number greater than the max RTP sequence number?

@penguinol
Copy link
Contributor

Because libwebrtc use rtx for bandwidth estimate, it may send rtx packet without receiving NACK.
Such as:
Libwebrtc sent rtp packets with seq num 1, 2, 3 and then sent rtx packet with rtp seq num 3 for bandwidth estimate.
Rtp packet with seq num 3 may lost beacuse of networker problem, but rtx packet with rtp seq num 3 arrived.
So finally we got rtp packets with seq num 1, 2 and rtx packet with seq num 3.

@penguinol
Copy link
Contributor

penguinol commented Sep 28, 2022

thanks for reply~ Do you mean when mediasoup receive rtx packet before sending nack, maybe we should return true in NackGenerator::ReceivePacket if isRecovered deal logic? because if the rtx packet is received before the nack is sent, the code will enter the processing logic of NackGenerator::ReceivePacket if isRecovered

image

I did some tests, if return True here, we may send duplicate rtp packets to the consumer.

penguinol added a commit to penguinol/mediasoup that referenced this issue Sep 28, 2022
libwebrtc may send duplicate rtx packets to estimate bandwidth. When there is packet loss, we may receive rtx packet with seq higher than rtp packet.
@mengbieting
Copy link
Author

Hi, the scenario you describe is when the rtx packet arrives before the rtp packet with the corresponding seq number, and the rtp packet is not really lost, in this case, the modification of return true will cause mediasoup to send duplicate rtp packet the consumer, right? I applied this modification to my project before, and found that the caton rate on the consumer side has decrease compared with the same conditions, but I didn't notice the problem that duplicate seq number packets may be sent by mediasoup, thank you point out this problem, if I have a better way to modify it, I will submit it to this issue again, thanks~

@penguinol
Copy link
Contributor

There are two situations:

  1. rtp packet arrived after rtx packet because of out of order.
  2. libwebrtc may send serveral rtx packets with same rtp seq num to fill the bandwidth.

#912 considered the second situation.

@ibc
Copy link
Member

ibc commented Oct 17, 2022

We don't forget about this, it's just that we are focused on some incoming huge changes and prefer to handle this PR after them.

@Romantic-LiXuefeng
Copy link

Because libwebrtc use rtx for bandwidth estimate, it may send rtx packet without receiving NACK. Such as: Libwebrtc sent rtp packets with seq num 1, 2, 3 and then sent rtx packet with rtp seq num 3 for bandwidth estimate. Rtp packet with seq num 3 may lost beacuse of networker problem, but rtx packet with rtp seq num 3 arrived. So finally we got rtp packets with seq num 1, 2 and rtx packet with seq num 3.

Such as: Libwebrtc sent rtp packets with seq num 1, 2, 3 and then sent rtx packet with rtp seq num 3 for bandwidth estimate.

Hello @penguinol , Does I miss something? In probe state, Pacer deliver the the media rtp packets as probe packet directly. only padding packets in rtx. What's the situaltion sent the rtx packet with seq_num 3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
6 participants