Fix reconnection timeout handler not working in the token provider phase #3513

nuno-vieira · 2024-11-28T10:38:53Z

🔗 Issue Links

Fixes https://linear.app/stream/issue/IOS-78

🎯 Goal

Fixes the reconnection timeout handler not working in the token provider phase.

🛠 Implementation

The main change is that the reconnectionTimeoutHandler was moved from the ConnectionRecoveryHandler to the ChatClient so that it can handle the token provider phase. For this to work, a couple of things required some changes. There are comments in the PR explaining each change.

Flow diagram to explain why we need to set the state to .initialize again whenever we call connect in ChatClient:

Before

stateDiagram-v2
    [*] --> ChatClient.init
    ChatClient.init--> ChatClient.Connect
    ChatClient.Connect --> TokenProvider
    
    TokenProvider --> ReconnectionTimeout: Fails
    TokenProvider --> WebSocket.Connect: Success
    
     ReconnectionTimeout --> disconnected
     disconnected --> ChatClient.Connect
    
    note right of WebSocket.Connect
        State reported: .connecting
    end note
    
    note right of ChatClient.init
        State reported: .initialize
    end note
    
    note right of disconnected
        State reported: .disconnected
    end note
    
    note left of disconnected
        On the second cycle, the state is not reported
        Because it did not change from .disconnected
    end note

After

stateDiagram-v2
    [*] --> ChatClient.init
    ChatClient.init--> ChatClient.Connect
    ChatClient.Connect --> TokenProvider
    
    TokenProvider --> ReconnectionTimeout: Fails
    TokenProvider --> WebSocket.Connect: Success
    
     ReconnectionTimeout --> disconnected
     disconnected --> ChatClient.Connect
    
    note right of WebSocket.Connect
        State reported: .connecting
    end note
    
    note right of ChatClient.Connect
        State reported: .initialize
    end note
    
    note right of disconnected
        State reported: .disconnected
    end note

🧪 Manual Testing Notes

Open the Demo App Configuration
Setup token refresh details and make it fail, for example, 20 times
Set a reconnection timeout of 15 seconds
Connect a user
Wait 15 seconds
Should automatically disconnect
Should not report anymore "Token refresh failed" in the console

☑️ Contributor Checklist

I have signed the Stream CLA (required)
This change should be manually QAed
Changelog is updated with client-facing changes
Changelog is updated with new localization keys
New code is covered by unit tests
Comparison screenshots added for visual changes
Affected documentation updated (docusaurus, tutorial, CMS)

nuno-vieira · 2024-11-28T10:42:24Z

Sources/StreamChat/Repositories/ConnectionRepository.swift

+    func initialize() {
+        webSocketClient?.initialize()
+    }


This needs to be introduced. Otherwise, we won't report the status of the new connection to the delegates. This is especially important when using ChatClient as a singleton. This is the problem:

connectUser() (connectionStatus = .initialized) - keep in mind that it only goes to .connecting, after the token provider finishes

timeout() (connectinStatus = .disconnected) - Disconnected status is reported ✅

connectUser() again (connectionStatus == .disconnected) - Connecting status is not reported ❌ , because the status did not change

Since we don't reset the connection status, it keeps at the .disconnected, and so the .connecting or .initialized is not reported when manually reconnecting.

when you call connectUser again, shouldn't it move to initialized / connecting already? Because there should be a status change when you start connecting again.

I've added more details to point 1.. The problem here is that we only set connectionState = .connecting after the token phase finishes. So, if the token provider keeps failing, we never set the websocket to connecting and it will be kept at disconnected on the second try, which means we won't report any connection state change, since it will be stuck at disconnected.

This approach actually makes sense, before the connect() call is made to the WebSocket engine, the state is initialized. So, when the token provider fails and we did not connect the websocket engine, we need to restart the state to initialized. (Since the timeout action will report disconnect)

Overall overview:

stateDiagram-v2 [*] --> initialize initialize --> ChatClient.Connect ChatClient.Connect --> TokenProvider TokenProvider --> ReconnectionTimeout: Fails TokenProvider --> WebSocket.Connect: Success ReconnectionTimeout --> disconnected disconnected --> ChatClient.Connect note right of WebSocket.Connect State reported: .connecting end note note right of initialize State reported: .initialize end note note right of disconnected State reported: .disconnected end note note left of disconnected On the second cycle, the state is not reported Because it did not change from .disconnected end note

Loading

Ideally it would be nice to see .connecting as the state just after calling connect(). As a SDK user, who does not know internals, that would make sense. That said, it might make sense to do this separately (because we have this distinction at the moment where .initialized is used during the token fetch phase).

Especially because documentation says (one typo there):

/// The initial state meaning that there was no atempt to connect yet. case initialized

That is confusing for the SDK user.

My view is that I feel like it is OK in the PR to go for .initialized, because this is what how the status has been reported previously, but then in a follow-up PR actually change this to .connecting because that would make more sense for SDK users (thinking about splitting because no idea what side-effects it could have and that should be tested throughly).

Yes, exactly, I had the same thinking as Toomas. Basically, this part:
"The problem here is that we only set connectionState = .connecting after the token phase finishes.", doesn't feel correct. We are connecting (we are trying to get a token), but that's not reflected anywhere. To reduce risks, I'm fine to merge it like this as well, but yeah, we should make this more explicit.

Yes, changing this would be very risky at the moment, that is why I decided not to do, and I remember I had a couple of issues trying it.

@laevandus I changed that comment since it is internal either way. I tried changing the state to .connecting but this breaks other stuff, and so for now I prefer to not risk it. At most, adding another internal state, like .fetchingToken or something like this, would be better and not cause any impact on the rest of the logic. But for now, I think this is more than enough 👍

Agree that we should try to tackle in another issue & PR.

nuno-vieira · 2024-11-28T10:42:49Z

Sources/StreamChat/Repositories/ConnectionRepository.swift

-        if connectionId == nil {
-            if source == .userInitiated {
-                log.warning("The client is already disconnected. Skipping the `disconnect` call.")
-            }
-            completion()
-            return
-        }
-


This is an unnecessary optimization that just complicates things. For example, when we timeout, we don't have the connectionId, so the webSocketClient? .disconnect won't be called. Which means no connection status changes will be reported.

yes, this makes sense.

nuno-vieira · 2024-11-28T10:43:47Z

Sources/StreamChat/WebSocketClient/WebSocketClient.swift

+        switch connectionState {
+        case .initialized, .disconnected, .disconnecting:
+            connectionState = .disconnected(source: source)
+        case .connecting, .waitingForConnectionId, .connected:
+            connectionState = .disconnecting(source: source)


when disconnecting, if we are already disconnected, we should not put the state to disconnecting since the engine won't do anything, and so it won't report the disconnected afterwards. For this reason, we need to instantly report as disconnected for this scenario.

this seems tricky, previously it seems we were also processing events. Can you explain a bit more why we delete that part?
Also, the switch itself is a bit strange to me, especially the second case. If you are disconnected, and you are connecting, why should it go disconnecting?

I did not change the processing events part. The github diffing is not very clear, but the only thing changed here is the report of connectionState

Also, the switch itself is a bit strange to me, especially the second case. If you are disconnected, and you are connecting, why should it go disconnecting?

Where does this happen? We report disconnected if we were not connected previously, so theres nothing to disconnect, that is why we instantly report that we are disconnected. If we are connecting, or waiting for connection or connected, then we need to call disconnect to the engine, so the engine will eventually report the disconnected state.

It is quite clear actually:

When the state is initialized, .disconnected, and disconnecting, and the user disconnects, we instantly report disconnected since the WebSocket engine won't report anything because it is already disconnected.

When the state is connected, connecting or waiting for connection, we need to call the engine.disconnect() and the websocket delegate will report the disconnected state. This is why here we report disconnecting instead of disconnected.

yeah, now after seeing the diagram and the explanations, it makes more sense, thanks.

Stream-SDK-Bot · 2024-11-28T10:43:53Z

SDK Size

`title`	`develop`	`branch`	`diff`	`status`
StreamChat	7.06 MB	7.06 MB	+1 KB	🟢
StreamChatUI	4.96 MB	4.96 MB	0 KB	🟢

Stream-SDK-Bot · 2024-11-28T10:49:24Z

SDK Performance

`target`	`metric`	`benchmark`	`branch`	`performance`	`status`
MessageList	Hitches total duration	10 ms	6.68 ms	33.2% 🔼	🟢
	Duration	2.6 s	2.55 s	1.92% 🔼	🟢
	Hitch time ratio	4 ms per s	2.62 ms per s	34.5% 🔼	🟢
	Frame rate	75 fps	78.33 fps	4.44% 🔼	🟢
	Number of hitches	1	0.8	20.0% 🔼	🟢

Stream-SDK-Bot · 2024-11-28T11:22:57Z

SDK Size

`title`	`develop`	`branch`	`diff`	`status`
StreamChat	7.08 MB	7.08 MB	+1 KB	🟢
StreamChatUI	4.96 MB	4.96 MB	0 KB	🟢

martinmitrevski

More eyes would be needed here, some things are not entirely clear to me. @laevandus good if you also have a look.

martinmitrevski · 2024-11-28T15:48:22Z

Sources/StreamChat/Repositories/ConnectionRepository.swift

+    func initialize() {
+        webSocketClient?.initialize()
+    }


when you call connectUser again, shouldn't it move to initialized / connecting already? Because there should be a status change when you start connecting again.

martinmitrevski · 2024-11-28T15:50:11Z

Sources/StreamChat/Repositories/ConnectionRepository.swift

-        if connectionId == nil {
-            if source == .userInitiated {
-                log.warning("The client is already disconnected. Skipping the `disconnect` call.")
-            }
-            completion()
-            return
-        }
-


yes, this makes sense.

martinmitrevski · 2024-11-28T15:56:06Z

Sources/StreamChat/WebSocketClient/WebSocketClient.swift

+        switch connectionState {
+        case .initialized, .disconnected, .disconnecting:
+            connectionState = .disconnected(source: source)
+        case .connecting, .waitingForConnectionId, .connected:
+            connectionState = .disconnecting(source: source)


this seems tricky, previously it seems we were also processing events. Can you explain a bit more why we delete that part?
Also, the switch itself is a bit strange to me, especially the second case. If you are disconnected, and you are connecting, why should it go disconnecting?

laevandus · 2024-11-29T07:37:55Z

Sources/StreamChat/ChatClient.swift

        authenticationRepository.clearTokenProvider()
-        authenticationRepository.cancelTimers()
+        authenticationRepository.reset()


Should reconnectionTimeoutHandler?.stop() be called here as well or it does not matter?

Thinking about the case of calling disconnect quickly after calling connect.

It does not really matter, because this is triggered by the reconnectionTimeoutHandler which has repeats: false, so it will be already stopped by the time it reaches here

martinmitrevski

Ok, let's merge it with the current state. But first, @testableapple should show this PR some love.

laevandus · 2024-11-29T12:18:11Z

✅
Tested the branch and develop. In develop it did not disconnect after 15 seconds, but in this PR it correctly disconnected and the channel list updated (shimmering stopped).

github-actions · 2024-11-29T12:22:58Z

	1 Warning
⚠️	The changes should be manually QAed before the Pull Request will be merged

Generated by 🚫 Danger

…ing-on-the-token-provider

sonarqubecloud · 2024-12-03T09:30:31Z

Quality Gate passed

Issues
168 New issues
0 Accepted issues

Measures
0 Security Hotspots
83.7% Coverage on New Code
0.1% Duplication on New Code

See analysis details on SonarQube Cloud

…ase (#3513)

nuno-vieira added 6 commits November 27, 2024 17:49

Move reconnection timeout handler to ChatClient

5098e45

Remove unnecessary check

62c1f3d

Do not stop connection handler on timeout

d86ae18

Fix LLC Tests compilation

6a5d3bf

Fix unit test failures

dce022f

Add missing test coverage

cadfdf7

nuno-vieira added 🐞 Bug An issue or PR related to a bug 🌐 SDK: StreamChat (LLC) Tasks related to the StreamChat LLC SDK labels Nov 28, 2024

nuno-vieira requested a review from a team as a code owner November 28, 2024 10:38

nuno-vieira commented Nov 28, 2024

View reviewed changes

Update CHANGELOG.md

1b47595

Fix Xcode 14 build

f43af9b

martinmitrevski reviewed Nov 28, 2024

View reviewed changes

laevandus reviewed Nov 29, 2024

View reviewed changes

martinmitrevski approved these changes Nov 29, 2024

View reviewed changes

nuno-vieira added the 🤞 Ready For QA A PR that is Ready for QA label Nov 29, 2024

laevandus approved these changes Nov 29, 2024

View reviewed changes

Update docs for internal .initialized state

bac6e84

nuno-vieira and others added 2 commits November 29, 2024 12:54

Merge branch 'develop' into fix/reconnection-timeout-handler-not-work…

0ed332f

…ing-on-the-token-provider

Merge branch 'develop' into fix/reconnection-timeout-handler-not-work…

4081aec

…ing-on-the-token-provider

testableapple added 🟢 QAed A PR that was QAed and removed 🤞 Ready For QA A PR that is Ready for QA labels Dec 3, 2024

testableapple merged commit 7941389 into develop Dec 3, 2024
12 of 14 checks passed

testableapple deleted the fix/reconnection-timeout-handler-not-working-on-the-token-provider branch December 3, 2024 09:34

laevandus pushed a commit that referenced this pull request Dec 3, 2024

Fix reconnection timeout handler not working in the token provider ph…

80755a9

…ase (#3513)

laevandus mentioned this pull request Dec 3, 2024

4.68.0 Release #3521

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix reconnection timeout handler not working in the token provider phase #3513

Fix reconnection timeout handler not working in the token provider phase #3513

nuno-vieira commented Nov 28, 2024 •

edited

Loading

nuno-vieira Nov 28, 2024 •

edited

Loading

martinmitrevski Nov 28, 2024

nuno-vieira Nov 28, 2024 •

edited

Loading

nuno-vieira Nov 28, 2024 •

edited

Loading

laevandus Nov 29, 2024

martinmitrevski Nov 29, 2024

nuno-vieira Nov 29, 2024

nuno-vieira Nov 29, 2024

laevandus Nov 29, 2024

nuno-vieira Nov 28, 2024

martinmitrevski Nov 28, 2024

nuno-vieira Nov 28, 2024

martinmitrevski Nov 28, 2024

nuno-vieira Nov 28, 2024 •

edited

Loading

nuno-vieira Nov 28, 2024 •

edited

Loading

martinmitrevski Nov 29, 2024

Stream-SDK-Bot commented Nov 28, 2024 •

edited

Loading

Stream-SDK-Bot commented Nov 28, 2024

Stream-SDK-Bot commented Nov 28, 2024 •

edited

Loading

martinmitrevski left a comment

martinmitrevski Nov 28, 2024

martinmitrevski Nov 28, 2024

martinmitrevski Nov 28, 2024

laevandus Nov 29, 2024 •

edited

Loading

nuno-vieira Nov 29, 2024

martinmitrevski left a comment

laevandus commented Nov 29, 2024 •

edited

Loading

github-actions bot commented Nov 29, 2024

sonarqubecloud bot commented Dec 3, 2024

Fix reconnection timeout handler not working in the token provider phase #3513

Fix reconnection timeout handler not working in the token provider phase #3513

Conversation

nuno-vieira commented Nov 28, 2024 • edited Loading

🔗 Issue Links

🎯 Goal

🛠 Implementation

Before

After

🧪 Manual Testing Notes

☑️ Contributor Checklist

nuno-vieira Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nuno-vieira Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

nuno-vieira Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nuno-vieira Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

nuno-vieira Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stream-SDK-Bot commented Nov 28, 2024 • edited Loading

SDK Size

Stream-SDK-Bot commented Nov 28, 2024

SDK Performance

Stream-SDK-Bot commented Nov 28, 2024 • edited Loading

SDK Size

martinmitrevski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

laevandus Nov 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinmitrevski left a comment

Choose a reason for hiding this comment

laevandus commented Nov 29, 2024 • edited Loading

github-actions bot commented Nov 29, 2024

sonarqubecloud bot commented Dec 3, 2024

Quality Gate passed

nuno-vieira commented Nov 28, 2024 •

edited

Loading

nuno-vieira Nov 28, 2024 •

edited

Loading

nuno-vieira Nov 28, 2024 •

edited

Loading

nuno-vieira Nov 28, 2024 •

edited

Loading

nuno-vieira Nov 28, 2024 •

edited

Loading

nuno-vieira Nov 28, 2024 •

edited

Loading

Stream-SDK-Bot commented Nov 28, 2024 •

edited

Loading

Stream-SDK-Bot commented Nov 28, 2024 •

edited

Loading

laevandus Nov 29, 2024 •

edited

Loading

laevandus commented Nov 29, 2024 •

edited

Loading