Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Driver churns ports & connections if the cluster isn't answering #42

Open
levkk opened this issue Sep 8, 2021 · 3 comments
Open

Driver churns ports & connections if the cluster isn't answering #42

levkk opened this issue Sep 8, 2021 · 3 comments

Comments

@levkk
Copy link

levkk commented Sep 8, 2021

If we are trying to create a connection pool to a cluster which is not answering, the connection pool will keep trying to connect until it exhausts all available outgoing ports. We artificially gave the driver only 1000 ports for the shard-aware connection pool.

This is what we got after a couple minutes:

1631141762.557 [WARN] (connection_pool.cpp:378:void datastax::internal::core::ConnectionPool::on_reconnect(datastax::internal::core::DelayedConnector*)): Connection pool was unable to reconnect to host 52.72.17.40 because of the following error: Connection timeout
terminate called after throwing an instance of 'std::runtime_error'
  what():  ShardPortCalculator: cannot find free outgoing port
Aborted (core dumped)

The driver needs to reuse outgoing ports or re-connect more gracefully.

@levkk levkk changed the title Core dump in connection failed logic Driver churns ports/connections if the cluster isn't answering Sep 8, 2021
@levkk levkk changed the title Driver churns ports/connections if the cluster isn't answering Driver churns ports & connections if the cluster isn't answering Sep 8, 2021
@Lorak-mmk
Copy link
Collaborator

This is caused by ShardPortCalculator not having any way to mark ports as free. In order to fix this, it would be nice to have a way to reproduce this issue. Could you please give more specific steps? I tried giving driver small range of ports (10-100), and then blocking access to port 19042 of one of the nodes using iptables (tried with DROP and REJECT) - I tried doing this before starting the driver, and after it estabilished session and was doing work, but couldn't get this error :(

@Lorak-mmk
Copy link
Collaborator

@levkk

@levkk
Copy link
Author

levkk commented Feb 13, 2022

I think DROP will make the driver hang until TCP keep-alive expires. Try rejecting the connection explicitly or better yet just shut down Scylla on the other end and run the test again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants