Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent Timeout Exceptions after upgrading to Aerospike (5.6->6.0) and client (4.7.2->5.3.0) #74

Open
kuskmen opened this issue Nov 28, 2022 · 3 comments

Comments

@kuskmen
Copy link

kuskmen commented Nov 28, 2022

Hello, after we upgraded the server and client to the aforementioned versions we started noticing intermittent Aerospike.Timeout error for random write operations (from full record create operation to simple int bin (boolean) update operation), most (if not all) settings are default from server and client point of view.

What we noticed is that client is throwing Timeout exceptions but server metrics in Grafana are not showing any indication of that, server logs are also only info.

We've checked all the suggestions mentioned here: https://support.aerospike.com/s/article/Warning-write-fail-queue-too-deep to potentially look for answers but to no avail.

We noticed that most of the time when a timeout occurs server has this log: https://docs.aerospike.com/reference/server-log#1663663594

With all that said, could it be that client has timeout issues in 5.3.0 as well, as we don't see any indication from server of requests being timed out?

The client configuration is also pretty straightforward:

return services.AddSingleton<IAsyncClient>(
         new AsyncClient(new AsyncClientPolicy
         {
              asyncMaxCommandAction = MaxCommandAction.DELAY,
         }, hosts));
}

everything else is the default.

@BrianNichols
Copy link
Member

The server log only shows timeouts that occurred on the server side (from receiving of command to response). The client timeout monitors the full round-trip from sending the command to receiving the response. In the great majority of cases, the client initiates the timeout. The TimeoutException message starts with either "Client timeout" or "Server timeout".

I'm not aware of any premature timeout issues with the latest C# client. There is one outstanding latency issue, but it only applies to when LDAP servers are included in the Aerospike server configuration.

I suggest opening an Enterprise support case for this issue.

@BrianNichols
Copy link
Member

BrianNichols commented Dec 1, 2022

I have recently learned that there can be performance degradation for queries that do not return much data in server 6.0. The reason is that server 6.0 switched to the new partition based query protocol for clients that support this protocol. The old query protocol may have shorter latency, but could return duplicate records or fail to return records when the cluster in migration. The new partition based query protocol eliminates the duplicate/missing records, but may result in longer latency. This applies to query only, so might not be applicable to your case.

@kuskmen
Copy link
Author

kuskmen commented Jan 29, 2023

Just to further draw attention to this issue, a different team from our company also experiences the same issues with NodeJs client on a completely different server setup (Aerospike in K8s (still 6.0+)) on a completely new aerospike. So it's beginning to feel more and more like a server issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants