TF_E_CLIENT_IPC error on Linux (TF version 4.4.4.0)Answered

Hello,

One of our customers is receiving the error TF_E_CLIENT_IPC from TF_RequestLease calls in some machines which prevents the process from launching our software. This issue is forcing the users to retry the process multiple times until the lease request eventually works. The licensing system itself seems to work fine as there are other machines that can fetch a lease and load our software successfully. This error does not occur always, but only in some machines sometimes.

The client is executing our software and also the server on a RHEL 7 installation.
The TF version (library and server) is the latest available on your website: 4.4.4.0.
We can confirm that server address and port are properly configured in their environment (succeeding and erroring machines).

We saw a post pointing to the same issue that we are describing but that one was under 4.1.9.0. We also saw another older post related to this topic which finally resulted in a fix released with 4.1.9.0. We would expect that our issue is somewhere else because we are using 4.4.4.0.

We checked the documentation and this error is related to the "interprocess communication facilities" which are "needed to coordinate between multiple client-program instances running in the same operating-system session". Our understanding is that there is some process internal to the TF library that is failing.

Could you help us to troubleshoot this issue on the client side?
Could you provide information on which conditions could lead to this problem?

We need your help to fix this problem as soon as possible because it is affecting the workflow of our client, as it requires monitoring the processes and manually restarting the failing ones.

Thank you!

Hello again,

I’m writing to follow up on this thread regarding the TF_E_CLIENT_IPC on Linux.
Apart from what we exposed in the first comment:

- we can confirm that the licensing server has available licenses to lease.
- we have verified that all client machines can successfully connect to the licensing server.

We haven’t yet received any update from you and this delay is causing downtime for our end client. Could you please let us know if there has been any progress, or when we might expect a resolution?

Your assistance would be appreciated, as we need to minimize further impact on our client’s operations.
Thank you in advance for your help.

Are you using the static libraries or the dynamic libraries?

Hello,
We are using the dynamic libraries.

This issue is forcing the users to retry the process multiple times until the lease request eventually works.

Hmmm… that's interesting.

We haven't been able to reproduce this yet. Have you? If so, giving us a reproducible set of conditions would be most helpful.

It is interesting that the customer is getting the error intermittently. Are they starting a whole bunch of instances of your app at once? Or is your app (various parts of it) requesting the lease a bunch of times from different parts of your app (whether separate processes or within the same process)?

Hello,

We’ve investigated the issue further, but unfortunately, we have not been able to reproduce it on our end. Based on information from the client, a few environmental factors might be contributing to the behavior:

- They are running on a cluster of machines.
- Multiple parallel processes are active on their systems, and these may be invoking our application or other TurboFloat-based applications concurrently.

In response to your specific questions:

- To the best of our knowledge, they are not launching multiple instances of our application on the same machine at the same time. That said, due to the cluster configuration, it’s possible that two distinct processes might be allocated to the same machine concurrently.
- We’ve reviewed our codebase and can confirm that the license lease request is made from a single, centralized location. There are no redundant or duplicate calls to the leasing system.

Given the complexity of the client's setup, do you have any suggestions for further testing or scenarios we could attempt to better replicate the issue?

Thanks!

Answer

Honestly, we haven't been able to reproduce this on the latest released versions (nor the next version in testing). Our latest released version is very robust to bad user behavior, but obviously it's not perfect.

We'll see if we can create some extremely bad environments to try to reproduce this, but honestly we need more information. If you're successful in reproducing this please let us know so we can reproduce it on our end and then work out either a workaround to the environment or, if it's a genuine bug in our code, a fix.

Hello Wyatt,

Thanks for your reply.

Could you please share the conditions or scenarios that trigger the TF_E_CLIENT_IPC error in the TurboFloat client library? In the header it’s described as:

“An error occurred while the TurboFloat client library was trying
to use interprocess communication facilities. These are needed to
coordinate between multiple client-program instances running in
the same operating-system session.”

A brief list of possible causes could help us reproduce and diagnose the issue.
Thank you!

Honestly, it's not something that's really debuggable by the end-user. Understanding what they're doing is more useful than giving them a list of everything that could possibly go wrong.

Honestly, it sounds like a contention issue. That is, they're trying to request the lease many many times within the same millisecond and some of the processes request the lease get TF_E_CLIENT_IPC.

But it's hard to know without more information or without a way to reproduce this.