Post

4 followers Follow
1
Avatar

How can I make sure that because of data rebalancing there are no onRegion functions running concurrently on the same object?

Let's say I have a running onRegion function on a single entry that performs updates on an external data source.
Then data rebalancing happens and the entry moves to another node and simultaneously another onRegion function gets executed on the same entry.
I want to provide some event handling mechanism on external sources and I need to ensure I guarantee correct order of events. I do not want to run that second function until the first one has finished.

Any suggestions?

Thanks in advance,
Hovhannes

Hovhannes Antonyan

Please sign in to leave a comment.

7 comments

0
Avatar

Unless you are executing your function with hasResult returning false or asynchronously executing the function (spinning off a thread in the execute method), then the function won't be processing the same key simultaneously in two different nodes. Depending on what is returned from isHA, the function may process the key twice, though. The first time FunctionContext isPossibleDuplicate will return false; the second time it will return true. That boolean is the best indication the key may have already been processed.

Barry Oglesby 0 votes
0
Avatar

That's great to hear, this is exactly what I need. Yes functions are HA and hasResult returning true.
Ok so gemfire won't execute parallel onregion calls on the same key on two different nodes.
But then another question arises, would it execute parallel calls on the same key on the same node?

Hovhannes Antonyan 0 votes
0
Avatar

I just want to be sure I clearly understand your question. I think I was answering a different question. I was answering wrt the same thread executing the function twice because of a rebalance. If you're asking whether two separate threads can execute a function on the same key at the same time, then yes that definitely can happen. If optimizeForWrite returns false, then two functions can be operating on the same key simultaneously in two different nodes - one on the primary; one on the secondary. If optimizeForWrite returns true, then two functions can be operating on the same key simultaneously in the same node - the primary. None of this has to do with rebalancing, though. If you need to prevent this, then you can use either transactions or the Distributed Lock Service.

Barry Oglesby 0 votes
0
Avatar

No in my case optimizeForWrite is true. I will rephrase my question to make it more clear:
I made an onRegion call to key X, and let's say I suspended the function execution via breakpoint. Then I triggered a rebalance and have invoked the same function on key X. What will happen?
Will I have two function executions running on two different nodes (one on old node, one on new node)?

I want to provide bulk event handling mechanism for the entries in region, I do not want to use listeners because it is per each entry and I want to cache and trigger them by bulk, so I want to use onRegion calls. But then the question is how should I handle data rebalancing? Are the distributed locks the only way to achieve this? I do not want to use distributed lock service because from another point of view I do not want one node to depend on other.

Hovhannes Antonyan 0 votes
0
Avatar

If you transaction data rebalanced between begin and commit it will through transaction data rebalanced exception. So in this case the first transaction will fail.

Shuvro Das 0 votes
0
Avatar

Thanks for the answer.
Hmm, I think this will not help me because the functions will still be executed in parallel, even though one of them in the end will throw an exception. However since in the implementation of onRegion call I try to synchronize external data sources, making concurrent calls to that source would be an issue for me.

Hovhannes Antonyan 0 votes
0
Avatar

The AsyncEventListener might be a better way to write to a back-end data source. Thats pretty much what it is designed for. It is an asynchronous API for processing batches of events as they occur. If its configured as parallel, it'll process primary events in each member. Unfortunately, you can still have simultaneous processing of the same event in two different members when a rebalance occurs.

One thing you could do is to write your own rebalance function instead of using gfsh that does:

  • notifies all members before the rebalance
  • does the rebalance
  • notifies all members after the rebalance

The notification could be done with either a Region and CacheListener or another Function. I used a Function in my test. In either case, each member would set a volatile rebalanceInProgress boolean to true before the rebalance and false after the rebalance. The AsyncEventListener would check the value of that boolean, and if true, get the Distributed Lock on each key, all the local buckets or even a single lock. This would guarantee only one member is processing an event at a time. If the boolean is false, then no locking would be necessary.

You'd also want to set startup-recovery-delay="-1" so that new members wouldn't automatically recover redundancy or primaries. All that would be controlled by the function.

Barry Oglesby 0 votes