Apache Nifi processor that acts like a barrier to synchronize multiple flow files -


i'm evaluating nifi our etl process. want build following flow: fetch lot of data sql database -> split chunks 1000 records each -> count error records in each chunk -> count total number of error records -> if exceeds threshold fail process -> else save each chunk database.

the problem can't resolve how wait until chunks validated. if example have 5 validation tasks working concurrently, need kind of barrier wait until chunks processed , after run error count processor because don't want save invalid data , delete if threshold reached.

the other question have if there possibility run validation processor on multiple nodes in parallel , still have possibility wait until completed.

one solution use executescript processor "relief valve" hold simple count in memory triggered off of first receipt of flowfile specific attribute value (store in local/cluster state map of key attribute-value value count). once value reaches threshold, can generate new flowfile route success relationship containing attribute value has finished. in case, send other results (the flowfiles need batched) mergecontent processor , set minimum batching size whatever like. follow-on processor valve should have scheduling strategy set event driven runs when receives flowfile valve.


Comments

Popular posts from this blog

ios - RestKit 0.20 — CoreData: error: Failed to call designated initializer on NSManagedObject class (again) -

laravel - PDOException in Connector.php line 55: SQLSTATE[HY000] [1045] Access denied for user 'root'@'localhost' (using password: YES) -

java - Digest auth with Spring Security using javaconfig -