Apache Nifi processor that acts like a barrier to synchronize multiple flow files -
i'm evaluating nifi our etl process. want build following flow: fetch lot of data sql database -> split chunks 1000 records each -> count error records in each chunk -> count total number of error records -> if exceeds threshold fail process -> else save each chunk database.
the problem can't resolve how wait until chunks validated. if example have 5 validation tasks working concurrently, need kind of barrier wait until chunks processed , after run error count processor because don't want save invalid data , delete if threshold reached.
the other question have if there possibility run validation processor on multiple nodes in parallel , still have possibility wait until completed.
one solution use executescript
processor "relief valve" hold simple count in memory triggered off of first receipt of flowfile specific attribute value (store in local/cluster state map of key attribute-value
value count
). once value reaches threshold, can generate new flowfile route success relationship containing attribute value has finished. in case, send other results (the flowfiles need batched) mergecontent
processor , set minimum batching size whatever like. follow-on processor valve should have scheduling strategy set event driven
runs when receives flowfile valve.
Comments
Post a Comment