Updating aim overwrite
Use the Update function to replace an entire record in a data source.
In contrast, the Update If and the Patch functions modify one or more values in a record, leaving the other values alone. Collections allow duplicate records, so multiple records might match.
The erstwhile implementation of Dist Cp has its share of quirks and drawbacks, both in its usage, as well as its extensibility and performance.
The purpose of the Dist Cp refactor was to fix these shortcomings, enabling it to be used and extended programmatically.
New paradigms have been introduced to improve runtime and setup performance, while simultaneously retaining the legacy behaviour as default.
This document aims to describe the design of the new Dist Cp, its spanking new features, their optimal use, and any deviance from the legacy implementation.
For HDFS, both the source and destination must be running the same version of the protocol or use a backwards-compatible protocol; see Copying Between Versions.
After a copy, it is recommended that one generates and cross-checks a listing of the source and destination to verify that the copy was truly successful.
Update and Overwrite options warrant special attention, since their handling of source-paths varies from the defaults in a very subtle manner.
The function evaluates the condition for each record and modifies any record for which the result is true.
To specify a modification, use a change record that contains new property values.
Only the first portion of the data source will be retrieved and then the function applied. A blue dot will appear at authoring time to remind you of this limitation and to suggest switching to delegable alternatives where possible.
Dist Cp Version 2 (distributed copy) is a tool used for large inter/intra-cluster copying.
Since Dist Cp employs both Map/Reduce and the File System API, issues in or between any of the three could adversely and silently affect the copy.