3 Essential Questions to Address When Building an Incremental Data Load Script with API | by Daniel Khoa Le | June 2024

Now let's say you've already fetched a bunch of records by making API requests with the parameters mentioned above, it's time for you to decide how you want to write them to the destination table.

Answer: Merge/Dedup mode (recommended)

This question refers to the choice of Write disposition either Sync modeThe immediate answer is that since you are looking to load your data incrementally, you will probably choose to write the extracted data in append mode or merge mode (also known as deduplication mode).

However, let's step back a bit to examine our options more closely and determine which method is best suited for incremental loading.

Below are the most popular writing layouts.

overwrite/replace: delete all existing records in the target tables and then insert the extracted records.
attach: simply add extracted records to the target tables.
go after: Insert new

records and updates(**) existing records. merge_key primary_key How do we know which records are new?: Typically, we'll use a primary key to determine this. If you're using DLT, your merge strategy can be more sophisticated than that, including distinguishing between and dedup_sort (one is used for merging and one is used for deduplication before merging)either

(which records will be deleted with the same key in the deduplication process) I'll leave that part for another tutorial.(**) This is a simple explanation, if you want to know more about how dlt handles this merging strategy, read more

here

. Here is an example to help us understand the results of different writing arrangements.

On 06/19/2024: We performed the first synchronization. source application

Image by the author destination database

Data uploaded to our No matter which synchronization strategy you choose, the table at the destination is

from the source table. updated_atImage by the author updated_at Saved state of

= 2024–06–03, which is the last between the 2 records we synchronized.

source application

On 2024.06.2: We performed the second synchronization.

Data in

Image by the author
Changes in the source table:
Record id=1 (sales figure) was updated.

Record id=2 was deleted. updated_atRecord id=3 was inserted.

In this sync, we ONLY pull records with the

> 2024–06–03 (state saved since last sync). Therefore, we will only extract records id=1 and id=3. Since record id=2 was removed from the source data, there is no way for us to recognize this change. With the second sync, you will now see the difference between the writing strategies. destination database

Data uploaded to our

Scenario 1: Overwrite

Image by the author The destination table will be overwritten with the 2 records extracted this time.

Scenario 2: Append

Image by the author The 2 extracted records will be added to the target table, existing records will not be affected.

Scenario 3: Merge or deduplication

Image by the author The 2 extracted records with id=1 and 3 will replace the existing records in the destination. This process is called merging or deduplication. The record id=2 in the destination table remains untouched.

Conclusions:The deduplication strategy can be effective in the incremental data loading process, but if your table is very large, this deduplication process can take a considerable amount of time.

3 Essential Questions to Address When Building an Incremental Data Load Script with API | by Daniel Khoa Le | June 2024

Technical Terrence Team

1 stock I would like to buy in case of a stock market correction

Leave a Reply Cancel reply

Recommended.

28 years later will hit theaters next summer

From the Perceptron to Adaline. Setting the foundations right | by Pan Cretan | Nov, 2023

EURAUD takes a step lower this morning

Rarible adds support for Tezos and Polygon NFTs

Ethereum Price Has 100 SMA But Needs to Clear It for More Profits

Categories

Important Links

3 Essential Questions to Address When Building an Incremental Data Load Script with API | by Daniel Khoa Le | June 2024

Answer: Merge/Dedup mode (recommended)

Related

Technical Terrence Team

1 stock I would like to buy in case of a stock market correction

Leave a Reply Cancel reply

Recommended.

28 years later will hit theaters next summer

From the Perceptron to Adaline. Setting the foundations right | by Pan Cretan | Nov, 2023

EURAUD takes a step lower this morning

Rarible adds support for Tezos and Polygon NFTs

Ethereum Price Has 100 SMA But Needs to Clear It for More Profits

Categories

Important Links

Get daily news updates to your inbox!