A stream is a Snowflake object that provides change data capture (CDC) capabilities to track the changes in a table. It records changes made to a table, including information about inserts, updates and deletes as well as metadata about each change. This process is referred to as change data capture (CDC). Types Of Streams: Standard table stream tracks all DML changes to the source table, including inserts, updates, and deletes (including table truncates). Append-only table stream tracks row inserts only. Update and Delete operations (including table truncates) are not recorded. An append-only stream returns the appended rows only and therefore can be a better choice than a standard stream for extract, load, transform (ELT) and similar scenarios that depend exclusively on row inserts. Append-only table streams can be combined with Snowflake tasks for continuous ELT workflows to process recently changed table rows. STALE Streams: A stream becomes stale if its point of offset gets positioned at a point earlier than the table’s data retention period. In order to avoid staleness we should ensure that data from the stream is consumed regularly. Consuming stream data advances the stream offset point as a result of which it avoids stream becoming stale. ENABLE […]
Read More +Category: Snowpipe
Overview of Time Travel feature in Snowflake
Snowflake Time Travel enables us to perform the following tasks:● Accessing data that has been changed or deleted at any point within a retention period.● Restoring tables, schemas, and databases that have been dropped.● Create Clones of tables, schemas and databases at or before specific points in the past.These tasks can be performed until the retention period, after which the data is moved into Snowflake Fail-safe. Time Travel SQL Extensions:In Snowflake Time Travel to access or clone historical data we use the Time Travel SQL Extensions AT or BEFORE clause which can be specified in SELECT statements and CREATE … CLONE commands. To pinpoint the exact historical data which you wish to access we use the below parameters with the SQL Extensions:TIMESTAMPOFFSET (time difference in seconds from the present time)STATEMENT (identifier for statement, e.g. query ID) Restoring Objects:A dropped table or schema or database that has not yet been purged from the system (i.e. not yet moved into Snowflake Fail-safe) can be restored using the UNDROP command. Calling UNDROP restores the object to its most recent state before the object was dropped. Enabling and Disabling Time Travel:Time travel is automatically enabled in Snowflake with 1-day data retention period by default […]
Read More +Overview of Continuous and Automated Data loading in Snowflake using Snowpipe
Snowpipe is Snowflake’s continuous data ingestion service. Snowpipe loads data from files as soon as they are available in a staging area based on the COPY statement defined in a referenced pipe. The COPY statement identifies the source location (e.g. AWS S3 Bucket) of the data files and a target table to be used by Snowpipe to load data into Snowflake table. Below mechanisms are available for detecting and loading the staged files: Automating Continuous Data Loading Using Cloud Messaging Calling Snowpipe REST Endpoints to Load Data Automating Continuous Data Loading Using Cloud Messaging – Automated data loads uses event notifications (e.g. AWS SQS) for cloud storage to inform Snowpipe of the arrival of new data files to load. Snowpipe then copies the files into a queue, from which they are loaded into the target table in a continuous, serverless fashion based on parameters defined in the pipe.Calling Snowpipe REST Endpoints to Load Data – Your client application calls a public REST endpoint with a list of data filenames and a referenced pipe name. If new data files matching the list are discovered in the stage, they are queued for loading. Snowflake-provided compute resources load data from the queue into […]
Read More +