Snowflake provides multiple options for data loading. The best solution may depend upon the volume of data to load and the frequency of loading.
Bulk Loading Using the COPY Command – This option enables loading batches of data from files already available in cloud storage, or copying (i.e. staging) data files from a local machine to an internal (i.e. Snowflake) cloud storage location before loading the data into tables using the COPY command.
Bulk loading relies on user-provided virtual warehouses, which are specified in the COPY statement. Users are required to size the warehouse appropriately to accommodate expected loads.
Continuous Loading Using Snowpipe – This option is designed to load small volumes of data (i.e. micro-batches) and incrementally make them available for analysis. Snowpipe loads data within minutes after files are added to a stage and submitted for ingestion. This ensures users have the latest results, as soon as the raw data is available.
Snowpipe uses compute resources provided by Snowflake (i.e. a serverless compute model). These Snowflake-provided resources are automatically resized and scaled up or down as required, and are charged and itemized using per-second billing. Data ingestion is charged based upon the actual workloads.
Note that it is not always necessary to load data before running queries. External tables option in Snowflake enables querying data stored in external data storage for analysis without first loading it into Snowflake. This solution is especially beneficial to customers who already have a lot of data stored externally but only want to query a portion of the data, for example, the most recent data. Users can create materialized views on subsets of this data for improved query performance.
DataTerrain, with years of experience and reliable experts, is ready to assist. We have served more than 200 customers in the US and over 70 customers worldwide. We are flexible in working hours and do not need any long-term binding contracts.