A large international company with many operating regions requires data to be shared bi-directionally among all offices (head office to regional offices and regional offices among themselves). This company is a Snowflake account holder with European operations deployed in Microsoft Azure (single region) while North American regional offices are using AWS (single region) as their deployment cloud. This setup is required to comply with Personal Identifiable Information (PII) regulations in some of the European countries. The corporate head office is in Europe.
How can this data be shared bi-directionally, while MINIMIZING costs?
Use data replication everywhere to reduce costs associated with same-region sharing.
Use the PUT command to move files to an Amazon S3 bucket and Azure Blobs, and use an external file management application to move files within the corporate VPC.
Move all the Snowflake accounts to a single region, and implement data sharing.
Use bi-directional data sharing among offices in the same region and replication among offices across the continents.
According to the Snowflake documentation1, data sharing is a feature that allows sharing selected objects in a database in one account with other accounts in the same organization, without copying or transferring any data. Data sharing is supported across regions and across cloud platforms, but it requires enabling account database replication for both the source and target accounts2. Data replication is a feature that allows replicating objects from a source account to one or more target accounts in the same organization, providing read-only access for the replicated objects. Data replication is also supported across regions and across cloud platforms, but it incurs additional storage costs for the replicated data2. Therefore, the best way to share data bi-directionally among all offices, while minimizing costs, is to use data sharing among offices in the same region, which does not require replication or additional storage, and use replication among offices across the continents, which provides near real-time access to the shared data. Option A is incorrect because using data replication everywhere would increase the costs associated with additional storage and compute resources for the replicated data. Option B is incorrect because using the PUT command to move files to an Amazon S3 bucket and Azure Blobs, and using an external file management application to move files within the corporate VPC, would not leverage the benefits of Snowflake’s data sharing and replication features, and would also incur additional costs and complexity for data transfer and synchronization. Option C is incorrect because moving all the Snowflake accounts to a single region would violate the PII regulations in some of the European countries, and would also incur additional costs and complexity for data migration and consolidation.
In which scenario will use of an external table simplify a data pipeline?
When accessing a Snowflake table from a relational database
When accessing a Snowflake table from an external database within the same region
When continuously writing data from a Snowflake table to external storage
When accessing a Snowflake table that references data files located in cloud storage
According to the Introduction to External Tables documentation, an external table is a Snowflake feature that allows you to query data stored in an external stage as if the data were inside a table in Snowflake. The external stage is not part of Snowflake, so Snowflake does not store or manage the stage. This simplifies the data pipeline by eliminating the need to load the data into Snowflake before querying it. External tables can access data stored in any format that the COPY INTO