Wednesday, November 22, 2023

Near real-time and snapshots in Azure Synapse Link for Dataverse

The Azure Synapse Link for Dataverse documentation contains a section about Access near real-time data and read-only snapshot data but it does not really explain why you want to use one or the other. 

When you open an Azure Synapse SQL Serverless LakeDB in SQL Server Management Studio you see a clear distinction between the two versions of the table data - whereas in Azure Synapse Studio there is no obvious distinction besides the name you will see the "account" table the "account_partitioned" view:

  • Near real time data: external table for all the underlying CSV files exported by the Azure Synapse Lin for Dataverse sync engine. There is a soft SLA for the data to be present in these tables within 15 minutes
  • Snapshot data/partitioned views: views on top of the near-real time data which are updated on an hourly interval.

In most scenarios, it best to do queries against these partitioned views since you will avoid read conflicts and you are sure that a full transaction has been written on the CSV files in Azure Data Lake storage. 

A typical exception that you might receive when doing queries directly against the "tables" is "https://`[datalakestoragegen2name]$$/[lakedbname]/[tablename/Snapshot/2023-05_1684234580/2023-05.csv" does not exist or you don't have file access rights)" but this also depends on your specific context. If you have a lot of create, updates or deletes on Dataverse tables this might happen more regularly. Even though, the partitioned views are update on an hourly basis - it might be that the Synapse Link engine is just refreshing the views at the same point that you perform a query, which will give you a similar exception but the changes that this occurs are more rare.

You can check the last sync timestamp and sync status in the Power Platform maker portal (see screenshot below)

For the moment, you will also have to manually check the monitoring page (which can be quite tedious if you have a lot of environments) but there is an item in the Microsoft release planner "Receive notifications about the state of Azure Synapse Link for Dataverse" which is apparently in public preview but I haven't seen it in for  environments (not in the and also not in  I have access to. 

It is also not easy to see if something went wrong with the refresh of the partitioned views - up until now the easiest way to find out is running a SQL query -  select name,create_date from sys.views order by create_date desc against the LakeDB.

No comments: