Showing posts with label asyncoperation. Show all posts
Showing posts with label asyncoperation. Show all posts

Sunday, August 11, 2024

Unmasking hidden issues in Dataverse: the surprising role of event expander operations in System job logs - down the rabbit hole we go

 As outlined in Large AsyncOperationBase increase in Dataverse/Dynamics 365 CE: the canary in the coalmine - an increase in the amount of storage consumed by system jobs (visible in the Power Platform Admin Center storage capacity report) is a tell-tale for a Dataverse or Dynamics CRM environment which has some hidden problems.


The AsyncOperationBase table keeps a record of all asynchronous jobs (Async plugins, async workflows, internal Microsoft jobs, etc ...) which are running in the background processing data in your environment. If you have a lot of failed or cancelled jobs, there might be an issue with plugins or workflows. If you see a lot of awaiting resources jobs, there might be a big data load happening on your environment (or maybe an infinite loop).

During a periodic review of the System jobs health status, we noticed that there were a lot more jobs in status "Waiting for resources" then we were used too (More than 200.000). We raised a Microsoft support ticket for this and we got an update on this (redacted version below):

"Microsoft is rolling out a new way of how audit logs are being written in station 4 (EMEA) in a deferred manner. Entities representing these deferred operations are created in the AsyncOperationBase table. 

These operations, while rolled up in the AsyncOperationBase table, execute outside of the Async Service and are not meant to be interpreted as additional backlog that the Async Service needs to process. These operations have no negative impact on System Job throughput. When the Audit operation has been fully processed outside of the Async Service, these operations will be removed from the AsyncOperationBase table. Event Expander Operation jobs are used as part of this new audit functionality, these are important jobs to ensure auditing is not lost. 

These jobs are however processed by a separate service, so they do not affect async throughput, etc. in any way.  Seeing a lot of these jobs (operation type 92 - event expander operation) is not an issue, as these are constantly churning in order to write audit history. If you have custom reporting in place to monitor system jobs - you should exclude AsyncOperationType 92"

Monday, May 16, 2022

Large AsyncOperationBase increase in Dataverse/Dynamics 365 CE: the canary in the coalmine

In normal circumstances, you probably rarely look at the System Job views in your  Dynamics 365 environment. But if you see a sudden and unexpected (large) increase in the storage used by the AsyncOperationBase table or AsyncOperation in File storage is like a canary in the coalmine - something is wrong in your Dynamics365/Dataverse environment.


System jobs (or Async Operations) are tasks that Dynamics performs in the background that don't have to be completed straight away. When such an asynchronous task is being setup, Dynamics creates a row in the AsyncOperationBase table and puts in all the details about when to run, what function/plugin/cloud flow to call, parameters to pass etc... 

All of this information used to be stored in columns of the AsyncOperationBase table but a row could  get really large (KB or MBs) depending on the actual task being scheduled.  Part of this payload has now been moved out of the database and stored into files on Azure Blob Storage and the database row now has a reference to the blob file.  You can follow up on the numbers in the capacity details for an instance in the Power Platform Admin Center (PPAC) - where you will now also see AsyncOperation in the file section after this architectural change from Microsoft. (Also see Storage capacity management for Dynamics 365/Dataverse - how to track storage evolution with Power BI and New Microsoft Dataverse storage capacity (Microsoft docs))




Dynamics 365 has a recurring system job to delete successfully completed system jobs  - see screenshot below - when an asyncoperation record is cleared from the database, the linked file storage is also cleared up.


One of the reasons you might see a large increase in storage consumed is because of a large amount of failed or cancelled system jobs.  The system jobs status (internal name statecode, possible values = ready, suspended, locked and completed)  and status reason ( internal name statuscode, possible values for completed jobs = succeeded, cancelled and failed)  can be used to identify possible issues.

I really like the SQL 4 CDS XrmToolBox add-in to troubleshoot issues with system jobs - it allows you to write simple SQL queries to identify system jobs causing issues. 

I also use  SQL4CDS for further analysis when troubleshooting async plugins using SQL queries e.g. select top 100 * from plugintracelog where typename like 'JOPX.CRM.Plugins.Notification%' but there are also other tools in XrmToolBox which you can use