Sunday, August 11, 2024

Unmasking hidden issues in Dataverse: the surprising role of event expander operations in System job logs - down the rabbit hole we go

 As outlined in Large AsyncOperationBase increase in Dataverse/Dynamics 365 CE: the canary in the coalmine - an increase in the amount of storage consumed by system jobs (visible in the Power Platform Admin Center storage capacity report) is a tell-tale for a Dataverse or Dynamics CRM environment which has some hidden problems.


The AsyncOperationBase table keeps a record of all asynchronous jobs (Async plugins, async workflows, internal Microsoft jobs, etc ...) which are running in the background processing data in your environment. If you have a lot of failed or cancelled jobs, there might be an issue with plugins or workflows. If you see a lot of awaiting resources jobs, there might be a big data load happening on your environment (or maybe an infinite loop).

During a periodic review of the System jobs health status, we noticed that there were a lot more jobs in status "Waiting for resources" then we were used too (More than 200.000). We raised a Microsoft support ticket for this and we got an update on this (redacted version below):

"Microsoft is rolling out a new way of how audit logs are being written in station 4 (EMEA) in a deferred manner. Entities representing these deferred operations are created in the AsyncOperationBase table. 

These operations, while rolled up in the AsyncOperationBase table, execute outside of the Async Service and are not meant to be interpreted as additional backlog that the Async Service needs to process. These operations have no negative impact on System Job throughput. When the Audit operation has been fully processed outside of the Async Service, these operations will be removed from the AsyncOperationBase table. Event Expander Operation jobs are used as part of this new audit functionality, these are important jobs to ensure auditing is not lost. 

These jobs are however processed by a separate service, so they do not affect async throughput, etc. in any way.  Seeing a lot of these jobs (operation type 92 - event expander operation) is not an issue, as these are constantly churning in order to write audit history. If you have custom reporting in place to monitor system jobs - you should exclude AsyncOperationType 92"

No comments: