JOPX on All things architecture: August 2022

Friday, August 26, 2022

Detailed Power Platform request usage information in Power Platform Admin Center in preview

I wrote a post in January 2022 on the changes in Dynamics 365 and Power Platform request limits and allocations mentioning that detailed reporting was not available at that point in time.

In the meanwhile Microsoft has released Detailed Power Platform request usage information in the Power Platform admin center (Preview). For integration application users you need to look at the Non-licensed User report which shows request usage per day for non-licensed users and the total entitlement for the tenant. Unfortunately for environments with a lot of integrations you might need to revert to some Excel skills or Power BI to make sense of the data (currently working in a tenant where we have 100K+ lines in the CSV export file).

For the moment there is no high usage enforcement for Power Platform request limits, but this might start at least six months after the reports have been made available.

References:

Power Platform Licensing FAQ - what add-ons are available to the Power Apps and Power Automate plans

Thursday, August 18, 2022

Presentation on using Python and Azure Cloud for (stock) trading and investing

The presentation from Dataminds Connect 2021 - Using Python and Azure Cloud for (stock) trading and investing is now available on Github - the Jupyter notebooks I demoed during the presentation are also available on Github:

Wednesday, August 17, 2022

Dynamics 365 and Power Platform monthly reading list July 2022

Power Platform and Dynamics 365 release 2022 wave 2

Technical topics (Configuration, customization and extensibility)

Topics for Dynamics 365 Business Applications Platform consultants, project managers and power users

Friday, August 12, 2022

Using the yFinance Python package to download financial data from Yahoo Finance - part 2

In a previous post I showed how you can download ticker data from Yahoo Finance using the yFinance Python package. I now updated the Jupyter notebook code sample using YFinance to also show how you can retrieve additional information (sector, industry, trailing and forward earnings per share, etc...). The Ticker class in the yFinance library contains the info property which returns a dictionary object ( a collection of key-value pairs where each key is associated with a value) which allows you to access specific information about an asset.

Since I wanted to know how fast data retrieval would be I also include the %%time magic command . Wall clock time measures how much time has passed. CPU time is how many (milli)seconds the CPU was busy.

Yahoo Finance contains data about stocks, Exchange Traded Funds (ETF), mutual funds and stock market indices - the information that you can retrieve for each of these differs, so it is safe to check in your code for the quoteType. Below example retrieves information about Apple stock, the iShares MSCI AWCI UCITS ETF (Acc) and a thematic mutual fund from KBC.

I also included a code snippet which shows how to retrieve this information for multiple assets and convert this into a Pandas dataframe.

Tuesday, August 02, 2022

Explore and analyze stock ticker data in Azure data lake with Azure Synapse serverless SQL Pool

In this walkthrough, I will show how you can perform exploratory data analysis on stock market data using Azure Synapse serverless SQL pools. To simplify things I will just focus on daily quotes for the S&P 500.

The S&P 500 (short for Standard & Poor's 500) tracks the performance of 500 large companies listed on exchanges in the United States. The composition of the S&P 500 is typically rebalanced four times per year. The S&P 500 is a capitalization-weighted index meaning that the stocks with a higher market capitalization have a big impact on the changes in the index (See Top 10 S&P 500 stocks by index weight)

I downloaded all daily data for the S&P 500 stock market index (ticker symbol is ^GSPC) from Yahoo Finance using the historical data tab in CSV format. The S&P CSV file contains the date, open, high, low, close, volume, dividends and stock splits for the S&P 500 from December 1927 (but the index in its current form was only created in 1957) until now (dividends and stock splits are not relevant). I manually downloaded the file but take a look at Using Python and Pandas Datareader to retrieve financial data part 3: Yahoo Finance and Using the yFinance Python package to download financial data from Yahoo Finance for ways to automate retrieving data from Yahoo Finance using Python.

Serverless SQL Pools in Azure Synapse

Serverless SQL Pool is an auto-scale SQL query engine that is built-in to Azure Synapse - as the term serverless indicates you don't need to worry about provisioning underlying hardware or software resources. Serverless SQL Pool uses a pay-per-use model so you will only be charged for a query if you run it to process data. Like Synapse dedicated SQL pool, serverless SQL pool also distributes processing across multiple nodes using a scale-out architecture (Check out the Microsoft research publication Polaris: the distributed SQL engine in Azure Synapse for an in-depth discussion).

Synapse Serverless SQL enables you to query external data stored in Azure Storage (including Data Lake Gen 1 and Data Lake Gen2), Cosmos DB and Dataverse. The data remains stored in Azure storage in a supported file format (CSV, JSON, Parquet or delta) and is query processing is handled by the Synapse SQL engine.

Walkthrough: analyzing S&P 500 data with Synapse serverless SQL

In this post I will not show you how you need to setup Azure Synapse - take a look at Quickstart: Create a Synapse Workspace for a detailed walkthrough - the Microsoft Learn learning paths which I added in the references are also quite useful.

In this post, I will be primarily using SQL to analyze the data but this is a matter of preference (having a coding background I prefer Python to do exploratory data analysis)

After you downloaded the data you will need to upload the CSV file to the Azure data lake storage associated with Synapse Link (you can also use a different Azure storage).

The OpenRowset (Bulk..) function allows you to access files in Azure storage. The SP500.csv file has a header row specifying the different columns in use - it contains all daily ticker data since December 1927. I am using Parser_Version 2.0 since it is more performant but it has some limitations (see the Arguments section in Microsoft's OpenRowSet documentation) - also check out How do Synapse serverless SQL pools deal with different file schemas (or schema evolution) part 1 CSV for some interesting info on how schema changes are handled.

If you will be using the data quite frequently, it might make more sense to use a CETAS process (CREATE EXTERNAL TABLE AS SELECT) to generate a dataset pointing to the data residing in the data lake ready for querying. In the Synapse Studio data hub, you can simply right click on a file and select the option to create an external table.

Next, select the database and the name of the table. You will need to create the external table by selecting "Use SQL Script" since you will need to adapt the script to skip the header row for reading data. For CSV files you have the option to infer column names.

You will need to modify the generated script for creating the external file format so that it skips the header row. You are still able to modify the database in which you want to create the external table (1) and I added a line to indicate that the external file contains a header row so data read should start on row 2 (2). Once you understand the script, it also possible to modify it to use wildcards, so that you can read from multiple files in multiple folders.

Now let's try out some queries in Azure Synapse Studio:

Let's get all closing prices for this century ([date]> '2020-01-01') - you will notice that you can also visualize the data using some basic graphs.
Which were the years with the largest percentage difference between the highest and lowest close for the S&P 500? No surprises here - we have the Wall Street crash of 1929 followed by the Great Depression of the 1930s, the Financial Crisis of 2007-2008 and the Covid crash in 2020 in the top 10
Which were the days with the highest difference between the day's closing price and the previous closing price - so the days in which the market crashed. In this example I used the SQL Lag() function. Besides the 1930s we also see Black Monday with a 20% decline in the S&P 500 - this triggered a global sell-off (Take a look at this video about Black Monday documentary (YouTube) with traders actually still working on the market floor)
You can also use common table expressions (CTE) for working with temporary named result sets for more complex queries and data manipulations. In the example below I want to find the 3-day trend for the S&P 500. (See Introduction to the SQL With clause if you are new to CTEs). The idea behind this query is to create a three-day trend variable for any given row. If the closing price on a day is greater than the closing price on the previous day, then we assign that day +1 one, otherwise, that date gets assigned -1 (minx_close columns). If the majority in the previous 3 days consists of positive values, the trend is positive, otherwise the trend is negative. (Example taken from Coursera: Introduction to Trading, Machine Learning & GCP )

As seen in this post, Synapse serverless SQL is quite useful for data professionals in different situations. Data engineers can use it to explore data from data lake to optimize data transformations, data scientists and data analysts can use it to quickly carry out exploratory data analysis (EDA). Take a look at Build data analytics solutions using Azure Synapse serverless SQL pools (Microsoft Learn) if you want to learn more. In an upcoming post I will show how easy it is to consume the data from Azure Synapse SQL Serverless in Power BI.

	/* Replace [yourstorageaccount] with the name of the Azure Blob Storage */
	select Top 100 * from
	OPENROWSET(BULK 'https://[yourstorageaccount].dfs.core.windows.net/synapse/tickerdata/sp500.csv',

	FORMAT = 'CSV',
	PARSER_VERSION = '2.0',
	HEADER_ROW =TRUE --enable header row
	)
	AS [result]

	/* Queries used in blog post
	select Min([date])as min_date,Max([date]) as max_date from SP500

	select Year([date]) as stockyear, min([close]) as lowest, max([close]) as highest,
	round((max([close])-min([close]))/min([close])*100,2) as pctdiff from sp500
	group by Year([date])
	order by pctdiff desc

	Select TOP 5 [date], [Close], Lag([Close],1) OVER(ORDER BY [Date]) as previousclose,
	(100*[Close]/Lag([Close],1) OVER(ORDER BY [Date]))-100 as pct_close_change
	from SP500
	ORDER BY pct_close_change ASC

view raw gistfile1.txt hosted with ❤ by GitHub

References: