Occasional rantings about Enterprise Architecture and Solution Architecture. Taking the first small steps in machine learning, Python and algorithmic trading
In the meanwhile Microsoft has released Detailed Power Platform request usage information in the Power Platform admin center (Preview). For integration application users you need to look at the Non-licensed User report which shows request usage per day for non-licensed users and the total entitlement for the tenant. Unfortunately for environments with a lot of integrations you might need to revert to some Excel skills or Power BI to make sense of the data (currently working in a tenant where we have 100K+ lines in the CSV export file).
In a previous post I showed how you can download ticker data from Yahoo Finance using the yFinance Python package. I now updated the Jupyter notebook code sample using YFinance to also show how you can retrieve additional information (sector, industry, trailing and forward earnings per share, etc...). The Ticker class in the yFinance library contains the info property which returns a dictionary object ( a collection of key-value pairs where each key is associated with a value) which allows you to access specific information about an asset.
Since I wanted to know how fast data retrieval would be I also include the %%time magic command . Wall clock time measures how much time has passed. CPU time is how many (milli)seconds the CPU was busy.
Yahoo Finance contains data about stocks, Exchange Traded Funds (ETF), mutual funds and stock market indices - the information that you can retrieve for each of these differs, so it is safe to check in your code for the quoteType. Below example retrieves information about Apple stock, the iShares MSCI AWCI UCITS ETF (Acc) and a thematic mutual fund from KBC.
I also included a code snippet which shows how to retrieve this information for multiple assets and convert this into a Pandas dataframe.
In this walkthrough, I will show how you can perform exploratory data analysis on stock market data using Azure Synapse serverless SQL pools. To simplify things I will just focus on daily quotes for the S&P 500.
The S&P 500 (short for Standard & Poor's 500) tracks the performance of 500 large companies listed on exchanges in the United States. The composition of the S&P 500 is typically rebalanced four times per year. The S&P 500 is a capitalization-weighted index meaning that the stocks with a higher market capitalization have a big impact on the changes in the index (See Top 10 S&P 500 stocks by index weight)
Serverless SQL Pool is an auto-scale SQL query engine that is built-in to Azure Synapse - as the term serverless indicates you don't need to worry about provisioning underlying hardware or software resources. Serverless SQL Pool uses a pay-per-use model so you will only be charged for a query if you run it to process data. Like Synapse dedicated SQL pool, serverless SQL pool also distributes processing across multiple nodes using a scale-out architecture (Check out the Microsoft research publication Polaris: the distributed SQL engine in Azure Synapse for an in-depth discussion).
Synapse Serverless SQL enables you to query external data stored in Azure Storage (including Data Lake Gen 1 and Data Lake Gen2), Cosmos DB and Dataverse. The data remains stored in Azure storage in a supported file format (CSV, JSON, Parquet or delta) and is query processing is handled by the Synapse SQL engine.
Walkthrough: analyzing S&P 500 data with Synapse serverless SQL
In this post, I will be primarily using SQL to analyze the data but this is a matter of preference (having a coding background I prefer Python to do exploratory data analysis)
After you downloaded the data you will need to upload the CSV file to the Azure data lake storage associated with Synapse Link (you can also use a different Azure storage).
If you will be using the data quite frequently, it might make more sense to use a CETAS process (CREATE EXTERNAL TABLE AS SELECT) to generate a dataset pointing to the data residing in the data lake ready for querying. In the Synapse Studio data hub, you can simply right click on a file and select the option to create an external table.
Next, select the database and the name of the table. You will need to create the external table by selecting "Use SQL Script" since you will need to adapt the script to skip the header row for reading data. For CSV files you have the option to infer column names.
You will need to modify the generated script for creating the external file format so that it skips the header row. You are still able to modify the database in which you want to create the external table (1) and I added a line to indicate that the external file contains a header row so data read should start on row 2 (2). Once you understand the script, it also possible to modify it to use wildcards, so that you can read from multiple files in multiple folders.
Now let's try out some queries in Azure Synapse Studio:
Let's get all closing prices for this century ([date]> '2020-01-01') - you will notice that you can also visualize the data using some basic graphs.
Which were the years with the largest percentage difference between the highest and lowest close for the S&P 500? No surprises here - we have the Wall Street crash of 1929 followed by the Great Depression of the 1930s, the Financial Crisis of 2007-2008 and the Covid crash in 2020 in the top 10
Which were the days with the highest difference between the day's closing price and the previous closing price - so the days in which the market crashed. In this example I used the SQL Lag() function. Besides the 1930s we also see Black Monday with a 20% decline in the S&P 500 - this triggered a global sell-off (Take a look at this video about Black Monday documentary (YouTube) with traders actually still working on the market floor)
You can also use common table expressions (CTE) for working with temporary named result sets for more complex queries and data manipulations. In the example below I want to find the 3-day trend for the S&P 500. (See Introduction to the SQL With clause if you are new to CTEs). The idea behind this query is to create a three-day trend variable for any given row. If the closing price on a day is greater than the closing price on the previous day, then we assign that day +1 one, otherwise, that date gets assigned -1 (minx_close columns). If the majority in the previous 3 days consists of positive values, the trend is positive, otherwise the trend is negative. (Example taken from Coursera: Introduction to Trading, Machine Learning & GCP )
As seen in this post, Synapse serverless SQL is quite useful for data professionals in different situations. Data engineers can use it to explore data from data lake to optimize data transformations, data scientists and data analysts can use it to quickly carry out exploratory data analysis (EDA). Take a look at Build data analytics solutions using Azure Synapse serverless SQL pools (Microsoft Learn) if you want to learn more. In an upcoming post I will show how easy it is to consume the data from Azure Synapse SQL Serverless in Power BI.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters