Thursday, July 23, 2015
Dynamics CRM 2015 development and NuGet
If you need the Dynamics CRM 2015 assemblies you can search for “crm 2015 client” in the Nuget package manager. If you need Dynamics CRM 2013 assemblies – you will need to use the package manager console as outlined in Using Nuget for Dynamics CRM Development Part 1: Nuget basics and useful links
Reference links:
Tuesday, June 16, 2015
Combining Dynamics CRM Online and Power BI Preview
When I started playing around with it I was surprised that it seemed to do things quite differently from Power BI for Office 365 since I thought it was simply the next release of the existing Power BI for Office 365 offering. Apparently this is not the case.
Power BI Preview seems to be quite different from Power BI for Office 365 - for a detailed description of differences check out Power BI vs Power BI Preview: what’s the difference – here’s a quick summary:
- Power BI for Office 365 is based on technologies such as Excel and SharePoint and is an integrated part of Office 365, whereas Power BI Preview is built on a separate platform.
- Power BI Preview is using the browser the Power BI Designer as design tool for creating dashboards and reports whereas Power BI for Office 365 mainly relies on Excel as a design tool.
- Power BI Preview also exposes an API which allows you to push data into the Power BI service – for more information check out the Power BI Developer Center. For a good introduction check out Developing for Power BI Overview (Video). This is something which I think is the key enabler for real-time analytics on your data. To stay up to date make sure that you follow the Power BI Development blog
- Power BI Preview has some new data visualizations available such as single number card tiles, combo charts, funnel charts, gauge charts, filled maps and tree maps (Check out Visualization types available in Power BI Reports)
If you check out the official documentation Use Power BI with Microsoft Dynamics Online (Technet) – it seems to focus on the new Power BI Preview but the Microsoft Dynamics CRM templates for Power BI that you can download for free from PinPoint - listed in the second section of the page - seem to be based on Power BI for Office 365. (Use Google Chrome to see the download link – I did not see it when using Internet Explorer 11)
When you actually try to use it in practice together with Dynamics CRM Online you will however encounter some serious limitations which are hopefully resolved by the summer release:
- The Microsoft Dynamics CRM content pack for Power BI preview only exposes a limited set of 10 entities and associated measures – please vote for Dynamics CRM custom field and entity support if you think this should be extended. There exists a workaround where you will export you Dynamics CRM Online data in Excel and then use Excel data in Power BI (See screenshot below for the available entity sets)
- At the moment it is not possible yet to pass in filters to the Power BI Dashboards which seems like an essential requirement for truly embedded analytics in Dynamics CRM using Power BI – you can however vote for this feature on the Microsoft Power BI Support site – Pass filters in URL.
- To make matters even worse, it simply is not possible for the moment to embed Power BI Preview at all into Dynamics CRM – a feature which is available in Power BI for Office 365.
My guess is that the way forward will be Power BI Preview (or name it Power BI 2.0) and it will replace Power BI for Office 365 – you already see it appearing in the license management section of Office 365 (see screenshot below). But for the moment it is still a Preview and no specific release date has been made available so go for Power BI for Office 365 at the moment.
References:
- Connecting to on-premises organizational data from Power BI
- Microsoft Dynamics CRM content pack for Power BI
- Microsoft CRM Online & Power BI Tutorial (Power BI for Office 365)
- Power BI vs Power BI Preview: what’s the difference
- Power BI Workbook Size Limitations
Tuesday, June 09, 2015
Getting to grips with Dynamics CRM releases, updates and build numbers
Updates and improvements to Dynamics CRM are released twice a year – in what is commonly referred to as the spring and fall release – see Microsoft Dynamics CRM – Roadmap for 2015. Given the new “Cloud first” credo of Microsoft these updates can be a cloud only release as was the case with the Spring 2015 (Carina) release. For Dynamics CRM Online you are required to be on the current version ( n ) or the prior version ( n-1 ) but you have the choice to skip an update – see Manage Dynamics CRM Online Updates. Dynamics CRM on premise follows the standard lifecycle that you are accustomed to (see Microsoft Dynamics Support Lifecycle Policy FAQ and Microsoft Product Lifecycle Search for Dynamics CRM)
To make things a little more interesting the Dynamics CRM product team seems to have chosen to use stars and constellations as code names for the different releases. Code names of the same genre are also used for closely related products to Dynamics CRM such Dynamics Marketing, Social Engagement and Parature Knowledgebase.
Recently Microsoft also changed the naming conventions for their updates and explained the version/build numbers that they are using now and for future releases – check out New naming conventions for Microsoft Dynamics CRM updates. The tables below summarizes the different versions for the moment. As outlined in Greg Olsen his blog post – Microsoft Dynamics CRM 2015 Roadmap – the next version for Dynamics CRM is code named Ara – another interesting tidbit - “Not confirmed by Microsoft, but it is likely that On-Premises installations will have to wait for the CRM ‘ARA’ release during the Fall Wave in order to get the Carina new features and others.”
| Product Name | Version description | Version number | Release or Update | Code Name |
| Microsoft Dynamics CRM Online | Fall ‘13 | 6.0.0 | Major release | Orion |
| Microsoft Dynamics CRM Online | Fall ‘13 | 6.0.1 | Incremental Update | - |
| Microsoft Dynamics CRM Online | Fall ‘13 | 6.0.2 | Incremental Update | - |
| Microsoft Dynamics CRM Online | Spring ‘14 | 6.1.0 | Minor release | Leo |
| Microsoft Dynamics CRM Online | 2015 Update (Fall ‘14) | 7.0.0 | Major release | Vega |
| Microsoft Dynamics CRM Online | 2015 Update 1 (Spring ‘15) | 7.1.0 | Minor release | Carina |
| Microsoft Dynamics CRM Online | t.b.d. | t.b.d. | t.b.d. | Ara |
| Product Name | Version description | Version number | Release or Update | Code Name |
| Microsoft Dynamics CRM (on premise) | 2013 | 6.0.0 | Major release | Orion |
| Microsoft Dynamics CRM (on premise) | 2013 UR1 | 6.0.1 | Incremental Update | - |
| Microsoft Dynamics CRM (on premise) | 2013 UR2 | 6.0.2 | Incremental Update | - |
| Microsoft Dynamics CRM (on premise) | 2013 SP1 | 6.1.0 | Minor release | Leo |
| Microsoft Dynamics CRM (on premise) | 2015 | 7.0.0 | Major release | Vega |
| Microsoft Dynamics CRM (on premise) | 2015 Update 0.1 | 7.0.1 | Minor release | Carina |
| Microsoft Dynamics CRM (on premise) | t.b.d. | t.b.d. | t.b.d. | Ara |
References:
- Dynamics CRM – Get ready for the next release
- Manage Dynamics CRM Online updates
- Customer Driven Updates in CRM Online 2015 Update 1 (Video)
- Microsoft Dynamics CRM 2015, 2013, and 2011 updates: release dates, build numbers and collateral
- Quick tip: determine the version of Microsoft Dynamics CRM
- Dynamics CRM Roadmap – Spring 2015 and beyond
- A few notes on Convergence 2015 Announcements
- Microsoft Dynamics CRM – Roadmap for 2015
Thursday, April 09, 2015
SharePoint Server 2013 and business intelligence scenarios
With all the emphasis on Microsoft Power BI – people seem to forget that there still are some other options for setting up a business intelligence solution based on SharePoint available for those of you who can’t go all in for a cloud solution (because of regulations, corporate policies or other reasons). Don’t get me wrong – I do believe that if you are standardized on Microsoft you should follow their “Cloud First” credo. Listed below are a number of links to get you started.
- Configure AdventureWorks for Business Intelligence Solutions with SharePoint 2013
- Create a connection to a data model for Power View
- Power View in SharePoint Server: Create, save and print reports
- Explore the Adventure Works Multidimensional Model by using Power View
- SharePoint Server 2013 BI – Interactive Reports using Power View in Excel 2013
- Install Reporting Services SharePoint Mode for SharePoint 2013
- Supported combinations of SharePoint and Reporting Services Server and Add-in (SQL Server 2012)
- Channel 9 Video – How to create modern BI solutions using Microsoft SharePoint Server 2013, PowerPivot and Power View in Excel 2013
SharePoint Deep Dive exploration: explaining duplicate detection in SharePoint Server 2013
- SharePoint Deep Dive exploration: SharePoint alerting
- SharePoint Deep Dive exploration: looking into the SharePoint userinfo table
SharePoint Server can detect near duplicates of documents and will take this into account when displaying search results. In this post I will delve a little deeper into the underlying techniques being used. An important thing to keep in mind is that the way that duplicate documents are identified has evolved and changed in the different versions of SharePoint.
SharePoint Server 2007 detected duplicates using a commonly used technique called "shingling". This is a generic technique which allows you to identify duplicates or near duplicates of documents (or webpages). Shingling has been widely used in different types of systems and software to identify spams, plagiarism or to enforce copyright protection. A shingle – which is more more commonly referred to as a q-gram – is a contiguous subsequence of tokens taken from a document.
So if you want to see if two documents are similar, you can do this by looking at how many shingles they have in common. You however need to determine how long your subsequence of tokens needs to be – typically a value of 4 is used. This is formalized by using S(d,w), which is the set of distinct shingles of width w which are contained in a document e.g. for the line “a rose is a rose is a rose” – so with w=4, we get the following shingles “a rose is a”, “rose is a rose”, “is a rose is”. If you wan to compare the similarity between two sets, e.g. S(doc1) and S(doc2) which are the sets of distinct shingles of document1 and document2, you can use the Jaccard similarity index (or resemblance index) to define the degree of similarity. A Jaccard index with a value of 0 means that documents are completely dissimilar, whereas 1 points to identical documents. This would however that we would need to calculate the similarity index of each pair of documents – which would be a quite intensive task – to speed up processing a form of hashing is used (for more details take a look at the explanation about near duplicates and shingling)
As items in SharePoint 2007 were indexed, these hashes were stored in the search database. It is not really clear from the documentation whether these hashes only related to the content of an item or to the properties as well (although this blog - Microsoft Office SharePoint Server 2007: Duplicate search results states that it is only on the content of a document). So in SharePoint Server 2007 these hashes were stored in the MSSDuplicateHashes tables.
In SharePoint Server 2013 these hashes are not stored in the MSSDuplicateHashes table anymore but in the DocumentSignature – this is documented in the article Customizing search results in SharePoint 2013. In the next screenshot I have used the and you will notice that although the document title and some metadata are different for the 5 documents, there are only 2 distinct document signatures. This indicates that the shingle is only calculated using the content of documents and not the metadata or the file name (Content By Search web parts don’t seem to use duplicate trimming). The document signature actually contains 4 checksums and if one of the four matches with another document, the document is treated as a duplicate. This also means that when SharePoint search encounters a document for which it is unable to extract the actual contents, it probably is not able to do proper duplicate trimming.
Since SharePoint Server 2013 search result web parts have duplicate trimming activated and SharePoint 2013 is using a quite coarse algorithm for determining a duplicate, you will see some unexpected results. Luckily after installing the SharePoint 2013 Cumulative Update July 2014 you will have the option to de-activate duplicate trimming within the query builder settings.
Another way to accomplish the same thing is by changing the settings for grouping of results. As outlined in Customizing search results in SharePoint 2013, duplicate removal of search results is a part of grouping. So if you specify to group on DocumentSignature, you would be able to show near duplicates (if one of the 4 checksums is different) but still omit the “complete” duplicates.
But the most elegant solution is the one outlined by Elio in View duplicate results in SharePoint 2013 Search Center via Javascript which allows you to change the “duplicate trimming” setting of the webpart using javascript –allowing your end users to determine themselves whether or not they want to trust the SharePoint duplicate trimming algorithm.
References:
Thursday, April 02, 2015
Big Data and Internet of Things (IOT) links
Just a quick roundup of some interesting links to articles, whitepapers and videos on Big Data and IoT. I would be amazed if you haven’t heard from Big Data – but still you might still take a look at these introductory blog posts which mainly cover Big Data from a Microsoft perspective.
- Microsoft Big Data – Introducing Windows Azure HDInsight
- Microsoft Big Data – looking into the HDInsight Emulator
- Big Data – getting to the V that really matters
- Microsoft Big Data - Overview of Apache Hadoop components in HDInsight, from Ambari to Zookeeper
Other Big Data and Internet of Things (IOT) links:
- Ten examples of IoT and Big Data working well together – the success or failure of the Internet of Things hinges on big data – says Brian Hopkins, an analyst with Forrester Research
- The internet of things and big data: unlocking the power
- Why big data matters to Boeing and what it means for your next flight
- Our favorite 40+ Big Data use cases – what’s yours?
- How Google is using big data and machine learning to aid drug discovery
- Whitepaper: Process automation and IoT – Yokogawa’s approach
- How Pfizer is using Big Data to Power Patient Care
- Platform ecosystems will be the revolutionary foundation for IoT (Accenture Technology Vision 2015)
- Making the Internet of Things – TweetHeart – A NeoPixel Heart that is twitter sensitive
- Leveraging the Internet of Everything to create a better customer experience
- Lab of Things enables research and teaching
- Bosch pools Industry 4.0 expertise in the “Connected Industry” innovation cluster
- Data Science for IoT: the role of hardware in analytics
- The analytics of things (Deloitte)
- The Internet of Things: Cities as a Platform (Sogeti labs)
- Making the Internet of Things (Part 1) – Exploring the littoral space
- Internet of Things or Things on the Internet?
- IT Tomorrow – Internet of Things (Video in Dutch)
- 6 predictions for the $125 billion Big Data analytics market in 2015
Tuesday, March 31, 2015
Overview of Apache Hadoop components in HDInsight, from Ambari to Zookeeper
- Ambari – provides provisioning, monitoring and management layer on top of Apache Hadoop clusters. It provides a web interface for easy management as well as a REST API.
- Flume – allows you to collect, aggregate and move large volumes of streaming data into HDFS in a fault tolerant fashion.
- HBase – provides NoSQL database functionality on top of HDFS. It is a columnar store, which provides fast access to large quantities of data. HBase tables can have billions of rows and these rows can have almost unlimited number of columns.
- HCatalog – provides a tabular abstraction on top of HDFS. Pig, Hive and Mapreduce use this layer to make it easier to work with files in Hadoop. HCatalog has been merged into the Hive project. Hive uses it kind of a like a master database. For more details check out Apache HCatalog – a table management layer that exposes Hive metadata to other Hadoop applications.
- Hive – allows you to perform data warehouse operations using HiveQL. HiveQL is a SQL like language and provides an abstraction layer on top of MapReduce. Hive allows you to use Hive tables to project a schema onto the data (schema on read). Through the use of HiveQL you can view your data as a table and create queries just as you would in a normal database with support for selects, filters, group by, equi-joins, etc…. Hive inherits schema and location information from HCatalog. Hive will act as a bridge to many BI products which expect tabular data. One of the recent developments around Hive is the Stinger initiative – its main aim is to deliver performance improvements while keeping SQL compatibility
- Kafka – is a fast, scalable, durable and fault-tolerant messaging system. It is commonly used together with Storm and HBase for stream processing, website activity tracking, metrics collection and monitoring or log aggregation. It is provides similar functionality as AMQP, JMS or Azure Event Hub
- Mahout – the goal of Mahout is build scalable machine learning libraries. The main machine learning use cases Apache Mahout support are recommender systems (people who buy x also buy y), classification (assigning data to discrete categories e.g. is a credit card transaction fraudelent or not) and clustering (grouping unstructured data without any training data). For more details take a look at Introducing Mahout (IBM)
- Oozie – enables you to create repeatable, dynamic workflows for tasks to be performed in a Hadoop cluster. An Oozie workflow can include Sqoop transfers, Hive jobs, HDFS commands, Mapreduce jobs, etc … Oozie will submit the jobs but Mapreduce will execute them. Oozie also has built-in callback and pollback mechanisms to check for the status of jobs
- Pegasus provides large scale graph mining capabilities by offering important graph mining algorithms such as degree calculation, pagerank calculation, random walk with restart (RWR), etc .. Most graph mining algorithms have limited scalability, they support up to millions of nodes. Pegasus billion-node graphs. Graphs (also referred to as networks) are everywhere in real life going from web pages, social networks, biological networks and many more… Finding patterns, rules etc within these networks allow you to rank web pages (or documents), measure viral marketing, discover disease patterns, etc … The details of Pegasus can be found in the white paper Pegasus: a peta-scale graph mining system – implementation and observations.
- Pig is developed to make data analysis on Hadoop easier. It is made up of two components: a high level scripting language (which is called Pig Latin but most people just reference it as Pig) and an execution environment. Pig Latin is a procedural language which allows you to build data flows, it contains a number of built in User Defined Functions (UDFs) to manipulate data. These UDFs allow you to ingest data from files, streams or other sources, make selections and transform the data. Finally Pig will store the results back into HDFS. Pig scripts are translated into a series of MapReduce jobs that are run on Apache Hadoop. Users can create their own functions or invoke code in other languages such as JRuby, Jython and Java. Pig will gives you more control and optimization over the flow of the data than Hive does.
- RHadoop – is a collection of R packages that allow users to manage and analyze data with Hadoop in R, including the creation of map-reduce jobs. Check out Step-by-step guide to setting up an R-Hadoop system and Using RHadoop to predict website visitors to get started with some hands-on examples.
- Storm – distributed real-time computation system, it supports a set of common stream analytics operations, provides guaranteed message processing with support for transactions. It was originally created by Nathan Marz (see History of Apache Storm and lessons learned) – the guy who cam up with the term Lambda architecture for a generic, scalable and fault tolerant data processing architecture.
- SQOOP – was built to transfer data from relational structured data stores (such as SQL Server, MySQL or Oracle) to Apache Hadoop and vice versa. Because Sqoop can handle database metadata, it is able to perform type-safe data movement using the data types specified in the metadata.
- Zookeeper – manages and store configuration information. It is responsible for managing and mediating conflicting updates across your Hadoop cluster.
Thursday, March 26, 2015
People insights– data driven insights regarding people
Whereas marketing and sales as well as financial departments have been using advanced analytics for quite a while, it seems that HR is still in one of the early maturity phases of analytics usage. This is a view which seemed to be shared by CEOs. In a recent study CEOs gave their HR department a 5.9 (out of 10) for their analytical skills. (See CEO niet overtuigd van analytische skills HR )
Whereas HR controls a lot of data (and needs to keep it up to date) it does not seem to be able to use this data to provide strategic advise to the board of directors. HR can only deliver truly added value by providing data-driven insights regarding people that are both compelling to business leaders and actionable by HR. This is a view which is also quite nicely outlined by consultancy firm Inostix in their HR Analytics Value Pyramid (See The HR Analytics Value Pyramid (Part 3) ). To make sure that HR team stays current and viable, they will need to adopt a whole need set of skills of which analytics is just one (See The reskilled HR team – transform HR professionals into skilled business consultants and the capability gap across the 2015 Human Capital Trends)
In a number of upcoming posts I will delve a little deeper into this topic and will show some practical examples of how you can realize some quick wins without a huge upfront investment.
Related links:
- What we learned about HR Analytics in 2014
- 17 differences between HR Metrics and Predictive HR Analytics
- Datafication of human capital
- Top 72 HR Analytics Influencers Part 3
- Business need to make better use of analytics to predict what they need than just recruiting
- Sink or swim: a tidal wave of technology is shaping HR
- How important is data analytics to the future of HR?
- Six takeaways from the HR Analytics Innovation Summit
- Is HR ready for the big data and analytics revolution?
- Making the business case for predictive talent analytics
- Leveraging predictive analytics to avoid a major point of hiring failure
SharePoint Saturday 2015 : How to build your own Delve, combining machine learning, big data and SharePoint
BIWUG is organizing the fifth edition of SharePoint Saturday Belgium – this year in Antwerp – for more information check out the site http://www.spsevents.org/city/Antwerp/Antwerp2015/ . Here is the excerpt of the session I will be delivering.
How to build your own Delve: combining machine learning, big data and SharePoint
You are experiencing the benefits of machine learning everyday through product recommendations on Amazon & Bol.com, credit card fraud prevention, etc… So how can we leverage machine learning together with SharePoint and Yammer. We will first look into the fundamentals of machine learning and big data solutions and next we will explore how we can combine tools such as Windows Azure HDInsight, R, Azure Machine Learning to extend and support collaboration and content management scenarios within your organization.
Related posts:
- Microsoft Azure Machine Learning – the power to predict
- Data science dojo – Beginning AzureML video series
- Big Data – Beyond the hype, getting to the V that really matters
- Microsoft Big Data – Introducing Windows Azure HDInsight
Wednesday, March 04, 2015
BIWUG session on advanced integration between SharePoint Online and Yammer
On the 19th of March BIWUG (www.biwug.be) is organizing its next session – don’t forget to register for BIWUG1903 – we have planned a great speaker and an interesting session
Advanced integration between SharePoint Online and Yammer using Yammer Apps (Speaker: Stephane Eyskens, SharePoint Technical Architect - http://www.silver-it.com/ )
First things first, the session will start describing what are the required steps to bind an Office 365 Tenant with an Enteprise Domain, how to federate on-premises users with Office 365 in order to have a SSO in place and how to bind Yammer to the Office 365 Tenant. Next, developers will learn how to leverage the Yammer App Model in order to build deeper integration between SPO(+on-prem) and Yammer. Business scenarios such as leveraging Yammer's Open Graph in SPO Workflows and associating Yammer Groups to SPO Team sites (& groups) will be covered. Security aspects will be discussed as well : from acting on behalf of a user with his consent to impersonating it completely, we'll see how to manage tokens and discuss some best practices.
Intended audience: The session is primarily intended for developers.
Key benefits: After this session, developers should have a good visibility on how to go beyond the OOTB Yammer App integration with
SPO and what Open Graph is all about.
Also thanks to Xylos for hosting this session
Monday, March 02, 2015
Resetting content index in SharePoint Server 2013: why and how
Don’t just reset your search index in a production environment since this will also impact the analytics processing component (Read Reset the index in SharePoint Server 2013). Listed below is the syntax for the PowerShell command (the snippet below assumes that you only have one SearchServiceApplication)
(Get-SPEnterpriseSearchServiceApplication).Reset($true,$true)
The SearchServiceApplication.Reset method takes two parameters - public void Reset( bool disableAlerts, bool ignoreUnreachableServer) – I would recommend always setting disableAlerts to true if necessary. The value for the second parameter will depend on your specific case. If you also get a timeout when using the PowerShell cmdlet – you can use the steps outlined in SharePoint 2013 Content Index Reset Timeout – they worked for me.
Friday, February 13, 2015
Mindful apps – putting people at the center supported by data
Most of the characteristics which are outlined in the comparison between traditional and mindful apps are not revolutionary (See table above) but there is one one important key message.
Mindful apps will allow us to assess and compare options in decision context, they will allow us to quickly respond to events and make the best decision given a specific context and will provide us with “extended intelligence” by understanding and recognizing patterns within the data at hand. We as humans are good at problem solving, pattern recognition, identifying outliers, making creative leaps and incorporating new information when making decisions. We should be able to focus on these high end tasks by being freed from laborious and menial tasks which can be automated.
There are 3 different trends which will impact how these mindful apps will be shaped:
- User context matters – make it personal. When we make decisions or work within the context of specific processes, there are a lot of parameters which determine how we react or how we make decisions – these parameters should be integrated into the decision framework driving mindful apps. Our calendar, availability of colleagues to reach out to, input from communications (using e-mail, messaging or other formats), information that we capture from blogs, social networks such as LinkedIn or open data sources together with available information within your organization should be filtered and at your fingertips. Machine learning and cognitive algorithms will drive the second machine age (a term coined by Brynolfson from MIT) but we are only at the start of how these algorithms can drive the future workplace for information workers.
- Mobile shapes our expectations. Mobile apps and the user experience they provide is shaping at how we see an ideal enterprise application as well. Mindful apps should strive to combine beauty, simplicity and purpose to create an experience that delights us and that is effortless to use. Mobile apps are easy to understand, when people use a good app for the first time, they intuitively grasp the most important features, why can’t we do the same for enterprise apps. Simplicity rules. The apps should also incorporate necessary logic to evolve as the user grows more comfortable with its use and is exploring more advanced functionality. Apps should learn people’s preferences over time and show the interface which is best suited for the task at hand.
- (Big) data and advanced analytics are the driving force. There is a lot of hype and confusion around the term Big Data but one thing is for sure – storage costs and processing cost have dropped significantly in the last decade. When you combine this with the rise of new storage platforms such as Hadoop, NoSQL datastores such as HBase, Cassandra, etc … and new data processing frameworks such as Apache Drill, Dremel, Spark, etc.. new opportunities arise to support users in their decision making processes. While there is a lot of emphasis on the 4 Vs (Volume, Velocity, Variety and Veracity) – there is one more V that you have to think about that is Value (Also see Big Data beyond the hype, getting to the V that really matters)
- Cloud will lead the way. A lot of the innovation which will enable this next generation of apps is coming out of the datacenters of Google, Amazon, LinkedIn, Microsoft, Yahoo, etc… but most organizations don’t have the available capacity (nor the same financial resources) as these internet giants. Luckily the economies of scales which are offered by the cloud allows solution providers to provide you with a data infrastructure which can scale from prototype size to production environments able to handle huge amounts of data. The different major cloud players – IBM, Microsoft, Amazon and Google all seem to make big bets in building out the data analytics platform of the future and this competition will drive prices further down. This competition will also force them to focus on more innovative solutions which allow them to differentiate from the competition.
The future is already here — it's just not very evenly distributed. (William Gibson)
Thursday, February 05, 2015
Microsoft Azure Machine Learning–the power to predict
The best definition for Machine Learning – in my opinion – is from the excellent book “Introduction to Machine Learning (MIT Press 2014, Ethem Alpaydin)” (Use it as a reference – this is not an easy “how to” book)
The goal of machine learning is to program computers to use example data or past experience to solve a given problem.
In general when we want to solve a problem on a computer, we need an algorithm to transform using a set of instruction into an output. Unfortunately for some problems we do not know how to program such algorithms – such as for e-mail spam detection or predicting customer behavior. In most cases we have the input and output available e.g. a set of e-mails for which some are marked as spam. Based on this data, we would like a computer (or machine) to automatically extract the algorithm necessary to perform the classification. The algorithm does not need to be perfect but needs to be a good and useful approximation.
The term machine learning is tightly coupled to the domain of analytics (or data science – see Data Scientist: the sexiest job of the 21st century ). Analytics is concerned with the discovery and extraction of useful business patterns or mathematical decision models from a specific data set. For this a number techniques can be used, depending on the practitioners background they will probably favor a technique from their respective domain:
- Regression, General Linear Models (GLMS), decision trees, etc … (originated out of the statistics domain)
- Machine learning algorithms such as support vector machines, neural networks, Bayesian methods, … (originated out of the computer science domain)
But why should you care about machine learning? I think the picture below shows you how the focus is shifting from traditional reporting (hindsight) to more advanced predictive and prescriptive analytics (foresight) which will provide business with more added value but also requires business intelligence specialist new competencies such as machine learning and data mining. Examples across industries vary but in general predictive analytics has the potential to change the way how businesses make decisions (I will take a look a more in depth definitely pick up Predictive Analytics – The power to predict who will click, buy, lie or die from Eric Siegel)
Microsoft Azure Machine Learning distinguishes itself from other platforms and tools by a number of different characteristics:
- Allows you to jointly build predictive models from anywhere in the world using only a web browser by making use of visual composition canvas (called Machine Learning Studio) using modules without requiring you to write code (although you can use R code snippets if you want). You can start quickly from existing sample experiments/models or you can share your own data experiments.
- Collaborative work together with anyone from anywhere using just your browser.
- Available as a cloud service, eliminating upfront costs fro hardware resources.
- The different modules allow you to author an end-to-end machine learning workflow starting with reading data, to training and validating your predictive model.
- Ability to deploy models as web services. You can quickly operationalize your models by converting them into web services and you even the ability to monetize your machine learning models using Azure Data Market.
References:
- Microsoft Azure Machine Learning product page
- Machine Learning Blog
- Introducing Microsoft Azure Machine Learning (TechEd Europe 2014 recording)
- Microsoft Learning on Azure (AzureConf 2014 recording)
- Extensibility and R Support in the Azure Machine Learning Platform
- How to upload an R package to Azure Machine Learning
- Vowpal Wabbit Modules in AzureML
- Predict What’s Next: How to get started with Machine Learning Part 1
- Predict What’s Next: How to get started with Machine Learning Part 2
- AzureML : a short introduction
Wednesday, January 28, 2015
BIWUG session–imec Share - an Office 365 customer case
I have uploaded the presentation from yesterday on slideshare – check it out
Listed below also a number of supporting links:
- Office365 Developer Patterns and Practices (GitHub)
- SharePoint Online: software boundaries and limits
- How to avoid getting throttled or blocked in SharePoint Online
- Give feedback about the Office Developer platform (including Office 365) – please vote for the ability modify location based metadata defaults using CSOM
- Get early access to new features in Office 365 and provide feedback with Uservoice
- Office 365 roadmap
- SharePoint Online Client Components SDK (June release)
- App parts/iframes are not the only solution – check out Javascript injection in SharePoint Online – Office 365 Developer Patterns and Practices
- Transforming your SharePoint Full Trust Code to the SharePoint App Model
Wednesday, January 21, 2015
SharePoint Saturday Belgium 2015– Call for speakers
On April 18th 2015 BIWUG (www.biwug.be) is organizing its fifth edition of SharePoint Saturday Belgium. We invite you to submit a session for this year's SharePoint Saturday Belgium using this link - http://www.spsevents.org/city/Antwerp/Antwerp2015 . It is possible to submit multiple sessions. We will close the call for speakers on February 18th EOD.
SharePoint Saturday Belgium 2015 will take place in Antwerp – for more details check out http://www.spsevents.org/city/Antwerp/Antwerp2015. If you have any questions or remarks, do not hesitate to contact me.
Monday, January 05, 2015
Data Science Dojo– Beginning AzureML video series
Interesting video series to start with if you want to learn how you can use Microsoft Azure Machine Learning (AzureML)
- Beginning Azure ML Part 1 – Importing Data, accessing and creating a new experiment
- Beginning Azure ML Part 2 – Reading external data sources
- Beginning Azure ML Part 3 – Data exploration and visualization
- Beginning Azure ML Part 4 - Preprocessing data part I: casting and renaming columns
- Beginning Azure ML Part 5 – Preprocessing data part II: scrub missing values and project columns
- Beginning Azure ML Part 6 - Feature engineering and R script
- Beginning Azure ML Part 7 – Building your First Model
- Beginning Azure ML Part 8 – Run and fine-tune multiple models
- Beginning Azure ML Part 9 – Deploying your first predictive model as a web service
- Beginning Azure ML Part 10 – Using R API to obtain predictions from your web service
- Beginning Azure ML Part 11 – Using Python API to obtain predictions from your web service
Monday, December 15, 2014
Introducing Azure Stream Analytics
Azure Stream Analytics which is currently in preview is a fully managed real-time stream analytics service that aims at providing highly resilient, low latency, and scalable complex event processing of streaming data for scenarios such as Internet of Things, command and control (of devices) and real-time Business Intelligence on streaming data.
Although it might look similar to Amazon Kinesis, it seems to distinguish itself by aiming to increase developer productivity by enabling you to author streaming jobs using a SQL-like language to specify necessary transformations and it provides a range of operators which are quite useful to define time-based operations such as windowed aggregations (Check out Stream Analytics Query Language Reference for more information) – listed below is an example taken from the documentation which finds all toll booths which have served more than 3 vehicles in the last 5 minutes (See Sliding Window – slides by an epsilon and produces output at the occurrence of an event)
SELECT DateAdd(minute,-5,System.TimeStamp) AS WinStartTime, System.TimeStamp AS WinEndTime, TollId, COUNT(*)
FROM Input TIMESTAMP BY EntryTime
GROUP BY TollId, SlidingWindow(minute, 5)
HAVING COUNT(*) > 3
References
- Introducing Azure Stream Analytics: Processing on Near-Realtime Data (TechEd Europe 2014 recording)
- Microsoft adds IoT Streaming Analytics, Data Production and Workflow Services to Azure
- Telemetry and Data Flow at Hyper-Scale: Azure Event Hub
Thursday, December 11, 2014
SharePoint deep dive exploration: SharePoint alerting
This is the second in a series of blogpost on SharePoint Server 2013 in which we will explorer how e-mail alerting works in SharePoint 2013. For part 1 – take a look at SharePoint deep dive exploration: looking into the SharePoint UserInfo table.
If you need to know more about how alerts are working at the lowest level you should take a look at SharePoint 2003 Database tables documentation – for alerts this documentation still seems to be valid. SharePoint stores the list of events for which users have request alerts in the EventCache Table – in comparison to SharePoint 2003 there are some extra fields available (marked in bold). For some of the fields I did not find a
The other tables which are manipulated by the the SharePoint alert framework are EventLog, ImmedSubscriptions, SchedSubscriptions and EventSubsMatches (For an in depth discussion also take a look at the Working with search alerts in SharePoint 2010 ). Every event is recorded in these table but since the EventType and EventData column will contain the most data, these are only filled in when the list has at least one subscription.
So how does this works – there actually is a SharePoint timer job – called the “Immediated Alerts” job which is scheduled to run every 5 minutes. This will pick up the necessary event information and will process it (in batches of 10.000) – if you see issues with alerts not being sent out – I recommend you to take a look at SharePoint Scheduled Alerts Deconstructed
| Column Name | Description |
| EventTime | Time when the event record was inserted into the database |
| SiteId | ID of the site, available from the AllSites table |
| WebId | ID of the web, available from the AllWebs table |
| ListId | ID of the list in which the monitored item appears |
| ItemId | ID of the item that raised the event |
| DocId | ID of the document that raised the event |
| Guid0 | ? |
| Int0 | ? |
| Int1 | ? |
| ContentTypeId | ? |
| ItemName | Full name of the item |
| ItemFullUrl | Full path of the item |
| EventType | ItemAdded(1), Item Modified (2), Item Deleted (4), DiscussionAdded (16), Discussion Modified(32), Discussion Deleted(64), Discussion Closed (128), Discussion Activated (256), … |
| ObjectType | |
| ModifiedBy | User name of the person who raised the event |
| TimeLastModified | Time when the event occurred |
| EventData | The binary large object (BLOB) containing all of the field changes with the old and new values for an item |
| ACL | The ACL for the item at the time it is edited |
| DocClientId | |
| CorrelationId |
The reason why I started looking into these tables because I got feedback from a client that all e-mail alerts which were being sent out had the wrong link in it after we migrated their environment from SharePoint 2007 to 2013. One of the first things that I did was actually sit next to the user who was adding documents in SharePoint and then I noticed something strange. The user uploaded a document and when they needed to fill in extra metadata, they immediately changed the name of the document.
After looking into how alerting works I still did not get an explanation for why the links were sent out correctly before in 2007 – because this should have failed as well. So I used this PowerShell script to create an export of all the e-mail alerts/subscriptions that users had in SharePoint and I noticed that most of the alerts were on just a couple of libraries and then I found it.
In SharePoint 2007, they had a “require check out” set by default on these libraries – this means that when the user uploaded and renamed the document, it was not yet visible to other users and the alert was not send out. If checkout is not required then the files are immediately visible and the “New Item Added” immediate alerts is fired – this was the behavior that they were seeing in 2013.
So the “require checkout” is an interesting workaround to prevent a file from being visible before it is explicitly checked in. Since they were changing the file properties (and even the filename) before the file is visible to users, the New Item alerts would not trigger and users would only be notified of the “Changed Item” alert when the file was checked in.
The reason why we deactivated “require check out” was because of it would conflict with co-authoring but apparently they would never use this feature for these specific libraries for which these alerts were set. So the morale of the story, don’t just activate or change a specific functionality because it is available in a new version but first look at how people are actually using it.
References:
- Use Windows PowerShell to update alerts in SharePoint 2013
- SharePoint scheduled alerts deconstructed
- Working with search alerts in SharePoint 2010















