CTRL+ALT+JOPX

Tuesday, June 16, 2015

Combining Dynamics CRM Online and Power BI Preview

I am a strong believer of the concept of “in-context analytics” and as I outlined in Mindful apps – putting people at the center supported by data I consider analytics and business intelligence to be essential in providing business value. So I was quite interested when I first learned about the Power BI preview with it’s built in support Dynamics CRM Online (For a great write up about it check out Previewing the New Power BI Experience with Dynamics CRM).

When I started playing around with it I was surprised that it seemed to do things quite differently from Power BI for Office 365 since I thought it was simply the next release of the existing Power BI for Office 365 offering. Apparently this is not the case.

Power BI Preview seems to be quite different from Power BI for Office 365 - for a detailed description of differences check out Power BI vs Power BI Preview: what’s the difference – here’s a quick summary:

Power BI for Office 365 is based on technologies such as Excel and SharePoint and is an integrated part of Office 365, whereas Power BI Preview is built on a separate platform.
Power BI Preview is using the browser the Power BI Designer as design tool for creating dashboards and reports whereas Power BI for Office 365 mainly relies on Excel as a design tool.
Power BI Preview also exposes an API which allows you to push data into the Power BI service – for more information check out the Power BI Developer Center. For a good introduction check out Developing for Power BI Overview (Video). This is something which I think is the key enabler for real-time analytics on your data. To stay up to date make sure that you follow the Power BI Development blog
Power BI Preview has some new data visualizations available such as single number card tiles, combo charts, funnel charts, gauge charts, filled maps and tree maps (Check out Visualization types available in Power BI Reports)

If you check out the official documentation Use Power BI with Microsoft Dynamics Online (Technet) – it seems to focus on the new Power BI Preview but the Microsoft Dynamics CRM templates for Power BI that you can download for free from PinPoint - listed in the second section of the page - seem to be based on Power BI for Office 365. (Use Google Chrome to see the download link – I did not see it when using Internet Explorer 11)

When you actually try to use it in practice together with Dynamics CRM Online you will however encounter some serious limitations which are hopefully resolved by the summer release:

The Microsoft Dynamics CRM content pack for Power BI preview only exposes a limited set of 10 entities and associated measures – please vote for Dynamics CRM custom field and entity support if you think this should be extended. There exists a workaround where you will export you Dynamics CRM Online data in Excel and then use Excel data in Power BI (See screenshot below for the available entity sets)
At the moment it is not possible yet to pass in filters to the Power BI Dashboards which seems like an essential requirement for truly embedded analytics in Dynamics CRM using Power BI – you can however vote for this feature on the Microsoft Power BI Support site – Pass filters in URL.
To make matters even worse, it simply is not possible for the moment to embed Power BI Preview at all into Dynamics CRM – a feature which is available in Power BI for Office 365.

My guess is that the way forward will be Power BI Preview (or name it Power BI 2.0) and it will replace Power BI for Office 365 – you already see it appearing in the license management section of Office 365 (see screenshot below). But for the moment it is still a Preview and no specific release date has been made available so go for Power BI for Office 365 at the moment.

References:

Tags van Technorati: Power+BI,visualization,Office365,analytics,CRM,Dynamics+CRM

Tuesday, June 09, 2015

Getting to grips with Dynamics CRM releases, updates and build numbers

Since a couple of weeks I have been working with Dynamics CRM. One of the things which is always challenging when starting to learn a new product is getting to understand the different versions and the changes between versions. When Microsoft was still on a 3-year release cycle for their products, this was quite easy to understand but most Microsoft products are now on a more much more frequent release schedule and Dynamics CRM is no exception.
Updates and improvements to Dynamics CRM are released twice a year – in what is commonly referred to as the spring and fall release – see Microsoft Dynamics CRM – Roadmap for 2015. Given the new “Cloud first” credo of Microsoft these updates can be a cloud only release as was the case with the Spring 2015 (Carina) release. For Dynamics CRM Online you are required to be on the current version ( n ) or the prior version ( n-1 ) but you have the choice to skip an update – see Manage Dynamics CRM Online Updates. Dynamics CRM on premise follows the standard lifecycle that you are accustomed to (see Microsoft Dynamics Support Lifecycle Policy FAQ and Microsoft Product Lifecycle Search for Dynamics CRM)

To make things a little more interesting the Dynamics CRM product team seems to have chosen to use stars and constellations as code names for the different releases. Code names of the same genre are also used for closely related products to Dynamics CRM such Dynamics Marketing, Social Engagement and Parature Knowledgebase.
Recently Microsoft also changed the naming conventions for their updates and explained the version/build numbers that they are using now and for future releases – check out New naming conventions for Microsoft Dynamics CRM updates. The tables below summarizes the different versions for the moment. As outlined in Greg Olsen his blog post – Microsoft Dynamics CRM 2015 Roadmap – the next version for Dynamics CRM is code named Ara – another interesting tidbit - “Not confirmed by Microsoft, but it is likely that On-Premises installations will have to wait for the CRM ‘ARA’ release during the Fall Wave in order to get the Carina new features and others.”

Product Name	Version description	Version number	Release or Update	Code Name
Microsoft Dynamics CRM Online	Fall ‘13	6.0.0	Major release	Orion
Microsoft Dynamics CRM Online	Fall ‘13	6.0.1	Incremental Update	-
Microsoft Dynamics CRM Online	Fall ‘13	6.0.2	Incremental Update	-
Microsoft Dynamics CRM Online	Spring ‘14	6.1.0	Minor release	Leo
Microsoft Dynamics CRM Online	2015 Update (Fall ‘14)	7.0.0	Major release	Vega
Microsoft Dynamics CRM Online	2015 Update 1 (Spring ‘15)	7.1.0	Minor release	Carina
Microsoft Dynamics CRM Online	t.b.d.	t.b.d.	t.b.d.	Ara

Table1. Releases Microsoft Dynamics CRM online

Product Name	Version description	Version number	Release or Update	Code Name
Microsoft Dynamics CRM (on premise)	2013	6.0.0	Major release	Orion
Microsoft Dynamics CRM (on premise)	2013 UR1	6.0.1	Incremental Update	-
Microsoft Dynamics CRM (on premise)	2013 UR2	6.0.2	Incremental Update	-
Microsoft Dynamics CRM (on premise)	2013 SP1	6.1.0	Minor release	Leo
Microsoft Dynamics CRM (on premise)	2015	7.0.0	Major release	Vega
Microsoft Dynamics CRM (on premise)	2015 Update 0.1	7.0.1	Minor release	Carina
Microsoft Dynamics CRM (on premise)	t.b.d.	t.b.d.	t.b.d.	Ara

Table 2. Releases Microsoft Dynamics CRM (on premise)
References:

Thursday, April 09, 2015

SharePoint Server 2013 and business intelligence scenarios

With all the emphasis on Microsoft Power BI – people seem to forget that there still are some other options for setting up a business intelligence solution based on SharePoint available for those of you who can’t go all in for a cloud solution (because of regulations, corporate policies or other reasons). Don’t get me wrong – I do believe that if you are standardized on Microsoft you should follow their “Cloud First” credo. Listed below are a number of links to get you started.

Tags van Technorati: SharePoint,Business+Intelligence,BI,Reporting+Services

SharePoint Deep Dive exploration: explaining duplicate detection in SharePoint Server 2013

This is the third post in a series of posts which try to delve a little deeper in the inner workings of SharePoint - for the previous post check out:

SharePoint Server can detect near duplicates of documents and will take this into account when displaying search results. In this post I will delve a little deeper into the underlying techniques being used. An important thing to keep in mind is that the way that duplicate documents are identified has evolved and changed in the different versions of SharePoint.

SharePoint Server 2007 detected duplicates using a commonly used technique called "shingling". This is a generic technique which allows you to identify duplicates or near duplicates of documents (or webpages). Shingling has been widely used in different types of systems and software to identify spams, plagiarism or to enforce copyright protection. A shingle – which is more more commonly referred to as a q-gram – is a contiguous subsequence of tokens taken from a document.
So if you want to see if two documents are similar, you can do this by looking at how many shingles they have in common. You however need to determine how long your subsequence of tokens needs to be – typically a value of 4 is used. This is formalized by using S(d,w), which is the set of distinct shingles of width w which are contained in a document e.g. for the line “a rose is a rose is a rose” – so with w=4, we get the following shingles “a rose is a”, “rose is a rose”, “is a rose is”. If you wan to compare the similarity between two sets, e.g. S(doc1) and S(doc2) which are the sets of distinct shingles of document1 and document2, you can use the Jaccard similarity index (or resemblance index) to define the degree of similarity. A Jaccard index with a value of 0 means that documents are completely dissimilar, whereas 1 points to identical documents. This would however that we would need to calculate the similarity index of each pair of documents – which would be a quite intensive task – to speed up processing a form of hashing is used (for more details take a look at the explanation about near duplicates and shingling)

As items in SharePoint 2007 were indexed, these hashes were stored in the search database. It is not really clear from the documentation whether these hashes only related to the content of an item or to the properties as well (although this blog - Microsoft Office SharePoint Server 2007: Duplicate search results states that it is only on the content of a document). So in SharePoint Server 2007 these hashes were stored in the MSSDuplicateHashes tables.

In SharePoint Server 2013 these hashes are not stored in the MSSDuplicateHashes table anymore but in the DocumentSignature – this is documented in the article Customizing search results in SharePoint 2013. In the next screenshot I have used the and you will notice that although the document title and some metadata are different for the 5 documents, there are only 2 distinct document signatures. This indicates that the shingle is only calculated using the content of documents and not the metadata or the file name (Content By Search web parts don’t seem to use duplicate trimming). The document signature actually contains 4 checksums and if one of the four matches with another document, the document is treated as a duplicate. This also means that when SharePoint search encounters a document for which it is unable to extract the actual contents, it probably is not able to do proper duplicate trimming.

Since SharePoint Server 2013 search result web parts have duplicate trimming activated and SharePoint 2013 is using a quite coarse algorithm for determining a duplicate, you will see some unexpected results. Luckily after installing the SharePoint 2013 Cumulative Update July 2014 you will have the option to de-activate duplicate trimming within the query builder settings.

Another way to accomplish the same thing is by changing the settings for grouping of results. As outlined in Customizing search results in SharePoint 2013, duplicate removal of search results is a part of grouping. So if you specify to group on DocumentSignature, you would be able to show near duplicates (if one of the 4 checksums is different) but still omit the “complete” duplicates.

But the most elegant solution is the one outlined by Elio in View duplicate results in SharePoint 2013 Search Center via Javascript which allows you to change the “duplicate trimming” setting of the webpart using javascript –allowing your end users to determine themselves whether or not they want to trust the SharePoint duplicate trimming algorithm.
References:

Thursday, April 02, 2015

Big Data and Internet of Things (IOT) links

Just a quick roundup of some interesting links to articles, whitepapers and videos on Big Data and IoT. I would be amazed if you haven’t heard from Big Data – but still you might still take a look at these introductory blog posts which mainly cover Big Data from a Microsoft perspective.

Other Big Data and Internet of Things (IOT) links:

Tags van Technorati: big+data,IOT,analytics,Microsoft,HDInsight,IBM

Tuesday, March 31, 2015

Overview of Apache Hadoop components in HDInsight, from Ambari to Zookeeper

A couple of months ago I wrote a first post about Microsoft Big Data – Introducing Windows Azure HDInsight. In this post I will delve a little deeper into the different components which are used in HDInsight. This is not an exhaustive list of components but it lists a number of components which you might encounter when working on your first big data project using Microsoft Azure HDInsight.

Ambari – provides provisioning, monitoring and management layer on top of Apache Hadoop clusters. It provides a web interface for easy management as well as a REST API.
Flume – allows you to collect, aggregate and move large volumes of streaming data into HDFS in a fault tolerant fashion.
HBase – provides NoSQL database functionality on top of HDFS. It is a columnar store, which provides fast access to large quantities of data. HBase tables can have billions of rows and these rows can have almost unlimited number of columns.
HCatalog – provides a tabular abstraction on top of HDFS. Pig, Hive and Mapreduce use this layer to make it easier to work with files in Hadoop. HCatalog has been merged into the Hive project. Hive uses it kind of a like a master database. For more details check out Apache HCatalog – a table management layer that exposes Hive metadata to other Hadoop applications.
Hive – allows you to perform data warehouse operations using HiveQL. HiveQL is a SQL like language and provides an abstraction layer on top of MapReduce. Hive allows you to use Hive tables to project a schema onto the data (schema on read). Through the use of HiveQL you can view your data as a table and create queries just as you would in a normal database with support for selects, filters, group by, equi-joins, etc…. Hive inherits schema and location information from HCatalog. Hive will act as a bridge to many BI products which expect tabular data. One of the recent developments around Hive is the Stinger initiative – its main aim is to deliver performance improvements while keeping SQL compatibility
Kafka – is a fast, scalable, durable and fault-tolerant messaging system. It is commonly used together with Storm and HBase for stream processing, website activity tracking, metrics collection and monitoring or log aggregation. It is provides similar functionality as AMQP, JMS or Azure Event Hub
Mahout – the goal of Mahout is build scalable machine learning libraries. The main machine learning use cases Apache Mahout support are recommender systems (people who buy x also buy y), classification (assigning data to discrete categories e.g. is a credit card transaction fraudelent or not) and clustering (grouping unstructured data without any training data). For more details take a look at Introducing Mahout (IBM)
Oozie – enables you to create repeatable, dynamic workflows for tasks to be performed in a Hadoop cluster. An Oozie workflow can include Sqoop transfers, Hive jobs, HDFS commands, Mapreduce jobs, etc … Oozie will submit the jobs but Mapreduce will execute them. Oozie also has built-in callback and pollback mechanisms to check for the status of jobs
Pegasus provides large scale graph mining capabilities by offering important graph mining algorithms such as degree calculation, pagerank calculation, random walk with restart (RWR), etc .. Most graph mining algorithms have limited scalability, they support up to millions of nodes. Pegasus billion-node graphs. Graphs (also referred to as networks) are everywhere in real life going from web pages, social networks, biological networks and many more… Finding patterns, rules etc within these networks allow you to rank web pages (or documents), measure viral marketing, discover disease patterns, etc … The details of Pegasus can be found in the white paper Pegasus: a peta-scale graph mining system – implementation and observations.
Pig is developed to make data analysis on Hadoop easier. It is made up of two components: a high level scripting language (which is called Pig Latin but most people just reference it as Pig) and an execution environment. Pig Latin is a procedural language which allows you to build data flows, it contains a number of built in User Defined Functions (UDFs) to manipulate data. These UDFs allow you to ingest data from files, streams or other sources, make selections and transform the data. Finally Pig will store the results back into HDFS. Pig scripts are translated into a series of MapReduce jobs that are run on Apache Hadoop. Users can create their own functions or invoke code in other languages such as JRuby, Jython and Java. Pig will gives you more control and optimization over the flow of the data than Hive does.
RHadoop – is a collection of R packages that allow users to manage and analyze data with Hadoop in R, including the creation of map-reduce jobs. Check out Step-by-step guide to setting up an R-Hadoop system and Using RHadoop to predict website visitors to get started with some hands-on examples.
Storm – distributed real-time computation system, it supports a set of common stream analytics operations, provides guaranteed message processing with support for transactions. It was originally created by Nathan Marz (see History of Apache Storm and lessons learned) – the guy who cam up with the term Lambda architecture for a generic, scalable and fault tolerant data processing architecture.
SQOOP – was built to transfer data from relational structured data stores (such as SQL Server, MySQL or Oracle) to Apache Hadoop and vice versa. Because Sqoop can handle database metadata, it is able to perform type-safe data movement using the data types specified in the metadata.
Zookeeper – manages and store configuration information. It is responsible for managing and mediating conflicting updates across your Hadoop cluster.

Tags van Technorati: hadoop,mahout,storm,big+data,pig,hive,oozie,zookeeper,apache

Thursday, March 26, 2015

People insights– data driven insights regarding people

Whereas marketing and sales as well as financial departments have been using advanced analytics for quite a while, it seems that HR is still in one of the early maturity phases of analytics usage. This is a view which seemed to be shared by CEOs. In a recent study CEOs gave their HR department a 5.9 (out of 10) for their analytical skills. (See CEO niet overtuigd van analytische skills HR )

Whereas HR controls a lot of data (and needs to keep it up to date) it does not seem to be able to use this data to provide strategic advise to the board of directors. HR can only deliver truly added value by providing data-driven insights regarding people that are both compelling to business leaders and actionable by HR. This is a view which is also quite nicely outlined by consultancy firm Inostix in their HR Analytics Value Pyramid (See The HR Analytics Value Pyramid (Part 3) ). To make sure that HR team stays current and viable, they will need to adopt a whole need set of skills of which analytics is just one (See The reskilled HR team – transform HR professionals into skilled business consultants and the capability gap across the 2015 Human Capital Trends)

In a number of upcoming posts I will delve a little deeper into this topic and will show some practical examples of how you can realize some quick wins without a huge upfront investment.

Related links:

SharePoint Saturday 2015 : How to build your own Delve, combining machine learning, big data and SharePoint

BIWUG is organizing the fifth edition of SharePoint Saturday Belgium – this year in Antwerp – for more information check out the site http://www.spsevents.org/city/Antwerp/Antwerp2015/ . Here is the excerpt of the session I will be delivering.

How to build your own Delve: combining machine learning, big data and SharePoint

You are experiencing the benefits of machine learning everyday through product recommendations on Amazon & Bol.com, credit card fraud prevention, etc… So how can we leverage machine learning together with SharePoint and Yammer. We will first look into the fundamentals of machine learning and big data solutions and next we will explore how we can combine tools such as Windows Azure HDInsight, R, Azure Machine Learning to extend and support collaboration and content management scenarios within your organization.

Tags van Technorati: SharePoint,big+data,azureml,machine+learning

Wednesday, March 04, 2015

BIWUG session on advanced integration between SharePoint Online and Yammer

On the 19th of March BIWUG (www.biwug.be) is organizing its next session – don’t forget to register for BIWUG1903 – we have planned a great speaker and an interesting session

Advanced integration between SharePoint Online and Yammer using Yammer Apps (Speaker: Stephane Eyskens, SharePoint Technical Architect - http://www.silver-it.com/ )

First things first, the session will start describing what are the required steps to bind an Office 365 Tenant with an Enteprise Domain, how to federate on-premises users with Office 365 in order to have a SSO in place and how to bind Yammer to the Office 365 Tenant. Next, developers will learn how to leverage the Yammer App Model in order to build deeper integration between SPO(+on-prem) and Yammer. Business scenarios such as leveraging Yammer's Open Graph in SPO Workflows and associating Yammer Groups to SPO Team sites (& groups) will be covered. Security aspects will be discussed as well : from acting on behalf of a user with his consent to impersonating it completely, we'll see how to manage tokens and discuss some best practices.

Intended audience: The session is primarily intended for developers.

Key benefits: After this session, developers should have a good visibility on how to go beyond the OOTB Yammer App integration with
SPO and what Open Graph is all about.

Also thanks to Xylos for hosting this session

Tags van Technorati: BIWUG,community,sharepoint,sharepoint+online,Office365,Yammer,Development

Monday, March 02, 2015

Resetting content index in SharePoint Server 2013: why and how

When you are developing against SharePoint Server 2013 search, you might forced to reset the search index. You can do this using the SharePoint user interface through the screen shown below or using PowerShell. I prefer to use PowerShell since resetting through the user interface seems to give me timeouts especially when the index is a quite large. One of the reasons why you are required to reset your content index is when your Search Service Application got into an unhealthy state because of insufficient disk space (See Fixing the Search Service after the Index Drive fills) but I also noticed that when you are working on your development machine and are making lots of changes to the search schema – it might also be useful to reset the search index for your changes to be picked up. If you want to change it using the user interface go to the Search Administration screen of the Search Service Application and select the “Index Reset” option underneath the crawling section of the left menu.

Don’t just reset your search index in a production environment since this will also impact the analytics processing component (Read Reset the index in SharePoint Server 2013). Listed below is the syntax for the PowerShell command (the snippet below assumes that you only have one SearchServiceApplication)

(Get-SPEnterpriseSearchServiceApplication).Reset($true,$true)

The SearchServiceApplication.Reset method takes two parameters - public void Reset( bool disableAlerts, bool ignoreUnreachableServer) – I would recommend always setting disableAlerts to true if necessary. The value for the second parameter will depend on your specific case. If you also get a timeout when using the PowerShell cmdlet – you can use the steps outlined in SharePoint 2013 Content Index Reset Timeout – they worked for me.

Tags van Technorati: SharePoint,administration,SP2013,SPS2013,Microsoft

Friday, February 13, 2015

Mindful apps – putting people at the center supported by data

When preparing for my session The future of business process apps – a Microsoft perspective last year I got inspired by this great article The future of enterprise apps: moving beyond workflows to mindflows – which introduced the concept of mindful apps. The core message is that if we want to automate the last mile we have to analyze how people work day in and day out and start our system/application design with people at the center. One of the quotes which is mentioned in the article is from Bill Murphy (CTO of Blackstone one of the largest investment funds worldwide) – “We aim to take away as much of the stress as possible from easy stuff, by automating the routine and mundane actions, and give users more time to focus on the higher-end pieces of what they need to do.”

Most of the characteristics which are outlined in the comparison between traditional and mindful apps are not revolutionary (See table above) but there is one one important key message.
Mindful apps will allow us to assess and compare options in decision context, they will allow us to quickly respond to events and make the best decision given a specific context and will provide us with “extended intelligence” by understanding and recognizing patterns within the data at hand. We as humans are good at problem solving, pattern recognition, identifying outliers, making creative leaps and incorporating new information when making decisions. We should be able to focus on these high end tasks by being freed from laborious and menial tasks which can be automated.

There are 3 different trends which will impact how these mindful apps will be shaped:

User context matters – make it personal. When we make decisions or work within the context of specific processes, there are a lot of parameters which determine how we react or how we make decisions – these parameters should be integrated into the decision framework driving mindful apps. Our calendar, availability of colleagues to reach out to, input from communications (using e-mail, messaging or other formats), information that we capture from blogs, social networks such as LinkedIn or open data sources together with available information within your organization should be filtered and at your fingertips. Machine learning and cognitive algorithms will drive the second machine age (a term coined by Brynolfson from MIT) but we are only at the start of how these algorithms can drive the future workplace for information workers.
Mobile shapes our expectations. Mobile apps and the user experience they provide is shaping at how we see an ideal enterprise application as well. Mindful apps should strive to combine beauty, simplicity and purpose to create an experience that delights us and that is effortless to use. Mobile apps are easy to understand, when people use a good app for the first time, they intuitively grasp the most important features, why can’t we do the same for enterprise apps. Simplicity rules. The apps should also incorporate necessary logic to evolve as the user grows more comfortable with its use and is exploring more advanced functionality. Apps should learn people’s preferences over time and show the interface which is best suited for the task at hand.
(Big) data and advanced analytics are the driving force. There is a lot of hype and confusion around the term Big Data but one thing is for sure – storage costs and processing cost have dropped significantly in the last decade. When you combine this with the rise of new storage platforms such as Hadoop, NoSQL datastores such as HBase, Cassandra, etc … and new data processing frameworks such as Apache Drill, Dremel, Spark, etc.. new opportunities arise to support users in their decision making processes. While there is a lot of emphasis on the 4 Vs (Volume, Velocity, Variety and Veracity) – there is one more V that you have to think about that is Value (Also see Big Data beyond the hype, getting to the V that really matters)
Cloud will lead the way. A lot of the innovation which will enable this next generation of apps is coming out of the datacenters of Google, Amazon, LinkedIn, Microsoft, Yahoo, etc… but most organizations don’t have the available capacity (nor the same financial resources) as these internet giants. Luckily the economies of scales which are offered by the cloud allows solution providers to provide you with a data infrastructure which can scale from prototype size to production environments able to handle huge amounts of data. The different major cloud players – IBM, Microsoft, Amazon and Google all seem to make big bets in building out the data analytics platform of the future and this competition will drive prices further down. This competition will also force them to focus on more innovative solutions which allow them to differentiate from the competition.

The best examples where we – as a consumer - see the power of Big Data, Analytics, Machine Learning and the cloud appear is mobile. The three major players (Microsoft, Apple and Google) are relying quite heavily on the cloud computing power and huge data stores to provide the experience of digital assistants. Microsoft is currently working on Cortana (which has been released in a number of countries worldwide), Apple was definitely the trendsetter with Siri and Google has Google Now.

The future is already here — it's just not very evenly distributed. (William Gibson)

Tags van Technorati: machine+learning,analytics,big+data,collaboration,vision,apps,nwow,hnw,engaged+workplace,future+workplace

Thursday, February 05, 2015

Microsoft Azure Machine Learning–the power to predict

Microsoft Azure Machine Learning provides Machine Learning as a Service (on Microsoft Azure) and allows you to make your own applications more intelligent. Microsoft Azure Machine Learning was initially started as as an incubation project in Microsoft Research (codename Passau) and is part of the overall Microsoft Data Platform.
The best definition for Machine Learning – in my opinion – is from the excellent book “Introduction to Machine Learning (MIT Press 2014, Ethem Alpaydin)” (Use it as a reference – this is not an easy “how to” book)
The goal of machine learning is to program computers to use example data or past experience to solve a given problem.
In general when we want to solve a problem on a computer, we need an algorithm to transform using a set of instruction into an output. Unfortunately for some problems we do not know how to program such algorithms – such as for e-mail spam detection or predicting customer behavior. In most cases we have the input and output available e.g. a set of e-mails for which some are marked as spam. Based on this data, we would like a computer (or machine) to automatically extract the algorithm necessary to perform the classification. The algorithm does not need to be perfect but needs to be a good and useful approximation.
The term machine learning is tightly coupled to the domain of analytics (or data science – see Data Scientist: the sexiest job of the 21st century ). Analytics is concerned with the discovery and extraction of useful business patterns or mathematical decision models from a specific data set. For this a number techniques can be used, depending on the practitioners background they will probably favor a technique from their respective domain:

Regression, General Linear Models (GLMS), decision trees, etc … (originated out of the statistics domain)
Machine learning algorithms such as support vector machines, neural networks, Bayesian methods, … (originated out of the computer science domain)

If we focus specifically on machine learning we make a distinction between supervised learning where we try to find a mapping between a set of input variables and a specific output variable using a set of values to train a specific model and unsupervised learning where we try to find patterns in the input data.

But why should you care about machine learning? I think the picture below shows you how the focus is shifting from traditional reporting (hindsight) to more advanced predictive and prescriptive analytics (foresight) which will provide business with more added value but also requires business intelligence specialist new competencies such as machine learning and data mining. Examples across industries vary but in general predictive analytics has the potential to change the way how businesses make decisions (I will take a look a more in depth definitely pick up Predictive Analytics – The power to predict who will click, buy, lie or die from Eric Siegel)

Microsoft Azure Machine Learning distinguishes itself from other platforms and tools by a number of different characteristics:

Allows you to jointly build predictive models from anywhere in the world using only a web browser by making use of visual composition canvas (called Machine Learning Studio) using modules without requiring you to write code (although you can use R code snippets if you want). You can start quickly from existing sample experiments/models or you can share your own data experiments.
Collaborative work together with anyone from anywhere using just your browser.
Available as a cloud service, eliminating upfront costs fro hardware resources.
The different modules allow you to author an end-to-end machine learning workflow starting with reading data, to training and validating your predictive model.
Ability to deploy models as web services. You can quickly operationalize your models by converting them into web services and you even the ability to monetize your machine learning models using Azure Data Market.

The start location is the Microsoft Azure Machine Learning homepage - https://studio.azureml.net/ which contains a number of user guides as well as training videos - http://azure.microsoft.com/en-us/documentation/videos/index/?services=machine-learning . Another great way to get started is by looking at the different Azure Machine Learning Samples - http://azure.microsoft.com/en-us/documentation/services/machine-learning/models/ such as Azure Machine Learning Sample: Credit risk prediction (predict whether an applicant is a good credit risk based on the German Credit Card UCI dataset) and a clustering algorithm to identify similar companies from companies in the S&P 500, using text in published Wikipedia articles for these companies.

References:

Tags van Technorati: azureml,azure,windows,microsoft,predictive+analytics,datascience,machine+learning

Wednesday, January 28, 2015

BIWUG session–imec Share - an Office 365 customer case

I have uploaded the presentation from yesterday on slideshare – check it out

imec Share - An Office 365 customer case from Joris Poelmans

Listed below also a number of supporting links:

Office365 Developer Patterns and Practices (GitHub)
SharePoint Online: software boundaries and limits
How to avoid getting throttled or blocked in SharePoint Online
Give feedback about the Office Developer platform (including Office 365) – please vote for the ability modify location based metadata defaults using CSOM
Get early access to new features in Office 365 and provide feedback with Uservoice
Office 365 roadmap
SharePoint Online Client Components SDK (June release)
App parts/iframes are not the only solution – check out Javascript injection in SharePoint Online – Office 365 Developer Patterns and Practices
Transforming your SharePoint Full Trust Code to the SharePoint App Model

Tags van Technorati: Office365,SharePoint+Online,realdolmen,biwug

Wednesday, January 21, 2015

SharePoint Saturday Belgium 2015– Call for speakers

On April 18th 2015 BIWUG (www.biwug.be) is organizing its fifth edition of SharePoint Saturday Belgium. We invite you to submit a session for this year's SharePoint Saturday Belgium using this link - http://www.spsevents.org/city/Antwerp/Antwerp2015 . It is possible to submit multiple sessions. We will close the call for speakers on February 18th EOD.

SharePoint Saturday Belgium 2015 will take place in Antwerp – for more details check out http://www.spsevents.org/city/Antwerp/Antwerp2015. If you have any questions or remarks, do not hesitate to contact me.

Tags van Technorati: SharePoint+saturday,biwug,community,sharepoint,microsoft

Monday, January 05, 2015

Data Science Dojo– Beginning AzureML video series

Interesting video series to start with if you want to learn how you can use Microsoft Azure Machine Learning (AzureML)

Tags van Technorati: Machine+learning,windows+azure,azureML,azure,microsoft,predictive+analytics

Monday, December 15, 2014

Introducing Azure Stream Analytics

Azure Stream Analytics which is currently in preview is a fully managed real-time stream analytics service that aims at providing highly resilient, low latency, and scalable complex event processing of streaming data for scenarios such as Internet of Things, command and control (of devices) and real-time Business Intelligence on streaming data.

Although it might look similar to Amazon Kinesis, it seems to distinguish itself by aiming to increase developer productivity by enabling you to author streaming jobs using a SQL-like language to specify necessary transformations and it provides a range of operators which are quite useful to define time-based operations such as windowed aggregations (Check out Stream Analytics Query Language Reference for more information) – listed below is an example taken from the documentation which finds all toll booths which have served more than 3 vehicles in the last 5 minutes (See Sliding Window – slides by an epsilon and produces output at the occurrence of an event)

SELECT DateAdd(minute,-5,System.TimeStamp) AS WinStartTime, System.TimeStamp AS WinEndTime, TollId, COUNT(*) 
FROM Input TIMESTAMP BY EntryTime
GROUP BY TollId, SlidingWindow(minute, 5)
HAVING COUNT(*) > 3

This SQL like language allows for non-developers to built stream processing solutions through the Azure Portal and allows to easily filter, project, aggregate and join streams, add static data (master data) with streaming data and detect patterns within the data streams without developer intervention.

Azure Stream Analytics leverages cloud elasticity to scale up or scale down the number of resources on demand thereby providing a distributed, scale out architecture with very low startup costs. You will only pay for the resources you use and have the ability to add resources as needed. Pricing is calculated based on the volume of data processed by the streaming job (in GB) and the number of Streaming Units that you are using. Streaming Units provide the scale out mechanism for Azure Stream Analytics and provide a maximum throughput of 1MB/sec. Pricing starts as low as €0.0004/GB and €0.012/hr per streaming unit (roughly equivalent to less than 10€/month). It also integrates seamlessly with other services such as Azure Event Hub, Azure Machine Learning, Azure Storage and Azure SQL databases.

References

Tags van Technorati: azure,stream+Analytics,microsoft,cloud,big+data

Thursday, December 11, 2014

SharePoint deep dive exploration: SharePoint alerting

This is the second in a series of blogpost on SharePoint Server 2013 in which we will explorer how e-mail alerting works in SharePoint 2013. For part 1 – take a look at SharePoint deep dive exploration: looking into the SharePoint UserInfo table.

If you need to know more about how alerts are working at the lowest level you should take a look at SharePoint 2003 Database tables documentation – for alerts this documentation still seems to be valid. SharePoint stores the list of events for which users have request alerts in the EventCache Table – in comparison to SharePoint 2003 there are some extra fields available (marked in bold). For some of the fields I did not find a

The other tables which are manipulated by the the SharePoint alert framework are EventLog, ImmedSubscriptions, SchedSubscriptions and EventSubsMatches (For an in depth discussion also take a look at the Working with search alerts in SharePoint 2010 ). Every event is recorded in these table but since the EventType and EventData column will contain the most data, these are only filled in when the list has at least one subscription.

So how does this works – there actually is a SharePoint timer job – called the “Immediated Alerts” job which is scheduled to run every 5 minutes. This will pick up the necessary event information and will process it (in batches of 10.000) – if you see issues with alerts not being sent out – I recommend you to take a look at SharePoint Scheduled Alerts Deconstructed

Column Name	Description
EventTime	Time when the event record was inserted into the database
SiteId	ID of the site, available from the AllSites table
WebId	ID of the web, available from the AllWebs table
ListId	ID of the list in which the monitored item appears
ItemId	ID of the item that raised the event
DocId	ID of the document that raised the event
Guid0	?
Int0	?
Int1	?
ContentTypeId	?
ItemName	Full name of the item
ItemFullUrl	Full path of the item
EventType	ItemAdded(1), Item Modified (2), Item Deleted (4), DiscussionAdded (16), Discussion Modified(32), Discussion Deleted(64), Discussion Closed (128), Discussion Activated (256), …
ObjectType
ModifiedBy	User name of the person who raised the event
TimeLastModified	Time when the event occurred
EventData	The binary large object (BLOB) containing all of the field changes with the old and new values for an item
ACL	The ACL for the item at the time it is edited
DocClientId
CorrelationId

The reason why I started looking into these tables because I got feedback from a client that all e-mail alerts which were being sent out had the wrong link in it after we migrated their environment from SharePoint 2007 to 2013. One of the first things that I did was actually sit next to the user who was adding documents in SharePoint and then I noticed something strange. The user uploaded a document and when they needed to fill in extra metadata, they immediately changed the name of the document.

After looking into how alerting works I still did not get an explanation for why the links were sent out correctly before in 2007 – because this should have failed as well. So I used this PowerShell script to create an export of all the e-mail alerts/subscriptions that users had in SharePoint and I noticed that most of the alerts were on just a couple of libraries and then I found it.

In SharePoint 2007, they had a “require check out” set by default on these libraries – this means that when the user uploaded and renamed the document, it was not yet visible to other users and the alert was not send out. If checkout is not required then the files are immediately visible and the “New Item Added” immediate alerts is fired – this was the behavior that they were seeing in 2013.

So the “require checkout” is an interesting workaround to prevent a file from being visible before it is explicitly checked in. Since they were changing the file properties (and even the filename) before the file is visible to users, the New Item alerts would not trigger and users would only be notified of the “Changed Item” alert when the file was checked in.

The reason why we deactivated “require check out” was because of it would conflict with co-authoring but apparently they would never use this feature for these specific libraries for which these alerts were set. So the morale of the story, don’t just activate or change a specific functionality because it is available in a new version but first look at how people are actually using it.

References:

Tags van Technorati: sharepoint,alerts,sharepoint+2013,troubleshooting

BIWUG on blueprint for large scale SharePoint projects and display templates

On the 16th of December BIWUG (www.biwug.be) is organizing its next session – don’t forget to register for BIWUG1612 because there are some great sessions planned.

SharePoint Factory: a blueprint for large scale SharePoint projects (Speaker: Lana Khoury, Senior EIM Consultant at CGI Belgium responsible for CGI Belgium’s Microsoft Competency Centre, and the Digital Transformation Portfolio)

Large Notes 2 SharePoint transformations do require a standardized approach in development and project management in order to assure the delivery in time and quality.The SharePoint Factory has been developed, to allow parallel development of applications and support all stages of the development process by having standardized quality gates, test procedures and templates for example requirements analysis templates. Essentially, the SharePoint Factory can be compared to an assembly line in the automotive industry.This approach is combined with a SharePoint PM as a Service offering which is a blueprint for the Management of Large Scale SharePoint projects and does provide a specific PM Process with SharePoint centric artefacts, checklists and documents. The approach has been developed within a 6.500 person day Project in Germany and has already been published to German .net Magazin, SharePoint Kompendium and Dutch DIWUG Magazine.

Take your display template skills to the next level (Speaker: Elio Struyf, senior SharePoint consultant at Ventigrate - http://www.eliostruyf.com/)

Once you know how search display templates work and how they can be created. It is rather easy to enhance the overall experience of your sites compared with previous versions of SharePoint. In this session I will take you to the next level of display templates, where you will learn to add grouping, sorting, loading more results, and more. This session focuses on people that already have the basic understanding of what search display templates are, and how they can be created.

18:00 - 18:30 ... Welcome and snack

18:30 - 19:30 ... SharePoint factory: a blueprint for large scale SharePoint projects (Speaker: Lana Khoury)

19:30 - 19:45 ... Break

19:45 - 20:45 ... Take your display template skills to the next level ( Speaker: Elio Struyf )

20:45 - … ... SharePint!

Tags van Technorati: SharePoint,biwug,community,microsoft