Wednesday, August 19, 2015

Using Microsoft Power BI Desktop to build Dynamics CRM Online reports Part 2

In the previous post I showed a very simple example of how you can create a report in Power BI Desktop – in this post I will show you how to extend this simple example. First we will create a new dataset based on Opportunity data in Dynamics CRM Online but we will add extra columns by specifying them in the oData query https://[yourtenantname].crm4.dynamics.com/xrmservices/2011/OrganizationData.svc/OpportunitySet?$select=CustomerId,EstimatedValue,SalesStage Next we will expand the columns in the same way that we did in Part 1.

If you have worked with Dynamics CRM you will probably know the concept of an Option Set (a.k.a pick list) – which allows to list a set of available choices for a specific field. Dynamics CRM will store the integer value (not the label) within its database. SalesStage is an example of such an Option Set and you notice that the integer value is also exposed in the OData query.



There are two ways of getting the labels back for the Option Set – one is simply using “Replace Values” function from Power BI. The other option, is a more dynamic method using the PickListMappingSet as outlined in Gotchas when using Power Query to retrieve Dynamics CRM Data – Part 2 – here is a brief summary of the steps:
  • Retrieve the PickListMappingSet and  expand the ColumnMappingId column
  • Duplicate the PickListMappingSet and rename it to SalesStage (given it the name of the Option Set makes the whole more understandable)
  • Filter the ColumnMappingId.Name column to only include SalesStage values

  • Finally merge the values of the SalesStage data source with the OpportunitySet data source by selecting Merge Queries (in the Combine section). You will probably notice that not all of the rows can be matched – this is because some of the records contain null values – you should decide up front what how you are going to clean up your data for these types of input errors. Another thing to keep in mind is the fact that option sets that you create yourself are not exposed in the PickListMappingSet – so this requires you to do “Replace Values”. You can vote on Connect for Make user created option sets also available through the PickListMappingSet odata table

  • After the merge a new column is added of type “Table”, click to expand and keep the “SourceValue” column



Finally, I grouped the columns on CustomerId and SalesStage, sorted by estimed revenue and filtered to keep the top 50 rows. Next I used a simple bar chart to display the data – I also played around with the colors of the data labels – also check out the references listed below for some helpful links about color formatting.



References:

Monday, August 17, 2015

Using Microsoft Power BI Desktop to build Dynamics CRM Online reports Part 1


A  couple of weeks ago I wrote a posting about Combining Dynamics CRM Online and Power BI Preview but since then a lot of exciting things have been released and announced. One of these things is the fact that Power BI Designer has been rebranded to  Power BI Desktop and a lot of new functionality has been added.. Power BI Desktop basically ties together Power Query, Power Pivot and Power View in a standalone application, removing the constraint of having to use Excel 2013 to design visualizations but it also extends the existing functionality quite significantly.
The first thing that you need to do when you want to build reports is to get at the data – Microsoft Power BI Desktop support a huge number of data sources but the one I’m interested in is Dynamics CRM Online.


The Dynamics CRM Online data source actually uses the CRM OData endpoint, to find the exact url go to Settings>Customizations>Developer resources, it should look something like https://[yourtenantname].crm4.dynamics.com/XRMServices/2011/OrganizationData.svc/ . I also encourage you to install Dynamics XRM Tools 2015 since it contains an OData Query Designer tool which is quite useful.



The first step you need to take is deciding which columns you will be needing in your data model.  I strongly recommend you to remove the columns up front by specifying in the OData query which columns you need. This will decrease the volume to be processed by Power BI desktop and it easier to do this up front. So if you want for example to show the top opportunities based on estimated revenue – you can use the following query.
https://[yourtenantname].crm4.dynamics.com/xrmservices/2011/OrganizationData.svc/OpportunitySet?$select=CustomerId,EstimatedValue

You will notice that the actual values are not being displayed – this is because both CustomerId and EstimatedValue contain complex values, which you can expand by clicking the expand icon in the column header.



Since we only need aggregated data (not the individual opportunities) for the different top customers, we are going to group the data by CustomerId and sum the estimated revenue.



Afterwards you can limit the data by only retrieving the top 20 rows.



Finally we need to visualize the data on a report. In editing mode, we will drop a bar chart control on the designer surface and define the data elements which needs to be displayed



Power BI Desktop also has a wide variety of display options that you can configure for your visualization such as the different options for X and Y axis (show labels, start and end values), colors to use for the data labels, which display unit to use for the data labels(including precision). One thing that I’m still missing though is the option to show the actual values instead of using display units for values below 1.000.



Finally to make your report available to other users you will need to publish it. You either have the option to publish it to Pyramid Analytics Server ( an on-premise alternative for PowerBI.com which was announced end July ) or to PowerBI.com.

References:

Fixing Windows 10 installation error : We couldn’t create a new partition or locate an existing one

Last week I installed Windows 10 on an Acer Iconia Tab W500 and I encountered an error while trying to create a new partition. The Iconia Tab does not have a DVD drive so the easiest way to start is by creating a bootable USB with the Windows USB/DVD Download Tool – next you will need to boot off this USB drive and you will see the Windows 10 installation screens appear.

First you have to select the language to install, time and currency format and keyboard input. Next click install now. At this point you have the option to upgrade your existing OS or do a clean install – I selected Custom: Install Windows Only (advanced) since I did not need to keep the existing files and applications.


Next you need to decide where you are going to install Windows 10.



I removed both partitions I had but when I tried to create a new partition I got an error “We couldn’t create a new partition or locate an existing one.  For more information, see the Setup log files”. Workarounds such as removing the extra SD card or unplugging the USB stick did not resolve the issue. Luckily the steps detailed in this blogpost Error: "We couldn't create a new partition or locate an existing one. For more information, see the Setup log files." when you try to install Windows 8 (CP) are still valid for Windows 10.
  1. Close the setup window and select “Repair>Advanced Tools”
  2. Go to the command window
  3. Start DISKPART.
  4. Type LIST DISK and identify your SSD disk number (from 0 to n disks).
  5. Type SELECT DISK <n> where <n> is your SSD disk number.
  6. Type CLEAN
  7. Type CREATE PARTITION PRIMARY
  8. Type ACTIVE
  9. Type FORMAT FS=NTFS QUICK
  10. Type ASSIGN
  11. Type EXIT twice (one to get out of DiskPart, the other to exit the command line tool)
Afterwards just reboot and start the setup again – you will now be able to use the newly created partition.

Thursday, August 06, 2015

Tips and tricks for using search in SharePoint 2013

As I outlined in About intent, recall, relevance and precision of search solutions building a good search solution is quite difficult. In this post I will focus on on what content publishers and content “searchers”  can do to make search in SharePoint 2013 work more efficient.
Content publishers – I want my content to be found:
  • Use meaningful file names and titles for the documents that you add in SharePoint, by default SharePoint will show the filename (also check out Understanding title information shown in SharePoint 2013 search results and how to make it work better ) – use “_” (underscores) when you want to combine multiple words in the filename.
  • Use Promoted results to push results to the top of your search result page based on specific keywords which are used within your organization. You can use the search logs as a starting point but you can also do a survey amongst your users to see which are documents to look for on a daily basis.
  • Add metadata ( also referred to as document properties or attributes of a document). Metadata allows an author to attach supplemental information to a document without touching the actual contents of the document. When you use a file system you typically also have this kind of “meta-information” available such as Created Date, Modified Date, Author etc… A file system however does not allow you to add extra additional metadata to documents and that is why people will revert to creating a folder hierarchy where the folder names are used to describe the documents. SharePoint provides an alternative by allowing you to add metadata which can be used to sort, group, filter, etc… documents stored in a document library (Take a look at Using folders in SharePoint document libraries : some guidance and tips  about metadata and folders in SharePoint). But the metadata does not only apply to browsing and views of documents but it will also be used in SharePoint search to determine relevancy and push documents higher in search results.
  • Publish documents which are relevant to a lot of people near your root site. Document location, file types, authoritative pages, and content language are things you can manipulate to improve a document’s relevance. The URL depth is quite important – the more slashes “/” (deeper in the site structure) in the URL of a document, the less valuable it is considered to be. The click distance between the document and what is called an authoritative page is also important. By default the home page url of your SharePoint site is considered to be an authoritative page but you can configure this yourself – (See Configure authoritative pages in SharePoint 2013)
Content searchers - I am looking for content using search:
  • Use AND and OR as well as other search operators to limit or expand your search results. Always use capitalizations when using these operators – otherwise they will be ignored.
  • Use wildcards to search for documents – you can add a “*” at the end (but not in the beginning). In SharePoint 2013 you can even search for everything by just entering * in the search box. Next you can use the refiners and sorting to limit the results.
  • Use property searches to search for documents of which you know a name in the title or the filename. If you search for “filename: holiday*” it will search for all documents with holiday in the filename. You can also search for specific types of documents by using filetype – e.g. filetype:docx searches for all word documents
It is also important to keep in mind that it is quite easy to completely tailor the user experience of the SharePoint search center to make it look exactly as the user wants so check out the links below. But sometimes simple things such as the steps recommended -  SharePoint 2013 Search: removing the junk from your search trunk – already yields great results.

References

Wednesday, August 05, 2015

Using SharePoint Online to store Dynamics CRM Online Documents

Integration between SharePoint and Dynamics CRM has been around since Dynamics CRM 2011 – this integration allows for linking documents to Dynamics CRM entities but the documents will be stored in SharePoint instead of the CRM database. This integration is actually a must have for Dynamics CRM Online since there is a built-in storage limit which initially is 5 GB and which is increased with 2.5 GB per 20 licensed users – you can buy additional storage but it is more expensive than storing the documents in SharePoint Online (See Microsoft Dynamics CRM Online : Service Description (whitepaper) for more details)

With the introduction of Dynamics CRM Online 2015 Update 1 there has been an important change with the deprecation of the the Dynamics CRM list component for Microsoft SharePoint. Initially Dynamics CRM integrated with SharePoint using the Dynamics CRM list component for Microsoft SharePoint but with Dynamics CRM 2013 SP1 Server based integration was introduced.
For the time being you still have two different options to integrate Dynamics CRM Online and SharePoint Online – one using the Dynamics CRM list component for Microsoft SharePoint and the other one using Server-based SharePoint integration but this will not be supported anymore after one of the next updates.

In this post I will explain how the server-based SharePoint integration in Dynamics CRM Online looks like from an end-user perspective (For more details about how to set it up - check out Setup CRM online to use SharePoint Online as well as Enable SharePoint document management for specific entities ) and what you should watch out for.  For a deep dive technical description I recommend that you read SharePoint Integration Reloaded – Part 1 .
In this example I have document management enabled for a custom entity called policy (this is an example from the Traviata CRM solution for Insurance Carriers) and document folders are automatically created based on a specific entity (this is something you will need to configure up
front).


The automatic folder creation is actually a very useful feature since you can also use these SharePoint folders to secure these documents (keep in mind that security/authorization needs to be configured separately in SharePoint Online) and it also a way to overcome the list view limit in SharePoint Online (for an interesting discussion around folders using Dynamics CRM see Scalability considerations for CRM/SharePoint integration ) – so the first time that you attach a document to a CRM record a folder will be created based on the related entities – in our example we have an insurance policy for our contact/party Kristof – so it will create a folder structure based on the party entity.




Now you can upload your documents into SharePoint directly from within the CRM user interface – in the past this component used an iFrame but now it is technically completely integrated.



You can directly upload documents to SharePoint within this interface but you can also create Office documents directly from within the interface using the Office Web Apps.



You can also work from within SharePoint and see the different documents from within the SharePoint document library.




You might also have noticed that there is a GUID in the folder name – from a developer perspective this seems like an interesting additional although as an end-user I don’t like it – also check out the comments section of this post CRM 2013 and SharePoint integration new feature for more debate around this “feature”.  Unfortunately there are still some downsides to the server based integration which are clearly indicated in Important considerations for server-based SharePoint integration as well as New server to server integration with CRM Online and SharePoint provides much more limited functionality. Besides document management integration there are however some other interesting options such as integrating Dynamics CRM, SharePoint  and OneNote and I expect a lot more to available in the upcoming releases of Dynamics CRM. For an overview of integration points check out the Integration Guide: Microsoft Dynamics CRM Online and Office 365









Thursday, July 23, 2015

Dynamics CRM 2015 development and NuGet

To be honest I hadn’t used Nuget a lot in the past but  since I took up Dynamics CRM I decided to give it a try. For those of don’t know NuGet, it is a package manager for the Microsoft development platform and it has been integrated into Visual Studio starting from Visual Studio 2010 onwards – it allows you to add libraries (in the form of NuGet package) to your own solutions and it will add the necessary config changes, add references etc… for you and it will help you keep your projects up to date with the latest release of the SDK assemblies and tools.

If you need the Dynamics CRM 2015 assemblies you can search for “crm 2015 client” in the Nuget package manager. If you need Dynamics CRM 2013 assemblies – you will need to use the package manager console as outlined in Using Nuget for Dynamics CRM Development Part 1: Nuget basics and useful links



Reference links:

Tuesday, June 16, 2015

Combining Dynamics CRM Online and Power BI Preview

I am a strong believer of the concept of “in-context analytics” and as I outlined in Mindful apps – putting people at the center supported by data I consider analytics and business intelligence to be essential in providing business value. So I was quite interested when I first learned about the Power BI preview with it’s built in support Dynamics CRM Online  (For a great write up about it check out Previewing the New Power BI Experience with Dynamics CRM).

When I started playing around with it I was surprised that it seemed to do things quite differently from Power BI for Office 365 since I thought it was simply the next release of the existing Power BI for Office 365 offering. Apparently this is not the case.

Power BI Preview seems to be quite different from Power BI for Office 365 - for a detailed description of differences check out Power BI vs Power BI Preview: what’s the difference – here’s a quick summary:
  • Power BI for Office 365 is based on technologies such as Excel and SharePoint and is an integrated part of Office 365, whereas Power BI Preview is built on a separate platform. 
  • Power BI Preview is using the browser the Power BI Designer as design tool for creating dashboards and reports whereas Power BI for Office 365 mainly relies on Excel as a design tool.
  • Power BI Preview also exposes an API which allows you to push data into the Power BI service – for more information check out the Power BI Developer Center. For a good introduction check out Developing for Power BI Overview (Video). This is something which I think is the key enabler for real-time analytics on your data. To stay up to date make sure that you follow the Power BI Development blog
  • Power BI Preview has some new data visualizations available such as single number card tiles, combo charts, funnel charts, gauge charts, filled maps and tree maps (Check out Visualization types available in Power BI Reports)


If you check out the official documentation Use Power BI with Microsoft Dynamics Online (Technet) – it seems to focus on the new Power BI Preview but the Microsoft Dynamics CRM templates for Power BI that you can download for free from PinPoint - listed in the second section of the page - seem to be based on Power BI for Office 365. (Use Google Chrome to see the download link – I did not see it when using Internet Explorer 11)

When you actually try to use it in practice together with Dynamics CRM Online you will however encounter some serious limitations which are hopefully resolved by the summer release:



My guess is that the way forward will be Power BI Preview (or name it Power BI 2.0) and it will replace Power BI for Office 365 – you already see it appearing in the license management section of Office 365 (see screenshot below). But for the moment it is still a Preview and no specific release date has been made available so go for Power BI for Office 365 at the moment.



References:

Tuesday, June 09, 2015

Getting to grips with Dynamics CRM releases, updates and build numbers

Since a couple of weeks I have been working with Dynamics CRM. One of the things which is always challenging when starting to learn a new product is getting to understand the different versions and the changes between versions. When Microsoft was still on a 3-year release cycle for their products, this was quite easy to understand but most Microsoft products are now on a more much more frequent release schedule and Dynamics CRM is no exception.
Updates and improvements to Dynamics CRM are released twice a year – in what is commonly referred to as the spring and fall release – see Microsoft Dynamics CRM – Roadmap for 2015. Given the new “Cloud first” credo of Microsoft these updates can be a cloud only release as was the case with the Spring 2015 (Carina) release.  For Dynamics CRM Online you are required to be on the current version ( n )  or the prior version ( n-1 ) but you have the choice to skip an update – see Manage Dynamics CRM Online Updates. Dynamics CRM on premise follows the standard lifecycle that you are accustomed to  (see Microsoft Dynamics Support Lifecycle Policy FAQ and Microsoft Product Lifecycle Search for Dynamics CRM)



To make things a little more interesting the Dynamics CRM product team seems to have chosen to use stars and constellations as code names for the different releases. Code names of the same genre are also used for closely related products to Dynamics CRM such Dynamics Marketing, Social Engagement and Parature Knowledgebase.
Recently Microsoft also changed the naming conventions for their updates and explained the version/build numbers that they are using now and for future releases – check out New naming conventions for Microsoft Dynamics CRM updates. The tables below summarizes the different versions for the moment. As outlined in Greg Olsen his blog post – Microsoft Dynamics CRM 2015 Roadmap – the next version for Dynamics CRM is code named Ara – another interesting tidbit -  “Not confirmed by Microsoft, but it is likely that On-Premises installations will have to wait for the CRM ‘ARA’ release during the Fall Wave in order to get the Carina new features and others.”
Product Name Version description Version number Release or Update Code Name
Microsoft Dynamics CRM Online Fall ‘13 6.0.0 Major release Orion
Microsoft Dynamics CRM Online Fall ‘13 6.0.1 Incremental Update -
Microsoft Dynamics CRM Online Fall ‘13 6.0.2 Incremental Update -
Microsoft Dynamics CRM Online Spring ‘14 6.1.0 Minor release Leo
Microsoft Dynamics CRM Online 2015 Update (Fall ‘14) 7.0.0 Major release Vega
Microsoft Dynamics CRM Online 2015 Update 1 (Spring ‘15) 7.1.0 Minor release Carina
Microsoft Dynamics CRM Online t.b.d. t.b.d. t.b.d. Ara
Table1.  Releases Microsoft Dynamics CRM online
Product Name Version description Version number Release or Update Code Name
Microsoft Dynamics CRM (on premise) 2013 6.0.0 Major release Orion
Microsoft Dynamics CRM (on premise) 2013 UR1 6.0.1 Incremental Update -
Microsoft Dynamics CRM (on premise) 2013 UR2 6.0.2 Incremental Update -
Microsoft Dynamics CRM (on premise) 2013 SP1 6.1.0 Minor release Leo
Microsoft Dynamics CRM (on premise) 2015 7.0.0 Major release Vega
Microsoft Dynamics CRM (on premise) 2015 Update 0.1 7.0.1 Minor release Carina
Microsoft Dynamics CRM (on premise) t.b.d. t.b.d. t.b.d. Ara
Table 2. Releases Microsoft Dynamics CRM (on premise)
References:


Thursday, April 09, 2015

SharePoint Server 2013 and business intelligence scenarios

With all the emphasis on Microsoft Power BI – people seem to forget that there still are some other options for setting up a business intelligence solution based on SharePoint available for those of you who can’t go all in for a cloud solution (because of regulations, corporate policies or other reasons). Don’t get me wrong – I do believe that if you are standardized on Microsoft you should follow their “Cloud First” credo. Listed below are a number of links to get you started.

SharePoint Deep Dive exploration: explaining duplicate detection in SharePoint Server 2013

This is the third post in a series of posts which try to delve a little deeper in the inner workings of SharePoint - for the previous post check out:



SharePoint Server can detect near duplicates of documents and will take this into account when displaying search results. In this post I will delve a little deeper into the underlying techniques being used. An important thing to keep in mind is that the way that duplicate documents are identified has evolved and changed in the different versions of SharePoint.

SharePoint Server 2007 detected duplicates using a commonly used technique called "shingling". This is a generic technique which allows you to identify duplicates or near duplicates of documents (or webpages). Shingling has been  widely used in different types of systems and software to identify spams, plagiarism or to enforce copyright protection. A shingle – which is more more commonly referred to as a q-gram – is a contiguous subsequence of tokens taken from a document.
So if you want to see if two documents are similar, you can do this by looking at how many shingles they have in common. You however need to determine how long your subsequence of tokens needs to be – typically a value of 4 is used. This is formalized by using S(d,w), which is the set of distinct shingles of width w which are contained in a document e.g. for the line “a rose is a rose is a rose” – so with w=4, we get the following shingles “a rose is a”, “rose is a rose”, “is a rose is”. If you wan to compare the similarity between two sets, e.g. S(doc1) and S(doc2) which are the sets of distinct shingles of document1 and document2, you can use the Jaccard similarity index (or resemblance index) to define the degree of similarity. A Jaccard index with a value of 0 means that documents are completely dissimilar, whereas 1 points to identical documents.  This would however that we would need to calculate the similarity index of each pair of documents – which would be a quite intensive task – to speed up processing a form of hashing is used (for more details take a look at  the explanation about near duplicates and shingling)



As items in SharePoint 2007 were indexed, these hashes were stored in the search database. It is not really clear from the documentation whether these hashes only related to the content of an item or to the properties as well (although  this blog  - Microsoft Office SharePoint Server 2007: Duplicate search results  states that it is only on the content of a document). So in SharePoint Server 2007 these hashes were stored in the MSSDuplicateHashes tables.

In SharePoint Server 2013 these hashes are not stored in the MSSDuplicateHashes table anymore but in the DocumentSignature – this is documented in the article Customizing search results in SharePoint 2013. In the next screenshot I have used the and you will notice that although the document title and some metadata are different for the 5 documents, there are only 2 distinct document signatures. This indicates that the shingle is only calculated using the content of documents and not the metadata or the file name (Content By Search web parts don’t seem to use duplicate trimming). The document signature actually contains 4 checksums and if one of the four matches with another document, the document is treated as a duplicate. This also means that when SharePoint search encounters a document for which it is unable to extract the actual contents, it probably is not able to do proper duplicate trimming.


Since SharePoint Server 2013 search result web parts have duplicate trimming activated and SharePoint 2013 is using a quite coarse algorithm for determining a duplicate, you will see some unexpected results. Luckily after installing the SharePoint 2013 Cumulative Update July 2014 you will have the option to de-activate duplicate trimming within the query builder settings.



Another way to accomplish the same thing is by changing the settings for grouping of results. As outlined in Customizing search results in SharePoint 2013, duplicate removal of search results is a part of grouping. So if you specify to group on DocumentSignature, you would be able to show near duplicates (if one of the 4 checksums is different) but still omit the “complete” duplicates.



But the most elegant solution is the one outlined by Elio in View duplicate results in SharePoint 2013 Search Center via Javascript which allows you to change the “duplicate trimming” setting of the webpart using javascript –allowing your end users to determine themselves whether or not they want to trust the SharePoint duplicate trimming algorithm.
References:

Thursday, April 02, 2015

Big Data and Internet of Things (IOT) links

 

Just a quick roundup of some interesting links to articles, whitepapers and videos on Big Data and IoT. I would be amazed if you haven’t heard from Big Data – but still you might still take a look at these introductory blog posts which mainly cover Big Data from a Microsoft perspective.

Other Big Data and Internet of Things (IOT) links:

Tuesday, March 31, 2015

Overview of Apache Hadoop components in HDInsight, from Ambari to Zookeeper

A couple of months ago I wrote a first post about Microsoft Big Data – Introducing Windows Azure HDInsight. In this post I will delve a little deeper into the different components which are used in HDInsight. This is not an exhaustive list of components but it lists a number of components which you might encounter when working on your first big data project using Microsoft Azure HDInsight.


  • Ambari – provides provisioning, monitoring and management layer on top of Apache Hadoop clusters. It provides a web interface for easy management as well as a REST  API.
  • Flume – allows you to collect, aggregate and move large volumes of streaming data into HDFS in a fault tolerant fashion.
  • HBase – provides NoSQL database functionality on top of HDFS. It is a columnar store, which provides fast access to large quantities of data. HBase tables can have billions of rows and these rows can have almost unlimited number of columns.
  • HCatalog – provides a tabular abstraction on top of HDFS. Pig, Hive and Mapreduce use this layer to make it easier to work with files in Hadoop. HCatalog has been merged into the Hive project. Hive uses it kind of a like a master database. For more details check out Apache HCatalog – a  table management layer that exposes Hive metadata to other Hadoop applications.
  • Hive – allows you to perform data warehouse operations using HiveQL. HiveQL is a SQL like language and provides an abstraction layer on top of MapReduce. Hive allows you to use Hive tables to project a schema onto the data (schema on read). Through the use of HiveQL you can view your data as a table and create queries just as you would in a normal database with support for selects, filters, group by, equi-joins, etc…. Hive inherits schema and location information from HCatalog.  Hive will act as a bridge to many BI products which expect tabular data. One of the recent developments around Hive is the Stinger initiative – its main aim is to deliver performance improvements while keeping SQL compatibility
  • Kafka – is a fast, scalable, durable and fault-tolerant messaging system. It is commonly used together with Storm and HBase for stream processing, website activity tracking, metrics collection and monitoring or log aggregation. It is provides similar functionality as AMQP, JMS or Azure Event Hub
  • Mahout – the goal of Mahout is build scalable machine learning libraries. The main machine learning use cases Apache Mahout support are recommender systems (people who buy x also buy y), classification (assigning data to discrete categories e.g. is a credit card transaction fraudelent or not) and clustering (grouping unstructured data without any training data). For more details take a look at Introducing Mahout (IBM)
  • Oozie – enables you to create repeatable, dynamic workflows for tasks to be performed in a Hadoop cluster. An Oozie workflow can include Sqoop transfers, Hive jobs, HDFS commands, Mapreduce jobs, etc … Oozie will submit the jobs but Mapreduce will execute them.  Oozie also has built-in callback and pollback mechanisms to check for the status of jobs
  • Pegasus provides large scale graph mining capabilities by offering important graph mining algorithms such as degree calculation, pagerank calculation, random walk with restart (RWR), etc .. Most graph mining algorithms have limited scalability, they support up to millions of nodes. Pegasus billion-node graphs. Graphs (also referred to as networks) are everywhere in real life going from web pages, social networks, biological networks and many more… Finding patterns, rules etc within these networks allow you to rank web pages (or documents), measure viral marketing, discover disease patterns, etc … The details of Pegasus can be found in the white paper  Pegasus: a peta-scale graph mining system – implementation and observations.
  • Pig is developed to make data analysis on Hadoop easier. It is made up of two components: a high level scripting language (which is called Pig Latin but most people just reference it as Pig) and an execution environment. Pig Latin is a procedural language which allows you to build data flows, it contains a number of built in User Defined Functions (UDFs) to manipulate data. These UDFs allow you to ingest data from files, streams or other sources, make selections and transform the data. Finally Pig will store the results back into HDFS.  Pig scripts are translated into a series of MapReduce jobs that are run on Apache Hadoop. Users can create their own functions or invoke code in other languages such as JRuby, Jython and Java. Pig will gives you more control and optimization over the flow of the data than Hive does.
  • RHadoop – is a collection of R packages that allow users to manage and analyze data with Hadoop in R, including the creation of map-reduce jobs. Check out Step-by-step guide to setting up an R-Hadoop system and Using RHadoop to predict website visitors to get started with some hands-on examples.
  • Storm – distributed real-time computation system, it supports a set of common stream analytics operations, provides guaranteed message processing with support for transactions. It was originally created by Nathan Marz (see History of Apache Storm and lessons learned) – the guy who cam up with the term Lambda architecture for a generic, scalable and fault tolerant data processing architecture.
  • SQOOP – was built to transfer data from relational structured data stores (such as SQL Server, MySQL or Oracle) to Apache Hadoop and vice versa. Because Sqoop can handle database metadata, it is able to perform type-safe data movement using the data types specified in the metadata.
  • Zookeeper – manages and store configuration information. It is responsible for managing and mediating conflicting updates across your Hadoop cluster.

Thursday, March 26, 2015

People insights– data driven insights regarding people

Whereas marketing and sales as well as financial departments have been using advanced analytics for quite a while, it seems that HR is still in one of the early maturity phases of analytics usage. This  is a view which seemed to be shared by CEOs. In a recent study CEOs gave their HR department a 5.9 (out of 10) for their analytical skills.  (See CEO niet overtuigd van analytische skills HR )

Whereas HR controls a lot of data (and needs to keep it up to date) it does not seem to be able to use this data to provide strategic advise to the board of directors. HR can only deliver truly added value by providing data-driven insights regarding people that are both compelling to business leaders and actionable by HR. This is a view which is also quite nicely outlined by consultancy firm Inostix in their HR Analytics Value Pyramid (See The HR Analytics Value Pyramid (Part 3) ). To make sure that HR team stays current and viable, they will need to adopt a whole need set of skills of which analytics is just one (See The reskilled HR team – transform HR professionals into skilled business consultants  and the capability gap across the 2015 Human Capital Trends)

In a number of upcoming posts I will delve a little deeper into this topic and will show some practical examples of how you can realize some quick wins without a huge upfront investment.

Related links:

SharePoint Saturday 2015 : How to build your own Delve, combining machine learning, big data and SharePoint

BIWUG is organizing the fifth edition of SharePoint Saturday Belgium – this year in Antwerp – for more information check out the site http://www.spsevents.org/city/Antwerp/Antwerp2015/ . Here is the excerpt of the session I will be delivering.

How to build your own Delve: combining machine learning, big data and SharePoint

You are experiencing the benefits of machine learning everyday through product recommendations on Amazon & Bol.com, credit card fraud prevention, etc… So how can we leverage machine learning together with SharePoint and Yammer. We will first look into the fundamentals of machine learning and big data solutions and next we will explore how we can combine tools such as Windows Azure HDInsight, R, Azure Machine Learning to extend and support collaboration and content management scenarios within your organization.

Related posts:

Wednesday, March 04, 2015

BIWUG session on advanced integration between SharePoint Online and Yammer

On the 19th of March BIWUG (www.biwug.be) is organizing its next session – don’t forget to register for BIWUG1903 – we have planned a great speaker and an interesting session

Advanced integration between SharePoint Online and Yammer using Yammer Apps (Speaker: Stephane Eyskens, SharePoint Technical Architect - http://www.silver-it.com/ )

First things first, the session will start describing what are the required steps to bind an Office 365 Tenant with an Enteprise Domain, how to federate on-premises users with Office 365 in order to have a SSO in place and how to bind Yammer to the Office 365 Tenant. Next, developers will learn how to leverage the Yammer App Model in order to build deeper integration between SPO(+on-prem) and Yammer. Business scenarios such as leveraging Yammer's Open Graph in SPO Workflows and associating Yammer Groups to SPO Team sites (& groups) will be covered. Security aspects will be discussed as well : from acting on behalf of a user with his consent to impersonating it completely, we'll see how to manage tokens and discuss some best practices.

Intended audience: The session is primarily intended for developers.

Key benefits: After this session, developers should have a good visibility on how to go beyond the OOTB Yammer App integration with
SPO and what Open Graph is all about.

Also thanks to Xylos for hosting this session

Monday, March 02, 2015

Resetting content index in SharePoint Server 2013: why and how

When you are developing against SharePoint Server 2013 search, you might forced to reset the search index. You can do this using the SharePoint user interface through the screen shown below or using PowerShell. I prefer to use PowerShell since resetting through the user interface seems to give me timeouts especially when the index is a quite large. One of the reasons why you are required to reset your content index is when your Search Service Application got into an unhealthy state because of insufficient disk space (See Fixing the Search Service after the Index Drive fills) but I also noticed that when you are working on your development machine and are making lots of changes to the search schema – it might also be useful to reset the search index for your changes to be picked up. If you want to change it using the user interface go to the Search Administration screen of the Search Service Application and select the “Index Reset” option underneath the crawling section of the left menu.



Don’t just reset your search index in a production environment since this will also impact the analytics processing component (Read Reset the index in SharePoint Server 2013). Listed below is the syntax for the PowerShell command (the snippet below assumes that you only have one SearchServiceApplication)

(Get-SPEnterpriseSearchServiceApplication).Reset($true,$true)

The SearchServiceApplication.Reset method takes two parameters -  public void Reset(    bool disableAlerts,   bool ignoreUnreachableServer) – I would recommend always setting disableAlerts to true if necessary. The value for the second parameter will depend on your specific case. If you also get a timeout when using the PowerShell cmdlet – you can use the steps outlined in SharePoint 2013 Content Index Reset Timeout – they worked for me.

Friday, February 13, 2015

Mindful apps – putting people at the center supported by data

When preparing for my session The future of business process apps – a Microsoft perspective  last year I got inspired by this great article The future of enterprise apps: moving beyond workflows to mindflows – which introduced the concept of mindful apps. The core message is that if we want to automate the last mile we have to analyze how people work day in and day out and start our system/application design with people at the center. One of the quotes which is mentioned in the article is from Bill Murphy (CTO of Blackstone one of the largest investment funds worldwide) – “We aim to take away as much of the stress as possible from easy stuff, by automating the routine and mundane actions, and give users more time to focus on the higher-end pieces of what they need to do.”


Most of the characteristics which are outlined in the comparison between traditional and mindful apps are not revolutionary (See table above) but there is one one important key message.
Mindful apps will allow us to assess and compare options in decision context, they will allow us to quickly respond to events and make the best decision given a specific context and will provide us with “extended intelligence” by understanding and recognizing patterns within the data at hand. We as humans are good at problem solving, pattern recognition, identifying outliers, making creative leaps and incorporating new information when making decisions. We should be able to focus on these high end tasks by being freed from laborious and menial tasks which can be automated.




There are 3 different trends which will impact how these mindful apps will be shaped:
  • User context matters – make it personal. When we make decisions or work within the context of specific processes, there are a lot of parameters which determine how we react or how we make decisions – these parameters should be integrated into the decision framework driving mindful apps. Our calendar, availability of colleagues to reach out to, input from communications (using e-mail, messaging or other formats), information that we capture from blogs, social networks such as LinkedIn or open data sources together with available information within your organization should be filtered and at your fingertips. Machine learning and cognitive algorithms will drive the second machine age (a term coined by Brynolfson from MIT) but we are only at the start of how these algorithms can drive the future workplace for information workers.
  • Mobile shapes our expectations.  Mobile apps and the user experience they provide is shaping at how we see an ideal enterprise application as well. Mindful apps should strive to combine beauty, simplicity and purpose to create an experience that delights us and that is effortless to use. Mobile apps are easy to understand, when people use a good app for the first time, they intuitively grasp the most important features, why can’t we do the same for enterprise apps. Simplicity rules. The apps should also incorporate necessary logic to evolve as the user grows more comfortable with its use and is exploring more advanced functionality. Apps should learn people’s preferences over time and show the interface which is best suited for the task at hand.
  • (Big) data and advanced analytics are the driving force. There is a lot of hype and confusion around the term Big Data but one thing is for sure – storage costs and processing cost have dropped significantly in the last decade. When you combine this with the rise of new storage platforms such as Hadoop, NoSQL datastores  such as HBase, Cassandra, etc … and new data processing frameworks such as Apache Drill, Dremel, Spark, etc..  new opportunities arise to support users in their decision making processes. While there is a lot of emphasis on the 4 Vs (Volume, Velocity, Variety and Veracity) – there is one more V that you have to think about that is Value (Also see  Big Data beyond the hype, getting to the V that really matters)
  • Cloud will lead the way.  A lot of the innovation which will enable this next generation of apps is coming out of the datacenters of Google, Amazon, LinkedIn, Microsoft, Yahoo, etc… but most organizations don’t have the available capacity (nor the same financial resources) as these internet giants. Luckily the economies of scales which are offered by the cloud allows solution providers to provide you with a data infrastructure which can scale from prototype size to production environments able to handle huge amounts of data. The different major cloud players – IBM, Microsoft, Amazon and Google all seem to make big bets in building out the data analytics platform of the future and this competition will drive prices further down. This competition  will also force them to focus on more innovative solutions which allow them to differentiate from the competition.
The best examples where we – as a consumer - see the power of Big Data, Analytics, Machine Learning and the cloud appear is mobile. The three major players (Microsoft, Apple and Google) are relying quite heavily on the cloud computing power and huge data stores to provide the experience of digital assistants. Microsoft is currently working on Cortana (which has been released in a number of countries worldwide), Apple was definitely the trendsetter with Siri and Google has Google Now.




The future is already here — it's just not very evenly distributed. (William Gibson)