Tuesday, August 19, 2014

Must read whitepapers on Enterprise Social Collaboration (Microsoft)

  • Enterprise Social Collaboration Progress Model (June 2013) - Microsoft and the Eller MBA program at the University of Arizona jointly developed the Social Collaboration Progression Model that outlines six stages of social collaboration and their triggers, obstacles, and impacts. The six phases identified in this model represent a progression that can be used to ascertain an organization's current state in the social collaboration paradigm. The paper identifies the prerequisites that are necessary to move to a chosen target stage, the obstacles that must be overcome, and the impacts of such a transition. The paper also addresses a high-level view of how social collaboration applies to the divisional levels within an organization.
  • Social computing in the enterprise – building a social computing strategy  (December 2012) - This document describes the Microsoft vision for social computing in the enterprise and explains how to build an effective social computing strategy. It is designed to help C level executives and enterprise architects appreciate the value of enterprise social computing, understand the Microsoft vision for enterprise social computing, and grasp what’s involved in building a strategy for social computing in the enterprise.
  • Explore enterprise social scenarios (June 2014) - Understand common scenarios for enterprise social that can be built with Microsoft products, including Yammer, Office 365, SharePoint Server, Lync, and Microsoft Dynamics CRM.


Monday, August 18, 2014

BIWUG–Call for new board members


During our last board meeting we received the news that both Patrick and Jethro decided to give up their role as board member. We'd like to thank them both for the efforts they put into BIWUG and SharePoint Saturday. Special thanks go to Patrick for re-launching the user group three years ago and contributing to the well oiled machine that the board is at this moment.

So, we're again looking for members who'd like to take up a more active role inside our community. If you're interested in taking up some responsibilities, please let us know by sending me a DM using Twitter (@jopxtwits) or contact me by e-mail joris.poelmans@biwug.be . The members who sent us a response last time are automatically considered (if you aren't interested any more, please let us know).

Again, thanks to Patrick and Jethro and good luck in their new challenges.

Kind regards,

The BIWUG team

Tags van Technorati: ,,,

Thursday, August 07, 2014

Updating SharePoint 2013 user profiles using C#

I recently wrote a console program which needed to update some user profile properties for specific users – unfortunately the code gave me the following error when trying to create a SPServiceContext and accessing the UserProfileManager.

Microsoft.Office.Server.UserProfiles.UserProfileApplicationNotAvailableException was unhandled   HResult=-2146233088
Message=UserProfileApplicationNotAvailableException_Logging :: UserProfileApplicationProxy.ApplicationProperties ProfilePropertyCache does not have ….

To get this to work make sure that the user account under which you are running the console app has Full control on the User Profile Service Application – see screenshot below. Go to SharePoint Central Administration > Manage Service Applications > Select the User Profile Service Application and click Permissions in the ribbon.

Solve image rendering problems in Internet Explorer 11 on HP ZBooks

I recently got a new HP ZBook with Windows 8.1 and Internet Explorer 11 installed. While browsing to different websites (Yammer, Twitter, Google,etc …) I noticed that images were not properly rendering (see screenshot below).

To resolve this you have to change the Accelerated Graphics settings in Internet Explorer. Go tool Tools>Internet Options. Click the Advanced tab and then under Accelerated graphics make sure that you check the Use software rendering instead of GPU rendering.

Monday, August 04, 2014

Big Data – Beyond the hype, getting to the V that really matters

One of my favourite cartoons on big data by Tom Fishburne is shown below, and as the saying goes “Bad humor is an evasion of reality, good humor is an acceptance of it” it reveals an interesting fact about big data. Even though it currently sits a the top of Gartner’s hype cycle (See Gartner’s 2013 hype cycle for emerging technologies, august 2013), there is still a lot of confusion out there. So let’s first try to get some clarity on the concept.

There are a lot of definitions about Big Data but the one which is still widely used was coined by Gartner :
Big data is high Volume, high Velocity, and/or high Variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization (Gartner, The importance of Big Data: A definition, June 2012)

This 3V definition is actually not new, it was first coined by Doug Laney in Februari 2001 when he wrote about 3D Data management (See Deja VVVu: Others claiming Gartner’s Construct for Big Data ). In most definitions a fourth V (for veracity) is added and Gartner has recently released a report talking which goes one step further and which talks about the 12 dimensions of Big Data or Extreme Information Management (EIM). Let’s delve a little deeper into these 4 Vs.

Volume – constantly moving target
The size of data requiring to be processed seems to be a constantly moving target. Big data which was initially characterized as a few dozen TBs in a single dataset has now evolved to several PB and the volume seems to be increasing. The current idea is that data is characterized as big when the size of the data breaks barriers for traditional relational database management systems and when the volume prohibits processing the volume in a cost effective and fast enough manner.

There are a number of factors which are driving this tremendous growth. We currently live in an age where most information is “born digital”, it is created, by persons or a machine, specifically for digital use, key examples are email and text messaging, GPS location data, metadata associated with phone calls (so called CDRs or Call Detail Records),  data associated with most commercial transactions: credit card swipes, bard code reads, data associated with portal access (key cards or ID badges), toll-road access, traffic cameras, but also increasing data from cars, televisions, appliances – the so called “Internet of Things”. IDC estimated that there existed 2.8 zetabytes (ZB) – where one ZB = 1 billion TB -  of data in the world in 2012. 90% of it was created in the past 2 years (IDC Digital Universe study, 2012). 

In the near future, the amount of data will only increase with the majority of this increase being driven by machine generated data from sensors, RFID chips, NFC communications and other appliances. According to Cisco CTO Padmasree Warrior, we currently have 13 billion devices connected to the internet, this will increase to 50 billion in 2020 (See some predictions about the Internet of Things and wearable tech from Pew Research for more details)

Data capture has become nearly instantaneous in this digital age thanks to new customer interaction  points and technologies such as web sites, social media, smartphone apps, etc… but we are also still capturing data from traditional sources such as  ATM data, point-of-sale devices and other transactional systems, etc … These kinds of rapid updates present new challenges to information systems. If you need to react in real-time to information traditional data processing technology simply will not suffice. Data is in most case only  valuable when it is processed in real time and acted upon.  Custom-tailored experiences like Amazon’s recommendation engine or personalized promotions  are the new norm.

Variety of data
Much of the data is “unstructured” meaning that it doesn’t fit neatly into the columns and rows of a standard relational database. The current estimate is that 85% of all data is unstructured.  Most social data is unstructured(such as book reviews on Amazon, blog comments, videos on YouTube, podcasts or tweets,…)  but also clickstream data, sensor data from cars, ships, RFID tags, smart energy meters,… are prime examples of unstructured data.

Connected devices that track your every heartbeat and know if you are sleeping or running hold the promise to usher in an era of personalized medicine.  The debate about whether the “Quantified self” is the holy grail of personalized medicine or just a hype is still ongoing.

Veracity is all about the accuracy or “truth” of information being collected - since you will be unlocking and integrating data from external sources which you don’t control you will need to verify. Data quality and integrity of data are more important than ever. I will delve a little deeper into this topic in a future blog post.
As outlined in Big data opportunities in vertical industries (Gartner, 2013) – the challenge and also the opportunities differ by industry. But finally it is always about the value of the data

Value of data
Approaching Big Data as a data management challenge is very one-sided. It’s not really important to know how many PB of ZB of data your business has accumulated, the issue is how to get value out of the data.  The key here is analytics. Analytics is what makes big data come alive.  But the nature of big of data will require us to change the way that we need to analyze this data.  Traditional reporting and historical analytics will not suffice and are often not suited for big data. You will need to look at a predictive analytics, text analysis, data mining, machine learning, etc …

One of the most popular aspects of Big Data today is the realm of predictive analytics. This embraces a wide variety of techniques from statistics, modeling, machine learning and data mining, etc …. These tools can be used to analyze historical and current data and make reliable projections about future or otherwise unknown events. This means exploiting patterns within the data to identify anomalies or areas of unusualness. These anomalies can represent risks (e.g. fraud detection, propensity to churn,…) or business opportunities such as cross-sell and up-sell targets, credit scoring optimization or insurance underwriting.

Still a lot of challenges remain, according to the results of the Steria’s Business Intelligence Maturity Audit performed with 600 different companies in 20 different countries,  only 7% of European companies consider Big Data to be relevant. On the other hand we have McKinsey predicting a 600 billion USD estimated revenue shift by 2020 to companies that use Big Data effectively (Source: McKinsey, 2013, Game changes: five opportunities for US growth and renewal). In general, companies seem to struggle, 56% of companies say getting value out of big data is a challenge and 33% say they are challenged to integrate data across multiple sources.

Thursday, July 31, 2014

Microsoft Big Data– looking into the HDInsight Emulator

As outlined in a previous post Introducing Windows Azure HDInsight, HDInsight is a pay-as-you go solution for Hadoop-based big data processing running on Windows Azure. But you don’t even have to use Azure to develop and test your code. Microsoft has an HDInsight Emulator that runs locally on your machine and that simulates the HDInsight environment in Azure locally.

HDInsight emulator is installed through the Microsoft Web Platform Installer (WebPI) a 64-bit version of Windows (Windows 7 SP1, Windows Server 2008 R2 SP1, Windows 8 or Windows Server 2012). It also seems to install without issues on Windows 8.1.

After you have installed it you will see three shortcuts on your desktop: Hadoop command line, Hadoop Name Node status and Hadoop MapReduce Status. You should also see three folders created on your local disk (Hadoop, HadoopFeaturePackSetup and HadoopInstallFiles) as well as a number of new windows services being added. The HDInsight emulator is actually a single node HDInsight cluster including both name and data node and use local CPU for compute.

One of the sample Mapreduce jobs which is included is WordCount. To try it out we will first download the complete works of Shakespeare from project Gutenberg. The file is called pg100.txt and we copy it into the folder c:\hadoop\hadoop-1.1.0-SNAPSHOT – next just follow the steps as outlined in Run a word count MapReduce job. For most HDFS-file specific commands use hadoop fs –<command name>  where most of the commands are Linux system commands. For a full list of support Hadoop file system commands, type hadoop fs at the Hadoop command prompt, for a full explanation check out the Hadoop FS Shell Guide. The Hadoop command line shell can also run other Hadoop applications such Hive, Pig, HCatalog, Sqoop, Oozie,etc … To actually run the Wordcount sample job on the HDInsight emulator run:

hadoop jar hadoop-examples.jar wordcount /user/hdiuser/input /user/hdiuser/output

Once the job is submitted you can track its progress using MapReduce status webpage. When the job is finished, the results are typically stored in a fixed file named part_r_NNNNN where N is the file counter.

A second sample which is also included is estimating PI using Monte Carlo Simulation – check out this excellent video explaining the principle behind estimating PI using Monte Carlo Simulation. For an explanation of how to use this with an actual HDInsight cluster check out the Pi estimator Hadoop sample in HDInsight

hadoop jar hadoop-examples.jar pi "16","10000000"

Tags van Technorati: ,,,,

Wednesday, July 30, 2014

Microsoft Big Data - Introducing Windows Azure HDInsight

Somebody once said - if you're going to stick around in this business, you have to have the ability to reinvent yourself, whether consciously or unconsciously.

This is the first in a series of blog posts about Big Data from a Microsoft perspective. I have always used my blog as a notebook as it helped me to get a clearer view on different topics. I hope that you stay with me in this journey in the exciting world of big data.

One of the challenges with working with big data is that the volume and the expected growth of data volume can be quite hard to predict. When starting with Big Data a cloud platform is an ideal way to start given its pay per use model and the flexible scalability model. Another thing to consider is the fact that Big Data technology evolves quite rapidly and cloud providers such as Microsoft will evolve along giving you the opportunity to work with the latest technology. So if you are just getting started and you have a Microsoft background Windows Azure HDInsight might be a good place to start. Also remember that if you have an MSDN account you are eligible for Azure monthly credits up to 150 USD.

Microsoft worked together with Hortonworks to build their Hadoop-based big data solution, the Hortonworks Data Platform (HDP). It exists in 3 different flavors:
  • HDInsight is an Apache Hadoop-based distribution running in the cloud on Azure. Apache Hadoop is an open source framework that supports data-intensive distributed applications. It uses HDFS storage to enable applications to work with 1000s of nodes and petabytes of data using a scale-out model.
  • Hortonworks Data Platform( HDP) for Windows is a complete installation package which can be installed on Windows Servers running on premise or on virtual machines running in the cloud.
  • Microsoft Analytics Platform System (formerly called Microsoft PDW)
In this post we will focus on Windows Azure HDInsight. HDInsight 3.1 which contains HDP 2.1 and is currently the default version for new Hadoop clusters (Always check Microsoft HDInsight release notes for the latest version info). At the core HDInsight is providing the HDFS/MapReduce software framework but related projects such as Pig, Hive, Oozie, Sqoop and Mahout are also included. 

HDFS (Hadoop File System) is a distributed file system designed to run on commodity hardware and is highly fault tolerant by nature. HDFS was developed by Doug Cutting and Mike Cafarella when they worked at Yahoo on the Nutch search project in 2005 and was inspired by the Google GFS white paper (See an interview with Doug Cutting, the founder of Hadoop (April 2014) and How Yahoo spawned Hadoop, the future of Big Data). In Hadoop, a cluster of servers stores the data using HDFS, each node in the cluster is a data node and contains a HDFS data store and execution engine. The cluster is managed by a server called the name node.

This distributed file system however poses some challenges for the processing of data and this is where the MapReduce paradigm  comes in which was also inspired by Google (MapReduce: Simplified Data Processing on Large Clusters, 2004). The term itself refers to the two basic computations in distributed computing, map (determining where the data is located in the different nodes and moving the work to these nodes) and reduce (bringing the intermediate results back together and computing them). These Mapreduce functions are typically written in Java, but you can use Apache streaming to plug in other languages

There are a number of advantages of using HDInsight:
  • You can quickly spin up a Hadoop cluster using the Azure Portal or using Windows PowerShell
  • You only pay for what you use. When your Hadoop processing jobs are complete, you can deprovision the the cluster and retain the data because Azure HDInsight uses the Azure Blob storage as the default file system which allows you to store data outside of the HDFS clusters.
  • Microsoft provides deep integration into the rest of their BI stack such as PowerPivot, Powerview, Excel, etc. …
  • HDInsight is exposed using familiar interfaces for Microsoft developers such as a .NET SDK (see for example Submit Hive jobs using HDInsight .NET SDK) and PowerShell
That’s it for now – in a next post I will delve a little deeper in how you can setup your first HDInsight cluster, the different architecture components and I will look into the HDInsight emulator.


Tags van Technorati: ,,,,

Wednesday, June 25, 2014

Driving sustainable user adoption for SharePoint and Yammer – Part I

A while ago I did a presentation at the Future of Business is sharing event on user adoption of collaboration technologies within companies. A good user adoption strategy – this is not the same as change management – is key in getting a collaborative environment accepted within a company.

Although SharePoint has been hugely successful in the past few years – there is still a gap in satisfaction between IT pros and business managers – SharePoint met the expectations of  73% of the former, and of 62% of the latter (Source: Microsoft SharePoint faces tough future, Forrester says). In my opinion, this is because in most organizations, user adoption is just an afterthought. People often confuse user adoption with training, so a typical reaction is – “let’s send our end users to some training” – probably a technical training about site collections, versioning, web parts, etc… and what happens 5 weeks after the training they fall back in the old way of working.  Training is important, it allows people to make the first jump, but it has to be contextual, if you have build a solution around managing projects with SharePoint, make a specific training on the benefits that people will get when using SharePoint.

But user adoption really is about people getting to know your solution, understand it and use it in a correct manner. In the end the success of a deployment such as Office 365/Yammer or SharePoint Online is measured by sustained user adoption. Why the emphasis on “sustained”?  A study by the Altimeter Group about usage of social computing tools found that after an initial spike in enthusiasm and usage, you typically see a gradual decline in usage until only limited groups within your company are you using the solution. [Altimeter Group– making the business case for Enterprise Social Networks] – this applies to collaboration tools in general.

Getting beyond the early adopters of a technology solution, is not easy and in a perfect world – with only IT consultants (just kidding :-)) - everything would work magically. In a real world “Build and they will come” simply does not work. one of the main reasons is that people are fundamentally resistant to change, so you will need to put some effort in it to explain them why they need to change. If a new idea or way of work is initiated and self-sustaining, it will only survive if it gets adopted by a critical mass of users (typically you will need to at least >50% adoption)

The two most important things to focus on are Why and What?  Why do you need SharePoint (or Yammer, Yambla,…) – and what are the business problems you are going to solve or mitigate.  People know about file shares and they work with on a daily basis and then comes along this  this new product – SharePoint – but untill you can explain them how they need to use in their daily work and routines, they will not adopt it. Social, Yammer – why? What is the added value of using the SharePoint newsfeed or Yammer groups to someone in accounting – don’t push a certain feature if you can’t answer why people would need it and how they can use it.

It is interesting to see that even after you have deployed a file sharing and collaboration solution, people still send out e-mails with attachments instead of a link to the file. There are two killer applications in an enterprise, Excel and e-mail. And even with companies advocating zero-email (but struggling to actually make it happen), I don’t see e-mail disappearing anytime soon. So instead of banning it, embrace it and integrate it into your solution and use it in the more efficient way.
In a next post I will talk about how to leverage a user adoption team to make your collaboration platform deployment a success.
“Success starts with deployment, it does not end with deployment” – a it is necessary but not sufficient.

Future of business is sharing - IMEC Share, innovate, collaborate and excel