Monday, December 15, 2014

Introducing Azure Stream Analytics

Azure Stream Analytics  which is currently in preview is a fully managed real-time stream analytics service that aims at providing highly resilient, low latency, and scalable complex event processing of streaming data for scenarios such as Internet of Things, command and control (of devices) and real-time Business Intelligence on streaming data.

Although it might look similar to Amazon Kinesis, it seems to distinguish itself by aiming to increase developer productivity by enabling you to author streaming jobs using a SQL-like language to specify necessary transformations and it provides a range of operators which are quite useful to define time-based operations such as windowed aggregations (Check out Stream Analytics Query Language Reference for more information) – listed below is an example taken from the documentation which finds all toll booths which have served more than 3 vehicles in the last 5 minutes (See Sliding Window – slides by an epsilon and produces output at the occurrence of an event)

SELECT DateAdd(minute,-5,System.TimeStamp) AS WinStartTime, System.TimeStamp AS WinEndTime, TollId, COUNT(*) 
FROM Input TIMESTAMP BY EntryTime
GROUP BY TollId, SlidingWindow(minute, 5)
HAVING COUNT(*) > 3

This SQL like language allows for non-developers to built stream processing solutions through the Azure Portal and allows to easily filter, project, aggregate and join streams, add static data (master data) with streaming data and detect patterns within the data streams without developer intervention.

 

Azure Stream Analytics leverages cloud elasticity to scale up or scale down the number of resources on demand thereby providing a distributed, scale out architecture with very low startup costs. You will only pay for the resources you use and have the ability to add resources as needed. Pricing is calculated based on the volume of data processed by the streaming job (in GB) and the number of Streaming Units that you are using. Streaming Units provide the scale out mechanism for Azure Stream Analytics and provide a maximum throughput of 1MB/sec. Pricing starts as low as €0.0004/GB and €0.012/hr per streaming unit (roughly equivalent to less than 10€/month). It also integrates seamlessly with other services such as Azure Event Hub, Azure Machine Learning, Azure Storage and Azure SQL databases.

References



Thursday, December 11, 2014

SharePoint deep dive exploration: SharePoint alerting

This is the second in a series of blogpost on SharePoint Server 2013 in which we will explorer how e-mail alerting works in SharePoint 2013. For part 1 – take a look at SharePoint deep dive exploration: looking into the SharePoint UserInfo table.

If you need to know more about how alerts are working at the lowest level you should take a look at SharePoint 2003 Database tables documentation – for alerts this documentation still seems to be valid. SharePoint stores the list of events for which users have request alerts in the EventCache Table – in comparison to SharePoint 2003 there are some extra fields available (marked in bold). For some of the fields I did not find a

The other tables which are manipulated by the the SharePoint alert framework are EventLog, ImmedSubscriptions, SchedSubscriptions and EventSubsMatches (For an in depth discussion also take a look at the Working with search alerts in SharePoint 2010 ).  Every event is recorded in these table but since the EventType and EventData column will contain the most data, these are only filled in when the list has at least one subscription.

So how does this works – there actually is a SharePoint timer job – called the “Immediated Alerts” job which is scheduled to run every 5 minutes. This will pick up the necessary event information and will process it (in batches of 10.000) – if you see issues with alerts not being sent out – I recommend you to take a look at SharePoint Scheduled Alerts Deconstructed

Column Name Description
EventTime Time when the event record was inserted into the database
SiteId ID of the site, available from the AllSites table
WebId ID of the web, available from the AllWebs table
ListId ID of the list in which the monitored item appears
ItemId ID of the item that raised the event
DocId ID of the document that raised the event
Guid0 ?
Int0 ?
Int1 ?
ContentTypeId ?
ItemName Full name of the item
ItemFullUrl Full path of the item
EventType ItemAdded(1), Item Modified (2), Item Deleted (4), DiscussionAdded (16), Discussion Modified(32), Discussion Deleted(64), Discussion Closed (128), Discussion Activated (256), …
ObjectType  
ModifiedBy User name of the person who raised the event
TimeLastModified Time when the event occurred
EventData The binary large object (BLOB) containing all of the field changes with the old and new values for an item
ACL The ACL for the item at the time it is edited
DocClientId  
CorrelationId  

The reason why I started looking into these tables because I got feedback from a client that all e-mail alerts which were being sent out had the wrong link in it after we migrated their environment from SharePoint 2007 to 2013. One of the first things that I did was actually sit next to the user who was adding documents in SharePoint and then I noticed something strange. The user uploaded a document and when they needed to fill in extra metadata, they immediately changed the name of the document.

After looking into how alerting works I still did not get an explanation for why the links were sent out correctly before in 2007 – because this should have failed as well. So I used this PowerShell script to create an export of all the e-mail alerts/subscriptions that users had in SharePoint and I noticed that most of the alerts were on just a couple of libraries and then I found it.

In SharePoint 2007, they had a “require check out” set by default on these libraries – this means that when the user uploaded and renamed the document, it was not yet visible to other users and the alert was not send out. If checkout is not required then the files are immediately visible and the “New Item Added” immediate alerts is fired – this was the behavior that they were seeing in 2013.

So the “require checkout” is an interesting workaround to prevent a file from being visible before it is explicitly checked in. Since they were changing the file properties (and even the filename) before the file is visible to users, the New Item alerts would not trigger and users would only be notified of the “Changed Item” alert when the file was checked in.

The reason why we deactivated “require check out” was because of it would conflict with co-authoring but apparently they would never use this feature for these specific libraries for which these alerts were set. So the morale of the story, don’t just activate or change a specific functionality because it is available in a new version but first look at how people are actually using it.

References:

 

BIWUG on blueprint for large scale SharePoint projects and display templates

 

On the 16th of December BIWUG (www.biwug.be) is organizing its next session – don’t forget to register for BIWUG1612 because there are some great sessions planned.

SharePoint Factory: a blueprint for large scale SharePoint projects (Speaker: Lana Khoury, Senior EIM Consultant at CGI Belgium responsible for CGI Belgium’s Microsoft Competency Centre, and the Digital Transformation Portfolio)

Large Notes 2 SharePoint transformations do require a standardized approach in development and project management in order to assure the delivery in time and quality.The SharePoint Factory has been developed, to allow parallel development of applications and support all stages of the development process by having standardized quality gates, test procedures and templates for example requirements analysis templates. Essentially, the SharePoint Factory can be compared to an assembly line in the automotive industry.This approach is combined with a SharePoint PM as a Service offering which is a blueprint for the Management of Large Scale SharePoint projects and does provide a specific PM Process with SharePoint centric artefacts, checklists and documents. The approach has been developed within a 6.500 person day Project in Germany and has already been published to German .net Magazin, SharePoint Kompendium and Dutch DIWUG Magazine.

Take your display template skills to the next level (Speaker: Elio Struyf, senior SharePoint consultant at Ventigrate -  http://www.eliostruyf.com/)

Once you know how search display templates work and how they can be created. It is rather easy to enhance the overall experience of your sites compared with previous versions of SharePoint. In this session I will take you to the next level of display templates, where you will learn to add grouping, sorting, loading more results, and more. This session focuses on people that already have the basic understanding of what search display templates are, and how they can be created.

18:00 - 18:30 ... Welcome and snack

18:30 - 19:30 ... SharePoint factory: a blueprint for large scale SharePoint projects (Speaker: Lana Khoury)

19:30 - 19:45 ... Break

19:45 - 20:45 ... Take your display template skills to the next level ( Speaker: Elio Struyf )

20:45 - …      ... SharePint!

 

Tags van Technorati: ,,,

Tuesday, November 25, 2014

Get early access to new features in Office 365 and provide feedback with Uservoice

Microsoft is offering a new First Release program. If you opt-in to join this new First Release program, you get to test the new features for Office 365, SharePoint Online, and Exchange Online a couple weeks before they roll out to everyone else. To activate it go to Office 365 Admin Center > Service Settings > Updates. You will get a warning stating that activation of new features might take up to 24 hours to complete – so be patient.



The last couple of months a couple interesting new functional modules such as Delve, Yammer Groups and the new App launcher have been pre-released on Office 365 for which some might only become visible after you have activated first release. Remember that there also is an Office 365 for business public roadmap available (at office.com/roadmap) where you can see which functionality is being rolled out and which is under development. For more information check out the links below.

Also remember that you can always use the Office Developer Platform Uservoice (http://officespdev.uservoice.com/) to give feedback and request changes. You can submit your feedback for a specific change and encourage others who you know to support these changes by voting for them. If you want to give feedback with regards to InfoPath – there is a Microsoft Office Forms vNext User Voice (http://officeforms.uservoice.com/) as well.

References
Tags van Technorati: ,,,,

Tuesday, November 18, 2014

Understanding Azure Event Hubs–ingesting data at scale

Azure Event Hubs are an extension on the existing Azure Service Bus which provides hyper-scalable stream ingestion capabilities. It allows different producers (devices & sensors – possibly in the 10 thousands) to send continuous streams of data without interruption. There are a number of different scenario in which you typically see this kind of streaming data from different sensors such as future oriented scenarios such as connected cars, smart cities but also more common scenarios such as application telemetry or industrial automation.

Event hubs scaling is defined by Throughput Units (TUs) which is kind of like a pre-allocation of resources. A single TU is able to handle up to 2 MB/s for writes or 1000 events per second and 2MB/s for read operations. Load in the Event Hub is determined by creation of partitions, these partitions allow for parallel processing both from the consumer and producer side. Next to support for common messaging scenarios, competing consumers, it allows provide data retention policies up to 84 GB of event storage per day. The current release supports up to 32 partitions but you can log a call to increase this up to a 1000 partitions. Since a partition is allocated at most 1 TU, this would allow for 1GB/s data ingest per Event Hub. Messages can be send to an Event Hub publisher endpoint via HTTPS or AMQP 1.0, consumers can retrieve messages using AMQP 1.0

Building such an application architecture is quite challenging and Event Hubs allows you to leverage the elasticity of the cloud and a pay per use model to get started quite rapidly. Whereas current scaling of this type of systems is oriented at 10s of thousands of units, expectations are that this number will increase quite rapidly. Gartner expects the number of installed IoT units to increase up to 26 billion by 2020, other estimates are event pointing at 40 billion IoT units (Internet of Things by the Numbers: estimates and forecasts)

References:

Monday, November 17, 2014

Webinar: What’s new on the Microsoft Azure Data Platform

On Thursday 20th of November I will be delivering a webinar on the new capabilities in the Microsoft Azure Data Platform.  With the recent addition of three new services - Azure Stream Analytics, Azure Data Factory and Azure Event Hubs - Microsoft is making progress in building the best cloud platform for both big data solutions as well as enabling the Internet of Things (IoT). These additions will allow you to process, manage and orchestrate data from Internet of Things (IoT) devices and sensors and turn this data into valuable insights for your business.

The above mentioned new services extend Microsoft's existing big data offering based on HDInsight and Azure Machine Learning. HDInsight is Microsoft's offering of Hadoop functionality on Microsoft Azure. It simplifies the setup and configuration of Hadoop cluster by offering it as an elastic service. Azure Machine Learning is a new Microsoft Azure-based tool that helps organization build predictive models using built in machine learning algorithms all from a web console.

In this webinar I will show what are the key capabilities of these different components, how they fit together and how you can leverage them in your own solutions.

Register for this free webinar “What’s new on the Microsoft Azure Data Platform” and get up to speed in less than one hour.

Wednesday, November 12, 2014

BIWUG on apps for SharePoint Server 2010 and data driven collaboration

 

On the 26th of November BIWUG is organizing our next session – don’t forget to register for BIWUG2611 because there are some great sessions planned.

Writing apps on SharePoint Server 2010 (Speaker: Akshay Koul, SharePoint CoOrdinator at Self, http://www.akshaykoul.com )

The session is geared towards developers/advanced users and explains how you can write enterprise level applications on SharePoint 2010 without any server side code.  We will go through real life applications and discuss the mechanisms used, the provisioning process, debugging techniques as well as best practices. The application written are fully compatible with Office 365/SharePoint Online and SharePoint Server 2013.

Preparing for the upcoming (r)evolution from User Adoption to Data-Driven Collaboration (Speaker: Peter Van Hees, MVP Office 365/Collaboration architect, http://petervanhees.com )

As Consultants we (try to) listen to our customer, (try to) address the requirements ... and finally (try to) deploy the solution. This seems like an easy job, but in reality Collaboration projects - and especially SharePoint or Yammer implementations - are a little more challenging. The fast adoption of cloud computing has introduced a new currency for license-based software: User Engagement. If you can’t engage your users, your revenue stream will start to spiral downwards. It should be obvious that Office 365 (and all of its individual components) are not exempt. We all need to focus on the post deployment!

This story bears its roots in my hands-on experience while trying to launch Yammer initiatives. It seems that everyone agrees that Yammer is a wonderful and viral service ... yet, the conversations seems to flat line in most organizations. We will review how you should (already) be addressing User Adoption now; but, more importantly, we will spend more time to look into the stars … a future where Data-Driven Collaboration will take User Engagement to the next level. This isn't a story about Delve. It's about ensuring you integrate data in all your projects to prepare for the future. The age of smart software …

18:00 - 18:30 ... Welcome and snack

18:30 - 19:30 ... Writing apps on SharePoint Server 2010 (Speaker: Askhay Koul)

19:30 - 19:45 ... Break

19:45 - 20:45 ... Preparing for the upcoming (r)evolution from User Adoption to Data-Driven Collaboration( Speaker: Peter Van Hees )

20:45 - …      ... SharePint!

Tuesday, October 14, 2014

Getting Virtualbox to work on Windows 8.1

Quick tip for those of you who want to try install Virtualbox on Windows 8.1 – use one of the older versions  -  VirtualBox-4.3.12-93733-Win.exe  worked for me (download locationDownload Virtualbox Old builds). More recent versions seem to crash when you try to start a virtual image – see screenshot below.




If you are already using Hyper-V you will also need to create a dual boot since Virtualbox is not compatible with Hyper-V. You can do this using the commands listed below from an administrative command prompt (As outlined in this blog post from Scot Hanselman – Switch easily between VirtualBox and Hyper_V with a BCDEdit boot entry in Windows  8.1 )
C:\>bcdedit /copy {current} /d "No Hyper-V" 
The entry was successfully copied to {ff-23-113-824e-5c5144ea}. 

C:\>bcdedit /set {ff-23-113-824e-5c5144ea} hypervisorlaunchtype off 
The operation completed successfully.

When booting you will be provided with an option to boot with Hyper-V support or without Hyper-V support.


Tags van Technorati: ,,,

Tuesday, September 23, 2014

SharePoint deep dive exploration: looking into the SharePoint UserInfo table

A couple of months ago we encountered some issues after upgrading a SharePoint 2007 environment to SharePoint 2013 using a migration tool. One of the symptoms was that information about users was incorrectly displayed. This led us into the looking into how SharePoint stores user information inside its databases.

The user information which is being displayed in a created by or a modified by field in SharePoint it is not being directly retrieved from Active Directory but it is retrieved from an internal SharePoint table called the UserInfo table.


All users in Active Directory are not immediately added to this table. When a user is explicitly added to a site collection using security settings, it is added to the UserInfo table. Another way that user info is created in this table, is when a user  is granted access through an Active Directory group and the user visits the site for the first time.

Users which are deleted from  a site collection, will still be found in the UserInfo table but with a flag bDeleted set to True (1).  When the people picker queries the UserInfo table, it will not include user with bDeleted set to 1. The All people page ( /_layouts/people.aspx?Membershipgroupid=0) will also only list users where bDeleted equals 0. This also means that even when people leave your organization and their Active Directory account is disabled (or removed), the Created By and Modified By columns will still display the name of the user. The general recommendation is to leave this mechanism as it was designed but there are border case scenarios in which you want to delete users – if so you can take a look at Delete users and clean up user information list in SharePoint

There also are two different timer jobs which synchronize information from the User Profile Service Application to all site collections:
  • User Profile to SharePoint Quick Synchronization – runs default every 5 minutes – synchronizes information for users recently added to a site collection
  • User Profile to SharePoint Full Synchronization – runs default every hour -
(For a full list of all out of the box timer jobs in SharePoint Server 2013 check out SharePoint Server 2013 – Timer Job reference) These job will only synchronize users where the tp_IsActive Flag is true(1) in the UserInfo table. The reasons for this is performance since synchronizing all users would be quite resource intensive. tp_IsActive is set to true when a user first visits a site collection or when he is granted Contribute permissions explicitly on a site.
References:

Thursday, September 18, 2014

SharePoint Server 2013–Error on manage user properties page–your search encountered an error

Last week when I wanted to modify some user properties in a SharePoint Server 2013  - I got a very strange error “Your search encountered an error. If the problem persists, contact the portal site administrator”

I first checked search but I saw no errors in there – it seems that the error description is not really guiding you in the correct direction. The solution was starting the ForeFront Identity Manager Service (Run command>Services.msc) as outlined in this blogpost UserProfile Service, Managed properties are not available

Monday, September 15, 2014

BIWUG SharePoint Quiz 2014

BIWUG invites you to the second SharePoint Quiz: have a great evening while learning some interesting stuff!

The rules are similar to last year:
•Registration is required
•Choose your own team
•Up to 3 members for each team
•Share your team name using #biwug #SPQuizBE
•Unique and funny SharePoint related team name is a bonus
•Female members is a plus!
•40 questions, 4 rounds
•No cheating allowed
•Post a question on twitter with #SPQuizBE and boost your chances to get the highest score

Register for the BIWUG Quiz 2014

PS Don’t forget to check out our completely redesigned BIWUG website

Wednesday, September 10, 2014

Office 365 urges you to stay up-to-date with Internet Explorer

A couple of weeks ago I first noticed a new warning message in Office 365 stating:
“Beginning January 12, 2016, only the most recent version of Internet Explorer available for a supported operating system will receive technical support and security updates. We will work to update and clarify our Office 365 system requirements soon and communicate this to you via Message Center.”



This is actually quite a bold statement from Microsoft, because you take a look at the August 2014 browser market share (http://www.w3counter.com/globalstats.php), the usage of previous versions of Internet Explorer is still quite high but Internet Explorer 11 only having a slightly higher market share than Internet Explorer 8 and 9.



If you take a look at the Office 365 System Requirements (also updated mid August) you will also notice the following statement:
Office 365 is designed to work with the following software:
  • The current or immediately previous version of Internet Explorer or Firefox, or the latest version of Chrome or Safari.
  • Any version of Microsoft Office in mainstream support.
For the moment this means that as Office 365 continues to evolve and new functionality is added you will encounter more and more issues with older versions of Internet Explorer such as version 8.0 and 9.0. I’m wondering whether this will not drive more users to Google Chrome and Mozilla Firefox instead of the latest version of Internet Explorer.

Tuesday, August 19, 2014

Must read whitepapers on Enterprise Social Collaboration (Microsoft)

  • Enterprise Social Collaboration Progress Model (June 2013) - Microsoft and the Eller MBA program at the University of Arizona jointly developed the Social Collaboration Progression Model that outlines six stages of social collaboration and their triggers, obstacles, and impacts. The six phases identified in this model represent a progression that can be used to ascertain an organization's current state in the social collaboration paradigm. The paper identifies the prerequisites that are necessary to move to a chosen target stage, the obstacles that must be overcome, and the impacts of such a transition. The paper also addresses a high-level view of how social collaboration applies to the divisional levels within an organization.
  • Social computing in the enterprise – building a social computing strategy  (December 2012) - This document describes the Microsoft vision for social computing in the enterprise and explains how to build an effective social computing strategy. It is designed to help C level executives and enterprise architects appreciate the value of enterprise social computing, understand the Microsoft vision for enterprise social computing, and grasp what’s involved in building a strategy for social computing in the enterprise.
  • Explore enterprise social scenarios (June 2014) - Understand common scenarios for enterprise social that can be built with Microsoft products, including Yammer, Office 365, SharePoint Server, Lync, and Microsoft Dynamics CRM.

 

Monday, August 18, 2014

BIWUG–Call for new board members

 

During our last board meeting we received the news that both Patrick and Jethro decided to give up their role as board member. We'd like to thank them both for the efforts they put into BIWUG and SharePoint Saturday. Special thanks go to Patrick for re-launching the user group three years ago and contributing to the well oiled machine that the board is at this moment.

So, we're again looking for members who'd like to take up a more active role inside our community. If you're interested in taking up some responsibilities, please let us know by sending me a DM using Twitter (@jopxtwits) or contact me by e-mail joris.poelmans@biwug.be . The members who sent us a response last time are automatically considered (if you aren't interested any more, please let us know).

Again, thanks to Patrick and Jethro and good luck in their new challenges.

Kind regards,

The BIWUG team

Tags van Technorati: ,,,

Thursday, August 07, 2014

Updating SharePoint 2013 user profiles using C#

I recently wrote a console program which needed to update some user profile properties for specific users – unfortunately the code gave me the following error when trying to create a SPServiceContext and accessing the UserProfileManager.

Microsoft.Office.Server.UserProfiles.UserProfileApplicationNotAvailableException was unhandled   HResult=-2146233088
Message=UserProfileApplicationNotAvailableException_Logging :: UserProfileApplicationProxy.ApplicationProperties ProfilePropertyCache does not have ….


To get this to work make sure that the user account under which you are running the console app has Full control on the User Profile Service Application – see screenshot below. Go to SharePoint Central Administration > Manage Service Applications > Select the User Profile Service Application and click Permissions in the ribbon.



Solve image rendering problems in Internet Explorer 11 on HP ZBooks

I recently got a new HP ZBook with Windows 8.1 and Internet Explorer 11 installed. While browsing to different websites (Yammer, Twitter, Google,etc …) I noticed that images were not properly rendering (see screenshot below).


To resolve this you have to change the Accelerated Graphics settings in Internet Explorer. Go tool Tools>Internet Options. Click the Advanced tab and then under Accelerated graphics make sure that you check the Use software rendering instead of GPU rendering.

Monday, August 04, 2014

Big Data – Beyond the hype, getting to the V that really matters

One of my favourite cartoons on big data by Tom Fishburne is shown below, and as the saying goes “Bad humor is an evasion of reality, good humor is an acceptance of it” it reveals an interesting fact about big data. Even though it currently sits a the top of Gartner’s hype cycle (See Gartner’s 2013 hype cycle for emerging technologies, august 2013), there is still a lot of confusion out there. So let’s first try to get some clarity on the concept.

There are a lot of definitions about Big Data but the one which is still widely used was coined by Gartner :
Big data is high Volume, high Velocity, and/or high Variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization (Gartner, The importance of Big Data: A definition, June 2012)

This 3V definition is actually not new, it was first coined by Doug Laney in Februari 2001 when he wrote about 3D Data management (See Deja VVVu: Others claiming Gartner’s Construct for Big Data ). In most definitions a fourth V (for veracity) is added and Gartner has recently released a report talking which goes one step further and which talks about the 12 dimensions of Big Data or Extreme Information Management (EIM). Let’s delve a little deeper into these 4 Vs.

Volume – constantly moving target
The size of data requiring to be processed seems to be a constantly moving target. Big data which was initially characterized as a few dozen TBs in a single dataset has now evolved to several PB and the volume seems to be increasing. The current idea is that data is characterized as big when the size of the data breaks barriers for traditional relational database management systems and when the volume prohibits processing the volume in a cost effective and fast enough manner.

There are a number of factors which are driving this tremendous growth. We currently live in an age where most information is “born digital”, it is created, by persons or a machine, specifically for digital use, key examples are email and text messaging, GPS location data, metadata associated with phone calls (so called CDRs or Call Detail Records),  data associated with most commercial transactions: credit card swipes, bard code reads, data associated with portal access (key cards or ID badges), toll-road access, traffic cameras, but also increasing data from cars, televisions, appliances – the so called “Internet of Things”. IDC estimated that there existed 2.8 zetabytes (ZB) – where one ZB = 1 billion TB -  of data in the world in 2012. 90% of it was created in the past 2 years (IDC Digital Universe study, 2012). 

In the near future, the amount of data will only increase with the majority of this increase being driven by machine generated data from sensors, RFID chips, NFC communications and other appliances. According to Cisco CTO Padmasree Warrior, we currently have 13 billion devices connected to the internet, this will increase to 50 billion in 2020 (See some predictions about the Internet of Things and wearable tech from Pew Research for more details)

Velocity
Data capture has become nearly instantaneous in this digital age thanks to new customer interaction  points and technologies such as web sites, social media, smartphone apps, etc… but we are also still capturing data from traditional sources such as  ATM data, point-of-sale devices and other transactional systems, etc … These kinds of rapid updates present new challenges to information systems. If you need to react in real-time to information traditional data processing technology simply will not suffice. Data is in most case only  valuable when it is processed in real time and acted upon.  Custom-tailored experiences like Amazon’s recommendation engine or personalized promotions  are the new norm.

Variety of data
Much of the data is “unstructured” meaning that it doesn’t fit neatly into the columns and rows of a standard relational database. The current estimate is that 85% of all data is unstructured.  Most social data is unstructured(such as book reviews on Amazon, blog comments, videos on YouTube, podcasts or tweets,…)  but also clickstream data, sensor data from cars, ships, RFID tags, smart energy meters,… are prime examples of unstructured data.

Connected devices that track your every heartbeat and know if you are sleeping or running hold the promise to usher in an era of personalized medicine.  The debate about whether the “Quantified self” is the holy grail of personalized medicine or just a hype is still ongoing.

Veracity
Veracity is all about the accuracy or “truth” of information being collected - since you will be unlocking and integrating data from external sources which you don’t control you will need to verify. Data quality and integrity of data are more important than ever. I will delve a little deeper into this topic in a future blog post.
As outlined in Big data opportunities in vertical industries (Gartner, 2013) – the challenge and also the opportunities differ by industry. But finally it is always about the value of the data



Value of data
Approaching Big Data as a data management challenge is very one-sided. It’s not really important to know how many PB of ZB of data your business has accumulated, the issue is how to get value out of the data.  The key here is analytics. Analytics is what makes big data come alive.  But the nature of big of data will require us to change the way that we need to analyze this data.  Traditional reporting and historical analytics will not suffice and are often not suited for big data. You will need to look at a predictive analytics, text analysis, data mining, machine learning, etc …

One of the most popular aspects of Big Data today is the realm of predictive analytics. This embraces a wide variety of techniques from statistics, modeling, machine learning and data mining, etc …. These tools can be used to analyze historical and current data and make reliable projections about future or otherwise unknown events. This means exploiting patterns within the data to identify anomalies or areas of unusualness. These anomalies can represent risks (e.g. fraud detection, propensity to churn,…) or business opportunities such as cross-sell and up-sell targets, credit scoring optimization or insurance underwriting.

Still a lot of challenges remain, according to the results of the Steria’s Business Intelligence Maturity Audit performed with 600 different companies in 20 different countries,  only 7% of European companies consider Big Data to be relevant. On the other hand we have McKinsey predicting a 600 billion USD estimated revenue shift by 2020 to companies that use Big Data effectively (Source: McKinsey, 2013, Game changes: five opportunities for US growth and renewal). In general, companies seem to struggle, 56% of companies say getting value out of big data is a challenge and 33% say they are challenged to integrate data across multiple sources.

Thursday, July 31, 2014

Microsoft Big Data– looking into the HDInsight Emulator

As outlined in a previous post Introducing Windows Azure HDInsight, HDInsight is a pay-as-you go solution for Hadoop-based big data processing running on Windows Azure. But you don’t even have to use Azure to develop and test your code. Microsoft has an HDInsight Emulator that runs locally on your machine and that simulates the HDInsight environment in Azure locally.

HDInsight emulator is installed through the Microsoft Web Platform Installer (WebPI) a 64-bit version of Windows (Windows 7 SP1, Windows Server 2008 R2 SP1, Windows 8 or Windows Server 2012). It also seems to install without issues on Windows 8.1.


After you have installed it you will see three shortcuts on your desktop: Hadoop command line, Hadoop Name Node status and Hadoop MapReduce Status. You should also see three folders created on your local disk (Hadoop, HadoopFeaturePackSetup and HadoopInstallFiles) as well as a number of new windows services being added. The HDInsight emulator is actually a single node HDInsight cluster including both name and data node and use local CPU for compute.



One of the sample Mapreduce jobs which is included is WordCount. To try it out we will first download the complete works of Shakespeare from project Gutenberg. The file is called pg100.txt and we copy it into the folder c:\hadoop\hadoop-1.1.0-SNAPSHOT – next just follow the steps as outlined in Run a word count MapReduce job. For most HDFS-file specific commands use hadoop fs –<command name>  where most of the commands are Linux system commands. For a full list of support Hadoop file system commands, type hadoop fs at the Hadoop command prompt, for a full explanation check out the Hadoop FS Shell Guide. The Hadoop command line shell can also run other Hadoop applications such Hive, Pig, HCatalog, Sqoop, Oozie,etc … To actually run the Wordcount sample job on the HDInsight emulator run:

hadoop jar hadoop-examples.jar wordcount /user/hdiuser/input /user/hdiuser/output


Once the job is submitted you can track its progress using MapReduce status webpage. When the job is finished, the results are typically stored in a fixed file named part_r_NNNNN where N is the file counter.


A second sample which is also included is estimating PI using Monte Carlo Simulation – check out this excellent video explaining the principle behind estimating PI using Monte Carlo Simulation. For an explanation of how to use this with an actual HDInsight cluster check out the Pi estimator Hadoop sample in HDInsight

hadoop jar hadoop-examples.jar pi "16","10000000"


References:
Tags van Technorati: ,,,,

Wednesday, July 30, 2014

Microsoft Big Data - Introducing Windows Azure HDInsight


Somebody once said - if you're going to stick around in this business, you have to have the ability to reinvent yourself, whether consciously or unconsciously.

This is the first in a series of blog posts about Big Data from a Microsoft perspective. I have always used my blog as a notebook as it helped me to get a clearer view on different topics. I hope that you stay with me in this journey in the exciting world of big data.

One of the challenges with working with big data is that the volume and the expected growth of data volume can be quite hard to predict. When starting with Big Data a cloud platform is an ideal way to start given its pay per use model and the flexible scalability model. Another thing to consider is the fact that Big Data technology evolves quite rapidly and cloud providers such as Microsoft will evolve along giving you the opportunity to work with the latest technology. So if you are just getting started and you have a Microsoft background Windows Azure HDInsight might be a good place to start. Also remember that if you have an MSDN account you are eligible for Azure monthly credits up to 150 USD.

Microsoft worked together with Hortonworks to build their Hadoop-based big data solution, the Hortonworks Data Platform (HDP). It exists in 3 different flavors:
  • HDInsight is an Apache Hadoop-based distribution running in the cloud on Azure. Apache Hadoop is an open source framework that supports data-intensive distributed applications. It uses HDFS storage to enable applications to work with 1000s of nodes and petabytes of data using a scale-out model.
  • Hortonworks Data Platform( HDP) for Windows is a complete installation package which can be installed on Windows Servers running on premise or on virtual machines running in the cloud.
  • Microsoft Analytics Platform System (formerly called Microsoft PDW)
In this post we will focus on Windows Azure HDInsight. HDInsight 3.1 which contains HDP 2.1 and is currently the default version for new Hadoop clusters (Always check Microsoft HDInsight release notes for the latest version info). At the core HDInsight is providing the HDFS/MapReduce software framework but related projects such as Pig, Hive, Oozie, Sqoop and Mahout are also included. 

HDFS (Hadoop File System) is a distributed file system designed to run on commodity hardware and is highly fault tolerant by nature. HDFS was developed by Doug Cutting and Mike Cafarella when they worked at Yahoo on the Nutch search project in 2005 and was inspired by the Google GFS white paper (See an interview with Doug Cutting, the founder of Hadoop (April 2014) and How Yahoo spawned Hadoop, the future of Big Data). In Hadoop, a cluster of servers stores the data using HDFS, each node in the cluster is a data node and contains a HDFS data store and execution engine. The cluster is managed by a server called the name node.



This distributed file system however poses some challenges for the processing of data and this is where the MapReduce paradigm  comes in which was also inspired by Google (MapReduce: Simplified Data Processing on Large Clusters, 2004). The term itself refers to the two basic computations in distributed computing, map (determining where the data is located in the different nodes and moving the work to these nodes) and reduce (bringing the intermediate results back together and computing them). These Mapreduce functions are typically written in Java, but you can use Apache streaming to plug in other languages




There are a number of advantages of using HDInsight:
  • You can quickly spin up a Hadoop cluster using the Azure Portal or using Windows PowerShell
  • You only pay for what you use. When your Hadoop processing jobs are complete, you can deprovision the the cluster and retain the data because Azure HDInsight uses the Azure Blob storage as the default file system which allows you to store data outside of the HDFS clusters.
  • Microsoft provides deep integration into the rest of their BI stack such as PowerPivot, Powerview, Excel, etc. …
  • HDInsight is exposed using familiar interfaces for Microsoft developers such as a .NET SDK (see for example Submit Hive jobs using HDInsight .NET SDK) and PowerShell
That’s it for now – in a next post I will delve a little deeper in how you can setup your first HDInsight cluster, the different architecture components and I will look into the HDInsight emulator.



References:

Tags van Technorati: ,,,,

Wednesday, June 25, 2014

Driving sustainable user adoption for SharePoint and Yammer – Part I

A while ago I did a presentation at the Future of Business is sharing event on user adoption of collaboration technologies within companies. A good user adoption strategy – this is not the same as change management – is key in getting a collaborative environment accepted within a company.

Although SharePoint has been hugely successful in the past few years – there is still a gap in satisfaction between IT pros and business managers – SharePoint met the expectations of  73% of the former, and of 62% of the latter (Source: Microsoft SharePoint faces tough future, Forrester says). In my opinion, this is because in most organizations, user adoption is just an afterthought. People often confuse user adoption with training, so a typical reaction is – “let’s send our end users to some training” – probably a technical training about site collections, versioning, web parts, etc… and what happens 5 weeks after the training they fall back in the old way of working.  Training is important, it allows people to make the first jump, but it has to be contextual, if you have build a solution around managing projects with SharePoint, make a specific training on the benefits that people will get when using SharePoint.

But user adoption really is about people getting to know your solution, understand it and use it in a correct manner. In the end the success of a deployment such as Office 365/Yammer or SharePoint Online is measured by sustained user adoption. Why the emphasis on “sustained”?  A study by the Altimeter Group about usage of social computing tools found that after an initial spike in enthusiasm and usage, you typically see a gradual decline in usage until only limited groups within your company are you using the solution. [Altimeter Group– making the business case for Enterprise Social Networks] – this applies to collaboration tools in general.


Getting beyond the early adopters of a technology solution, is not easy and in a perfect world – with only IT consultants (just kidding :-)) - everything would work magically. In a real world “Build and they will come” simply does not work. one of the main reasons is that people are fundamentally resistant to change, so you will need to put some effort in it to explain them why they need to change. If a new idea or way of work is initiated and self-sustaining, it will only survive if it gets adopted by a critical mass of users (typically you will need to at least >50% adoption)

The two most important things to focus on are Why and What?  Why do you need SharePoint (or Yammer, Yambla,…) – and what are the business problems you are going to solve or mitigate.  People know about file shares and they work with on a daily basis and then comes along this  this new product – SharePoint – but untill you can explain them how they need to use in their daily work and routines, they will not adopt it. Social, Yammer – why? What is the added value of using the SharePoint newsfeed or Yammer groups to someone in accounting – don’t push a certain feature if you can’t answer why people would need it and how they can use it.


It is interesting to see that even after you have deployed a file sharing and collaboration solution, people still send out e-mails with attachments instead of a link to the file. There are two killer applications in an enterprise, Excel and e-mail. And even with companies advocating zero-email (but struggling to actually make it happen), I don’t see e-mail disappearing anytime soon. So instead of banning it, embrace it and integrate it into your solution and use it in the more efficient way.
In a next post I will talk about how to leverage a user adoption team to make your collaboration platform deployment a success.
“Success starts with deployment, it does not end with deployment” – a it is necessary but not sufficient.

Monday, June 23, 2014

Ten indispensable tools for SharePoint 2013 developers

  1. CamlDesigner 2013 – provides you with a graphical inferface which allows you to   build CAML queries for single lists as well as queries that can be executed with SPSiteDataQuery. You can also get code snippets for the server-side object model, the .NET client-side object model, the JavaScript client-side object model and REST. This tool has been developed by Karine Bosch and Andy Van Steenbergen – two of the board members of BIWUG (Belgian SharePoint User group) 
  2. SharePoint Manager 2013 – is a SharePoint object model explorer. It enables you to browse every site on the local farm and view every property.
  3. ULSViewer  - there are other tools out there – check out SharePoint ULS log viewer tool comparison and verdict
  4. SharePoint 2013 Client Browser – requires no installation –simply unzip the exe – allows you to explore the SharePoint object hierarchy.
  5. Fiddler – is a web debugging tool which allows you to investigate all HTTP traffic (REST calls and XML or JSON responses). It also has some built in features to profile app performance and spot bottlenecks – also check out Fiddler PowerToy – Part 2 : HTTP performance
  6. SharePoint 2013 Search Query Tool v2  - can be used to query, test and debug search queries – for both SharePoint 2013 on-premise and SharePoint online search queries. I also use it for examing and tuning ranking of search results. See understanding item ranking in SharePoint 2013 search for more details
  7. SPCop Community Edition – this is a Visual Studio extesion which analyzes your SharePoint code which was created by Matthias Einig and which is based on the SharePoint Code Analysis Framework
  8. SPFastDeploy – Visual Studio extension which allows to push individual files for deployment in SharePoint apps without requiring you to do a full re-deploy every time there is a change. It makes developers a lot more productive using the SharePoint 2013 hosted app model. Excellent tool built by Steve Curran.
  9. SharePoint Color Palette Tool – the new SharePoint 2013 theme model (also called composable looks) allows you to brand your SharePoint 2013 environment in a new way. One of the key components of a composable look is a .spcolor file which defines the color elements. The color palette tool is  a free utility that enables you to develop spcolor files interactively
  10. REST client plugin for Google Chrome – Excellent tool for creating REST requests – some developers might however favour the REST Postman plugin for Google Chrome.

Thursday, June 05, 2014

Understanding item ranking in SharePoint 2013 search

SharePoint 2013 allows you to view the result of the ranking model by retrieving the rankdetail managed property. The rankdetail property will only be returned when there are less than 100 results in the returned search result set.

There is however an interesting tool hidden inside SharePoint which outlines the rankdetails calculation which I found in Explain rank in SharePoint 2013 search. SharePoint 2013 contains a built-in  application page called /_layouts/15/explainrank.aspx which accepts two mandatory parameters:
  • q – which contains the query
  • d – which specifies the path of the item for which you want to see the rankdetail
You can download a specific Explainrank search display template which will incorporate a link to the explainrank.aspx page with the required parameters. An alternative for this which will also work on Office 365/SharePoint Online is the Mavention Search Ranking app. My current favorite for the moment is the SharePoint 2013 Search Query Tool which also allows you to show rankdetails and rank calculation.




For a deep dive explanation of all the fine details of search ranking you should definitely take a look at customizing ranking models to improve search relevance in SharePoint 2013. On a very high level relevance is determined by two different types of parameters:
  • Dynamic ranking: ranking parameters linked to the search term which is being used
  • Static ranking: ranking parameters which are being applied independent of the search query which is being used. When you look at the standard ranking algorithm being used when doing a content search – the following static parameters are taken into account:
    • Clickdistance : number of clicks between an authoritative page and items in the index.
    • QueryLogClicks, QueryLogSkips and LastClicks : use click through behavior to see if results are considered relevant by users
    • EventRate : activity tracking of usage events (clicks or view) – items with high usage activity get a higher activity rank score than less popular items. This is activity on items in SharePoint outside of the search pages.
    • URLDepth : documents which have a longer URL are considered to be less relevant
    • InternalFileType – SharePoint 2013 prioritizes some files differently based on the file type – this is the current ranking (PPT,DOC,HTML,ListItem,Image,Message, XLS,TXT,XML). The most signifcant difference with 2010 is the fact that PowerPoint get a relatively higher weight in 2013 and Excel lower.
    • Language – some languages seem to be favoured – I’m still looking into the details of this
One of the surprises with SharePoint 2013 search is that it does not take into account the freshness of results – Mikael Svenson wrote an interesting post about this in  Adding freshness boost to SharePoint Online and this functionality is now also available in the SharePoint 2013 Search Query Tool



References:

The end of SharePoint autohosted apps

Up until a couple of weeks ago you basically had 3 options to develop a SharePoint app as shown in the figure below – one of the options – Autohosted apps was however available in preview mode and therefore not recommended for production use. Mid May Microsoft pulled the plug out of this one – see Update on Autohosted Apps Preview Program



As outlined by Andrew - Update on Autohosted Apps Preview Program – this is probably a good thing because although the deployment model was quite flexible you were never sure what were the actual limitations with regards to allocated CPU time,data out,storage and memory usage. So for now, we have to wait and see what enhancements become available for provider hosted SharePoint apps to make it as flexible in deployment as autohosted apps but while still providing you with sufficient control and troubleshooting capabilities.




References: