Thursday, April 02, 2015

Big Data and Internet of Things (IOT) links

 

Just a quick roundup of some interesting links to articles, whitepapers and videos on Big Data and IoT. I would be amazed if you haven’t heard from Big Data – but still you might still take a look at these introductory blog posts which mainly cover Big Data from a Microsoft perspective.

Other Big Data and Internet of Things (IOT) links:

Tuesday, March 31, 2015

Overview of Apache Hadoop components in HDInsight, from Ambari to Zookeeper

A couple of months ago I wrote a first post about Microsoft Big Data – Introducing Windows Azure HDInsight. In this post I will delve a little deeper into the different components which are used in HDInsight. This is not an exhaustive list of components but it lists a number of components which you might encounter when working on your first big data project using Microsoft Azure HDInsight.


  • Ambari – provides provisioning, monitoring and management layer on top of Apache Hadoop clusters. It provides a web interface for easy management as well as a REST  API.
  • Flume – allows you to collect, aggregate and move large volumes of streaming data into HDFS in a fault tolerant fashion.
  • HBase – provides NoSQL database functionality on top of HDFS. It is a columnar store, which provides fast access to large quantities of data. HBase tables can have billions of rows and these rows can have almost unlimited number of columns.
  • HCatalog – provides a tabular abstraction on top of HDFS. Pig, Hive and Mapreduce use this layer to make it easier to work with files in Hadoop. HCatalog has been merged into the Hive project. Hive uses it kind of a like a master database. For more details check out Apache HCatalog – a  table management layer that exposes Hive metadata to other Hadoop applications.
  • Hive – allows you to perform data warehouse operations using HiveQL. HiveQL is a SQL like language and provides an abstraction layer on top of MapReduce. Hive allows you to use Hive tables to project a schema onto the data (schema on read). Through the use of HiveQL you can view your data as a table and create queries just as you would in a normal database with support for selects, filters, group by, equi-joins, etc…. Hive inherits schema and location information from HCatalog.  Hive will act as a bridge to many BI products which expect tabular data. One of the recent developments around Hive is the Stinger initiative – its main aim is to deliver performance improvements while keeping SQL compatibility
  • Kafka – is a fast, scalable, durable and fault-tolerant messaging system. It is commonly used together with Storm and HBase for stream processing, website activity tracking, metrics collection and monitoring or log aggregation. It is provides similar functionality as AMQP, JMS or Azure Event Hub
  • Mahout – the goal of Mahout is build scalable machine learning libraries. The main machine learning use cases Apache Mahout support are recommender systems (people who buy x also buy y), classification (assigning data to discrete categories e.g. is a credit card transaction fraudelent or not) and clustering (grouping unstructured data without any training data). For more details take a look at Introducing Mahout (IBM)
  • Oozie – enables you to create repeatable, dynamic workflows for tasks to be performed in a Hadoop cluster. An Oozie workflow can include Sqoop transfers, Hive jobs, HDFS commands, Mapreduce jobs, etc … Oozie will submit the jobs but Mapreduce will execute them.  Oozie also has built-in callback and pollback mechanisms to check for the status of jobs
  • Pegasus provides large scale graph mining capabilities by offering important graph mining algorithms such as degree calculation, pagerank calculation, random walk with restart (RWR), etc .. Most graph mining algorithms have limited scalability, they support up to millions of nodes. Pegasus billion-node graphs. Graphs (also referred to as networks) are everywhere in real life going from web pages, social networks, biological networks and many more… Finding patterns, rules etc within these networks allow you to rank web pages (or documents), measure viral marketing, discover disease patterns, etc … The details of Pegasus can be found in the white paper  Pegasus: a peta-scale graph mining system – implementation and observations.
  • Pig is developed to make data analysis on Hadoop easier. It is made up of two components: a high level scripting language (which is called Pig Latin but most people just reference it as Pig) and an execution environment. Pig Latin is a procedural language which allows you to build data flows, it contains a number of built in User Defined Functions (UDFs) to manipulate data. These UDFs allow you to ingest data from files, streams or other sources, make selections and transform the data. Finally Pig will store the results back into HDFS.  Pig scripts are translated into a series of MapReduce jobs that are run on Apache Hadoop. Users can create their own functions or invoke code in other languages such as JRuby, Jython and Java. Pig will gives you more control and optimization over the flow of the data than Hive does.
  • RHadoop – is a collection of R packages that allow users to manage and analyze data with Hadoop in R, including the creation of map-reduce jobs. Check out Step-by-step guide to setting up an R-Hadoop system and Using RHadoop to predict website visitors to get started with some hands-on examples.
  • Storm – distributed real-time computation system, it supports a set of common stream analytics operations, provides guaranteed message processing with support for transactions. It was originally created by Nathan Marz (see History of Apache Storm and lessons learned) – the guy who cam up with the term Lambda architecture for a generic, scalable and fault tolerant data processing architecture.
  • SQOOP – was built to transfer data from relational structured data stores (such as SQL Server, MySQL or Oracle) to Apache Hadoop and vice versa. Because Sqoop can handle database metadata, it is able to perform type-safe data movement using the data types specified in the metadata.
  • Zookeeper – manages and store configuration information. It is responsible for managing and mediating conflicting updates across your Hadoop cluster.

Thursday, March 26, 2015

People insights– data driven insights regarding people

Whereas marketing and sales as well as financial departments have been using advanced analytics for quite a while, it seems that HR is still in one of the early maturity phases of analytics usage. This  is a view which seemed to be shared by CEOs. In a recent study CEOs gave their HR department a 5.9 (out of 10) for their analytical skills.  (See CEO niet overtuigd van analytische skills HR )

Whereas HR controls a lot of data (and needs to keep it up to date) it does not seem to be able to use this data to provide strategic advise to the board of directors. HR can only deliver truly added value by providing data-driven insights regarding people that are both compelling to business leaders and actionable by HR. This is a view which is also quite nicely outlined by consultancy firm Inostix in their HR Analytics Value Pyramid (See The HR Analytics Value Pyramid (Part 3) ). To make sure that HR team stays current and viable, they will need to adopt a whole need set of skills of which analytics is just one (See The reskilled HR team – transform HR professionals into skilled business consultants  and the capability gap across the 2015 Human Capital Trends)

In a number of upcoming posts I will delve a little deeper into this topic and will show some practical examples of how you can realize some quick wins without a huge upfront investment.

Related links:

SharePoint Saturday 2015 : How to build your own Delve, combining machine learning, big data and SharePoint

BIWUG is organizing the fifth edition of SharePoint Saturday Belgium – this year in Antwerp – for more information check out the site http://www.spsevents.org/city/Antwerp/Antwerp2015/ . Here is the excerpt of the session I will be delivering.

How to build your own Delve: combining machine learning, big data and SharePoint

You are experiencing the benefits of machine learning everyday through product recommendations on Amazon & Bol.com, credit card fraud prevention, etc… So how can we leverage machine learning together with SharePoint and Yammer. We will first look into the fundamentals of machine learning and big data solutions and next we will explore how we can combine tools such as Windows Azure HDInsight, R, Azure Machine Learning to extend and support collaboration and content management scenarios within your organization.

Related posts:

Wednesday, March 04, 2015

BIWUG session on advanced integration between SharePoint Online and Yammer

On the 19th of March BIWUG (www.biwug.be) is organizing its next session – don’t forget to register for BIWUG1903 – we have planned a great speaker and an interesting session

Advanced integration between SharePoint Online and Yammer using Yammer Apps (Speaker: Stephane Eyskens, SharePoint Technical Architect - http://www.silver-it.com/ )

First things first, the session will start describing what are the required steps to bind an Office 365 Tenant with an Enteprise Domain, how to federate on-premises users with Office 365 in order to have a SSO in place and how to bind Yammer to the Office 365 Tenant. Next, developers will learn how to leverage the Yammer App Model in order to build deeper integration between SPO(+on-prem) and Yammer. Business scenarios such as leveraging Yammer's Open Graph in SPO Workflows and associating Yammer Groups to SPO Team sites (& groups) will be covered. Security aspects will be discussed as well : from acting on behalf of a user with his consent to impersonating it completely, we'll see how to manage tokens and discuss some best practices.

Intended audience: The session is primarily intended for developers.

Key benefits: After this session, developers should have a good visibility on how to go beyond the OOTB Yammer App integration with
SPO and what Open Graph is all about.

Also thanks to Xylos for hosting this session

Monday, March 02, 2015

Resetting content index in SharePoint Server 2013: why and how

When you are developing against SharePoint Server 2013 search, you might forced to reset the search index. You can do this using the SharePoint user interface through the screen shown below or using PowerShell. I prefer to use PowerShell since resetting through the user interface seems to give me timeouts especially when the index is a quite large. One of the reasons why you are required to reset your content index is when your Search Service Application got into an unhealthy state because of insufficient disk space (See Fixing the Search Service after the Index Drive fills) but I also noticed that when you are working on your development machine and are making lots of changes to the search schema – it might also be useful to reset the search index for your changes to be picked up. If you want to change it using the user interface go to the Search Administration screen of the Search Service Application and select the “Index Reset” option underneath the crawling section of the left menu.



Don’t just reset your search index in a production environment since this will also impact the analytics processing component (Read Reset the index in SharePoint Server 2013). Listed below is the syntax for the PowerShell command (the snippet below assumes that you only have one SearchServiceApplication)

(Get-SPEnterpriseSearchServiceApplication).Reset($true,$true)

The SearchServiceApplication.Reset method takes two parameters -  public void Reset(    bool disableAlerts,   bool ignoreUnreachableServer) – I would recommend always setting disableAlerts to true if necessary. The value for the second parameter will depend on your specific case. If you also get a timeout when using the PowerShell cmdlet – you can use the steps outlined in SharePoint 2013 Content Index Reset Timeout – they worked for me.

Friday, February 13, 2015

Mindful apps – putting people at the center supported by data

When preparing for my session The future of business process apps – a Microsoft perspective  last year I got inspired by this great article The future of enterprise apps: moving beyond workflows to mindflows – which introduced the concept of mindful apps. The core message is that if we want to automate the last mile we have to analyze how people work day in and day out and start our system/application design with people at the center. One of the quotes which is mentioned in the article is from Bill Murphy (CTO of Blackstone one of the largest investment funds worldwide) – “We aim to take away as much of the stress as possible from easy stuff, by automating the routine and mundane actions, and give users more time to focus on the higher-end pieces of what they need to do.”


Most of the characteristics which are outlined in the comparison between traditional and mindful apps are not revolutionary (See table above) but there is one one important key message.
Mindful apps will allow us to assess and compare options in decision context, they will allow us to quickly respond to events and make the best decision given a specific context and will provide us with “extended intelligence” by understanding and recognizing patterns within the data at hand. We as humans are good at problem solving, pattern recognition, identifying outliers, making creative leaps and incorporating new information when making decisions. We should be able to focus on these high end tasks by being freed from laborious and menial tasks which can be automated.




There are 3 different trends which will impact how these mindful apps will be shaped:
  • User context matters – make it personal. When we make decisions or work within the context of specific processes, there are a lot of parameters which determine how we react or how we make decisions – these parameters should be integrated into the decision framework driving mindful apps. Our calendar, availability of colleagues to reach out to, input from communications (using e-mail, messaging or other formats), information that we capture from blogs, social networks such as LinkedIn or open data sources together with available information within your organization should be filtered and at your fingertips. Machine learning and cognitive algorithms will drive the second machine age (a term coined by Brynolfson from MIT) but we are only at the start of how these algorithms can drive the future workplace for information workers.
  • Mobile shapes our expectations.  Mobile apps and the user experience they provide is shaping at how we see an ideal enterprise application as well. Mindful apps should strive to combine beauty, simplicity and purpose to create an experience that delights us and that is effortless to use. Mobile apps are easy to understand, when people use a good app for the first time, they intuitively grasp the most important features, why can’t we do the same for enterprise apps. Simplicity rules. The apps should also incorporate necessary logic to evolve as the user grows more comfortable with its use and is exploring more advanced functionality. Apps should learn people’s preferences over time and show the interface which is best suited for the task at hand.
  • (Big) data and advanced analytics are the driving force. There is a lot of hype and confusion around the term Big Data but one thing is for sure – storage costs and processing cost have dropped significantly in the last decade. When you combine this with the rise of new storage platforms such as Hadoop, NoSQL datastores  such as HBase, Cassandra, etc … and new data processing frameworks such as Apache Drill, Dremel, Spark, etc..  new opportunities arise to support users in their decision making processes. While there is a lot of emphasis on the 4 Vs (Volume, Velocity, Variety and Veracity) – there is one more V that you have to think about that is Value (Also see  Big Data beyond the hype, getting to the V that really matters)
  • Cloud will lead the way.  A lot of the innovation which will enable this next generation of apps is coming out of the datacenters of Google, Amazon, LinkedIn, Microsoft, Yahoo, etc… but most organizations don’t have the available capacity (nor the same financial resources) as these internet giants. Luckily the economies of scales which are offered by the cloud allows solution providers to provide you with a data infrastructure which can scale from prototype size to production environments able to handle huge amounts of data. The different major cloud players – IBM, Microsoft, Amazon and Google all seem to make big bets in building out the data analytics platform of the future and this competition will drive prices further down. This competition  will also force them to focus on more innovative solutions which allow them to differentiate from the competition.
The best examples where we – as a consumer - see the power of Big Data, Analytics, Machine Learning and the cloud appear is mobile. The three major players (Microsoft, Apple and Google) are relying quite heavily on the cloud computing power and huge data stores to provide the experience of digital assistants. Microsoft is currently working on Cortana (which has been released in a number of countries worldwide), Apple was definitely the trendsetter with Siri and Google has Google Now.




The future is already here — it's just not very evenly distributed. (William Gibson)



Thursday, February 05, 2015

Microsoft Azure Machine Learning–the power to predict

Microsoft Azure Machine Learning provides Machine Learning as a Service (on Microsoft Azure) and allows you to make your own applications more intelligent. Microsoft Azure Machine Learning was initially started as as an incubation project in Microsoft Research (codename Passau) and is part of the overall Microsoft Data Platform.
The best definition for Machine Learning – in my opinion – is from the excellent book “Introduction to Machine Learning (MIT Press 2014, Ethem Alpaydin)” (Use it as a reference – this is not an easy “how to” book)
The goal of machine learning is to program computers to use example data or past experience to solve a given problem.
In general when we want to solve a problem on a computer, we need an algorithm to transform using a set of instruction into an output. Unfortunately for some problems we do not know how to program such algorithms – such as for e-mail spam detection or predicting customer behavior. In most cases we have the input and output available e.g. a set of e-mails for which some are marked as spam. Based on this data, we would like a computer (or machine) to automatically extract the algorithm necessary to perform the classification. The algorithm does not need to be perfect but needs to be a good and useful approximation.
The term machine learning is tightly coupled to the domain of analytics (or data science – see Data Scientist: the sexiest job of the 21st century ). Analytics is concerned with the discovery and extraction of useful business patterns or mathematical decision models from a specific data set. For this a number techniques can be used, depending on the practitioners background they will probably favor a technique from their respective domain:
  • Regression, General Linear Models (GLMS), decision trees, etc … (originated out of the statistics domain)
  • Machine learning algorithms such as support vector machines, neural networks, Bayesian methods, … (originated out of the computer science domain)
If we focus specifically on machine learning we make a distinction between supervised learning where we try to find a mapping between a set of input variables and a specific output variable using a set of values to train a specific model and unsupervised learning where we try to find patterns in the input data.

But why should you care about machine learning? I think the picture below shows you how the focus is shifting from traditional reporting (hindsight) to more advanced predictive and prescriptive analytics (foresight) which will provide business with more added value but also requires business intelligence specialist new competencies such as machine learning and data mining. Examples across industries vary but in general predictive analytics has the potential to change the way how businesses make decisions (I will take a look a more in depth definitely pick up Predictive Analytics – The power to predict who will click, buy, lie or die from Eric Siegel)



Microsoft Azure Machine Learning distinguishes itself from other platforms and tools by a number of different characteristics:
  • Allows you to jointly build predictive models from anywhere in the world using only a web browser by making use of visual composition canvas  (called Machine Learning Studio) using modules without requiring you to write code (although you can use R code snippets if you want). You can start quickly from existing sample experiments/models or you can share your own data experiments.
  • Collaborative work together with anyone from anywhere using just your browser.
  • Available as a cloud service, eliminating upfront costs fro hardware resources.
  • The different modules allow you to author an end-to-end machine learning workflow starting with reading data, to training and validating your predictive model.
  • Ability to deploy models as web services. You can quickly operationalize your models by converting them into web services and you even the ability to monetize your machine learning models using Azure Data Market.
The start location is the Microsoft Azure Machine Learning homepage - https://studio.azureml.net/ which contains a number of user guides as well as training videos - http://azure.microsoft.com/en-us/documentation/videos/index/?services=machine-learning . Another great way to get started is by looking at the different Azure Machine Learning Samples - http://azure.microsoft.com/en-us/documentation/services/machine-learning/models/ such as Azure Machine Learning Sample: Credit risk prediction (predict whether an applicant is a good credit risk based on the German Credit Card UCI dataset) and a clustering algorithm to identify similar companies from companies in the S&P 500, using text in published Wikipedia articles for  these companies.




References:

Wednesday, January 21, 2015

SharePoint Saturday Belgium 2015– Call for speakers

On April 18th 2015 BIWUG (www.biwug.be) is organizing its fifth edition of SharePoint Saturday Belgium. We invite you to submit a session  for this year's SharePoint Saturday Belgium using this link  - http://www.spsevents.org/city/Antwerp/Antwerp2015 . It is possible to submit multiple sessions. We will close the call for speakers on February 18th EOD.

SharePoint Saturday Belgium 2015 will take place in Antwerp – for more details check out http://www.spsevents.org/city/Antwerp/Antwerp2015.  If you have any questions or remarks, do not hesitate to contact me.


 

Monday, December 15, 2014

Introducing Azure Stream Analytics

Azure Stream Analytics  which is currently in preview is a fully managed real-time stream analytics service that aims at providing highly resilient, low latency, and scalable complex event processing of streaming data for scenarios such as Internet of Things, command and control (of devices) and real-time Business Intelligence on streaming data.

Although it might look similar to Amazon Kinesis, it seems to distinguish itself by aiming to increase developer productivity by enabling you to author streaming jobs using a SQL-like language to specify necessary transformations and it provides a range of operators which are quite useful to define time-based operations such as windowed aggregations (Check out Stream Analytics Query Language Reference for more information) – listed below is an example taken from the documentation which finds all toll booths which have served more than 3 vehicles in the last 5 minutes (See Sliding Window – slides by an epsilon and produces output at the occurrence of an event)

SELECT DateAdd(minute,-5,System.TimeStamp) AS WinStartTime, System.TimeStamp AS WinEndTime, TollId, COUNT(*) 
FROM Input TIMESTAMP BY EntryTime
GROUP BY TollId, SlidingWindow(minute, 5)
HAVING COUNT(*) > 3

This SQL like language allows for non-developers to built stream processing solutions through the Azure Portal and allows to easily filter, project, aggregate and join streams, add static data (master data) with streaming data and detect patterns within the data streams without developer intervention.

 

Azure Stream Analytics leverages cloud elasticity to scale up or scale down the number of resources on demand thereby providing a distributed, scale out architecture with very low startup costs. You will only pay for the resources you use and have the ability to add resources as needed. Pricing is calculated based on the volume of data processed by the streaming job (in GB) and the number of Streaming Units that you are using. Streaming Units provide the scale out mechanism for Azure Stream Analytics and provide a maximum throughput of 1MB/sec. Pricing starts as low as €0.0004/GB and €0.012/hr per streaming unit (roughly equivalent to less than 10€/month). It also integrates seamlessly with other services such as Azure Event Hub, Azure Machine Learning, Azure Storage and Azure SQL databases.

References



Thursday, December 11, 2014

SharePoint deep dive exploration: SharePoint alerting

This is the second in a series of blogpost on SharePoint Server 2013 in which we will explorer how e-mail alerting works in SharePoint 2013. For part 1 – take a look at SharePoint deep dive exploration: looking into the SharePoint UserInfo table.

If you need to know more about how alerts are working at the lowest level you should take a look at SharePoint 2003 Database tables documentation – for alerts this documentation still seems to be valid. SharePoint stores the list of events for which users have request alerts in the EventCache Table – in comparison to SharePoint 2003 there are some extra fields available (marked in bold). For some of the fields I did not find a

The other tables which are manipulated by the the SharePoint alert framework are EventLog, ImmedSubscriptions, SchedSubscriptions and EventSubsMatches (For an in depth discussion also take a look at the Working with search alerts in SharePoint 2010 ).  Every event is recorded in these table but since the EventType and EventData column will contain the most data, these are only filled in when the list has at least one subscription.

So how does this works – there actually is a SharePoint timer job – called the “Immediated Alerts” job which is scheduled to run every 5 minutes. This will pick up the necessary event information and will process it (in batches of 10.000) – if you see issues with alerts not being sent out – I recommend you to take a look at SharePoint Scheduled Alerts Deconstructed

Column Name Description
EventTime Time when the event record was inserted into the database
SiteId ID of the site, available from the AllSites table
WebId ID of the web, available from the AllWebs table
ListId ID of the list in which the monitored item appears
ItemId ID of the item that raised the event
DocId ID of the document that raised the event
Guid0 ?
Int0 ?
Int1 ?
ContentTypeId ?
ItemName Full name of the item
ItemFullUrl Full path of the item
EventType ItemAdded(1), Item Modified (2), Item Deleted (4), DiscussionAdded (16), Discussion Modified(32), Discussion Deleted(64), Discussion Closed (128), Discussion Activated (256), …
ObjectType  
ModifiedBy User name of the person who raised the event
TimeLastModified Time when the event occurred
EventData The binary large object (BLOB) containing all of the field changes with the old and new values for an item
ACL The ACL for the item at the time it is edited
DocClientId  
CorrelationId  

The reason why I started looking into these tables because I got feedback from a client that all e-mail alerts which were being sent out had the wrong link in it after we migrated their environment from SharePoint 2007 to 2013. One of the first things that I did was actually sit next to the user who was adding documents in SharePoint and then I noticed something strange. The user uploaded a document and when they needed to fill in extra metadata, they immediately changed the name of the document.

After looking into how alerting works I still did not get an explanation for why the links were sent out correctly before in 2007 – because this should have failed as well. So I used this PowerShell script to create an export of all the e-mail alerts/subscriptions that users had in SharePoint and I noticed that most of the alerts were on just a couple of libraries and then I found it.

In SharePoint 2007, they had a “require check out” set by default on these libraries – this means that when the user uploaded and renamed the document, it was not yet visible to other users and the alert was not send out. If checkout is not required then the files are immediately visible and the “New Item Added” immediate alerts is fired – this was the behavior that they were seeing in 2013.

So the “require checkout” is an interesting workaround to prevent a file from being visible before it is explicitly checked in. Since they were changing the file properties (and even the filename) before the file is visible to users, the New Item alerts would not trigger and users would only be notified of the “Changed Item” alert when the file was checked in.

The reason why we deactivated “require check out” was because of it would conflict with co-authoring but apparently they would never use this feature for these specific libraries for which these alerts were set. So the morale of the story, don’t just activate or change a specific functionality because it is available in a new version but first look at how people are actually using it.

References:

 

BIWUG on blueprint for large scale SharePoint projects and display templates

 

On the 16th of December BIWUG (www.biwug.be) is organizing its next session – don’t forget to register for BIWUG1612 because there are some great sessions planned.

SharePoint Factory: a blueprint for large scale SharePoint projects (Speaker: Lana Khoury, Senior EIM Consultant at CGI Belgium responsible for CGI Belgium’s Microsoft Competency Centre, and the Digital Transformation Portfolio)

Large Notes 2 SharePoint transformations do require a standardized approach in development and project management in order to assure the delivery in time and quality.The SharePoint Factory has been developed, to allow parallel development of applications and support all stages of the development process by having standardized quality gates, test procedures and templates for example requirements analysis templates. Essentially, the SharePoint Factory can be compared to an assembly line in the automotive industry.This approach is combined with a SharePoint PM as a Service offering which is a blueprint for the Management of Large Scale SharePoint projects and does provide a specific PM Process with SharePoint centric artefacts, checklists and documents. The approach has been developed within a 6.500 person day Project in Germany and has already been published to German .net Magazin, SharePoint Kompendium and Dutch DIWUG Magazine.

Take your display template skills to the next level (Speaker: Elio Struyf, senior SharePoint consultant at Ventigrate -  http://www.eliostruyf.com/)

Once you know how search display templates work and how they can be created. It is rather easy to enhance the overall experience of your sites compared with previous versions of SharePoint. In this session I will take you to the next level of display templates, where you will learn to add grouping, sorting, loading more results, and more. This session focuses on people that already have the basic understanding of what search display templates are, and how they can be created.

18:00 - 18:30 ... Welcome and snack

18:30 - 19:30 ... SharePoint factory: a blueprint for large scale SharePoint projects (Speaker: Lana Khoury)

19:30 - 19:45 ... Break

19:45 - 20:45 ... Take your display template skills to the next level ( Speaker: Elio Struyf )

20:45 - …      ... SharePint!

 

Tags van Technorati: ,,,

Tuesday, November 25, 2014

Get early access to new features in Office 365 and provide feedback with Uservoice

Microsoft is offering a new First Release program. If you opt-in to join this new First Release program, you get to test the new features for Office 365, SharePoint Online, and Exchange Online a couple weeks before they roll out to everyone else. To activate it go to Office 365 Admin Center > Service Settings > Updates. You will get a warning stating that activation of new features might take up to 24 hours to complete – so be patient.



The last couple of months a couple interesting new functional modules such as Delve, Yammer Groups and the new App launcher have been pre-released on Office 365 for which some might only become visible after you have activated first release. Remember that there also is an Office 365 for business public roadmap available (at office.com/roadmap) where you can see which functionality is being rolled out and which is under development. For more information check out the links below.

Also remember that you can always use the Office Developer Platform Uservoice (http://officespdev.uservoice.com/) to give feedback and request changes. You can submit your feedback for a specific change and encourage others who you know to support these changes by voting for them. If you want to give feedback with regards to InfoPath – there is a Microsoft Office Forms vNext User Voice (http://officeforms.uservoice.com/) as well.

References
Tags van Technorati: ,,,,

Tuesday, November 18, 2014

Understanding Azure Event Hubs–ingesting data at scale

Azure Event Hubs are an extension on the existing Azure Service Bus which provides hyper-scalable stream ingestion capabilities. It allows different producers (devices & sensors – possibly in the 10 thousands) to send continuous streams of data without interruption. There are a number of different scenario in which you typically see this kind of streaming data from different sensors such as future oriented scenarios such as connected cars, smart cities but also more common scenarios such as application telemetry or industrial automation.

Event hubs scaling is defined by Throughput Units (TUs) which is kind of like a pre-allocation of resources. A single TU is able to handle up to 2 MB/s for writes or 1000 events per second and 2MB/s for read operations. Load in the Event Hub is determined by creation of partitions, these partitions allow for parallel processing both from the consumer and producer side. Next to support for common messaging scenarios, competing consumers, it allows provide data retention policies up to 84 GB of event storage per day. The current release supports up to 32 partitions but you can log a call to increase this up to a 1000 partitions. Since a partition is allocated at most 1 TU, this would allow for 1GB/s data ingest per Event Hub. Messages can be send to an Event Hub publisher endpoint via HTTPS or AMQP 1.0, consumers can retrieve messages using AMQP 1.0

Building such an application architecture is quite challenging and Event Hubs allows you to leverage the elasticity of the cloud and a pay per use model to get started quite rapidly. Whereas current scaling of this type of systems is oriented at 10s of thousands of units, expectations are that this number will increase quite rapidly. Gartner expects the number of installed IoT units to increase up to 26 billion by 2020, other estimates are event pointing at 40 billion IoT units (Internet of Things by the Numbers: estimates and forecasts)

References:

Monday, November 17, 2014

Webinar: What’s new on the Microsoft Azure Data Platform

On Thursday 20th of November I will be delivering a webinar on the new capabilities in the Microsoft Azure Data Platform.  With the recent addition of three new services - Azure Stream Analytics, Azure Data Factory and Azure Event Hubs - Microsoft is making progress in building the best cloud platform for both big data solutions as well as enabling the Internet of Things (IoT). These additions will allow you to process, manage and orchestrate data from Internet of Things (IoT) devices and sensors and turn this data into valuable insights for your business.

The above mentioned new services extend Microsoft's existing big data offering based on HDInsight and Azure Machine Learning. HDInsight is Microsoft's offering of Hadoop functionality on Microsoft Azure. It simplifies the setup and configuration of Hadoop cluster by offering it as an elastic service. Azure Machine Learning is a new Microsoft Azure-based tool that helps organization build predictive models using built in machine learning algorithms all from a web console.

In this webinar I will show what are the key capabilities of these different components, how they fit together and how you can leverage them in your own solutions.

Register for this free webinar “What’s new on the Microsoft Azure Data Platform” and get up to speed in less than one hour.

Wednesday, November 12, 2014

BIWUG on apps for SharePoint Server 2010 and data driven collaboration

 

On the 26th of November BIWUG is organizing our next session – don’t forget to register for BIWUG2611 because there are some great sessions planned.

Writing apps on SharePoint Server 2010 (Speaker: Akshay Koul, SharePoint CoOrdinator at Self, http://www.akshaykoul.com )

The session is geared towards developers/advanced users and explains how you can write enterprise level applications on SharePoint 2010 without any server side code.  We will go through real life applications and discuss the mechanisms used, the provisioning process, debugging techniques as well as best practices. The application written are fully compatible with Office 365/SharePoint Online and SharePoint Server 2013.

Preparing for the upcoming (r)evolution from User Adoption to Data-Driven Collaboration (Speaker: Peter Van Hees, MVP Office 365/Collaboration architect, http://petervanhees.com )

As Consultants we (try to) listen to our customer, (try to) address the requirements ... and finally (try to) deploy the solution. This seems like an easy job, but in reality Collaboration projects - and especially SharePoint or Yammer implementations - are a little more challenging. The fast adoption of cloud computing has introduced a new currency for license-based software: User Engagement. If you can’t engage your users, your revenue stream will start to spiral downwards. It should be obvious that Office 365 (and all of its individual components) are not exempt. We all need to focus on the post deployment!

This story bears its roots in my hands-on experience while trying to launch Yammer initiatives. It seems that everyone agrees that Yammer is a wonderful and viral service ... yet, the conversations seems to flat line in most organizations. We will review how you should (already) be addressing User Adoption now; but, more importantly, we will spend more time to look into the stars … a future where Data-Driven Collaboration will take User Engagement to the next level. This isn't a story about Delve. It's about ensuring you integrate data in all your projects to prepare for the future. The age of smart software …

18:00 - 18:30 ... Welcome and snack

18:30 - 19:30 ... Writing apps on SharePoint Server 2010 (Speaker: Askhay Koul)

19:30 - 19:45 ... Break

19:45 - 20:45 ... Preparing for the upcoming (r)evolution from User Adoption to Data-Driven Collaboration( Speaker: Peter Van Hees )

20:45 - …      ... SharePint!

Tuesday, October 14, 2014

Getting Virtualbox to work on Windows 8.1

Quick tip for those of you who want to try install Virtualbox on Windows 8.1 – use one of the older versions  -  VirtualBox-4.3.12-93733-Win.exe  worked for me (download locationDownload Virtualbox Old builds). More recent versions seem to crash when you try to start a virtual image – see screenshot below.




If you are already using Hyper-V you will also need to create a dual boot since Virtualbox is not compatible with Hyper-V. You can do this using the commands listed below from an administrative command prompt (As outlined in this blog post from Scot Hanselman – Switch easily between VirtualBox and Hyper_V with a BCDEdit boot entry in Windows  8.1 )
C:\>bcdedit /copy {current} /d "No Hyper-V" 
The entry was successfully copied to {ff-23-113-824e-5c5144ea}. 

C:\>bcdedit /set {ff-23-113-824e-5c5144ea} hypervisorlaunchtype off 
The operation completed successfully.

When booting you will be provided with an option to boot with Hyper-V support or without Hyper-V support.


Tags van Technorati: ,,,

Tuesday, September 23, 2014

SharePoint deep dive exploration: looking into the SharePoint UserInfo table

A couple of months ago we encountered some issues after upgrading a SharePoint 2007 environment to SharePoint 2013 using a migration tool. One of the symptoms was that information about users was incorrectly displayed. This led us into the looking into how SharePoint stores user information inside its databases.

The user information which is being displayed in a created by or a modified by field in SharePoint it is not being directly retrieved from Active Directory but it is retrieved from an internal SharePoint table called the UserInfo table.


All users in Active Directory are not immediately added to this table. When a user is explicitly added to a site collection using security settings, it is added to the UserInfo table. Another way that user info is created in this table, is when a user  is granted access through an Active Directory group and the user visits the site for the first time.

Users which are deleted from  a site collection, will still be found in the UserInfo table but with a flag bDeleted set to True (1).  When the people picker queries the UserInfo table, it will not include user with bDeleted set to 1. The All people page ( /_layouts/people.aspx?Membershipgroupid=0) will also only list users where bDeleted equals 0. This also means that even when people leave your organization and their Active Directory account is disabled (or removed), the Created By and Modified By columns will still display the name of the user. The general recommendation is to leave this mechanism as it was designed but there are border case scenarios in which you want to delete users – if so you can take a look at Delete users and clean up user information list in SharePoint

There also are two different timer jobs which synchronize information from the User Profile Service Application to all site collections:
  • User Profile to SharePoint Quick Synchronization – runs default every 5 minutes – synchronizes information for users recently added to a site collection
  • User Profile to SharePoint Full Synchronization – runs default every hour -
(For a full list of all out of the box timer jobs in SharePoint Server 2013 check out SharePoint Server 2013 – Timer Job reference) These job will only synchronize users where the tp_IsActive Flag is true(1) in the UserInfo table. The reasons for this is performance since synchronizing all users would be quite resource intensive. tp_IsActive is set to true when a user first visits a site collection or when he is granted Contribute permissions explicitly on a site.
References: