- Ambari – provides provisioning, monitoring and management layer on top of Apache Hadoop clusters. It provides a web interface for easy management as well as a REST API.
- Flume – allows you to collect, aggregate and move large volumes of streaming data into HDFS in a fault tolerant fashion.
- HBase – provides NoSQL database functionality on top of HDFS. It is a columnar store, which provides fast access to large quantities of data. HBase tables can have billions of rows and these rows can have almost unlimited number of columns.
- HCatalog – provides a tabular abstraction on top of HDFS. Pig, Hive and Mapreduce use this layer to make it easier to work with files in Hadoop. HCatalog has been merged into the Hive project. Hive uses it kind of a like a master database. For more details check out Apache HCatalog – a table management layer that exposes Hive metadata to other Hadoop applications.
- Hive – allows you to perform data warehouse operations using HiveQL. HiveQL is a SQL like language and provides an abstraction layer on top of MapReduce. Hive allows you to use Hive tables to project a schema onto the data (schema on read). Through the use of HiveQL you can view your data as a table and create queries just as you would in a normal database with support for selects, filters, group by, equi-joins, etc…. Hive inherits schema and location information from HCatalog. Hive will act as a bridge to many BI products which expect tabular data. One of the recent developments around Hive is the Stinger initiative – its main aim is to deliver performance improvements while keeping SQL compatibility
- Kafka – is a fast, scalable, durable and fault-tolerant messaging system. It is commonly used together with Storm and HBase for stream processing, website activity tracking, metrics collection and monitoring or log aggregation. It is provides similar functionality as AMQP, JMS or Azure Event Hub
- Mahout – the goal of Mahout is build scalable machine learning libraries. The main machine learning use cases Apache Mahout support are recommender systems (people who buy x also buy y), classification (assigning data to discrete categories e.g. is a credit card transaction fraudelent or not) and clustering (grouping unstructured data without any training data). For more details take a look at Introducing Mahout (IBM)
- Oozie – enables you to create repeatable, dynamic workflows for tasks to be performed in a Hadoop cluster. An Oozie workflow can include Sqoop transfers, Hive jobs, HDFS commands, Mapreduce jobs, etc … Oozie will submit the jobs but Mapreduce will execute them. Oozie also has built-in callback and pollback mechanisms to check for the status of jobs
- Pegasus provides large scale graph mining capabilities by offering important graph mining algorithms such as degree calculation, pagerank calculation, random walk with restart (RWR), etc .. Most graph mining algorithms have limited scalability, they support up to millions of nodes. Pegasus billion-node graphs. Graphs (also referred to as networks) are everywhere in real life going from web pages, social networks, biological networks and many more… Finding patterns, rules etc within these networks allow you to rank web pages (or documents), measure viral marketing, discover disease patterns, etc … The details of Pegasus can be found in the white paper Pegasus: a peta-scale graph mining system – implementation and observations.
- Pig is developed to make data analysis on Hadoop easier. It is made up of two components: a high level scripting language (which is called Pig Latin but most people just reference it as Pig) and an execution environment. Pig Latin is a procedural language which allows you to build data flows, it contains a number of built in User Defined Functions (UDFs) to manipulate data. These UDFs allow you to ingest data from files, streams or other sources, make selections and transform the data. Finally Pig will store the results back into HDFS. Pig scripts are translated into a series of MapReduce jobs that are run on Apache Hadoop. Users can create their own functions or invoke code in other languages such as JRuby, Jython and Java. Pig will gives you more control and optimization over the flow of the data than Hive does.
- RHadoop – is a collection of R packages that allow users to manage and analyze data with Hadoop in R, including the creation of map-reduce jobs. Check out Step-by-step guide to setting up an R-Hadoop system and Using RHadoop to predict website visitors to get started with some hands-on examples.
- Storm – distributed real-time computation system, it supports a set of common stream analytics operations, provides guaranteed message processing with support for transactions. It was originally created by Nathan Marz (see History of Apache Storm and lessons learned) – the guy who cam up with the term Lambda architecture for a generic, scalable and fault tolerant data processing architecture.
- SQOOP – was built to transfer data from relational structured data stores (such as SQL Server, MySQL or Oracle) to Apache Hadoop and vice versa. Because Sqoop can handle database metadata, it is able to perform type-safe data movement using the data types specified in the metadata.
- Zookeeper – manages and store configuration information. It is responsible for managing and mediating conflicting updates across your Hadoop cluster.
Tuesday, March 31, 2015
Overview of Apache Hadoop components in HDInsight, from Ambari to Zookeeper
Thursday, March 26, 2015
People insights– data driven insights regarding people
Whereas marketing and sales as well as financial departments have been using advanced analytics for quite a while, it seems that HR is still in one of the early maturity phases of analytics usage. This is a view which seemed to be shared by CEOs. In a recent study CEOs gave their HR department a 5.9 (out of 10) for their analytical skills. (See CEO niet overtuigd van analytische skills HR )
Whereas HR controls a lot of data (and needs to keep it up to date) it does not seem to be able to use this data to provide strategic advise to the board of directors. HR can only deliver truly added value by providing data-driven insights regarding people that are both compelling to business leaders and actionable by HR. This is a view which is also quite nicely outlined by consultancy firm Inostix in their HR Analytics Value Pyramid (See The HR Analytics Value Pyramid (Part 3) ). To make sure that HR team stays current and viable, they will need to adopt a whole need set of skills of which analytics is just one (See The reskilled HR team – transform HR professionals into skilled business consultants and the capability gap across the 2015 Human Capital Trends)
In a number of upcoming posts I will delve a little deeper into this topic and will show some practical examples of how you can realize some quick wins without a huge upfront investment.
Related links:
- What we learned about HR Analytics in 2014
- 17 differences between HR Metrics and Predictive HR Analytics
- Datafication of human capital
- Top 72 HR Analytics Influencers Part 3
- Business need to make better use of analytics to predict what they need than just recruiting
- Sink or swim: a tidal wave of technology is shaping HR
- How important is data analytics to the future of HR?
- Six takeaways from the HR Analytics Innovation Summit
- Is HR ready for the big data and analytics revolution?
- Making the business case for predictive talent analytics
- Leveraging predictive analytics to avoid a major point of hiring failure
SharePoint Saturday 2015 : How to build your own Delve, combining machine learning, big data and SharePoint
BIWUG is organizing the fifth edition of SharePoint Saturday Belgium – this year in Antwerp – for more information check out the site http://www.spsevents.org/city/Antwerp/Antwerp2015/ . Here is the excerpt of the session I will be delivering.
How to build your own Delve: combining machine learning, big data and SharePoint
You are experiencing the benefits of machine learning everyday through product recommendations on Amazon & Bol.com, credit card fraud prevention, etc… So how can we leverage machine learning together with SharePoint and Yammer. We will first look into the fundamentals of machine learning and big data solutions and next we will explore how we can combine tools such as Windows Azure HDInsight, R, Azure Machine Learning to extend and support collaboration and content management scenarios within your organization.
Related posts:
- Microsoft Azure Machine Learning – the power to predict
- Data science dojo – Beginning AzureML video series
- Big Data – Beyond the hype, getting to the V that really matters
- Microsoft Big Data – Introducing Windows Azure HDInsight
Wednesday, March 04, 2015
BIWUG session on advanced integration between SharePoint Online and Yammer
On the 19th of March BIWUG (www.biwug.be) is organizing its next session – don’t forget to register for BIWUG1903 – we have planned a great speaker and an interesting session
Advanced integration between SharePoint Online and Yammer using Yammer Apps (Speaker: Stephane Eyskens, SharePoint Technical Architect - http://www.silver-it.com/ )
First things first, the session will start describing what are the required steps to bind an Office 365 Tenant with an Enteprise Domain, how to federate on-premises users with Office 365 in order to have a SSO in place and how to bind Yammer to the Office 365 Tenant. Next, developers will learn how to leverage the Yammer App Model in order to build deeper integration between SPO(+on-prem) and Yammer. Business scenarios such as leveraging Yammer's Open Graph in SPO Workflows and associating Yammer Groups to SPO Team sites (& groups) will be covered. Security aspects will be discussed as well : from acting on behalf of a user with his consent to impersonating it completely, we'll see how to manage tokens and discuss some best practices.
Intended audience: The session is primarily intended for developers.
Key benefits: After this session, developers should have a good visibility on how to go beyond the OOTB Yammer App integration with
SPO and what Open Graph is all about.
Also thanks to Xylos for hosting this session
Monday, March 02, 2015
Resetting content index in SharePoint Server 2013: why and how
Don’t just reset your search index in a production environment since this will also impact the analytics processing component (Read Reset the index in SharePoint Server 2013). Listed below is the syntax for the PowerShell command (the snippet below assumes that you only have one SearchServiceApplication)
(Get-SPEnterpriseSearchServiceApplication).Reset($true,$true)
The SearchServiceApplication.Reset method takes two parameters - public void Reset( bool disableAlerts, bool ignoreUnreachableServer) – I would recommend always setting disableAlerts to true if necessary. The value for the second parameter will depend on your specific case. If you also get a timeout when using the PowerShell cmdlet – you can use the steps outlined in SharePoint 2013 Content Index Reset Timeout – they worked for me.
Friday, February 13, 2015
Mindful apps – putting people at the center supported by data
Most of the characteristics which are outlined in the comparison between traditional and mindful apps are not revolutionary (See table above) but there is one one important key message.
Mindful apps will allow us to assess and compare options in decision context, they will allow us to quickly respond to events and make the best decision given a specific context and will provide us with “extended intelligence” by understanding and recognizing patterns within the data at hand. We as humans are good at problem solving, pattern recognition, identifying outliers, making creative leaps and incorporating new information when making decisions. We should be able to focus on these high end tasks by being freed from laborious and menial tasks which can be automated.
There are 3 different trends which will impact how these mindful apps will be shaped:
- User context matters – make it personal. When we make decisions or work within the context of specific processes, there are a lot of parameters which determine how we react or how we make decisions – these parameters should be integrated into the decision framework driving mindful apps. Our calendar, availability of colleagues to reach out to, input from communications (using e-mail, messaging or other formats), information that we capture from blogs, social networks such as LinkedIn or open data sources together with available information within your organization should be filtered and at your fingertips. Machine learning and cognitive algorithms will drive the second machine age (a term coined by Brynolfson from MIT) but we are only at the start of how these algorithms can drive the future workplace for information workers.
- Mobile shapes our expectations. Mobile apps and the user experience they provide is shaping at how we see an ideal enterprise application as well. Mindful apps should strive to combine beauty, simplicity and purpose to create an experience that delights us and that is effortless to use. Mobile apps are easy to understand, when people use a good app for the first time, they intuitively grasp the most important features, why can’t we do the same for enterprise apps. Simplicity rules. The apps should also incorporate necessary logic to evolve as the user grows more comfortable with its use and is exploring more advanced functionality. Apps should learn people’s preferences over time and show the interface which is best suited for the task at hand.
- (Big) data and advanced analytics are the driving force. There is a lot of hype and confusion around the term Big Data but one thing is for sure – storage costs and processing cost have dropped significantly in the last decade. When you combine this with the rise of new storage platforms such as Hadoop, NoSQL datastores such as HBase, Cassandra, etc … and new data processing frameworks such as Apache Drill, Dremel, Spark, etc.. new opportunities arise to support users in their decision making processes. While there is a lot of emphasis on the 4 Vs (Volume, Velocity, Variety and Veracity) – there is one more V that you have to think about that is Value (Also see Big Data beyond the hype, getting to the V that really matters)
- Cloud will lead the way. A lot of the innovation which will enable this next generation of apps is coming out of the datacenters of Google, Amazon, LinkedIn, Microsoft, Yahoo, etc… but most organizations don’t have the available capacity (nor the same financial resources) as these internet giants. Luckily the economies of scales which are offered by the cloud allows solution providers to provide you with a data infrastructure which can scale from prototype size to production environments able to handle huge amounts of data. The different major cloud players – IBM, Microsoft, Amazon and Google all seem to make big bets in building out the data analytics platform of the future and this competition will drive prices further down. This competition will also force them to focus on more innovative solutions which allow them to differentiate from the competition.
The future is already here — it's just not very evenly distributed. (William Gibson)
Thursday, February 05, 2015
Microsoft Azure Machine Learning–the power to predict
The best definition for Machine Learning – in my opinion – is from the excellent book “Introduction to Machine Learning (MIT Press 2014, Ethem Alpaydin)” (Use it as a reference – this is not an easy “how to” book)
The goal of machine learning is to program computers to use example data or past experience to solve a given problem.
In general when we want to solve a problem on a computer, we need an algorithm to transform using a set of instruction into an output. Unfortunately for some problems we do not know how to program such algorithms – such as for e-mail spam detection or predicting customer behavior. In most cases we have the input and output available e.g. a set of e-mails for which some are marked as spam. Based on this data, we would like a computer (or machine) to automatically extract the algorithm necessary to perform the classification. The algorithm does not need to be perfect but needs to be a good and useful approximation.
The term machine learning is tightly coupled to the domain of analytics (or data science – see Data Scientist: the sexiest job of the 21st century ). Analytics is concerned with the discovery and extraction of useful business patterns or mathematical decision models from a specific data set. For this a number techniques can be used, depending on the practitioners background they will probably favor a technique from their respective domain:
- Regression, General Linear Models (GLMS), decision trees, etc … (originated out of the statistics domain)
- Machine learning algorithms such as support vector machines, neural networks, Bayesian methods, … (originated out of the computer science domain)
But why should you care about machine learning? I think the picture below shows you how the focus is shifting from traditional reporting (hindsight) to more advanced predictive and prescriptive analytics (foresight) which will provide business with more added value but also requires business intelligence specialist new competencies such as machine learning and data mining. Examples across industries vary but in general predictive analytics has the potential to change the way how businesses make decisions (I will take a look a more in depth definitely pick up Predictive Analytics – The power to predict who will click, buy, lie or die from Eric Siegel)
Microsoft Azure Machine Learning distinguishes itself from other platforms and tools by a number of different characteristics:
- Allows you to jointly build predictive models from anywhere in the world using only a web browser by making use of visual composition canvas (called Machine Learning Studio) using modules without requiring you to write code (although you can use R code snippets if you want). You can start quickly from existing sample experiments/models or you can share your own data experiments.
- Collaborative work together with anyone from anywhere using just your browser.
- Available as a cloud service, eliminating upfront costs fro hardware resources.
- The different modules allow you to author an end-to-end machine learning workflow starting with reading data, to training and validating your predictive model.
- Ability to deploy models as web services. You can quickly operationalize your models by converting them into web services and you even the ability to monetize your machine learning models using Azure Data Market.
References:
- Microsoft Azure Machine Learning product page
- Machine Learning Blog
- Introducing Microsoft Azure Machine Learning (TechEd Europe 2014 recording)
- Microsoft Learning on Azure (AzureConf 2014 recording)
- Extensibility and R Support in the Azure Machine Learning Platform
- How to upload an R package to Azure Machine Learning
- Vowpal Wabbit Modules in AzureML
- Predict What’s Next: How to get started with Machine Learning Part 1
- Predict What’s Next: How to get started with Machine Learning Part 2
- AzureML : a short introduction
Wednesday, January 28, 2015
BIWUG session–imec Share - an Office 365 customer case
I have uploaded the presentation from yesterday on slideshare – check it out
Listed below also a number of supporting links:
- Office365 Developer Patterns and Practices (GitHub)
- SharePoint Online: software boundaries and limits
- How to avoid getting throttled or blocked in SharePoint Online
- Give feedback about the Office Developer platform (including Office 365) – please vote for the ability modify location based metadata defaults using CSOM
- Get early access to new features in Office 365 and provide feedback with Uservoice
- Office 365 roadmap
- SharePoint Online Client Components SDK (June release)
- App parts/iframes are not the only solution – check out Javascript injection in SharePoint Online – Office 365 Developer Patterns and Practices
- Transforming your SharePoint Full Trust Code to the SharePoint App Model
Wednesday, January 21, 2015
SharePoint Saturday Belgium 2015– Call for speakers
On April 18th 2015 BIWUG (www.biwug.be) is organizing its fifth edition of SharePoint Saturday Belgium. We invite you to submit a session for this year's SharePoint Saturday Belgium using this link - http://www.spsevents.org/city/Antwerp/Antwerp2015 . It is possible to submit multiple sessions. We will close the call for speakers on February 18th EOD.
SharePoint Saturday Belgium 2015 will take place in Antwerp – for more details check out http://www.spsevents.org/city/Antwerp/Antwerp2015. If you have any questions or remarks, do not hesitate to contact me.
Monday, January 05, 2015
Data Science Dojo– Beginning AzureML video series
Interesting video series to start with if you want to learn how you can use Microsoft Azure Machine Learning (AzureML)
- Beginning Azure ML Part 1 – Importing Data, accessing and creating a new experiment
- Beginning Azure ML Part 2 – Reading external data sources
- Beginning Azure ML Part 3 – Data exploration and visualization
- Beginning Azure ML Part 4 - Preprocessing data part I: casting and renaming columns
- Beginning Azure ML Part 5 – Preprocessing data part II: scrub missing values and project columns
- Beginning Azure ML Part 6 - Feature engineering and R script
- Beginning Azure ML Part 7 – Building your First Model
- Beginning Azure ML Part 8 – Run and fine-tune multiple models
- Beginning Azure ML Part 9 – Deploying your first predictive model as a web service
- Beginning Azure ML Part 10 – Using R API to obtain predictions from your web service
- Beginning Azure ML Part 11 – Using Python API to obtain predictions from your web service
Monday, December 15, 2014
Introducing Azure Stream Analytics
Azure Stream Analytics which is currently in preview is a fully managed real-time stream analytics service that aims at providing highly resilient, low latency, and scalable complex event processing of streaming data for scenarios such as Internet of Things, command and control (of devices) and real-time Business Intelligence on streaming data.
Although it might look similar to Amazon Kinesis, it seems to distinguish itself by aiming to increase developer productivity by enabling you to author streaming jobs using a SQL-like language to specify necessary transformations and it provides a range of operators which are quite useful to define time-based operations such as windowed aggregations (Check out Stream Analytics Query Language Reference for more information) – listed below is an example taken from the documentation which finds all toll booths which have served more than 3 vehicles in the last 5 minutes (See Sliding Window – slides by an epsilon and produces output at the occurrence of an event)
SELECT DateAdd(minute,-5,System.TimeStamp) AS WinStartTime, System.TimeStamp AS WinEndTime, TollId, COUNT(*)
FROM Input TIMESTAMP BY EntryTime
GROUP BY TollId, SlidingWindow(minute, 5)
HAVING COUNT(*) > 3
References
- Introducing Azure Stream Analytics: Processing on Near-Realtime Data (TechEd Europe 2014 recording)
- Microsoft adds IoT Streaming Analytics, Data Production and Workflow Services to Azure
- Telemetry and Data Flow at Hyper-Scale: Azure Event Hub
Thursday, December 11, 2014
SharePoint deep dive exploration: SharePoint alerting
This is the second in a series of blogpost on SharePoint Server 2013 in which we will explorer how e-mail alerting works in SharePoint 2013. For part 1 – take a look at SharePoint deep dive exploration: looking into the SharePoint UserInfo table.
If you need to know more about how alerts are working at the lowest level you should take a look at SharePoint 2003 Database tables documentation – for alerts this documentation still seems to be valid. SharePoint stores the list of events for which users have request alerts in the EventCache Table – in comparison to SharePoint 2003 there are some extra fields available (marked in bold). For some of the fields I did not find a
The other tables which are manipulated by the the SharePoint alert framework are EventLog, ImmedSubscriptions, SchedSubscriptions and EventSubsMatches (For an in depth discussion also take a look at the Working with search alerts in SharePoint 2010 ). Every event is recorded in these table but since the EventType and EventData column will contain the most data, these are only filled in when the list has at least one subscription.
So how does this works – there actually is a SharePoint timer job – called the “Immediated Alerts” job which is scheduled to run every 5 minutes. This will pick up the necessary event information and will process it (in batches of 10.000) – if you see issues with alerts not being sent out – I recommend you to take a look at SharePoint Scheduled Alerts Deconstructed
| Column Name | Description |
| EventTime | Time when the event record was inserted into the database |
| SiteId | ID of the site, available from the AllSites table |
| WebId | ID of the web, available from the AllWebs table |
| ListId | ID of the list in which the monitored item appears |
| ItemId | ID of the item that raised the event |
| DocId | ID of the document that raised the event |
| Guid0 | ? |
| Int0 | ? |
| Int1 | ? |
| ContentTypeId | ? |
| ItemName | Full name of the item |
| ItemFullUrl | Full path of the item |
| EventType | ItemAdded(1), Item Modified (2), Item Deleted (4), DiscussionAdded (16), Discussion Modified(32), Discussion Deleted(64), Discussion Closed (128), Discussion Activated (256), … |
| ObjectType | |
| ModifiedBy | User name of the person who raised the event |
| TimeLastModified | Time when the event occurred |
| EventData | The binary large object (BLOB) containing all of the field changes with the old and new values for an item |
| ACL | The ACL for the item at the time it is edited |
| DocClientId | |
| CorrelationId |
The reason why I started looking into these tables because I got feedback from a client that all e-mail alerts which were being sent out had the wrong link in it after we migrated their environment from SharePoint 2007 to 2013. One of the first things that I did was actually sit next to the user who was adding documents in SharePoint and then I noticed something strange. The user uploaded a document and when they needed to fill in extra metadata, they immediately changed the name of the document.
After looking into how alerting works I still did not get an explanation for why the links were sent out correctly before in 2007 – because this should have failed as well. So I used this PowerShell script to create an export of all the e-mail alerts/subscriptions that users had in SharePoint and I noticed that most of the alerts were on just a couple of libraries and then I found it.
In SharePoint 2007, they had a “require check out” set by default on these libraries – this means that when the user uploaded and renamed the document, it was not yet visible to other users and the alert was not send out. If checkout is not required then the files are immediately visible and the “New Item Added” immediate alerts is fired – this was the behavior that they were seeing in 2013.
So the “require checkout” is an interesting workaround to prevent a file from being visible before it is explicitly checked in. Since they were changing the file properties (and even the filename) before the file is visible to users, the New Item alerts would not trigger and users would only be notified of the “Changed Item” alert when the file was checked in.
The reason why we deactivated “require check out” was because of it would conflict with co-authoring but apparently they would never use this feature for these specific libraries for which these alerts were set. So the morale of the story, don’t just activate or change a specific functionality because it is available in a new version but first look at how people are actually using it.
References:
- Use Windows PowerShell to update alerts in SharePoint 2013
- SharePoint scheduled alerts deconstructed
- Working with search alerts in SharePoint 2010
BIWUG on blueprint for large scale SharePoint projects and display templates
On the 16th of December BIWUG (www.biwug.be) is organizing its next session – don’t forget to register for BIWUG1612 because there are some great sessions planned.
SharePoint Factory: a blueprint for large scale SharePoint projects (Speaker: Lana Khoury, Senior EIM Consultant at CGI Belgium responsible for CGI Belgium’s Microsoft Competency Centre, and the Digital Transformation Portfolio)
Large Notes 2 SharePoint transformations do require a standardized approach in development and project management in order to assure the delivery in time and quality.The SharePoint Factory has been developed, to allow parallel development of applications and support all stages of the development process by having standardized quality gates, test procedures and templates for example requirements analysis templates. Essentially, the SharePoint Factory can be compared to an assembly line in the automotive industry.This approach is combined with a SharePoint PM as a Service offering which is a blueprint for the Management of Large Scale SharePoint projects and does provide a specific PM Process with SharePoint centric artefacts, checklists and documents. The approach has been developed within a 6.500 person day Project in Germany and has already been published to German .net Magazin, SharePoint Kompendium and Dutch DIWUG Magazine.
Take your display template skills to the next level (Speaker: Elio Struyf, senior SharePoint consultant at Ventigrate - http://www.eliostruyf.com/)
Once you know how search display templates work and how they can be created. It is rather easy to enhance the overall experience of your sites compared with previous versions of SharePoint. In this session I will take you to the next level of display templates, where you will learn to add grouping, sorting, loading more results, and more. This session focuses on people that already have the basic understanding of what search display templates are, and how they can be created.
18:00 - 18:30 ... Welcome and snack
18:30 - 19:30 ... SharePoint factory: a blueprint for large scale SharePoint projects (Speaker: Lana Khoury)
19:30 - 19:45 ... Break
19:45 - 20:45 ... Take your display template skills to the next level ( Speaker: Elio Struyf )
20:45 - … ... SharePint!
Tuesday, November 25, 2014
Get early access to new features in Office 365 and provide feedback with Uservoice
The last couple of months a couple interesting new functional modules such as Delve, Yammer Groups and the new App launcher have been pre-released on Office 365 for which some might only become visible after you have activated first release. Remember that there also is an Office 365 for business public roadmap available (at office.com/roadmap) where you can see which functionality is being rolled out and which is under development. For more information check out the links below.
Also remember that you can always use the Office Developer Platform Uservoice (http://officespdev.uservoice.com/) to give feedback and request changes. You can submit your feedback for a specific change and encourage others who you know to support these changes by voting for them. If you want to give feedback with regards to InfoPath – there is a Microsoft Office Forms vNext User Voice (http://officeforms.uservoice.com/) as well.
References
Tuesday, November 18, 2014
Understanding Azure Event Hubs–ingesting data at scale
Azure Event Hubs are an extension on the existing Azure Service Bus which provides hyper-scalable stream ingestion capabilities. It allows different producers (devices & sensors – possibly in the 10 thousands) to send continuous streams of data without interruption. There are a number of different scenario in which you typically see this kind of streaming data from different sensors such as future oriented scenarios such as connected cars, smart cities but also more common scenarios such as application telemetry or industrial automation.
Event hubs scaling is defined by Throughput Units (TUs) which is kind of like a pre-allocation of resources. A single TU is able to handle up to 2 MB/s for writes or 1000 events per second and 2MB/s for read operations. Load in the Event Hub is determined by creation of partitions, these partitions allow for parallel processing both from the consumer and producer side. Next to support for common messaging scenarios, competing consumers, it allows provide data retention policies up to 84 GB of event storage per day. The current release supports up to 32 partitions but you can log a call to increase this up to a 1000 partitions. Since a partition is allocated at most 1 TU, this would allow for 1GB/s data ingest per Event Hub. Messages can be send to an Event Hub publisher endpoint via HTTPS or AMQP 1.0, consumers can retrieve messages using AMQP 1.0
Building such an application architecture is quite challenging and Event Hubs allows you to leverage the elasticity of the cloud and a pay per use model to get started quite rapidly. Whereas current scaling of this type of systems is oriented at 10s of thousands of units, expectations are that this number will increase quite rapidly. Gartner expects the number of installed IoT units to increase up to 26 billion by 2020, other estimates are event pointing at 40 billion IoT units (Internet of Things by the Numbers: estimates and forecasts)
References:
- Introducing Microsoft Azure Event Hubs (00:07:00) by @ClemensV (TODO review it again)
- Event Hubs Overview (Technet documentation)
- Microsoft Azure Cloud Cover Show – Episode 160: Event Hubs
- Event Hubs Programming Guide (MSDN)
Monday, November 17, 2014
Webinar: What’s new on the Microsoft Azure Data Platform
On Thursday 20th of November I will be delivering a webinar on the new capabilities in the Microsoft Azure Data Platform. With the recent addition of three new services - Azure Stream Analytics, Azure Data Factory and Azure Event Hubs - Microsoft is making progress in building the best cloud platform for both big data solutions as well as enabling the Internet of Things (IoT). These additions will allow you to process, manage and orchestrate data from Internet of Things (IoT) devices and sensors and turn this data into valuable insights for your business.
The above mentioned new services extend Microsoft's existing big data offering based on HDInsight and Azure Machine Learning. HDInsight is Microsoft's offering of Hadoop functionality on Microsoft Azure. It simplifies the setup and configuration of Hadoop cluster by offering it as an elastic service. Azure Machine Learning is a new Microsoft Azure-based tool that helps organization build predictive models using built in machine learning algorithms all from a web console.
In this webinar I will show what are the key capabilities of these different components, how they fit together and how you can leverage them in your own solutions.
Register for this free webinar “What’s new on the Microsoft Azure Data Platform” and get up to speed in less than one hour.
Wednesday, November 12, 2014
BIWUG on apps for SharePoint Server 2010 and data driven collaboration
On the 26th of November BIWUG is organizing our next session – don’t forget to register for BIWUG2611 because there are some great sessions planned.
Writing apps on SharePoint Server 2010 (Speaker: Akshay Koul, SharePoint CoOrdinator at Self, http://www.akshaykoul.com )
The session is geared towards developers/advanced users and explains how you can write enterprise level applications on SharePoint 2010 without any server side code. We will go through real life applications and discuss the mechanisms used, the provisioning process, debugging techniques as well as best practices. The application written are fully compatible with Office 365/SharePoint Online and SharePoint Server 2013.
Preparing for the upcoming (r)evolution from User Adoption to Data-Driven Collaboration (Speaker: Peter Van Hees, MVP Office 365/Collaboration architect, http://petervanhees.com )
As Consultants we (try to) listen to our customer, (try to) address the requirements ... and finally (try to) deploy the solution. This seems like an easy job, but in reality Collaboration projects - and especially SharePoint or Yammer implementations - are a little more challenging. The fast adoption of cloud computing has introduced a new currency for license-based software: User Engagement. If you can’t engage your users, your revenue stream will start to spiral downwards. It should be obvious that Office 365 (and all of its individual components) are not exempt. We all need to focus on the post deployment!
This story bears its roots in my hands-on experience while trying to launch Yammer initiatives. It seems that everyone agrees that Yammer is a wonderful and viral service ... yet, the conversations seems to flat line in most organizations. We will review how you should (already) be addressing User Adoption now; but, more importantly, we will spend more time to look into the stars … a future where Data-Driven Collaboration will take User Engagement to the next level. This isn't a story about Delve. It's about ensuring you integrate data in all your projects to prepare for the future. The age of smart software …
18:00 - 18:30 ... Welcome and snack
18:30 - 19:30 ... Writing apps on SharePoint Server 2010 (Speaker: Askhay Koul)
19:30 - 19:45 ... Break
19:45 - 20:45 ... Preparing for the upcoming (r)evolution from User Adoption to Data-Driven Collaboration( Speaker: Peter Van Hees )
20:45 - … ... SharePint!
Tuesday, October 14, 2014
Getting Virtualbox to work on Windows 8.1
If you are already using Hyper-V you will also need to create a dual boot since Virtualbox is not compatible with Hyper-V. You can do this using the commands listed below from an administrative command prompt (As outlined in this blog post from Scot Hanselman – Switch easily between VirtualBox and Hyper_V with a BCDEdit boot entry in Windows 8.1 )
C:\>bcdedit /copy {current} /d "No Hyper-V"
The entry was successfully copied to {ff-23-113-824e-5c5144ea}.
C:\>bcdedit /set {ff-23-113-824e-5c5144ea} hypervisorlaunchtype off
The operation completed successfully.
When booting you will be provided with an option to boot with Hyper-V support or without Hyper-V support.
Tuesday, September 23, 2014
SharePoint deep dive exploration: looking into the SharePoint UserInfo table
The user information which is being displayed in a created by or a modified by field in SharePoint it is not being directly retrieved from Active Directory but it is retrieved from an internal SharePoint table called the UserInfo table.
All users in Active Directory are not immediately added to this table. When a user is explicitly added to a site collection using security settings, it is added to the UserInfo table. Another way that user info is created in this table, is when a user is granted access through an Active Directory group and the user visits the site for the first time.
Users which are deleted from a site collection, will still be found in the UserInfo table but with a flag bDeleted set to True (1). When the people picker queries the UserInfo table, it will not include user with bDeleted set to 1. The All people page ( /_layouts/people.aspx?Membershipgroupid=0) will also only list users where bDeleted equals 0. This also means that even when people leave your organization and their Active Directory account is disabled (or removed), the Created By and Modified By columns will still display the name of the user. The general recommendation is to leave this mechanism as it was designed but there are border case scenarios in which you want to delete users – if so you can take a look at Delete users and clean up user information list in SharePoint
There also are two different timer jobs which synchronize information from the User Profile Service Application to all site collections:
- User Profile to SharePoint Quick Synchronization – runs default every 5 minutes – synchronizes information for users recently added to a site collection
- User Profile to SharePoint Full Synchronization – runs default every hour -
References:
- Display name in SharePoint is out of sync
- Delete users and clean up user information list in SharePoint - border case scenario only apply if you are in the same situation.
Thursday, September 18, 2014
SharePoint Server 2013–Error on manage user properties page–your search encountered an error
Last week when I wanted to modify some user properties in a SharePoint Server 2013 - I got a very strange error “Your search encountered an error. If the problem persists, contact the portal site administrator”
I first checked search but I saw no errors in there – it seems that the error description is not really guiding you in the correct direction. The solution was starting the ForeFront Identity Manager Service (Run command>Services.msc) as outlined in this blogpost UserProfile Service, Managed properties are not available









