Daily Tech Digest by Kannan Subbiah: July 19, 2016

Cybersecurity control a concern for digital businesses

Gartner predicts that by 2018, 25% of corporate data traffic will bypass enterprise security controls and flow directly to the cloud from mobile devices. With data no longer restricted to data centers, it is important to stop trying to control information and instead determine how it flows, Pratap added. “Finding all sensitive data and tracking all access in all forms will be too onerous for most organizations,” she said. “Each organization will have to manage their ability to do this within the limits of the resources they can commit. From personally identifiable information to sensitive intellectual property, the impact of compromise of such information on the organization needs to be assessed regularly.”

From Pig to Spark: An Easy Journey to Spark for Apache Pig Developers

Pig has a lot of qualities: it is stable, scales very well, and integrates natively with the Hive metastore HCatalog. By describing each step atomically, it minimizes conceptual bugs that you often find in complicated SQL code. But sometimes, Pig has some limitations that makes it a poor programming paradigm to fit your needs. The three main limitations are : Pig is a pipeline and doesn’t offer loops or code indirections (IF..THEN) which can sometimes be mandatory in your code. ... Finally, a third Pig limitation is related to input data formats: whereas Pig is good with CSV and HCatalog, it seems a bit less comfortable with reading and processing some other data formats like JSON (through JsonLoader), whereas Spark integrates them natively.

Insurance is ready for an upgrade

Before too long, IoT may enable carriers to become primarily the ensurers of safety and productive use of properties, rather than just the insurers of damages should a loss occur. If IoT detects the imminent failure of a $100 compressor in a $1 million piece of equipment that prevents a $100 million business-interruption loss, an entirely new value chain is created. If carriers don’t seize the moment, outside tech firms could launch IoT platforms that already have an ingrained risk-transfer component, thereby beating insurers at their own game. Nor are life insurers immune to the disruptions caused by enhanced connectivity. More life carriers will likely take the plunge into telematics, including some utilizing a fitness-monitoring device to award points for those who exhibit healthy behaviors, thereby allowing policyholders to earn premium discounts and other rewards while facilitating a richer, more holistic relationship with their insurer.

Introduction to data-science tools in Bluemix – Part 3

A big part of any data science activity is learning how to put the data in a format that helps you gain insight. A common task is looking at the data in time segments, joining them on date patterns or time of year dates. In this recipe we will look at how we transform dates so they can be used as date formats rather than text strings. In addition we will look as joining data frames from multiple data sources. ... You will notice that the date is in format of “MMMM-YY”, this is a concern because the year is not specific. Because I know the data, I have made a rule in this case that everything less than 20 is for the year 2000 and beyond. Everything 20 and above is for the 1900’s. The next concern is that I need my date format in “YYYY_MM-DD” format and there is no “days” in the source date. I am going to default it to “01”

Europe Builds a Network for the Internet of Things. Will the Devices Follow?

For growth to accelerate, says de Smit, a few things are necessary. The first is for the KPN network to enable location-based features, which would, for instance, allow a shipping container to be tracked in transit across the country—something expected to go live before the end of 2016. The second is IoT coverage beyond national borders. Siemens, Shimano, and other large companies are very interested in gaining access to IoT networks, but only when there is enough geographic coverage, says de Smit. That may take a few years. KPN is not the only company building out the IoT. SigFox, a French startup, claims its competing wireless grid already covers 340 million people in parts of 22 countries. The company raised well over $100 million in investment in 2015 alone, and is using the money to expand as rapidly as possible.

Red Hat Shoots to Solve Container Storage with Gluster and OpenShift

The integration translates to another option for storing data inside containers. That’s important because, to date, other persistent storage solutions for containers have tended to be clunky. Here’s why: Docker containers are ephemeral. They spin up and down as needed, which is what makes containerized infrastructure so scalable and agile. But it also makes it hard to store data persistently, since you can’t store permanent data inside containers very effectively if the containers themselves are not permanent. Previous attempts to solve this conundrum have centered on creating special containers dedicated to storage, or allowing containerized apps to access storage on the host system.

Organising for Analytics Success - Centralising vs. Decentralising

As we know the analytics team needs to have an acute understanding of the business and business unit they are working in. To be able to build models and derive insights its important that there is some context to the objectives of the business unit as well as the problem the analytics team is solving for. It's based on this premise, then, that many Heads of Analytics (and similar) believe that analytics has to be decentralised. Deploy a Head of Analytics into each business unit, allow them to work alongside the business owners and build insights with specific knowledge of the customer and the product. This structure makes perfect sense. Except when you take into account that there is a distinct lack of skills when it comes to people who can build advanced analytical models; and understand business; and have the ability to lead a team and engage with business.

Chief data officer job stakes claim in data innovation

We forget, but, before big data and analytics became the mainstays, shops would take all of their data out of transactional systems, build a data warehouse, do some data cleansing and run some reports and, maybe, if you were really, really good, that could become the golden copy of your data, which you could send back to your applications. That's what we called the closed loop. It was data warehouse nirvana. But the IT and application development groups would have their release cycles, and the data warehouse group would have its release cycles. Never the two would meet, and they didn't really care about each other. Now, the big data platform has really become the back end of some of the applications, especially for analytics like recommendation engines and applications that measure customers' propensity to buy.

5 steps to avoid overcommitting resources on your IT projects

Maureen Carlson, Partner, Appleseed Partners, says, "Not enough companies are connecting the dots about the impact of resource overcommitment and the ability to deliver on innovation to meet growth objectives. The research shows that companies are working on products or projects that are at risk of delayed delivery because there was not enough capacity to take them on in the first place. Mature organizations are in a position to evaluate capacity in real-time to make critical business tradeoffs and see continued investment in this area as a competitive differentiator." ... PMOs play a crucial role in assisting organizations with strategy and execution and as such must recognize the need for effective resource management and capacity planning.

Has open source become the default business model for enterprise software?

When it comes building the business, open source and proprietary are the same -- but different. The biggest difference is starting points. The proprietary software company starts with an idea that is refined based on identifying customer pain points and classic gap analysis. With open source, the trigger is less formal, because at the outset, the primary risk is sweat equity. Somebody gets an idea, develops it in the wild, and in place of gap analysis, there's the sink-or-swim process of developer interest going viral. But, ultimately, both need to deliver some unique value-add, scale it, and go to market. There is the neatness, or lack thereof, of the open-source model. Witness the long tail of adoption of Android updates, or the ordered disorder of the Hadoop platform, where each commercial platform has different mixes and matches of open-source projects.

Quote for the day:

"To double your net worth, double your self-worth. Because you will never exceed the height of your self-image." -- Robin Sharma

Daily Tech Digest by Kannan Subbiah

July 19, 2016