June 07, 2015

Video: Parallel Algorithms Reconsidered
In this video, Peter Sanders from Karlsruhe Institute of Technology presents:Parallel Algorithms Reconsidered. Parallel algorithms have been a subject of intensive algorithmic research in the 1980s. This research almost died out in the mid 1990s. In this paper we argue that it is high time to reconsider this subject since a lot of things have changed. First and foremost, parallel processing has moved from a niche application to something mandatory for any performance critical computer applications. We will also point out that even very fundamental results can still be obtained. We give examples and also formulate some open problems.”


Privacy Risk Managementfor Federal Information Systems
This publication introduces a privacy risk management framework (PRMF) for anticipating and addressing privacy risk that results from the processing of personal information in federal information technology systems. In particular, this publication focuses on the development of two key pillars to support application of the PRMF: privacy engineering objectives and a privacy risk model. In so doing, it lays the foundation for the establishment of a common vocabulary to facilitate better understanding of, and communication about, privacy risks and the effective implementation of privacy principles in federal information systems. The set of privacy engineering objectives defined in this document provides a conceptual framework for engineers and system designers to bridge the gap between high-level principles and implementation.


Interview: Mike Lamble, CEO at Clarity Solution Group
“Better, cheaper, faster” is good. Schema-less writes, fitness for all data types, commodity hardware and open source software, limitless scalability – is also good. That said, out of the box Hadoop-based Data Lakes are not industrial strength. It’s not as simple as downloading the Hadoop software, installing it on a bunch of servers, loading the Data Lake, unplugging the enterprise Data Warehouse (EDW) and — voila. The reality is that the Data Lake architecture paradigm – which is a framework for an object-based storage repository that holds data in its native format until needed – oversimplifies the complexity of enabling actionable and sustainable enterprise Hadoop. An effective Hadoop implementation requires a balanced approach that addresses the same considerations with which conventional analytics programs have grappled with for years: establishing security and governance, controlling costs and supporting numerous use cases.


Google Create Kubernetes-based VM/Docker Image Building Framework
The Google Cloud Platform team have released a technical solution paper and open source reference implementation that describes in detail how to automate image builds via Google Compute Engine (GCE) using open source technology such as Jenkins, Packer, and Kubernetes. The reference implementation can be used as a template to continuously build images for GCE or Docker-based applications. Images are built in a central project, and then may be shared with other projects within an organisation. The Google Cloud Platform blog proposes that ultimately this automated image build process can be integrated as a step in an organisation's continuous integration (CI) pipeline.


Why “Agile” and especially Scrum are terrible
Under Agile, technical debt piles up and is not addressed because the business people calling the shots will not see a problem until it’s far too late or, at least, too expensive to fix it. Moreover, individual engineers are rewarded or punished solely based on the completion, or not, of the current two-week “sprint”, meaning that no one looks out five “sprints” ahead. Agile is just one mindless, near-sighted “sprint” after another: no progress, no improvement, just ticket after ticket. ... “Agile” and Scrum glorify emergency. That’s the first problem with them. They’re a reinvention of what the video game industry calls “crunch time”. It’s not sustainable. ... People will tolerate those changes if there’s a clear point ahead when they’ll get their autonomy back.


Big Data and the Future of Business
The point of Big Data is that we can do novel things. One of the most promising ways the data is being put to use is in an area called “machine learning.” It is a branch of artificial intelligence, which is a branch of computer science—but with a healthy dose of math. The idea, simply, is to throw a lot of data at a computer and have it identify patterns that humans wouldn’t see, or make decisions based on probabilities at a scale that humans can do well but machines couldn’t until now, or perhaps someday at a scale that humans can never attain. It’s basically a way of getting a computer to do things not by explicitly teaching it what to do, but having the machine figure things out for itself based on massive quantities of information.


Datameer adds governance tools for Hadoop analytics
Data silos are one potential consequence, as are regulatory-compliance risks when sensitive data sets are being used. Datameer’s new governance module is designed to give businesses transparency into their data pipelines while providing IT with tools to audit diligently for compliance with internal and external regulations. New data-profiling tools, for example, let companies find and transparently fix issues like dirty, inconsistent or invalid data at any stage in a complex analytics pipeline. Datameer’s capabilities include data profiling, data statistics monitoring, metadata management and impact analysis. Datameer also supports secure data views and multi-stage analytics pipelines, and it provides LDAP/Active Directory integration, role-based access control, permissions and sharing, integration with Apache Sentry 1.4, and column and row anonymization functions.


How UPS uses analytics to drive down costs
Putting it in perspective, the advanced math around determining an order of delivery is incredible. If you had a 120-stop route and you plotted out how many different ways there are to deliver that 120-stop route, it would be a 199-digit number. It’s so large mathematicians call it a finite number that is unimaginably large. It’s in essence infinite. So our mathematicians had to come up with a method of how to come up with an order of delivery that takes into account UPS business rules, maps, what time we need to be at certain places and customer preferences. It had to be an order of delivery that a driver could actually follow to not only meet all the business needs, but with fewer miles than they’re driving today. And this is on top of the 85 million miles we’ve already reduced.


CTO interview: Customer data analytics driving revenue growth at British Medical Journal
The analytics plans have involved investing in a number of tools. Among others, this includes including Google Analytics and AppDynamics, which is used to monitor user behaviour as well as a back office monitoring tool, Cooper said. ”We are using that a lot for performance and to be able to look at not just what an application is doing, but how we are using that to see what people are doing in the application,” she said. ... “Right now we are not in such a mess, but what we have got is so fragmented and we are just trying to work out what it is we need to track, what is the important data, what do we need to measure, because we have a lot of very industry specific data models that come with being an academic publisher.”


Safe Big Data
Data privacy has historically concentrated on preserving the systems managing data instead of the actual data. Since these systems have proven to be vulnerable, a new approach that encapsulates data in cloud-based environments is necessary. New algorithms must also be created to provide better key management and secure key exchanges. Data management concerns itself with secure data storage, secure transaction logs, granular audits and data provenance. This aspect must be concerned with validating and determining the trustworthiness of data. Fine-grained access controls along with end-to-end data protection can be used to verify data and make data management more secure.



Quote for the day:

"When you have exhausted all possibilities, remember this: You haven't." -- Thomas Edison