Daily Tech Digest by Kannan Subbiah

Daily Tech Digest - August 11, 2018

Could Machine Learning Mean the End of Understanding in Science?

artificial-neural-network-nodes-connected-together

Most scientists would probably agree that prediction and understanding are not the same thing. The reason lies in the origin myth of physics—and arguably, that of modern science as a whole. For more than a millennium, the story goes, people used methods handed down by the Greco-Roman mathematician Ptolemy to predict how the planets moved across the sky. Ptolemy didn’t know anything about the theory of gravity or even that the sun was at the centre of the solar system. His methods involved arcane computations using circles within circles within circles. While they predicted planetary motion rather well, there was no understanding of why these methods worked, and why planets ought to follow such complicated rules. Then came Copernicus, Galileo, Kepler and Newton. Newton discovered the fundamental differential equations that govern the motion of every planet. The same differential equations could be used to describe every planet in the solar system. This was clearly good, because now we understood why planets move.

3 Trends in Organization Design Presenting Opportunities for Leaders

Today, nearly every business has digitized to some extent. Some companies—for example, Uber and Amazon—have used digital solutions to create business models that would have been unimaginable in the 1980s. While not every business needs to be digitized to the same extent as Uber, nearly every business can benefit from exploring the use of artificial intelligence, data and analytics, and other technology to improve capabilities and results not just incrementally but exponentially. Capitalizing on these potentials, however, does require strong leadership and a willingness to change and adapt. You can’t just plug a new technology into an old framework without affecting other aspects of the organization, such as how work is done, how the structure is designed, how metrics are used to drive performance, what skills and talent are needed, and how culture will reinforce strategy. ... Agile is another organization design trend that has its roots in the digital world. It is a way of working that enables a company to respond more quickly to changes in the marketplace, and it can result in a more nimble, resilient organization.

Are You Spending Way Too Much on Software?

Companies are allowing their data to get too complex by independently acquiring or building applications. Each of these applications has thousands to hundreds of thousands of distinctions built into it. For example, every table, column, and other element is another distinction that somebody writing code or somebody looking at screens or reading reports has to know. In a big company, this can add up to millions of distinctions. But in every company I’ve ever studied, there are only a few hundred key concepts and relationships that the entire business runs on. Once you understand that, you realize all of these millions of distinctions are just slight variations of those few hundred important things. In fact, you discover that many of the slight variations aren’t variations at all. They’re really the same things with different names, different structures, or different labels. So it’s desirable to describe those few hundred concepts and relationships in the form of a declarative model that small amounts of code refer to again and again.

How do data companies get our data?

Research has shown that more than three in four Android apps contain at least on third-party tracker. Third-party app analytics companies plan a crucial role for advertisers and app developers. Though some are used to better understand how users use apps, a vast majority are used for targeted advertising, behavioural analytics, and location tracking. The problem is, that there is no actual opting-out, when it comes to such third-party tracking. In addition to third party trackers embedded in apps, apps themselves frequently access users’ entire address books, location data, photos and more, sometimes even if you have explicitly turned off access to such data. ... Another major source of data for data companies are surveys – this was at the heart of the 2018 Cambridge Analytica scandal. This includes things such as personality quizzes, online games and tests, and more. When a company asks you to rate a product, your opinion may benefit many other companies. The data company Epsilon for instance has created a database called Shopper’s Voice boasting “unique insights you won’t find anywhere else, directly from tens of millions of consumers.

Banking Giant ING Is Quietly Becoming a Serious Blockchain Innovator

ING is out to prove that startups aren't the only ones that can advance blockchain cryptography. Rather than waiting on the sidelines for innovation to arrive, the Netherlands-based bank is diving headlong into a problem that it turns out worries financial institutions as much as average cryptocurrency users. In fact, the bank first made a splash in November of last year by modifying an area of cryptography known as zero-knowledge proofs. Simply put, the code allows someone to prove that they have knowledge of a secret without revealing the secret itself. On their own, zero-knowledge proofs were a promising tool for financial institutions that were intrigued by the benefits of shared ledgers but wary of revealing too much data to their competitors. The technique, previously applied in the cryptocurrency world by zcash, offered banks a way to transfer assets on these networks without tipping their hands or compromising client confidentiality. But ING has came up with a modified version called "zero-knowledge range proofs," which can prove that a number is within a certain range without revealing exactly what that number is.

What is data wrangling and how can you leverage it for your business?

Regardless of how unexciting the process of data wrangling might be, it’s still critical because it makes your data useful. Properly wrangled data can provide value through analysis or be fed into a collaboration and workflow tool to drive downstream action once it’s been conformed to the target form. Conformance or transforming disparate data elements into the same format also addresses the problem of siloed data. Siloed data assets cannot “talk” to each other without translating data elements between the different formats, which is often time or cost prohibitive. Another benefit of data wrangling is that it can be organized into a standardized and repeatable process that moves and transforms data sources into a common format, which can be reused multiple times. Once your data has been conformed to a standard format, you’re in a position to do some very valuable, cross-data set analytics. Conformance is even more valuable when multiple data sources are wrangled into the same format.

Digital transformation and the law of small numbers

Across industries, there is more downbeat news on digital transformation. A recent study by consulting firm Capgemini and the MIT Center for Digital Business concludes that organizations are struggling to convert their digital investments into business successes. The reasons are illuminating and many: lack of digital leadership skills, and a lack of alignment between IT and business, to name a couple. The study goes on to suggest that companies have underestimated the challenge of digital transformation and that organizations have done a poor job of engaging employees across the enterprise in the digital transformation journey. These findings may sound surprising to technology vendors, all of whom have gone “digital” in anticipation of big rewards from the digital bonanza (at least one global consulting firm has gone so far as to tie senior executive compensation to “digital” revenues). Anecdotally, “digital” revenues are still under 30 percent of total revenues for most technology firms, which further corroborates the findings of market studies on the state of digital transformation.

Containers Are Eating the World

The container delivery workflow is fundamentally different. Dev and ops collaborate to create a single container image, composed of different layers. These layers start with the OS, then add dependencies (each in its own layer), and finally the application artifacts. More important, container images are treated by the software delivery process as immutable images: any change to the underlying software requires a rebuild of the entire container image. Container technology, and Docker images, have made this far more practical than earlier approaches such as VM image construction by using union file systems to compose a base OS image with the applications and its dependencies; changes to each layer only require rebuilding that layer. This makes each container image rebuild far cheaper than recreating a full VM image. In addition, well-architected containers only run one foreground process, which dovetails well with the practice of decomposing an application into well-factored pieces, often referred to as microservices.

How to build a layered approach to security in microservices

Microservices that need addresses across multiple applications make address-based security more complicated. For a different approach, you can group applications that share microservices into a common cluster, based on a common private IP address. Through this approach, all the components within the cluster are capable of addressing each other, but you will still need to expose them for communications outside that private network. If a microservice is broadly used across many applications, you should host it in its own cluster, and its address should be exposed to the enterprise virtual private network or the internet, depending on its scope. Network-based security reduces the chances of an intruder accessing a microservice API, but it won't protect against intrusions launched from within the private network. A Trojan or other hacked application could still gain access at the network level, so you may need to add another another level of security in microservices. This is the access control level. Access control relies on the microservice recognizing that a request is from an authentic source.

WhiteSource Launches Free Open Source Vulnerability Checking

After completing a scan of the user's requested libraries, the Vulnerability Checker shows all vulnerabilities detected in the software and the path, indicating which library includes which vulnerability. We also show the CVSS 3.0 score, provide links to references and even supply the suggested fix per the open source community. In the WhiteSource full platform we further provide information regarding whether you are actually making calls to the vulnerable functionality and a full trace analysis to provide insights for faster and quicker remediation for all known vulnerabilities (not just the top fifty from the previous month. WhiteSource automates the entire process of open source components management from the selection process, through the approval process and finding and fixing vulnerabilities in real-time. It is a SaaS offering priced annually per contributing developers, meaning the number of developers working on the relevant applications. We offer our full platform services free of charge for open source projects.

Quote for the day:

"Your excuses are nothing more than the lies your fears have sold you." -- Robin Sharma

Daily Tech Digest - August 10, 2018

Addressing the AI Engineering Gap with Technology

Headline breakthroughs in AI have come fast and furious in recent years, fuelled by the rapid maturing of techniques using deep learning, the success of GPUs at accelerating these compute-hungry tasks, and the availability of open-source libraries like TensorFlow, Caffe, Theano and PyTorch. This has accelerated innovation and experimentation, leading to impressive new products and services from large tech vendors like Google, Facebook, Apple, Microsoft, Uber and Tesla. However, I predict that these emerging AI technologies will be very slow to penetrate other industries. A handful of massive consumer tech companies already have the infrastructure in place to make use of the mountains of data they have access to, but the fact is that most other organisations don’t – and won’t for a while yet. There are two core hurdles to widespread adoption of AI: engineering big data management, and engineering AI pipelines. ... AI engineering competency is the next hurdle – and it’s likely to be many years yet before it becomes widespread across industries beyond the tech giants.

Enterprises should be able to sell their excess internet capacity

The idea is that those with excess data capacity, such as a well-provisioned office or data center, which may not be using all of its throughput capacity all of the time — such as during the weekend — allocates that spare bandwidth to Dove’s network. Passing-by data-users, such as Internet of Things-based sensors or an individual going about business, would then grab the data it, he, or she needs; payment is then handled seamlessly through blockchain smart contracts. “The Dove application will find the closest Dove-powered hotspot or peer node, negotiate the package deal, and connect automatically,” the company says in a white paper. Dove Network says it intends to supply a 500-yard-plus-range, blockchain-based wireless router to vendors. It’s also talking about longer-range access points in the future. Both solutions will allow relatively few organizations to sign up, yet still blanket urban areas with hotspots, it says. Dove Network further says on its website that it believes internet infrastructure is broken. It reckons half of the world is not connected to the internet, yet 35 percent of paid-for data is never used.

Can SNMP (Still) Be Used to Detect DDoS Attacks?

Polling from the cloud every five seconds might not be the way one wants to build its attack detection. And even if one does, it is limited to detecting attacks where the smallest burst is no longer than 10 seconds. What to do when the burst is six seconds, or less? The SNMP polling method simply does not scale for the detection of burst attacks and we need to move away from pulling analytics to real-time, event-based methods. On-box RMON rules with threshold detection, generating SNMP traps, provides one alternative without introducing new technologies or protocols. However, what is possible in terms of detections and triggers for SNMP traps will depend on the capabilities of your device. That said, most network equipment manufacturers provide performance management and streaming analytics that by far exceed the possibilities of SNMP. Now would be a good time to look at those alternatives and implement an on- or off-box automation for attack detection and trigger traffic redirection through API calls to the cloud service.

Hairy artificial skin gives robots a sense of touch

The smart skin includes nanowire sensors made from zinc oxide (ZnO). They are much thinner than human hair (0.2 microns, while hair is around 40 microns), and when they brush against something, they can sense temperature changes and surface variations. These nanowires are covered in a protective coating that makes them resistant to chemicals, extreme temperatures, moisture, and shock, so they can be used in harsh environments. The nanowires and protective coating are bundled together into one sheet of pressure sensing "skin" that can be draped over a robot, so existing robots such as a fleet of industrial arms at a manufacturing plant could be retrofitted with a new sense of touch. While the image of hairy robots is endearing, the skin actually just looks like a sheet of plastic with patches of sensors. The "hairs" are so small that you can't feel them, and they can only be seen under a microscope. The researchers describe their smart skin in a paper that published in IEEE Sensors Journal in 2015, and they have now received a patent for their technology. We asked the lead researcher Zeynep Çelik-Butler how this stands out from other smart skin technologies.

Data veracity challenge puts spotlight on trust

This data veracity challenge is one that most businesses have yet to come to grips with. In our Technology Vision for Oracle 2018, 79 percent of the business executives we spoke with agreed that organizations are basing their most critical systems and strategies on data – yet many have not invested in the capabilities to verify the truth within it. If we’re to fully harness data for the full benefit to businesses and society, then this challenge needs to be addressed head on. In the past year the company unveiled its Autonomous Database, which further maintains data purity by – as the name implies – offering total automation and thereby vastly reducing human error. Steps like these are critical, as data services and websites rely on DaaS to properly analyze their data and provide holistic views of customers. To address the data veracity challenge, businesses should focus on three tenets to build confidence: 1) provenance, or verifying the history of data from its origin throughout its life cycle; 2) context, or considering the circumstances around its use; and 3) integrity, or securing and maintaining data.

Numerous OpenEMR Security Flaws Found; Most Patched

The OpenEMR community "is very thankful to Project Insecurity for their report, which led to an improvement in OpenEMR's security," Brady Miller, OpenEMR project administrator, tells ISMG. "Responsible security vulnerability reporting is an invaluable asset for OpenEMR and all open source projects. The OpenEMR community takes security seriously and considered this vulnerability report a high priority since one of the reported vulnerabilities did not require authentication," Miller says. "A patch was promptly released and announced to the community. Additionally, all downstream packages and cloud offerings were patched." So, what's been fixed? "The key vulnerability in this report is the patient portal authentication bypass, which essentially allows a bad actor to bypass authentication and gain access to OpenEMR - if the patient portal is turned on," Miller says. "All the other vulnerabilities require authentication." The patient portal authentication bypass, multiple instances of SQL injection, unrestricted file upload, remote code execution and arbitrary file actions vulnerabilities "were all fixed," he says.

What can the enterprise learn from the connected home?

The main driver for enterprise IoT is that the large volumes of data created by connected devices present a huge opportunity. By leveraging the power of analytics – either on a small scale or across large deployments – businesses can gain additional layers of insight into their operations and make improvements. This is exactly what the smart home enables. By using connected products to track energy usage, for example, consumers can learn where they are spending the most money and become more cost-efficient. However, from an enterprise perspective, the challenge comes in being able to efficiently manage and control hundreds or potentially thousands of smart devices. Simply keeping track of the vast swathes of data being generated from devices in a range of different locations and from an assortment of vendors, is already a serious issue and is likely to be the biggest IoT challenge IT departments will face in the future. What they don’t want is to have several platforms pulling in different data streams. Not only would this be hugely confusing to manage, the lack of coordination would create a fragmented picture of what is going on across the business.

How API-based integration dissolves SaaS connectivity limits

API integration supports multichannel experiences that improve customer engagement. An example is how integration helps businesses partner with other service providers to offer new capabilities. An example is an API model that makes Uber services available on a United Airlines application. APIs also spur revenue growth. For instance, a business's IP [intellectual property] that lies behind firewalls can be exposed as an API to create new revenue channels. Many new-age companies, such as Airbnb and Lyft, leverage the API model to deliver revenue. Traditional companies [in] manufacturing and other [industries] are really applying this to their domain. API-first design provides modernized back-end interfaces that speed integrations. Doing back-end integrations? You can run the APIs within the data center to integrate SaaS and on-premises applications. A good API, a well-designed API can actually reduce the cost of integration by 50%.

Serverless Still Requires Infrastructure Management

Even though the servers are gone from the serverless picture, this doesn’t mean you can forget about infrastructure configuration altogether. Rather than configuring compute instances and many network related resources, which was commonplace for the traditional IaaS stack, we now need to configure functions, storage buckets or/and tables, APIs, messaging queues/topics and many additional resources to keep everything secured and monitored. When it comes to infrastructure management, serverless architectures usually require more resources to be managed due to the fine-grained nature of serverless stacks. At the same time, without servers in sight, infrastructure configuration can be done as a single stage activity, in contrast with the need to manage IaaS infrastructure separately from the software artifacts running on different kinds of servers. Even with this somewhat simplified way of managing infrastructure resources one still needs to use specialised tools for defining and applying infrastructure stack configurations. Cloud platform providers offer their proprietary solutions in this area.

5 ways machine learning makes life harder for cybersecurity pros

Machine learning is a form of AI that interprets massive amounts of data, applying algorithms to the material, and making predictions off its observations. Common technologies that employ machine learning include facial recognition, speech recognition, translation services, and object recognition. Businesses typically use machine learning for locating and processing large data sets that no human could sort through in a timely manner, if at all. Major companies like Amazon, IBM, Google, and Microsoft use machine learning to improve business functionality. But some organizations are implementing machine learning for more a narrow purpose: Cybersecurity. While many assume machine learning makes cybersecurity professionals' lives much easier by better tracking security issues, that's not necessarily the case. Just like any new technology, machine learning still has its flaws—problems that turn the tech into more of a headache than a helping hand in the security space

Quote for the day:

"Making those around you feel invisible is the opposite of leadership." -- Margaret Heffernan

Daily Tech Digest - August 09, 2018

Where low-code development works—and where it doesn’t

Where low-code development works â€” and where it doesnâ€™t

In any organization, you will find two kinds of processes: those that are structured and those that are more open-ended. Structured processes, which are typically followed rigorously, account for roughly two-thirds of all operations at an organization. These are generally the “life support” functions of any company or large group—things like leave management, attendance, and procurement. ... To avoid chaos, this workflow should remain consistent from week to week, and even quarter to quarter. Given the clear structure and obvious objectives, these processes can be handled nicely by a low-code solution. But open-ended processes are not so easy to define, and the goals aren’t always as clear. Imagine hosting a one-time event. You may know a little about what the end result should look like, but you can’t predefine the planning process because you don’t orchestrate these events all the time. These undefined processes, like setting an agenda for an offsite meeting, tend to be much more collaborative, and they often evolve organically as inputs from multiple stakeholders shape the space.

Adopt these continuous delivery principles organization-wide

Upper management should advocate for continuous delivery principles and enforce best practices. Once an organization has set up strong CD pipelines and reaps the benefits, resist any efforts to succumb to older, less automated deployment models just because of team conflicts or a lack of oversight. If a group must work closely together but cannot agree on continuous delivery practices, it's critical that upper management understands CD and its importance to software delivery, pushing the continuous agenda forward and encouraging adoption. Regulation is rarely considered a driver of innovation, so before your team adopts continuous delivery practices, understand any regulatory requirements the organization is under. No one wants to put together a CI/CD pipeline then have the legal department shut it down. An auditor needs to be informed about and understand, for example, the automated testing procedure in a continuous delivery pipeline. And the simple fact that it's repeatable does not mean a process adheres to the regulatory rules.

Incomplete visibility a top security failing

While many security teams implement good basic protections around administrative privileges, the report said these low-hanging-fruit controls should be in place at more organisations, with 31% of organisations still not requiring default passwords to be changed, and 41% still not using multifactor authentication for accessing administrative accounts. Organisations can start to build up cyber hygiene by following established best practices such as the Critical Security Controls, a prioritised set of steps maintained by the CIS. Although there are 20 controls, the report said implementing just the top six establishes what CIS calls “cyber hygiene.” “Industry standards are one way to leverage the broader community, which is important with the resource constraints that most organisations experience,” said Tim Erlin, vice-president of product management and strategy at Tripwire. “It’s surprising that so many respondents aren’t using established frameworks to provide a baseline for measuring their security posture. It’s vital to get a clear picture of where you are so you can plan a path forward.”

Political Play: Indicting Other Nations' Hackers

While it's impossible to gain a complete view of these operations, FireEye suggested that they were being run much more carefully. For example, one ongoing campaign appeared to target U.S. engineering and maritime targets, and especially those connected to South China Sea issues. "From what we observed, Chinese state actors can gain access to most firms when they need to," Bryce Boland, CTO for Asia-Pacific at FireEye, told South China Morning Post in April. "It's a matter of when they choose to and also whether or not they steal the information that is within the agreement." Now, of course, the U.S. appears to be trying to bring diplomatic pressure to bear on Russia as U.S. intelligence leaders warn that Moscow's election-interference campaigns have not diminished at all since 2016. "We have been clear in our assessments of Russian meddling in the 2016 election and their ongoing, pervasive efforts to undermine our democracy," Director of National Intelligence Dan Coats said last month

RESTful Architecture 101

When deployed correctly, it provides a uniform, interoperability, between different applications on the internet. The term stateless is a crucial piece to this as it allows applications to communicate agnostically. A RESTful API service is exposed through a Uniform Resource Locator (URL). This logical name separates the identity of the resource from what is accepted or returned. The URL scheme is defined in RFC 1738, which can be found here. A RESTful URL must have the capability of being created, requested, updated, or deleted. This sequence of actions is commonly referred to as CRUD. To request and retrieve the resource, a client would issue a Hypertext Transfer Protocol (HTTP) GET request. This is the most common request and is executed every time you type a URL into a browser and hit return, select a bookmark, or click through an anchor reference link. ... An important aspect of a RESTful request is that each request contains enough state to answer the request. This allows for visibility and statelessness on the server, desirable properties for scaling systems up, and identifying what requests are being made.

Oracle's Database Service Offerings Could Be Its Last Best Hope For Cloud Success

Views of Oracle Corp. Headquarters Ahead Of Earnings Data

All that said, if Oracle could adjust, it has the advantage of having a foothold inside the enterprise. It also claims a painless transition from on-prem Oracle database to its database cloud service, which if a company is considering moving to the cloud could be attractive. There is also the autonomous aspect of its cloud database offerings, which promises to be self-tuning, self-healing with automated maintenance and updates and very little downtime. Carl Olofson, an analyst with IDC who covers the database market sees Oracle’s database service offerings as critical to its cloud aspirations, but expects business could move slowly here. “Certainly, this development (Oracle’s database offerings) looms large for those whose core systems run on Oracle Database, but there are other factors to consider, including any planned or active investment in SaaS on other cloud platforms, the overall future database strategy, the complexity of moving operations from the datacenter to the cloud

Enterprise IT struggles with DevOps for mainframe

"At companies with core back-end mainframe systems, there are monolithic apps -- sometimes 30 to 40 years old -- operated with tribal knowledge," said Ramesh Ganapathy, assistant vice president of DevOps for Mphasis, a consulting firm in New York whose clients include large banks. "Distributed systems, where new developers work in an Agile manner, consume data from the mainframe. And, ultimately, these companies aren't able to reduce their time to market with new applications." Velocity, flexibility and ephemeral apps have become the norm in distributed systems, while mainframe environments remain their polar opposite: stalwart platforms with unmatched reliability, but not designed for rapid change. The obvious answer would be a migration off the mainframe, but it's not quite so simple. "It depends on the client appetite for risk, and affordability also matters," Ganapathy said. "Not all apps can be modernized -- at least, not quickly; any legacy mainframe modernization will go on for years."

Mitigating Cascading Failure at Lyft

Cascading failure is one of the primary causes of unavailability in high throughput distributed systems. Over the past four years, Lyft has transitioned from a monolithic architecture to hundreds of microservices. As the number of microservices grew, so did the number of outages due to cascading failure or accidental internal denial of service. Today, these failure scenarios are largely a solved problem within the Lyft infrastructure. Every service deployed at Lyft gets throughput and concurrency protection automatically. With some targeted configuration changes to our most critical services, there has been a 95% reduction in load-based incidents that impact the user experience. Before we examine specific failure scenarios and the corresponding protection mechanisms, let's first understand how network defense is deployed at Lyft. Envoy is a proxy that originated at Lyft and was later open-sourced and donated to the Cloud Native Computing Foundation. What separates Envoy from many other load balancing solutions is that it was designed to be deployed in a "mesh" configuration.

Beyond GDPR: ePrivacy could have an even greater impact on mobile

Metadata can be used in privacy-protective ways to develop innovative services that deliver new societal benefits, such as public transport improvements and traffic congestion management. In many cases, pseudonymisation can be applied to metadata to protect the privacy rights of individuals, while also delivering societal benefits. Pseudonymisation of data means replacing any identifying characteristics of data with a pseudonym, or, in other words, a value which does not allow the data subject to be directly identified. The processing of pseudonymised metadata can enable a wide range of smart city applications. For example, during a snow storm, city governments can work with mobile networks to notify connected car owners to remove their cars from a snowplough path. Using pseudonyms, the mobile network can notify owners to move their cars from a street identified by the city, without the city ever knowing the car owners’ identities.

Should we add bugs to software to put off attackers?

The effectiveness of the scheme also hinges on making the bugs non-exploitable but realistic (indistinguishable from “real” ones). For the moment, the researchers have chosen to concentrate their research on the first requirement. The researchers have developed two strategies for ensuring non-exploitability and used them to automatically add thousands of non-exploitable stack- and heap-based overflow bugs to real-world software such as nginx, libFLAC and file. “We show that the functionality of the software is not harmed and demonstrate that our bugs look exploitable to current triage tools,” they noted. Checking whether a bug can be exploited and actually writing a working exploit for it is a time-consuming process and currently can’t be automated effectively. Making attackers waste time on non-exploitable bugs should frustrate them and, hopefully, in time, serve as a deterrent. The researchers are the first to point out the limitations of this approach: the aforementioned need for the software to be “ok” with crashing, the fact that they still have to find a way to make theses bugs indistinguishable from those occurring “naturally”

Quote for the day:

"Coaching is unlocking a person's potential to maximize their own performance. It is helping them to learn rather than teaching them." -- John Whitmore

Daily Tech Digest - August 08, 2018

The tech potential for Accounting Firms

Knowing how the figures work will end up meaning no more than that you will see the writing on the wall more quickly than the average business owner might. But that will mean little if you can’t solve the problem. So how does the huge technological transformation that we are now going through affect the task of running a practice successfully? The obvious answer is, in many and various ways. The app and smartphone combination has completely transformed both the way a firm can get information from clients and the speed with which it can gather that information. Throw in instant messaging and a user base that is increasingly filling up with people who can use both thumbs to tap out replies on their favourite phone – and who do this day in and day out, regardless – and, yes, we are definitely in a different world by comparison with, say, a decade ago. As a firm, if you’re not already taking advantage of this change, well, one worries for you. As Gavin Fell, VP EMEA at Receipt Bank observes, there’s no longer any excuse for clients turning up once a month with a shopping bag full of receipts.

Artificial Intelligence in Singapore: pervasive, powerful and present

People and businesses are unanimous in their opinion that AI will impact our daily lives, and that there are productivity gains to be enjoyed through the adoption of this technology. Overall, we can expect to see a spike in the frequency, flexibility and immediacy of data analysis across industries and applications to drive business decisions. One example is the financial services industry in Singapore, which has been at the forefront of developing and adopting AI technologies across functions in their businesses. AI-based automated chat systems that can interact with customers on personal finance queries in real time are now common in several local banking platforms in Singapore. DBS Bank's AI-driven Virtual Assistant handles over 80 per cent of requests on Facebook Messenger accurately without human intervention ... Such services will ultimately improve service delivery, remove the stress and complexity of manual number crunching, and offer insights at greater speed and accuracy to facilitate quicker decision making in an industry where time is money.

Inside the updated Windows Console

While commands still have many of the same names, and many DOS apps will still run in the Windows console, it's a long way removed from that old text-mode DOS prompt, building on the evolution of the Windows platform. Over the years it's been joined in the Windows console by PowerShell, the default system administration scripting language for Windows and Windows Server, with tools for remote management of both Office 365 and Azure. PowerShell's blue console and color-coded command strings are a long way removed from the old black-and-white DOS window. Its action-oriented command vocabulary is also very different, letting you get and set system settings, building actions into complex scripts that can manage whole fleets of servers. If the Windows command line is a tool for working with a single PC, then PowerShell is a sysadmin's Swiss Army knife for an entire organization full of PCs and servers. Windows 10 recently brought along a third command-line environment — Linux — thanks to the Windows Subsystem for Linux.

The Galaxy Tab S4 is a great productivity machine precisely because it’s an Android tablet

The desktop experience really does feel a lot like Windows. You can resize the windows of DeX-optimized Android apps. You can launch multiple app windows, and Alt-Tab among them. You can drag and drop content between two compatible apps. You can save shortcuts to the desktop. You can right-click to launch contextual menus. You can navigate a taskbar that lets you see previews of open apps on the left, and system tools like Bluetooth, Volume, and Search on the right. ... There are DeX versions of Microsoft Word, Excel, Outlook, PowerPoint, OneNote, OneDrive and Skype. There’s also DeX support for Adobe Acrobat Reader, Photoshop Lightroom, and Photoshop Express (making my job possible on the road). Nine Mail, my preferred app for secure email, has a DeX version too. Of course, so much work today gets done in web browsers and is executed in the cloud (think about all of Google’s apps, let alone Office 365), so you could argue no one even needs apps for 90 percent of the work we do. Still, from a purely psychological, I’m-happy-in-my-comfort-zone perspective, I embrace what DeX delivers.

What to do when IPv4 and IPv6 policies disagree

An obvious takeaway for network and security administrators is that security policies should be more homogeneously applied to both IPv4 and IPv6, and that the enforcement of security policies on both internet protocols should become part of normal operation and management procedures. It is also advisable for sites that don't currently support IPv6 to apply IPv6 packet filtering policies that are similar to those applied to the IPv4 counterparts. This way, when IPv6 is finally deployed on those sites, the servers and other network elements will not be caught off guard. Recent studies have indicated that mismatches between IPv4 and IPv6 security policies are rather common. Network and security administrators must take action to ensure that the policies applied to both protocols are homogeneous. These common mismatches warrant that, when port scanning a site as part of a penetration test, for example, all of the available addresses must be subject to port scans, as the results for different addresses and different internet protocols may differ.

Legal and compliance teams critical to machine learning success

Companies run into problems with ML in a couple of ways. First, and most dangerous, is the failure to involve legal and compliance teams in the formulation of ML projects. With the rapid evolution of privacy regulations, it’s essential for enterprises to ensure they remain compliant. Another common issue is when companies focus on the technology first. Companies often invest millions of dollars and perhaps years developing a machine learning platform, convinced the organization will derive numerous benefits from different departments flocking to take advantage of it. Unsurprisingly, they don’t get the adoption they expect because they didn’t present a successful use case to their internal customers. A third critical mistake organizations make is not understanding the human part of the equation, that is, failing to adequately train the machine learning engine. It’s essential to use an iterative approach to ensure the ML engine is accurate in its analysis or identification. Failure to do this will undoubtedly lead to a high error rate.

Seagate announces new flash drives for hyperscale markets

The surprising aspect of the Nytros is they use the SATA interface. SATA is an old interface, a legacy from hard drives, and nowhere near capable of fully utilizing SSD’s performance. For true parallel throughput, you need a PCI Express or M.2 interface, which are designed specifically for the nature of how flash memory works. “People keep expecting SATA to go away, but SATA is lingering. It’s a very easy way of using your bits. It’s simple, it replaces hard disk drives and still give 30 times faster performance with the same security and same management [as PCI Express drives] and gives our portfolio a no-brainer for our customers,” said Tony Afshary, director of product management for SSD storage products at Seagate. But there are also PCI Express drives, and they bring new features to the table, as well. The new Nytro 5000 for hyperscale data centers doubles the read and write performance of the previous model while adding some NVMe features such as SRIOV for virtualization, additional name spaces, and support for multi streams.

How AI and Intelligent Automation Impact HR Practices

Right now, HR employees are buried in transactional work that involves data entry and simple math calculations. Those types of things can be done faster, cheaper and more accurately using Robotics Processing Automation (RPA). EY started with a brainstorming session that mapped out current processes and identified opportunities for change. "We probably came up with a half a dozen areas that we felt were not good use of human time, but a very good use of robots [such as] onboarding people, reconciliations for benefits, table batching and validation, travel and expenses, [and] learning and administration," said Fiore. For example, each of EY's 13,000 tax practice employees must attend training that results in certification. The certification needs to be validated, which involves notifying employees and managers and making sure the certification is recorded properly. "There's this whole process where people are pushing emails and spreadsheets for all the training that we do," said Fiore. "It's a team of people that are doing that kind of work we can free up."

Raising the Bar for Ethical Cryptocurrency Mining

Cybercriminals over the years have been using third-party scripts to compel people into getting involved in malicious activities without being aware of it. This was typically observed in the case of Texthelp, when cybercriminals injected a Coinhive script into one of Texthelp’s plugins. This made several U.K.-based government websites take part in malicious cryptomining activities unknowingly. For quite some time, we have been discussing malicious cryptomining. By now, you may be hoping to get some information about what an appropriate cryptomining process should be and whether it is really feasible to practice it decently in a predominantly-malicious environment. This is what we would refer to as ethical cryptomining. People engaged in this use their own systems to decipher complex mathematical problems to validate or process cryptocurrency transactions. Interestingly, as cryptocurrency continues to become more popular and its value witnesses a sharp rise, the complexity of the math problems further rises, demanding more CPU/GPU to be harnessed and prompting miners to opt for more high-end graphics cards.

Getting the most from OneNote, part 2: OneNote 10 is catching up

To make your notes easier to organise, you can tag key paragraphs and then search for all notes with a specific tag. OneNote 2016 has a drop-down list of more than 20 tags on the Home tab. You can apply the first nine tags quickly by typing Ctrl-1 through 9, and you can choose Customize Tags to reorder the existing tags — and create your own custom tags, choosing the tag icon and text formatting they apply. You can move those further up the list to give them keyboard shortcuts, and tags you apply will sync to other devices. However, you'll have to right-click on them and add them as custom tags on each new machine. You can also tag something in OneNote as an Outlook task, complete with an Outlook reminder, and monitor it from both applications. OneNote Online has the same long list of tags as OneNote 2016, but they're not customizable (although custom tags you've added to your notes will show up). Currently, OneNote 10 only has the first nine tags from OneNote 2016 — To Do (which doesn't sync to Outlook), Important, Question, Remember for later, Definition, Highlight, Contact, Address and Phone Number.

Quote for the day:

"Be The Kind Of Leader You Would Want To Follow." -- Gordon TredGold

Daily Tech Digest - August 07, 2018

Disentangling The Data Centre ‘Skills Shortage’ Conundrum?

In principle, a skills shortage appears where there is a mismatch of the capabilities available to role vacancies. We certainly have that, but compounding this issue is the physical lack of people. Genuine skills shortages can be resolved by retraining the people available to work in available roles. It’s a pretty simple equation. Find people, train or retrain and employ them. Vocational and specific training programs resolve this issue in a highly effective manner and should not be discounted as part of a broader response. This is particularly so where existing labour forces are provided with the vocational skills to keep up with changes associated with technology, customer demand or process shifts, for example. Sadly for the data centre sector, we have an underlying labour shortage too. We simply do not have enough people coming into the sector to train into the roles available or to keep up with expected shifts in demand. We have both skills AND labour shortages. Each one demands a different suite of interventions and this is just where the complexity starts.

What’s the difference between a BCMS and a BCP?

Organisations and regulators don’t often agree on how businesses should be run, but lately both have championed the adoption of business continuity – a method that enables organisations to keep functioning during an incident, and address the prevention of and response to disruptions. Business continuity has proved essential in the modern landscape, with the number of cyber attacks on the rise and the amount of information being stored by organisations growing rapidly. But for all the agreement over the importance of business continuity, there is one area of disconnect. Some organisations have adopted a BCMS (business continuity management system) and others a BCP (business continuity plan). This might sound like it’s two names for the same thing, but there’s an important difference. ... It’s possible to have a BCP but not a fully-fledged BCMS. That’s because there are further steps to a BCMS after the plan is in place – namely: developing, testing and reviewing the BCP. Completing these steps obviously involves a bigger investment in time and resources

4 Artificial Intelligence Use Cases That Don’t Require A Data Scientist

Today, your IT operations team likely spends a huge amount of time and mental energy tending to performance thresholds—for example, when an application slows down too much, the system generates an alert. But as the application code, the configurations, or the infrastructure change, the ops team must constantly reset and manage those thresholds. The amount of monitoring data generated is also growing significantly, which means the IT ops team is doing a lot of work just managing logs, which provide the data for setting thresholds. A better way is to put all the web, application, and database performance data, the user experience data, and the log data into one cloud-based data platform. Then let that system—using baseline-setting algorithms in machine learning—learn what the thresholds should be. With the baseline established, another technique called anomaly detection can identify when application performance is trending toward these thresholds, and trigger alerts with suggested corrective actions or automatically take corrective action.

Raspberry Pi and machine learning: How to get started

Although the relatively low-specced Pi isn't an obvious choice for machine learning, the board's compact size and low power consumption mean it's well suited to building mobile homemade gadgets and robots. Machine learning can help these devices handle new tasks, using image recognition to "see" and speech recognition to "hear". However, there are definite limits to the Pi's ML capabilities. There are two main stages to machine learning, training, during which the model learns how to perform a given task, and inference, when the trained model is used to perform that task. The Pi's limited processing power means it's not suitable for training anything but the simplest machine-learning models. Instead this stage is typically carried on a machine with at least a mid- to high-end GPU. However, the Pi is capable of performing inference, of actually running the trained machine learning model, albeit rather slowly.

How Connected Cars And Insurance Are Influenced By Big Data

Many insurance carriers deploy the acquired customer driving patterns and put forth insurance premium rates accordingly. The likes of premium discounts emanating out of driving behavior, mileage, and other metrics are slowly becoming realities in this highly innovative arena. There are a host of other evaluating models like PAYD, PHYD and MHYD; which are postulated as different versions of the Usage Based Insurance plan. Each one of these models target a specific driving metric; thereby offering insurance premium rates by analyzing the quality of the concerned driver. With premiums directly related to the driving performance, the connected cars can blur the lines between vehicle usage and customer privacy as anything and everything inside the vehicle can be tracked, rather seamlessly. ... With the applications growing in large numbers, a relatively stronger ecosystem is being created around the connected cars. The participants include sensor manufacturers, telecommunication firms, insurance companies, and even the automakers; with each one connected to the other by the threads of Big Data.

World's first four-bit 4TB SSDs for consumer devices coming this year

The downside of moving up to four bits per memory cell, according to Samsung, is that makes it harder to maintain a device's performance and speed because the extra density would cause the electrical charge to fall by as much as half. However, Samsung says its new SSDs are on par with the performance of its three-bit SSDs, achieved by using a three-bit SSD control, its TurboWrite technology, and boosting capacity by using 32 chips based on its 64-layer fourth-gen 1TB V-NAND chip. Samsung boasts that its QLC SSDs will improve efficiency for consumer computing, including in smartphone storage where the 1TB four-bit V-NAND chip will allow it to efficiently churn out 128GB memory cards for smartphones. ... Samsung is planning on releasing four-bit consumer SSDs later this year with 1TB, 2TB, and 4TB capacities in the widely used 2.5-inch form factor. As Samsung notes, this is a massive step up from the 32GB one-bit SSD it launched in 2006, followed by its two-bit 512GB SSDs in 2010, and three-bit or triple-level cell SSD in 2012.

Consumer Sentiments About Cybersecurity and What It Means for Your Organization

While suffering a data breach is never ideal, the survey also shows that honesty, transparency and a timely emergency response plan is critical. Companies must clearly communicate that a breach has occurred, those likely impacted and planned remediation actions to address the issue. Organizations that don’t admit to compromised consumer records until long after the breach took place suffer the greatest wrath from consumers. Successful organizations must create a secure climate for customers by embracing technology and cultural change. Security threats and data breaches can seriously impact a customer’s loyalty, thereby damaging the corporate brand, increasing customer churn, and incurring lawsuits. Corporate leaders must recognize the multiple pressures on their organizations to integrate new network technologies, transform their businesses and to defend against cyberattacks. Executives that are willing to embrace technology, cultural change and prioritize cybersecurity will be the ones to win the trust and loyalty of the 21st century consumer.

Adapting Blockchain for GDPR Compliance

Perhaps the most interesting — and most controversial article — related to Blockchain’s applicability to GDPR is Article 25, “Data protection by design and by default,” which addresses pseudonymization techniques for consumers’ stored data. Hashing is Blockchain’s pseudonymization technique, and there are two critical interpretations for the pseudonym linkage using Blockchain relative to Article 25. The first one states that because data pseudonymization is accomplished in Blockchain hashing, but not anonymization, the data linkage is no longer considered personal when it is established, and if this linkage is deleted, it also complies with Article 17. However, the second interpretation is that pseudonymization, even with all cryptographic hashes, can still be linked back to the original PII data. There still may, however, need to be some mathematical proof that brute-force cyberattack of off-chain data linkage using hashing can compromise this assumption.

BGP hijacking attacks target payment systems

Justin Jett, director of audit and compliance for Plixer, said BGP hijacking attacks are "extremely dangerous because they don't require the attacker to break into the machines of those they want to steal from." "Instead, they poison the DNS cache at the resolver level, which can then be used to deceive the users. When a DNS resolver's cache is poisoned with invalid information, it can take a long time post-attacked to clear the problem. This is because of how DNS TTL works," Jett wrote via email. "As Oracle Dyn mentioned, the TTL of the forged response was set to about five days. This means that once the response has been cached, it will take about five days before it will even check for the updated record, and therefore is how long the problem will remain, even once the BGP hijack has been resolved." Madory was not optimistic about what these BGP hijacking attacks might portend because of how fundamental BGP is to the structure of the internet.

The 14 soft skills every IT pro needs

“Great knowledge in a vacuum doesn’t benefit an organization,” says Wilgus. “Every IT project — and position — is going to conclude with a deliverable, for example a design document, presentation, attestation report or updated code base. Without the necessary soft skills, the intended message being expressed in the deliverable could be lost. Candidates that have presented at conferences, or have been published, will have a leg up on other candidates. ... If there are errors in a two-page resume, what’s the likelihood this candidate can produce a formal report of more substantial length? Candidates should expect hiring organizations will ask for a writing sample." ... “Active listening is the process of reflecting back not only what you hear the other person saying but also to validate and verbalize the nontechnical aspects of the conversation,” Adato says. “This is one way to demonstrate emotional intelligence. Leveraging this technique gives the individual speaking the opportunity to clarify, while simultaneously demonstrating that this information matters to you personally.”

Quote for the day:

"It is easy to lead from the front when there are no obstacles before you, the true colors of a leader are exposed when placed under fire." -- Mark W. Boyer

Daily Tech Digest - August 06, 2018

How quantum computers will destroy & save cryptography

To crack most current public key encryption, it would take a quantum computer with at least 4,000 perfect qubits or many times that number if the qubits were imperfect. How close are we to a perfect 4,000 qubits? It depends on who you ask. Dr. Jackson is confident that we’ll have perfect 4,000-qubit quantum computers in the next five years. He has some evidence to support his claim, although we are nowhere near 4,000 perfect qubits. In March 2018, Google announced an imperfect 72-qubit computer. Google’s current (publicly known) implementation makes a mistake about once every 200 calculations. When you’re doing billions of calculations a second, that error rate is an unusable disaster. Tens if not hundreds of billions of dollars are being spent around the world trying to make more stable quantum computers. Some say that the jump needed to get to 4,000-qubits is not as daunting as it once was. Dr. Jackson, who is directly working with quantum computers says, “We have gone from nine to 72 qubits in just one year, so it’s not crazy at all that we could get 4,000 in another five [years]. Given that the US government finally got on board a few months ago, I think that’s now a conservative estimate.”

Evaluating Hyperledger Composer

Hyperledger Composer allows you to write smart contracts in server-side JavaScript. It makes available a native client library by which Node.js applications can access the ledger and submit transactions to these smart contracts. For the purposes of this experiment, I used an already developed Node.js microservice as the control. I copied the source code for that microservice to a new folder then I replaced all references to MySQL, Redis, and Cassandra with calls to the Hyperledger Composer client API. It is the feed7 project that serves as the test in this experiment. Both projects use Elasticsearch because one of the requirements of each news-feed service is a keyword-based search, and a blockchain is not appropriate for that. Like most of the other microservices in this repo, the feed7 microservice uses Swagger to define its REST API. The specification can be found in the server/swagger/news.yaml file. With Hyperledger Composer, you create a business network that consists of a data model, a set of transactions that manipulate the data model, and a set of queries by which those transactions can access data within the model.

Mastering MITRE's ATT&CK Matrix

Originally developed to support Mitre's cyberdefense work, ATT&CK is both an enormous knowledge base of cyberattack technology and tactics and a model for understanding how those elements are used together to penetrate a target's defenses. ATT&CK, which stands for Adversarial Tactics, Techniques, and Common Knowledge, continues to evolve as more data is added to the knowledge base and model. The model is presented as a matrix, with the stage of event across one axis and the mechanism for that stage across the other. By following the matrix, red team members can design an integrated campaign to probe any aspect of an organization's defense, and blue team members can analyze malicious behavior and technology to understand where it fits within an overall attack campaign. Mitre has defined five matrices under the ATT&CK model. The enterprise matrix, which this article will explore, includes techniques that span a variety of platforms. Four specific platforms — Windows, Mac, Linux, and mobile — each have their own matrix.

Intelligent transportation: The most important pillar of a smart city?

Intelligent transportation: The pillar of smart cities? image

Intelligent transportation must be a first step in the smart city movement. This could include monitoring traffic patterns, highly trafficked pedestrian areas, metro stations, coordinating train times and much more. When cities host large events that increase traffic and security concerns, it becomes increasingly clear that any smart city initiative must begin with intelligent transportation. Intelligent transportation can improve overall situational awareness while enhancing interoperability and the ability to share information quickly. It provides a holistic approach to risk management as it fortifies emergency preparedness and response capabilities for cities, including fare evasion, vandalism or violence, medical emergencies, track obstructions and other similar types of disruptive events. ... Rather than focusing on the short-term fixes, cities and states can find holes where smart transportation solutions can resolve major issues. This leads to improved traffic flow, better roads, and can even help support law enforcement by identifying safety hazards and where cameras should be installed.

The biggest data breaches in the ASEAN region

When it comes to data breach control, the prospects are even gloomier. Whereas Philippines or Indonesia require that data controllers notify promptly affected users in the case of a data breach, Thailand, Brunei or Malaysia don’t have specific notification requirements in this particular scenario. This makes more difficult to know the real extent of actual data breaches in those countries as most of them would go unreported. Since the lack of sector-specific governance and policies is a problem around the whole region, ASEAN could benefit of a coordinated approach similar to the one implemented in the European Union (EU). In 2013 the EU developed a Cybersecurity Package, a region-wide cybersecurity strategy to “enhance the EU’s overall performance” and to “safeguard an online environment providing the highest possible freedom and security for the benefit of everyone.” The package was reviewed last year and marks a milestone in the fight against cybercrime in the union. Below we have compiled a list of the most serious data breach incidents in the ASEAN region during the past few years.

Lessons From The Amazon Ecosystem

The ARM Holdings design ecosystem is a set of relationships between major mobile device vendors, silicon manufacturers and chip designers, usually organised by Arm. It’s a highly specialised and effective ecosystem that can push chip design in novel directions and move designs to manufacture quickly because of the inclusion of silicon factories and smartphone makers. Very few ecosystems work this way. In fact the Amazon ecosystem in books is trying to optimise the Amazon platform rather than optimise or maximise market opportunity and customer success. Take the role of book arbitrage (mid-right in blue in the diagram above). In essence this means finding books deep in the Amazon catalogue, buying them cheaply, and then using more effective descriptions to sell them, also on Amazon, at a higher price. It makes up for Amazon’s indiscriminate search engine and the poor product descriptions of most booksellers. It pays the Amazon ecosystem to get any good enough product to market to tap into the long tail, at a very low price (99 cent novels). That is output rather than outcome; it is a product that does not necessarily please customers as much as might be possible at a lower volume of publishing.

By 2020, 1-in-5 healthcare orgs will adopt blockchain; here’s why

While there is some degree of network interoperability between healthcare providers, pharmacies and insurance companies through various frameworks like HIEs, they've had "varying degrees of success and penetration," IDC said. It cited innate shortcomings that include "limitations in the interoperability standard or protocol itself, workflow and policy differences between entities, information blocking, and technology requirements." Two leading HIEs – CommonWell Health Alliance, a trade association working toward healthcare record interoperability, and Carequality, a public-private collaborative created to establish a common interoperability framework – have had success in establishing a solid industry foundation for data exchange with the backing of EHR vendors. "And that's facilitating a somewhat limited form of query-based [data] exchange," said Mutaz Shegewi, IDC's research director for provider IT transformation strategies. Shegewi was referring to the ability to search for secured patient information online.

IT Managers: Are You Keeping Up with Social-Engineering Attacks?

Using both high-tech tools and low-tech strategies, today's social-engineering attacks are more convincing, more targeted, and more effective than before. They're also highly prevalent. Almost seven in 10 companies say they've experienced phishing and social engineering. For this reason, it's important to understand the changing nature of these threats and what you can do to help minimize them. Today's phishing emails often look like exact replicas of communications coming from the companies they're imitating. They can even contain personal details of targeted victims, making them even more convincing. In one incident, bad actors defrauded a U.S. company of nearly $100 millionby using an email address that resembled one of the company's vendors. And in the most recent presidential election, hackers used a phishing email that appeared to come from Google to access and release a top campaign manager's emails. Bad actors can get sensitive data in many other ways. In one case, they manipulated call-center workers to get a customer's banking password.

NVME SSDs, The Insanely Fast Storage You Want In Your PC

It’s possible to add an NVMe drive to any PC with an PCIe slot via a $25 adapter card. All recent versions of the major operating systems provide drivers, and regardless of the age of the system you will have a very fast drive on your hands. But there’s a catch. To benefit fully from an NVMe SSD, you must be able to boot the operating system from it. That requires BIOS support. Sigh. Most older mainstream BIOSes do not support booting from NVMe and most likely, never will. ... While just about any NVMe should make your system feel quicker, they are not all alike. Not even close. Where Samsung’s 970 Pro will read at over 3GBps and write at over 2.5GBps, Toshiba’s RC100 reads at 1.2GBps and writes at just under 900MBps. The difference can be even greater when the amount of data written exceeds the amount of cache on board. A number of factors that affect performance, including the controller, the amount of NAND on board, the number or PCIe lanes (see above), and the type of NAND.

Strong governance programs separate data lakes from swamps

A good data governance framework combined with a data catalog can keep a data lake pristine by cleaning up the disorderly swamp of data. A data catalog offers a single source of intelligence for data experts and other data users who need quick access to their data. Users can tag, document, and annotate data sets in the data catalog, continuously enriching the data and increasing the value of existing data assets while also eliminating data silos. A data catalog enables users to collaborate to understand the data’s meaning and use, to determine which data is fit for what purpose, and which is unusable, incomplete, or irrelevant. It provides a way for every user to find data, understand what it means, and trust that it’s correct. Businesses today are either building a brand new lake, or cleaning up an existing data lake. Whether you’ve inherited a swamp, or are just starting out and want to keep your data lake pristine, establishing a set of policy-driven processes can help you avoid these four common data lake problems

Quote for the day:

"Leadership in the past was a model of direction and control. Now it should help people set directions for the future and facilitate their delivery." -- John Bailey