Daily Tech Digest - July 03, 2020

Designing data governance that delivers value

Without quality-assuring governance, companies not only miss out on data-driven opportunities; they waste resources. Data processing and cleanup can consume more than half of an analytics team’s time, including that of highly paid data scientists, which limits scalability and frustrates employees. Indeed, the productivity of employees across the organization can suffer: respondents to our 2019 Global Data Transformation Survey reported that an average of 30 percent of their total enterprise time was spent on non-value-added tasks because of poor data quality and availability ... The first step is for the DMO to engage with the C-suite to understand their needs, highlight the current data challenges and limitations, and explain the role of data governance. The next step is to form a data-governance council within senior management (including, in some organizations, leaders from the C-suite itself), which will steer the governance strategy toward business needs and oversee and approve initiatives to drive improvement—for example, the appropriate design and deployment of an enterprise data lake—in concert with the DMO. The DMO and the governance council should then work to define a set of data domains and select the business executives to lead them.


How to Kill Your Developer Productivity

The problems start when teams get carried away with microservices and take the "micro" a little too seriously. From a tooling perspective you will now have to deal with a lot more yml files, docker files, with dependencies between variables of these services, routing issues, etc. They need to be maintained, updated, cared for. Your CI/CD setup as well as your organizational structure and probably your headcount needs a revamp. If you go into microservices for whatever reason, make sure you plan sufficient time to restructure your tooling setup and workflow. Just count the number of scripts in various places you need to maintain. ... Kubernetes worst case: Colleague XY really wanted to get his hands dirty and found a starter guide online. They set up a cluster on bare-metal and it worked great with the test-app. They then started migrating the first application and asked their colleagues to start interacting with the cluster using kubectl. Half of the team is now preoccupied with learning this new technology. The poor person that is now maintaining the cluster will be full time on this the second the first production workload hits the fan.


A Brief History of Data Lakes

Data Lakes are consolidated, centralized storage areas for raw, unstructured, semi-structured, and structured data, taken from multiple sources and lacking a predefined schema. Data Lakes have been created to save data that “may have value.” The value of data and the insights that can be gained from it are unknowns and can vary with the questions being asked and the research being done. It should be noted that without a screening process, Data Lakes can support “data hoarding.” A poorly organized Data Lake is referred to as a Data Swamp. Data Lakes allow Data Scientists to mine and analyze large amounts of Big Data. Big Data, which was used for years without an official name, was labeled by Roger Magoulas in 2005. He was describing a large amount of data that seemed impossible to manage or research using the traditional SQL tools available at the time. Hadoop (2008) provided the search engine needed for locating and processing unstructured data on a massive scale, opening the door for Big Data research. In October of 2010, James Dixon, founder and former CTO of Pentaho, came up with the term “Data Lake.” Dixon argued Data Marts come with several problems, ranging from size restrictions to narrow research parameters.


What is agile enterprise architecture?

An important group of agility dimensions relates to the process of strategic planning, where business leaders and architects collectively develop the global future course of action for business and IT. One of these dimensions is the overall amount of time and effort devoted to strategic planning. Some companies invest considerable resources in the discussions of their future evolution, while other companies pay much less attention to these questions. Another dimension is the organisational scope covered by strategic planning. Some companies embrace all their business units and areas in their long-range planning efforts, while others intentionally limit the scope of these efforts to a small number of core business areas. A related dimension is the horizon of strategic planning. Some organisations plan for no more than 2-3 years ahead, but others need a five-year, or even longer, planning horizons. Yet another relevant dimension is how the desired future is defined. Some companies create rather concrete descriptions of their target states, when others define their future only in terms of planned initiatives in investment roadmaps.


How to Guard Against Governance Risks Due to Shadow IT and Remote Work

Shadow IT evolves in organizations when workers, teams, or entire departments begin to improvise their work processes through unauthorized services or practices that operate outside the oversight and control of IT. It may involve something as seemingly harmless as storing work documents on a personal laptop, or it could pose a catastrophic risk by transferring confidential intellectual property or regulated private data via an unsecured personal file sharing service. ... Although productivity is critical, the use of personal cloud file services, ad hoc team network file shares, and personal email for file transfer undermine governance and represent material risk from a discovery, privacy, and noncompliance perspective. Without equipping your employees with productivity tools that address governance requirements, they pursue novel techniques without understanding the risks. Transferring documents via email, Dropbox, or Google Drive may seem ingenious; in reality, users may not understand the dangers posed by insufficient authentication or auditing or the direct violation of data privacy requirements. What's more, unmanaged deletion of work product may violate legal hold requirements.


How to Convince Stakeholders That Data Governance is Necessary

Often times, the data consumers don’t have an inventory of the data available to them. The consumers don’t have business glossaries, data dictionaries and data catalogs that house information about the data that will improve their understanding of the data (and access to the metadata might be a problem even if it is available). They don’t immediately know who to reach out to to request access to the data (that they may not know exists in the first place). And the rules associated with the data are not documented in resources that are available to data consumers, thus putting all of this effort, post hoop-jumping, at risk anyway. If you ask data consumers, casual data users, and data scientists what causes delays and problems completing their normal job, you can expect to get answers listed in the previous paragraph, that will boggle your mind. At that point, you will begin to understand the often mentioned 80/20 rule. This rule states that eighty percent of their time is spent data wrangling and the other twenty percent is spent actually doing the analysis, meaningful reporting and answering questions that is truly a part of their job.


Studying an 'Invisible God' Hacker: Could You Stop 'Fxmsp'?

Experts say the group was extremely well-organized and used teams of specialists, built a sophisticated botnet and sold remote access and exfiltrated data in the course of perfecting the botnet to help monetize those efforts. Or at least that was the group's MO until AdvIntel dropped a report in May 2019 documenting Fxmsp's activities. Shining a light on the gang - which relied in large part on advertising via publicly accessible cybercrime forums - caused the group to disappear. "The Fxmsp hacking collective was explicitly reliant on the publicity of their offers in the dark market auctions and underground communities," Yelisey Boguslavskiy, CEO of AdvIntel, tells me. After the report's release, he says Fxmsp disappeared from public view, although it's not clear if the hacker with that handle might still be operating privately. Study Fxmsp's historical operations, and a less-is-more ethos emerges. "In most cases, Fxmsp uses a very simple, yet effective approach: He scans a range of IP addresses for certain open ports to identify open RDP ports, particularly 3389. Then, he carries out brute-force attacks on the victim's server to guess the RDP password," Group-IB says in a recap.


4 common software maintenance models and when to use them

Quick-fix: In this model, you simply make a change without considering efficiency, cost or possible future work. The quick-fix model fits emergency maintenance only. Development policies should forbid the use of this model for any other maintenance motives. Consider forming a special team dedicated to emergency software maintenance. ... Iterative: Use this model for scheduled maintenance or small-scale application modernization. The business justification for changes should either already exist or be unnecessary. The iterative model only gets the development team involved. The biggest risk here is that it doesn't include business justifications -- the software team won't know if larger changes are needed in the future. The iterative model treats the application target as a known quantity. ... Reuse: Similar to the iterative model, the reuse model includes the mandate to build, and then reuse, software components. These components can work in multiple places or applications. Some organizations equate this model to componentized iteration, but that's an oversimplification; the goal here is to create reusable components, which are then made available to all projects under all maintenance models. 


Newly discovered principle reveals how adversarial training can perform robust deep learning

Why do we have adversarial examples? Deep learning models consist of large-scale neural networks with millions of parameters. Due to the inherent complexity of these networks, one school of researchers believe in a “cursed” result: deep learning models tend to fit the data in an overly complicated way so that, for every training or testing example, there exist small perturbations that change the network output drastically. This is illustrated in Figure 2. In contrast, another school of researchers hold that the high complexity of the network is a “blessing”: robustness against small perturbations can only be achieved when high-complexity, non-convex neural networks are used instead of traditional linear models. This is illustrated in Figure 3. It remains unclear whether the high complexity of neural networks is a “curse” or a “blessing” for the purpose of robust machine learning. Nevertheless, both schools agree that adversarial examples are ubiquitous, even for well-trained, well-generalizing neural networks.


AI Adoption – Data governance must take precedence

Obstacles are to be expected on the path to digital transformation, particularly with unfamiliar entities in the mix. For AI adoption, the most prevalent obstructions are: a company culture that doesn’t recognise a need for AI, difficulties in identifying business use cases, a skills gap or difficulty hiring and retaining staff and a lack of data or data quality issues. With this broad spectrum of challenges, it is worth delving into a couple of them. Firstly, it is interesting to note that an incompatible company culture mostly affects those companies that are in the evaluation stage with AI. When rephrased, perhaps it is obvious – a company with “mature” AI practices is 50 percent less likely to see no use for AI. By contrast, in a company where AI is not yet an integrated business function, resistance is more likely. Secondly, AI adopters are more likely to encounter data quality issues; by virtue of working closely with data and requiring good data practice, they are more likely to notice when errors and inconsistencies arise. Conversely, companies in the evaluating stages of AI adoption may not be aware of the extent of any data issues.



Quote for the day:

"Most people live with pleasant illusions, but leaders must deal with hard realities." -- Orrin Woodward

No comments:

Post a Comment