Daily Tech Digest by Kannan Subbiah: data gravity

Showing posts with label data gravity. Show all posts

Daily Tech Digest - July 23, 2022

How CIOs can unite sustainability and technology

CIOs must be proactive in progressing these organizational shifts, as business leaders will continue to lean on them to ensure company technologies are providing solutions without contributing to an environmental problem. While in years past this was not an active concern, the information and communications technology (ICT) sector has recently become a larger source of climate-related impact. Producing only 1.5% of CO2 in 2007, the industry has now risen to 4% today and will potentially reach 14% by 2040. Fortunately, CIOs can course-correct by focusing on three key areas: Net zero - Utilize green software practices that can reduce energy consumption; Trust - Build systems that protect privacy and are fair, transparent, robust, and accessible; and Governance - Make ESG the focus of technology, not an afterthought. As a first step in this transition, CIOs can begin assessing their organization’s technology through the lens of sustainability to ensure that those goals are being thought about in every facet of the business. In addition, they can connect with other leaders in the company to encourage greater emphasis and dialogue in cross-organization planning for technology solutions as they relate to sustainability targets.

Design patterns for asynchronous API communication

Request and response topics are more or less what they sound like:A client sends a request message through a topic to a consumer; The consumer performs some action, then returns a response message through a topic back to the consumer. This pattern is a little less generally useful than the previous two. In general, this pattern creates an orchestration architecture, where a service explicitly tells other services what to do. There are a couple of reasons why you might want to use topics to power this instead of synchronous APIs:You want to keep the low coupling between services that a message broker gives us. If the service that’s doing the work ever changes, the producing service doesn’t need to know about it, since it’s just firing a request into a topic rather than directly asking a service. The task takes a long time to finish, to the point where a synchronous request would often time out. In this case, you may decide to make use of the response topic but still make your request synchronously. You’re already using a message broker for most of your communication and want to make use of the existing schema enforcement and backwards compatibility that are automatically supported by the tools used with Kafka.

What is Data Gravity? AWS, Azure Pull Data to the Cloud

As enterprises create ever more data, they aggregate, store, and exchange this data, attracting progressively more applications and services to begin analyzing and processing their data. This “attraction” is caused, because these applications and services require higher bandwidth and/or lower latency access to the data. Therefore, as data accumulates in size, instead of pushing data over networks towards applications and services, “gravity” begins pulling applications and services to the data. This process repeats, which produces a compounding effect, meaning that as the scale of data grows, it becomes “heavier” and increasingly difficult to replicate and relocate. Ultimately, the “weight” of this data being created and stored generates a “force” that results in an inability to move the data, hence the term data gravity. Data gravity presents a fundamental problem for enterprises, which is the inability to move data at-scale. Consequently, data gravity impedes enterprise workflow performance, heightens security & regulatory concerns, and increases costs.

Windows 11 is getting a new security setting to block ransomware attacks

The new feature is rolling out to Windows 11 in a recent Insider test build, but the feature is also being backported to Windows 10 desktop and server, according to Dave Weston, vice president of OS Security and Enterprise at Microsoft. "Win11 builds now have a DEFAULT account lockout policy to mitigate RDP and other brute force password vectors. This technique is very commonly used in Human Operated Ransomware and other attacks – this control will make brute forcing much harder which is awesome!," Weston tweeted. Weston emphasized "default" because the policy is already an option in Windows 10 but isn't enabled by default. That's big news and is a parallel to Microsoft's default block on internet macros in Office on Windows devices, which is also a major avenue for malware attacks on Windows systems through email attachments and links. Microsoft paused the default internet macro block this month but will re-release the default macro block soon. The default block on untrusted macros is a powerful control against a technique that relied on end users being tricked into clicking an option to enable macros, despite warnings in Office against doing so.

Untangling Enterprise API Architecture with GraphQL

GraphQL is a query language that allows you to describe your data requirements in a more powerful and developer-friendly way than REST or SOAP. Its composability can help untangle enterprise API architecture. GraphQL becomes the communication layer for your services. Using the GraphQL specification, you get a unified experience when interacting with your services. Every service in your API architecture becomes a graph that exposes a GraphQL API. In this graph, everyone who wants to integrate or consume the GraphQL API can find all the data it contains. Data in GraphQL is represented by a schema that describes the available data structures, the shape of the data and how to retrieve it. Schemas must comply with the GraphQL specification, and the part of the organization responsible for the service can keep this schema coherent. GraphQL composability allows you to combine these different graphs — or subgraphs — into one unified graph. Many tools are available to create such a “graph of graphs."

How The Great Resignation Will Become The Great Reconfiguration

We are witnessing a great reconfiguration of how employees expect to be treated by employers. Henry Ford gave his workers a full two-day weekend as early as 1926, but now a weekend is expected in most office-based jobs—unless the job involves serving customers over the weekend! We have certain expectations of the employer and employee relationship, and what was normal before the pandemic is now being challenged. Even Wall Street cannot hold back the tide. People expect more flexibility over their hours and work location. Within a few years, this will be normalized by the effect of the top talent expecting it and that expectation fitering throughout company culture. This is how work will function post-pandemic. The Great Resignation is the first step, but eventually, I believe we will call the 2020s the Great Reconfiguration. ... WFH will live on - You might want your team back in the office, but they know they can be more productive remotely, and research backs up the employees. A new Harvard study suggests that all that in-person time can be compressed into just one or two days a week.

Will Your Cyber-Insurance Premiums Protect You in Times of War?

Due to the changing market and geopolitical situation, you need to be keenly aware of the exact kind of cyber-insurance coverage your organization requires. Your decisions should be dictated by the industry you're working in, the security risk, and how much you stand to lose in the event of an attack. It's important to note that insurance providers are also being more stringent in their requirements for companies to even obtain cyber coverage in the first place. Carriers are increasingly requiring companies to practice good cyber hygiene and have rigid cybersecurity protocols in place before even offering a quote. Once you have proper cybersecurity protocols in place, you should better qualify for adequate plans. However, remember that no two plans are alike or equally inclusive. When choosing a plan, be sure to look for any fine print regarding act-of-war and terrorism exclusions or those for other "hostile acts." Even when you've done everything right, your carrier can still attempt to deny you coverage under these loopholes.

The new CIO playbook: 7 tips for success from day one

It’s possible that, up to now, your focus has been solely on technology. One of the big differentiators between working on an IT team, even in a leadership role, and being CIO is that you will need to understand how technology fits into the larger business goals of the company. You will need to be a technology translator and advocate for the CEO, business leadership, and board. For that, you have to understand the business first. “We can come up with creative technical solutions,” says Roberge. “We know you need an email system, a CRM system, and an ERP. But how does the business want to use those tools? How is the sales guy going sell product and be able to get a quote out, get the tax requirements, things like that?” Business leaders are unlikely to understand technology the way you do. So, you must understand the business in order to help the other business units, the CEO, and the board understand how technology can fit into their goals. “As technology experts, we know our technology extremely well,” says Roberge.

Explained: How to tell if artificial intelligence is working the way we want it to

Far from a silver bullet, explanation methods have their share of problems. For one, Ghassemi’s recent research has shown that explanation methods can perpetuate biases and lead to worse outcomes for people from disadvantaged groups. Another pitfall of explanation methods is that it is often impossible to tell if the explanation method is correct in the first place. One would need to compare the explanations to the actual model, but since the user doesn’t know how the model works, this is circular logic, Zhou says. He and other researchers are working on improving explanation methods so they are more faithful to the actual model’s predictions, but Zhou cautions that, even the best explanation should be taken with a grain of salt. “In addition, people generally perceive these models to be human-like decision makers, and we are prone to overgeneralization. We need to calm people down and hold them back to really make sure that the generalized model understanding they build from these local explanations are balanced,” he adds.

Future-Proofing Organisations Through Transparency

Partners that trust each other, perform better. Both parties should clearly understand the decisions and actions they own. Consequently, organisations cooperate with less friction and enhance accessibility to relevant information. A study in the Harvard Business Review notes that managers frequently adopt a trust but verify approach, evaluating potential partner behaviours during negotiations to determine whether they are open and honest. As one manager in the study advised, “To see if [the] person is forthcoming; ask a question you know the answer to”. Transparent companies are viewed as ‘ethical’ as their customers believe they have nothing to hide. The new era of the business-to-business model demands transparency. Companies want to know that what they do matters and trace a project back to their organisation’s vision. In a modern world where sustainability is not just a buzzword, clients want to know that partnerships are built with brands that support their morals. Unsatisfied customers disengage with a company to find one that works together to achieve a greater outcome and takes accountability for their actions.

Quote for the day:

"People will not change their minds but they will make new decisions based upon new information." -- Orrin Woodward

Daily Tech Digest - June 21, 2022

Effective Software Testing – A Developer’s Guide

When there are decisions depending on multiple conditions (i.e. complex if-statements), it is possible to get decent bug detection without having to test all possible combinations of conditions. Modified condition/decisions coverage (MC/DC) exercises each condition so that it, independently of all the other conditions, affects the outcome of the entire decision. In other words, every possible condition of each parameter must influence the outcome at least once. The author does a good job of showing how this is done with an example. So given that you can check the code coverage, you must decide how rigorous you want to be when covering decision points, and crate test cases for that. The concept of boundary points is useful here. For a loop, it is reasonable to at least test when it executes zero, one and many times. It can seem like it should be enough to just do structural testing, and not bother with specification based testing, since structural testing makes sure all the code is covered. However, this is not true. Analyzing the requirements can lead to more test cases than simply checking coverage. For example, if results are added to a list, a test case adding one element will cover all the code.

Inconsistent thoughts on database consistency

While linearizability is about a single piece of data, serializability is about multiple pieces of data. More specifically, serializability is about how to treat concurrent transactions on the same underlying pieces of data. The “safest” way to handle this is to line up transactions in the order they were arrived and execute them serially, making sure that one finishes before the next one starts. In reality, this is quite slow, so we often relax this by executing multiple transactions concurrently. However, there are different levels of safety around this concurrent execution, as we’ll discuss below. Consistency models are super interesting, and the Jepsen breakdown is enlightening. If I had to quibble, it’s that I still don’t quite understand the interplay between the two poles of consistency models. Can I choose a lower level of linearizability along with the highest level of serializability? Or does the existence of any level lower than linearizable mean that I’m out of the serializability game altogether? If you understand this, hit me up! Or better yet, write up a better explanation than I ever could :). If you do, let me know so I can link it here.

AI and How It’s Helping Banks to Lower Costs

Using AI helps banks lower the costs of predicting future trends. Instead of hiring financial analysts to analyze data, AI is used to organize and present data that the banks can use. They can get real-time data to analyze behaviors, predict future trends, and understand outcomes. With this, banks can get more data that, in turn, helps them make better predictions. ... Another advantage of using AI in the banking industry is that it reduces human errors. By reducing errors, banks prevent loss of revenue caused by these errors. Moreover, human errors can lead to financial data breaches. When this happens, critical data may get exposed to criminals. They can use the stolen data to use clients’ identities for fraudulent activities. Especially with a high volume of work, employees cannot avoid committing errors. With the help of AI, banks can reduce a variety of errors. ... AI helps banks save money by detecting fraudulent payments. Without AI, banks may lose millions because of criminal activities. But thanks to AI, banks can prevent such losses as the technology can analyze more than one channel of data to detect fraud.

Is NoOps the End of DevOps?

NoOps is not a one-size-fits-all solution. You know that it’s limited to apps that fit into existing serverless and PaaS solutions. Since some enterprises still run on monolithic legacy apps (requiring total rewrites or massive updates to work in a PaaS environment), you’d still need someone to take care of operations even if there’s a single legacy system left behind. In this sense, NoOps is still a way away from handling long-running apps that run specialized processes or production environments with demanding applications. Conversely, operations occurs before production, so, with DevOps, operations work happens before code goes to production. Releases include monitoring, testing, bug fixes, security, and policy checks on every commit, and so on. You must have everyone on the team (including key stakeholders) involved from the beginning to enable fast feedback and ensure automated controls and tasks are effective and correct. Continuous learning and improvement (a pillar of DevOps teams) shouldn’t only happen when things go wrong; instead, members must work together and collaboratively to problem-solve and improve systems and processes.

How IT Can Deliver on the Promise of Cloud

While many newcomers to the cloud assume that hyperscalers will handle most of the security, the truth is they don’t. Public cloud providers such as AWS, Google, and Microsoft Azure publish shared responsibility models that push security of the data, platform, applications, operating system, network and firewall configuration, and server-side encryption, to the customer. That’s a lot you need to oversee with high levels of risk and exposure should things go wrong. Have you set up ransomware protection? Monitored your network environment for ongoing threats? Arranged for security between your workloads and your client environment? Secured sets of connections for remote client access or remote desktop environments? Maintained audit control of open source applications running in your cloud-native or containerized workloads? These are just some of the security challenges IT faces. Security of the cloud itself – the infrastructure and storage – fall to the service providers. But your IT staff must handle just about everything else.

Distributed Caching on Cloud

Caching is a technique to store the state of data outside of the main storage and store it in high-speed memory to improve performance. In a microservices environment, all apps are deployed with their multiple instances across various servers/containers on the hybrid cloud. A single caching source is needed in a multicluster Kubernetes environment on cloud to persist data centrally and replicate it on its own caching cluster. It will serve as a single point of storage to cache data in a distributed environment. ... Distributed caching is now a de-facto requirement for distributed microservices apps in a distributed deployment environment on hybrid cloud. It addresses concerns in important use cases like maintaining user sessions when cookies are disabled on the web browser, improving API query read performance, avoiding operational cost and database hits for the same type of requests, managing secret tokens for authentication and authorization, etc. Distributed cache syncs data on hybrid clouds automatically without any manual operation and always gives the latest data.

Bridging The Gap Between Open Source Database & Database Business

It is relatively easy to get a group of people that creates a new database management system or new data store. We know this because over the past five decades of computing, the rate of proliferation of tools to provide structure to data has increased, and it looks like at an increasing rate at that. Thanks in no small part to the innovation by the hyperscalers and cloud builders as well as academics who just plain like mucking around in the guts of a database to prove a point. But it is another thing entirely to take an open source database or data store project and turn it into a business that can provide enterprise-grade fit and finish and support a much wider variety of use cases and customer types and sizes. This is hard work, and it takes a lot of people, focus, money – and luck. This is the task that Dipti Borkar, Steven Mih, and David Simmen took on when they launched Ahana two years ago to commercialize the PrestoDB variant of the Presto distributed SQL engine created by Facebook, and no coincidentally, it is a similar task that the original creators of Presto have taken on with the PrestoSQL, now called Trinio, variant of Presto that is commercialized by their company, called Starburst.

Data gravity: What is it and how to manage it

Examples of data gravity include applications and datasets moving to be closer to a central data store, which could be on-premise or co-located. This makes best use of existing bandwidth and reduces latency. But it also begins to limit flexibility, and can make it harder to scale to deal with new datasets or adopt new applications. Data gravity occurs in the cloud, too. As cloud data stores increase in size, analytics and other applications move towards them. This takes advantage of the cloud’s ability to scale quickly, and minimises performance problems. But it perpetuates the data gravity issue. Cloud storage egress fees are often high and the more data an organisation stores, the more expensive it is to move it, to the point where it can be uneconomical to move between platforms. McCrory refers to this as “artificial” data gravity, caused by cloud services’ financial models, rather than by technology. Forrester points out that new sources and applications, including machine learning/artificial intelligence (AI), edge devices or the internet of things (IoT), risk creating their own data gravity, especially if organisations fail to plan for data growth.

CIOs Must Streamline IT to Focus on Agility

“Streamlining IT for agility is critical to business, and there’s not only external pressure to do so, but also internal pressure,” says Stanley Huang, co-founder and CTO at Moxo. “This is because streamlining IT plays a strategic role in the overall business operations from C-level executives to every employee's daily efforts.” He says that the streamlining of business processes is the best and most efficient way to reflect business status and driving power for each departmental planning. From an external standpoint, there is pressure to streamline IT because it also impacts the customer experience. “A connected and fully aligned cross-team interface is essential to serve the customer and make a consistent end user experience,” he adds. For business opportunities pertaining to task allocation and tracking, streamlining IT can help align internal departments into one overall business picture and enable employees to perform their jobs at a higher level. “When the IT system owns the source of data for business opportunities and every team’s involvement, cross team alignment can be streamlined and made without back-and-forth communications,” Huang says.

Open Source Software Security Begins to Mature

Despite the importance of identifying vulnerabilities in dependencies, most security-mature companies — those with OSS security policies — rely on industry vulnerability advisories (60%), automated monitoring of packages for bugs (60%), and notifications from package maintainers (49%), according to the survey. Automated monitoring represents the most significant gap between security-mature firms and those firms without a policy, with only 38% of companies that do not have a policy using some sort of automated monitoring, compared with the 60% of mature firms. Companies should add an OSS security policy if they don't have one, as a way to harden their development security, says Snyk's Jarvis. Even a lightweight policy is a good start, he says. "There is a correlation between having a policy and the sentiment of stating that development is somewhat secure," he says. "We think having a policy in place is a reasonable starting point for security maturity, as it indicates the organization is aware of the potential issues and has started that journey."

Quote for the day:

"No great manager or leader ever fell from heaven, its learned not inherited." -- Tom Northup