Tech Bytes - Daily Digest: Daily Tech Digest

Defining the Possible Approaches to Optimum Metadata Management

Metadata has been the focus of a lot of recent work, both in academia and industry. As more and more electronic data is generated, stored, and managed, metadata generation, storage, and management promise to improve the utilization of that data. Data and metadata are intrinsically linked, hence the concept can be found in any possible application area and can take numerous forms depending on its application context. However, it is found that metadata is often employed in scientific computations just for the initial data selection; at the most, metadata about query results are recovered after the query has been successfully executed and correlated. As a result, throughout the query processing procedure, a vast amount of information that may be useful for analyzing query results is not utilized. Thus, the data need "refinements". There are two distinct definitions of "refinements". The first is the addition of qualifiers that clarify or enlarge an element's meaning. While such modifications may be necessary or even necessary for a particular metadata application, for the sake of interoperability, the values of such elements can be regarded as subtypes of a broader element.

Data Contracts — ensure robustness in your data mesh architecture

In cases where many applications are coupled to each other, a cascading effect sometimes can be seen. Even a small change to a single application can lead to the adjustment of many applications at the same time. Therefore, many architects and software engineers avoid building coupled architectures. Data contracts are positioned to be the solution to this technical problem. A data contract guarantees interface compatibility and includes the terms of service and service level agreement (SLA). The terms of service describe how the data can be used, for example, only for development, testing, or production. The SLA typically also describes the quality of data delivery and interface. It also might include uptime, error rates, and availability, as well deprecation, a roadmap, and version numbers. Data contracts are in many cases part of a metadata-driven ingestion framework. They’re stored as metadata records, for example, in a centrally managed metastore, and play an important role for data pipeline execution, validation of data types, schemas, interoperability standards, protocol versions, defaulting rules on missing data, and so on. Therefore, data contracts include a lot of technical metadata.

What’s behind the cloud talent crisis — and how to fix it

The problem is that there aren’t enough experienced, trained engineers necessary to meet that need. And even folks who have been in the thick of cloud technology from the start are finding themselves rushing to stay abreast of the evolution of cloud technology, ensuring that they’re up on the newest skills and the latest changes. Compounding the issue, it’s an employee’s market, where job seekers are spoiled for choice by an endless number of opportunities. Companies are finding themselves in fierce competition, fishing during a drought in a pool that keeps shrinking. “It’s going to require so many more experienced, trained engineers than we currently have,” said Cloudbusing host Jez Ward during the Cloud Trends 2022 thought leadership podcast series at ReInvent. “We’re taking it exceptionally seriously, and we probably have it as our number one risk that we’re managing. As we talk to some of our partner organizations, they see this in the same way.” Cloudbusting podcast hosts Jez Ward and Dave Chapman were joined by Tara Tapper, chief people officer at Cloudreach and Holly Norman, Cloudreach’s head of AWS marketing to talk about what’s behind the tech crisis, and how companies can meet this challenge.

Meta AI’s Sparse All-MLP Model Doubles Training Efficiency Compared to Transformers

Transformer architectures have established the state-of-the-art on natural language processing (NLP) and many computer vision tasks, and recent research has shown that All-MLP (multi-layer perceptron) architectures also have strong potential in these areas. However, although newly proposed MLP models such as gMLP (Liu et al., 2021a) can match transformers in language modelling perplexity, they still lag in downstream performance. In the new paper Efficient Language Modeling with Sparse all-MLP, a research team from Meta AI and the State University of New York at Buffalo extends the gMLP model with sparsely activated conditional computation using mixture-of-experts (MoE) techniques. Their resulting sMLP sparsely-activated all-MLP architecture boosts the performance of all-MLPs in large-scale NLP pretraining, achieving training efficiency improvements of up to 2x compared to transformer-based mixture-of-experts (MoE) architectures, transformers, and gMLP.

How to build a better CIO-CMO relationship

The CIO should be regularly and actively engaging the CMO for assistance in "telling the story" of new technology investments. For example, they should share how the new HR system not only provided a good ROI and TCO, but made employees' lives easier and better. Technology vendors are well aware of the value of having their technology leaders "tell the story." The deputy CIO of Zoom spends a considerable amount of time evangelizing about the company and its products -- and is highly effective at it. Spotify has a well-regarded series of videos about how its DevOps culture helps it succeed. CIOs at non-technology companies -- or more accurately, at companies that produce products other than hardware, software and cloud services -- would do well to take a page from the technology CIO's playbook. CMOs and their teams can assist CIOs and their teams with developing a campaign to market a new technology implementation. They can ensure the campaign captures the appropriate attention of the desired constituencies, up to and including developing success metrics, so CIOs are able to assess how effective they're being.

Matter smart home standard delayed until fall 2022

The CSA is also allowing more time for the build and verification of a larger than expected number of platforms (OS’s and chipsets), which it hopes will see Matter launch with a healthy slate of compatible Matter devices, apps, and ecosystems. This need arose over the last year based on activity seen on the project’s Github repository. More than 16 platforms, including OS platforms like Linux, Darwin, Android, Tizen, and Zephyr, and chipset platforms from Infineon, Silicon Labs, TI, NXP, Nordic, Espressif Systems and Synaptics will now support Matter. “We had thought there would be four or five platforms, but it’s now more than 16,” says Mindala-Freeman. “The volume at which component and platform providers have gravitated to Matter has been tremendous.” The knock-on effect of these SDK changes is that the CSA needs to give its 50 member companies who are currently developing Matter-capable products another chance to test those devices before they go through the Matter certification process. The CSA also shared details of that initial certification process with The Verge. Following a specification validation event (SVE) this summer

PyTorch Geometric vs Deep Graph Library

Arguably the most exciting accomplishment of deep learning with graphs so far has been the development of AlphaFold and AlphaFold2 by DeepMind, a project that has made major strides in solving the protein structure prediction problem, a longstanding grand challenge of structural biology. With myriad important applications in drug discovery, social networks, basic biology, and many other areas, a number of open-source libraries have been developed for working with graph neural networks. Many of these are mature enough to use in production or research, so how can you go about choosing which library to use when embarking on a new project? Various factors can contribute to the choice of GNN library for a given project. Not least of all is the compatibility with you and your team’s existing expertise: if you are primarily a PyTorch shop it would make sense to give special consideration to PyTorch Geometric, although you might also be interested in using the Deep Graph Library with PyTorch as the backend (DGL can also use TensorFlow as a backend).

The No-Code Approach to Deploying Deep Learning Models on Intel® Hardware

Deep learning has two broad phases: training and inference. During training, computers build artificial neural network models by analyzing thousands of inputs—images, sentences, sounds—and guessing at their meaning. A feedback loop tells the machine if the guesses are right or wrong. This process repeats thousands of times, creating a multilayered network of algorithms. Once the network reaches its target accuracy, it can be frozen and exported as a trained model. During deep learning inference, a device compares incoming data with a trained model and infers what the data means. For example, a smart camera compares video frames against a deep learning model for object detection. It then infers that one shape is a cat, another is a dog, a third is a car, and so on. During inference, the device isn’t learning; it’s recognizing and interpreting the data it receives. There are many popular frameworks—like TensorFlow PyTorch, MXNet, PaddlePaddle—and a multitude of deep learning topologies and trained models. Each framework and model has its own syntax, layers, and algorithms.

Operational resilience is much more than cyber security

To a Chief Information Officer, for example, an IT department can’t be considered operationally resilient without the accurate, actionable data necessary to keep essential business services running. To a Chief Financial Officer, meanwhile, resilience involves maintaining strong financial reporting systems in order to maintain vigilance over spend and savings. This list could run on and on, but while resilience manifests itself differently to different departments, no aspect of an enterprise organisation exists in a vacuum. True resilience involves understanding connections between different aspects of a business – and the dependencies between the various facets of its infrastructure. To understand the connections and dependencies between business services, customer journeys, business applications, and cloud / legacy infrastructure, and so on, large organisations need to invest in tools like configuration management databases (CMDBs). With the visibility and knowledge that a CMDB provides, organisations can strengthen their resilience by understanding and anticipating how disruptions to one part of their infrastructure will impact the rest

Deploying AI With an Event-Driven Platform

There are a number of crucial features required in an event-driven AI platform to provide real-time access to models for all users. The platform needs to offer self-service analytics to non-developers and citizen data scientists. These users must be able to access all models and any data required for training, context, or lookup. The platform also needs to support as many different tools, technologies, notebooks, and systems as possible because users need to access everything by as many channels and options as possible. Further, almost all end users will require access to other types of data (e.g., customer addresses, sales data, email addresses) to augment the results of these AI model executions. Therefore, the platform should be able to join our model classification data with live streams from different event-oriented data sources, such as Twitter and Weather Feeds. This is why my first choice for building an AI platform is to utilize a streaming platform such as Apache Pulsar. Pulsar is an open-source distributed streaming and publish/subscribe messaging platform that allows your machine learning applications to interact with a multitude of application types

Quote for the day:

"As a leader, you set the tone for your entire team. If you have a positive attitude, your team will achieve much more." -- Colin Powell

Tech Bytes - Daily Digest

Pages

Daily Tech Digest - March 18, 2022