Daily Tech Digest - August 06, 2021

The Role of Business Architecture in Defining Data Architecture

Data architects can systematically examine the information concepts in the information map and define corresponding data entities for each of those concepts. There is no assumption that the data model and the information map will be identical. Data architects will apply data modeling techniques to formalize data entities as appropriate. The information map’s role is rather to provide business ecosystem transparency, delivering a business-driven perspective to ensure that data models and related deployments enable and do not hinder the organization they are meant to benefit. As data entities are defined, data architects can leverage information concept relationships to establish corresponding relationships among data entities in the data models. All information maps have a set of relationships that data architects may interrogate to derive their entity relationships. The next step is to attribute the data entities. Figure 5 depicts data attribute derivation using child capabilities defined under Agreement Management.


HTTP/2 Implementation Errors Exposing Websites to Serious Risks

To show how such an attack would work, Kettle pointed to an exploit he executed against Netflix where front-end servers performed HTTP downgrading without verifying request lengths. The vulnerability allowed Kettle to develop an exploit that triggered Netflix's back-end to redirect requests from Netflix's front-end to his own server. That allowed Kettle to potentially execute malicious code to compromise Netflix accounts, steal user passwords, credit card information, and other data. Netflix patched the vulnerability and awarded Kettle its maximum bounty of $20,000 for reporting it to the company. In another instance, Kettle discovered that Amazon's Application Load Balancer had failed to implement an HTTP/2 specification regarding certain message-header information that HTTP/1.1 uses to derive request lengths. With this vulnerability, Kettle was able to show how an attacker could exploit it to redirect requests from front-end servers to an attacker-controlled server. 


How to prepare your Windows network for a ransomware attack

Too many of us are still reliant on older server platforms that make it harder to roll out security solutions through Active Directory. We may have Server 2016 and Server 2019 servers in our network, but we’re not taking advantage of the security features of that domain functional level. Too many of us are still on older forest and domain functional levels because we have older servers or applications and a lack of testing that keep us from rolling out these newer features. Or we have vendors that won’t certify newer platforms and Active Directory features. Raising your forest level to 2016 provides many features that better protect the network such as privileged access management and automatic rolling of NTLM secrets on a user account. If your functional level is still 2008 R2, you don’t have a UI for the Active Directory recycle bin, which makes it easier for recovery. It also doesn’t allow you to get rid of an old security hole of unchanging passwords on your service accounts if you are still running 2008 R2 functional level.


Can the public cloud become confidential?

The Confidential Cloud is a secure confidential computing environment formed over one or more public cloud providers. Applications, data, and workloads within a Confidential Cloud are protected by a combination of hardware-grade encryption, memory isolation, and other services in the underlying host. Like micro-segmentation and host virtualization, resources within a Confidential Cloud are isolated from all processes and users in a default zero-trust posture. But the Confidential Cloud does more than isolate network communications, it isolates the entire IT environment used by a workload—including compute, storage, and networking. That enables support for virtually any application. Because Confidential Cloud protection is inextricably part of data, the protection extends wherever the data goes. Legacy enterprise perimeters are defined by physical appliances, but a Confidential Cloud’s perimeter is established by an inextricable combination of hardware isolation, encryption, and explicit least-privileged access policy. 


Why the future of service is hybrid

For many businesses though, this has led to employment issues, especially as the workforce ages. Knowledge loss is an increasingly common problem. According to the Service Council, 70% of service organisations say they would be burdened by the knowledge loss of a retiring workforce in the next five to 10 years, while 50% claim they are currently facing a shortage of resources to adequately meet service demand. Automation is great, but it will only go so far to help. Interestingly, the TSIA recently found that half of all field services organisations don’t have a formal career path in place for their field service engineers. This, in my view, is a huge point of unnecessary commercial risk. These organisations are not doing enough to prepare younger service techs for a mixed reality future – one where they will have to work more closely with digital technology and machines than any previous generation. It won’t happen by accident. There is certainly a need for an integral ‘system of record’ that captures accurate data about equipment ‘as maintained’. 


How to Recognise and Reduce HumanDebt

HumanDebt™ is the equivalent to Technical Debt but for people. All of the initiatives, the projects, the intentions we (the organisation) had to do better by our employees, but we abandoned halfway. All of the missed opportunities to make their lives and their work easier and more joyful. All of the empty talk on equality, respect, lack of blame, courage and trust. All of the missing focus on empowered teams and servant leadership. All of the lack of preoccupation or resources for building better team dynamics. All of the toxic culture created by these. That’s Human Debt. ... It is tempting to believe that this type of debt is the organisation’s problem only. Even more tempting is to believe that it only happens at that macro, cultural level and that that is the only level where it can be fixed. Both are fallacies though. It’s important that the organisation has a degree of recognition, which enables them to offer "organisational permission" and help, as there really is only one solid thing to start with - empower teams to work on their own dynamics and improve their happiness by giving them the resources they need to do so.


How to deal with a toxic teammate

Toxic behavior may have occurred less frequently or been less noticeable during the pandemic. “There has been more stress but also a lot of grace-giving and cutting-of-slack to account for whatever people have going on in their personal and professional lives,” Cuthbert says. “The water cooler is gone and hasn’t been replaced and there is less of a forum for those who are negative or unhappy.” But it can take numerous forms. “Motivating through fear and unattainable goals and timelines, obfuscating expectations and scope of job descriptions or projects, not clearly identifying the North Star and who is doing what, being inconsistent in holding people accountable, dominating, yelling, talking over others, and interrupting are all signs of toxic behavior,” Mattheis says. “Working remotely has not changed that reality. What it has done is adjust how it looks and feels as well as made it more difficult to speak to it and hold people accountable.” Like dealing with a toxic boss, responding to a peer’s unhealthy dynamics can be tricky, but there are constructive approaches for using emotional intelligence to address the issues and mitigate their impact on your own productivity and well being.


Chip shortage has networking vendors scrambling

The semiconductor industry is predicting a possible recovery in 2023. But who knows what demand will be at that time, Sadana said. Part of the problem is that current semiconductor foundry capacity is not adequate to meet the recent surge in global demand, wrote Baron Fung, industry analyst at Dell'Oro Group, in a recent blog. “The cost of servers and other data center equipment is projected to rise sharply in the near term partly due to the global semiconductor shortages,” Fung stated. “An increase of server average selling prices could approach the double-digit level that was observed in 2018, which was another period of tight supply and high demand. However, in the longer term, we anticipate that supply and demand dynamics could reach equilibrium and that technology transitions could drive market growth.” ... “We continue to proactively manage the supply chain, and our strategic relationship with Broadcom is helping us in this regard. Importantly, we have secured vendor commitments that will allow us to accelerate product delivery and bring down backlog as of Q2 and beyond,” Thomas stated.


Why businesses should embrace cloud-native development

Containers provide the infrastructure to realise a microservices architecture in practice. It provides individual standalone components for an app that can be independently replaced, changed, or removed without jeopardising the rest of your infrastructure. This is essential to realise the cloud-native vision because the completeness of a container package and its agnosticism to its environment ensures the portability needed for cloud-native apps – containerised apps can be deployed in whatever cloud environment you operate in, whether it be public, private, or hybrid. The use of containers in the cloud-native model thereby brings speed and scalability that cannot be achieved through traditional systems architecture, and addresses a fundamental business need: for changes in software to be applied quickly and seamlessly so that tasks can be completed efficiently and inexpensively. For all these reasons, containers are one of the biggest trends in enterprise software development


CISA's Easterly Unveils Joint Cyber Defense Collaborative

"To some extent, some of these activities are already going on across the federal government, but they're running largely in stovepipes. So the idea is that we bring together our partners in the government and our private sector partners to really mature this planning capability," Easterly said. Besides CISA and its parent organization, the Department of Homeland Security, other federal government participants will include the U.S. National Security Agency, U.S. Cyber Command and the FBI. Easterly announced nine companies have signed up to participate,: CrowdStrike, Palo Alto Networks, FireEye, Amazon Web Services, Google, Microsoft, AT&T, Verizon and Lumen. The JCDC will build on the relationships CISA has with Information Sharing and Analysis Centers, or ISACs, which represent various industries. The concept for the new initiative came from the Cyberspace Solarium Commission, which published its report in 2020 (see: Senate Approves Chris Inglis as National Cyber Director).



Quote for the day:

"Added pressure and responsibility should not change one's leadership style, it should merely expose that which already exists." -- Mark W. Boyer

Daily Tech Digest - August 05, 2021

Cybersecurity professionals: Positive reinforcement works wonders with users

Sai Venkataraman, CEO of SecurityAdvisor, in his Help Net Security article, The power of positive reinforcement in combating cybercriminals, said he wants management to rethink its approach and use positive reinforcement instead. "It's important to recognize that cognitive bias is part of the human brain's makeup and functionality," Venkataraman said in his introduction. "While these subconscious mental shortcuts make it difficult to change behaviors, it's not impossible." Cognitive bias is hands down the culprit. Charlotte Ruhl, in her Simple Psychology article What Is Cognitive Bias? defined cognitive bias as: "A subconscious error in thinking that leads you to misinterpret information from the world around you and affects the rationality and accuracy of decisions and judgments. "Biases are unconscious and automatic processes designed to make decision-making quicker and more efficient. Cognitive biases can be caused by a number of different things, such as heuristics (mental shortcuts), social pressures and emotions."


Hackers are using CAPTCHA techniques to scam email users

Researchers found that quantity continues to beat quality in email attacks. Proofpoint found that the highest number of clicks came from a threat actor linked to the Emotet botnet. “This total reflects their effectiveness and the sheer volume of emails they sent in each campaign,” the report notes. The group, whose infrastructure was knocked out by international law enforcement earlier this year, has gone virtually dormant since. Cybersecurity researchers also say that companies shouldn’t underestimate basic cyber hygiene in combatting ransomware. Hackers are increasingly turning to email to distribute initial malware that’s used later to download ransomware rather than using email as the initial attack vector. In 2020, Proofpoint detected 48 million emails that contained malware that was used to launch ransomware. Top threats detected by Proofpoint included names like The Trick, Dridex and Qbot. Concerns over ransomware have only skyrocketed in 2021 after a series of high-profile attacks against critical industries in the United States. 


To Protect Consumer Data, Don’t Do Everything on the Cloud

Restricting private data collection and processing to the edge is not without its downsides. Companies will not have all their consumer data available to go back and re-run new types of analyses when business objectives change. However, this is the exact situation we advocate against to protect consumer privacy. Information and privacy operate in a tradeoff — that is, a unit increase in privacy requires some loss of information. By prioritizing data utility with purposeful insights, edge computing reduces the quantity of information from a “data lake” to the sufficient data necessary to make the same business decision. This emphasis on finding the most useful data over keeping heaps of raw information increases consumer privacy. The design choices that support this approach — sufficiency, aggregation, and alteration — apply to structured data, such as names, emails or number of units sold, and unstructured data, such as images, videos, audio, and text. To illustrate, let us assume the retailer in our wine-tasting example receives consumer input via video, audio, and text.


Do You Have the Empathy to Make the Move to Architect?

Solution and API architects may focus on different levels of the stack, but also perform very similar roles. Usually, an architect is a more senior, but non-executive role. An architect typically makes high-level design decisions, enforces technical standards and looks to guide teams with a mix of technical and people skills. “Being an architect takes social skills built on the foundation of the technical,” said Keith Casey an independent contractor, API consultant and author of “The API Design Book.” “No matter how good at the socials you are, you need to have the technical. Have you built a system like this? Have you shipped a system like this? You can read cookbooks all day, until you’ve put that in the oven, you haven’t cooked. You actually have to succeed and fail a few times before you can really offer advice to everyone. Social has to come after the technical foundation.” While a developer likes to dig deep into the weeds of a particular product or language, an architect is ready to broaden their understanding of enterprise architecture and how it fits into the business as a whole.


California's privacy law raises risks of legal action and fines over data collection

The upcoming California Privacy Rights Act (CPRA) is considered a pioneer in data privacy and it strengthens the current California Consumer Privacy Act with stricter rules. Enforcement is also beefed up with the creation of the California Privacy Protection Agency (CPPA) plus the ability of individual Californians to file suits against companies for non-compliance. The law was passed November 2020 and it applies to any company of sufficient size that does business in California which includes online sales without requiring a physical location. California residents can request from a company how their personal data has been used, and for what purpose, and they can request that their personal data not be sold or demand it be deleted including any data that has been sold to third parties. Each company must also state if artificial intelligence was applied to any of their personal data, and if it was, what the logic was behind the AI. This is essentially asking for companies to reveal how their algorithms rank the data.


How to Explain Complex Technology Issues to Business Leaders

Business leaders generally trust their tech counterparts to successfully address and resolve all the necessary technical details. What colleagues most want is assurance that whatever technology IT is proposing delivers benefits that outweigh capital and operating expenses. "We need to rise above the technology itself to explain the impact it will have," Kelker said. Jerry Kurtz, executive vice president of insights and data, at IT advisory firm Capgemini North America, also stressed the importance of focusing on the project's potential business outcome and value. "Rather than getting into the details of the technology, challenge, or solution in technical terms, showcase the outcomes the solution can bring and how they will impact the business as a whole," he explained. "Once this has been accomplished, it's time to develop a roadmap to reach the agreed upon target state." Using analogies rooted in shared experiences is a good way to find a common ground with business leaders, advised Mike Bechtel, chief futurist at business and IT advisory firm Deloitte Consulting. 


How universities can facilitate blended learning through smart campus infrastructure

Smart campus infrastructure doesn’t only provide a reliable solution to short term connectivity issues, but it also offers long term scalability that can continuously be tweaked, upgraded and expanded to fit the institution’s needs as they shift. The ideal scenario would be to have low levels of latency on a high capacity network, creating breathing room so that any significant uptake in usage levels wouldn’t cause any issues. Alternative network providers (AltNets) can overprovision to ensure that this scenario plays out ideally for the university. By providing much more bandwidth than is needed, bottlenecks can be removed and users can enjoy a seamless connectivity experience. As broadband demand inevitably grows over time, optic kit can be upgraded in line with what is required. ... With Wi-Fi 6 deployed across the entire campus, the technology can take universities to new heights. Reliable, high speed connections implemented across the university would enable the student experience to take on a new form through third party deployments. Suddenly, smart homes can be utilised effectively across the entire campus. 

Recover from ransomware: Why the cloud is the way to go

Recovery in the cloud can happen before you ever need it. It starts with automatically and periodically performing an incremental restore of your computing environment to an IaaS vendor. This means your entire environment—including backups of both structured and unstructured data—is already restored before it’s needed. Yes, you will lose some amount of data depending on the window between the last restore and the ransomware attack, so you will need to decide up front how often you execute the pre-restore process to minimize the loss. You also need to agree on what amount of data loss is acceptable, which is officially referred to as your recovery point objective (RPO). Technically, this type of recovery doesn’t require the cloud, but using the cloud makes it financially feasible for most environments. Doing it with a physical data center requires the cosly route of paying for the data center before you need it. With the cloud you pay only for the storage associated with your pre-restored images. Cloud-friendly backup and DR products and services can proactively restore your entire environment to the cloud of your choice—once a day, once an hour, or continuously. 


Hybrid work model: 5 advantages

Organizations with the biggest productivity increases during the pandemic have supported and encouraged “small moments of engagement” among their employees, according to McKinsey. These small moments are where coaching, idea sharing, mentoring, and collaborative work happen. This productivity boost stems from training managers to reimagine processes and rethink how employees can thrive at work. Autonomy is the key to employee satisfaction: If you provide full autonomy and decision-making on how, where, and when your team members work, employee satisfaction will skyrocket. Autonomy is important for on-site workers, too. Employees who return to the office after over a year of setting their own schedule will need to feel that they are trusted to get work done without a manager standing by. At our company, mutual appreciation and positive assumptions are guiding principles. When we don’t see each other every day, it’s easy to make assumptions about other employees – we keep these assumptions positive, trusting that everyone is doing their best and making responsible decisions.

A New Approach to Securing Authentication Systems' Core Secrets

With SAML, user management is shifted from the service provider (SP) to an identity provider (IdP), and authentication and directory are decoupled from the service. Instead of worrying about dozens of different apps and their authentication measures, admins configure the IdP to verify all employees' identities. The SP and IdP only communicate with each other with a key pair: The IdP signs with the private key, and the SP verifies with the public key. A Golden SAML attack occurs when the attackers steal a private key from the identity provider and become a "rogue IdP," Be'ery said. This allows them to generate arbitrary access SAML tokens offline, within the attackers' environment. Doing this would let attackers access a system as any user, in any role, while bypassing security policies and MFA. They could also slip past access monitoring, if access is only monitored by the identity provider, Be'ery said. The security community saw this technique in the SolarWinds attack, which also marked the first publicly known use of Golden SAML in the wild, he noted. 



Quote for the day:

"Great things are not something accidental, but must certainly be willed." -- Vincent van Gogh

Daily Tech Digest - August 04, 2021

Thoughtfully training SRE apprentices: Establishing Padawan and Jedi matches

Learning via osmosis is very powerful. There is a lot of jargon and technical terms that are best learned just by hearing others use these terms in context. For example, if you ask someone who doesn’t work in technology to pronounce nginx, they will likely say this incorrectly. This is very common for new engineers too. It’s not a problem, it just means there is a lot to learn which experienced engineers may take for granted. What if you asked a group of people who don’t work in technology to spell nginx? I’m sure you’d get many different answers. How does this change in a remote world? Really, it’s the same. You’ll still be attending meetings and hearing new terms, you can still attend standup, and you can still continue to google the terms you don’t know to build your vocabulary. For example, imagine you are in a meeting on the topic of incident management and you are reviewing metrics as a team. As a new SRE apprentice you might wonder, what does MTTD mean? If you hear or see this term in a meeting you can quickly google it and learn on the job. 


Blockchain applications that will change the world in the next 5 years

Decentralized finance (DeFi) is another increasingly blossoming application of blockchain technology that is set to gain significant momentum in the next 5 years. In Q1 2021, the dollar value of assets under management by DeFi applications grew from roughly $20 billion to $50 billion. DeFi is a form of finance that removes central financial intermediaries, like banks, to offer traditional financial instruments that utilize smart contracts on blockchains. An example of DeFi in action is the plethora of new decentralized applications now offering easier access to digital loans — users can bypass strict requirements of banks and engage in peer-to-peer lending with other people around the world. The next five years are vital for DeFi and will see dramatic growth in its applications, regulatory compliance associated with the technology and its overall use. Celebrity investor Mark Cuban, who gained notoriety in the blockchain industry through his advocation of NFTs, has suggested that “banks should be scared” of DeFi’s rising popularity. 


The remote-working challenge: ‘There are huge issues’

From an employees’ perspective “WFH (Working From Home) has the potential to reduce commute time, provide more flexible working hours, increase job satisfaction, and improve work-life balance,” a recent study by the University of Chicago entitled Work from home & productivity: evidence from personnel & analytics data on IT professionals noted. That’s the theory, but it doesn’t always work out like that. The researchers tracked the activity of more than 10,000 employees at an Asian services company between April 2019 and August 2020 and found that they were working 30 per cent more hours than they were before the pandemic, and 18 per cent more unpaid overtime hours. But there was no corresponding increase in their workload, and their overall productivity per hour went down by 20 per cent. Employees with children, predictably perhaps, were most affected – they worked 20 minutes per day more than those without. More surprisingly, the employees had less focus time than before the pandemic, and a lot more meetings. “Time spent on co-ordination activities and meetings increased, but uninterrupted work hours shrank considerably.


Quantum Computing —What’s it All About

Quantum computers are a new type of computer that do calculations in a fundamentally different way. They will do certain calculations dramatically faster than current computers can. This will allow some business questions we currently answer infrequently to be answered faster and more often. It will also allow us to ask some questions we previously considered impossible to answer. And, as with any new technology, as we become clear on the capabilities, it will let us ask (and answer) questions that we had previously not even considered; solving the unknown unknowns, if you will. The clearest I’ve seen this concept laid out is in a simple diagram such as the below. The first time I saw this was an excellent presentation given by IBM’s Andy Stanford-Clark to Quantum London called “Quantum Computing: a guide for the perplexed”. A fitting title and fascinating talk. The most visible scientific and engineering feats in this field at the moment are the designing and building of the quantum computing hardware.


Cluster API Offers a Way to Manage Multiple Kubernetes Deployments

The focus of Cluster API initially is on projects creating tooling and on managed Kubernetes platforms, but in the long run, it will be increasingly useful for organizations that want to build out their own Kubernetes platform, Burns suggested. “It facilitates the infrastructure admin being able to provision a cluster for a user, in an automated fashion or even build the tooling to allow that user to self-service and say ‘hey, I want a cluster’ and press a button and the cluster pops out. By combining Cluster API with something like Logic Apps on Arc, they can come up to a portal, press a button, provision a Kubernetes cluster, get a no-code environment and start building their applications, all through a web browser. ... “We’re maturing to a place where you don’t have to be an expert; where the person who just wants to put together a data source and a little bit of a function transformation and an output can actually achieve all of that in the environment where it needs to run, whether that’s an airstrip, or an oil rig or a ship or factory,” Burns said.


Behind the scenes: A day in the life of a cybersecurity expert

"My biggest challenge is how to determine what we need to work on next," Engel said. "There's only so much time in the world and you only have so much manpower. We have so many ideas that we want to execute and deliver to ensure security and privacy, and we don't like to rest on our laurels. We're not just going to say, 'oh, this is an eight out of 10, so we don't need to touch it anymore.' We want to be 10 out of 10 everywhere." As for the fun part, "analysis on security events is a lot of fun because it's my background," he said. "I love that kind of thing." While he can't go into details on this, because of security reasons, it's an around-the-clock job. The automated systems can contact Engel at any time if something highly critical occurs—"like having a burglar alarm at your house or something like that," he said. "Someone's attempting to break-in. And in this example, we are the police." "A common misconception about cybersecurity is that it's literally just two people sitting in a dark room waiting for a screen to turn red, and then they maybe flick a couple buttons," Engel said. "It couldn't be further from the truth."


What is DataSecOps and why it matters

Security needs to be bolted into DataOps, not an afterthought. This means building a cross team, ongoing collaboration between security engineering, data engineering and other relevant stakeholders, and not just at the end of a big project. This also means that the security of data stores needs to be understood and transparent to security teams. Number three, in the ever-changing data world, and with limited resources, prioritization is key. You should plan and focus on the biggest risks first. In data that often means knowing where your sensitive data is, which is not so trivial, and prioritizing it much higher in terms of projects and resources. Number four, data access needs to have a clear and simple policy. If things start getting too complicated or non-deterministic around data access permissions, and by non-deterministic, I mean that sometimes you may request access and get it, and sometimes you may not get it, you’re either being a disabler for the business data usage, or you’re exposing security risks.


Improving microservice architecture with GraphQL API gateways

API gateways are nothing new to microservices. I’ve seen many developers use them to provide a single interface (and protocol) for client apps to get data from multiple sources. They can solve the problems previously described by providing a single API protocol, a single auth mechanism, and ensuring that clients only need to speak to one team when developing new features.Using GraphQL API gateways, on the other hand, is a relatively new concept that has become popular lately. This is because GraphQL has a few properties that lend themselves beautifully to API gateways. GraphQL Mesh will not only act as our GraphQL API gateway but also as our data mapper. It supports different data sources, such as OpenAPI/Swagger REST APIs, gRPC APIs, databases, GraphQL (obviously), and more. It will take these data sources, transform them into GraphQL APIs, and then stitch them together. To demonstrate the power of a library like this, we will create a simple SpaceX Flight Journal API. Our app will record all the SpaceX launches we attended over the years. 


IT modernization: 5 truths now

Digital transformation got a lot of CIO attention this past year at events like the MIT Sloan CIO Symposium – and it still isn’t a product or a solution that anyone can buy. Rather, it’s best described as a continuous process involving new technologies, ways of working, and adopting a culture of experimentation. Fostering that culture leads to faster and more experimentation and the ability to arrive at better outcomes through continuous improvement. But just because the technology component is often not front and center (and shouldn’t be) in digital transformation projects doesn’t mean that a technology toolbox, including a foundational platform, is unimportant. Anything but. If you look back at some of the words in that digital transformation definition, it’s easy to see why traditional rigid platforms often intended to support monolithic long-lived applications might not fit the bill. Digital transformation is responsible in no small part for the acceleration of both containerized environments and the consumption of cloud services.


Building future-proof tech products and long-lasting customer relationships

When designing a new product, it is essential to not only look at current needs, but to anticipate changes in business and in computing models. A timeless design must include potential for expansion and adaptation as the industry evolves. Products designed in a non-portable manner are limited to specific deployment scenarios only, for example on-prem vs. cloud. Of course, predicting movements in the industry is not easy. Many people bet on storage tape going away, yet it is still found in many data centres today. Some jumped on a new trend too soon and failed. And others did not recognise the value in what they had created. Take Xerox, for example, which created then ignored the first personal computer. So, how do you go about creating enduring technology that makes a meaningful difference in people’s lives and businesses? First, stay close to analysts whose job it is to analyse the market and identify major trends. And, second, engage in deep conversations with end-users to truly understand their objectives and challenges. Here, it is essential to discuss customers’ evolving needs and future projects, then work to create a product that solves for both the short and long term.



Quote for the day:

"I think leadership's always been about two main things: imagination and courage." -- Paul Keating

Daily Tech Digest - August 03, 2021

Is remote working better for the environment? Not necessarily

When workers’ homes become their offices, commutes may fall out of the carbon equation, but what’s happening inside those homes must be added in. How much energy is being used to run the air conditioner or heater? Is that energy coming from clean sources? In some parts of the country during lockdown, average home electricity consumption rose more than 20% on weekdays, according to the International Energy Agency. IEA’s analysis suggests workers who use public transport or drive less than four miles each way could actually increase their total emissions by working from home. Looking further ahead, the questions multiply. Many Shopify employees live near the office and walk, bike or take public transit. Will remote work mean they move from city apartments to sprawling suburban homes, which use, on average, three times more energy? Will they buy cars? Will they be electric or gas-powered SUVs? “You have company control over what takes place in the office,” Kauk noted. “When you have everyone working remotely from home, corporate discretion is now employee discretion.”


Modernizing your applications with containers and microservices

There are many reasons to learn and design with serverless microservices, but that doesn’t mean they are perfect for every situation – just like microservices in general. If your workloads are stable or predictable in size, you generally won’t receive the financial benefits of running in a serverless environment over the long-term in contrast to unpredictable workloads and serverless platforms scaling in response. Additionally, one downside of serverless and functions-as-a-service is magnified when you have stateful microservices that either require a longer “cold start” time when starting from scratch or something that requires long-term in-memory state management. One final caveat for serverless offerings is the implicit caution against vendor lock-in when using cloud provider-specific serverless offerings, which can lead to deeply integrated architectural decisions that can be impacted severely should the offering change capabilities, requirements, or pricing. 


The cybersecurity jobs crisis is getting worse

"Cybersecurity is seen as a cost centre to the business -- something you have to do, but only to a minimal degree, like paying the light bill. We need to shift the conversation to aligning our security programs with the business," says Alexander. "Businesses have a tendency to invest in things they see value in. We need to ensure they see the value in our cybersecurity programs -- including people, training and technology," she added. People and training are a key issue here: technology changes fast and the methods cyber criminals use to break into networks are constantly evolving, so it's important for organisations not only to hire the right people, but also to invest in training them so they can continue in their jobs by reacting to the latest threats and dealing with new forms of technology. But that doesn't start with employers: in order to ensure there are enough people to fill cybesecurity jobs going forward, education and training pathways are needed. "At a societal level, we have to do more to educate school age children about cybersecurity and career opportunities," says Jon Oltsik, Senior Principal Analyst and ESG Fellow.


Turning Microservices Inside-Out

Outbound events are already present as the preferred integration method for most modern platforms. Most cloud services emit events. Many data sources (such as Cockroach changefeeds, MongoDB change streams) and even file systems (for example Ceph notifications) can emit state change events. Custom-built microservices are not an exception here. Emitting state change or domain events is the most natural way for modern microservices to fit uniformly among the event-driven systems they are connected to in order to benefit from the same tooling and practices. Outbound events are bound to become a top-level microservices design construct for many reasons. Designing services with outbound events can help replicate data during an application modernization process. Outbound events are also the enabler for implementing elegant inter-service interactions through the Outbox Patterns and complex business transactions that span multiple services using a non-blocking Saga implementation.


Kubernetes Expands From Containers To Infrastructure Management

Google engineers and others at vendors like Portworx understood that extensions were needed to enable Kubernetes to do such jobs as manage compute allocations, data security and networking, so the CNI (container network interface) and CSI (container storage interface) were created, leading to “a new avatar for the second coming of Kubernetes,” he says. “Kubernetes was originally – and still is, obviously – being used to manage containers,” Thirumale says. “But with these extensions of CNI, CSI and security extensions, Kubernetes can actually be used to manage data and storage and manage networking and all of that. If I were to put a Kubernetes layer in the middleware layer, looking upwards, it’s managing where the containers land. But looking down, it’s actually now managing infrastructure. There’s a whole new way of managing infrastructure. The traditional way was you had to go to the storage admin and say, ‘Give me five more nodes and give it to me in these terabytes and with this capability and all of that that,’ then they’d provision your EMC box or a Pure box or NetApp box or what have you.”


How tech pros perceive the evolving state of risk in the business environment

This year’s study reveals the immense opportunity ahead for tech pros and IT leadership to align and collaborate on priorities and policies to best position not only individual organizations but the industry at large to succeed with a future built for risk preparedness. “Technology professionals today are under even greater pressure to ensure optimized, secure performance for remote workforces while facing limited time and resources for personnel training. When it comes to risk management and mitigation, prioritizing intentional investments in technology solutions that meet business needs is critical,” said Sudhakar Ramakrishna, President and CEO, SolarWinds. “More than ever before, tech pros must partner closely with business leaders to ensure they have the resources and headcount necessary to proactively address security risks. And more importantly, tech pros should constantly assess their risk management, mitigation, and protocols to avoid falling into complacency and being ‘blind’ to risk.”


Is DeepMind’s new reinforcement learning system a step toward general AI?

The combination of reinforcement learning and deep neural networks, known as deep reinforcement learning, has been at the heart of many advances in AI, including DeepMind’s famous AlphaGo and AlphaStar models. In both cases, the AI systems were able to outmatch human world champions at their respective games. But reinforcement learning systems are also notoriously renowned for their lack of flexibility. For example, a reinforcement learning model that can play StarCraft 2 at an expert level won’t be able to play a game with similar mechanics (e.g., Warcraft 3) at any level of competency. Even slight changes to the original game will considerably degrade the AI model’s performance. “These agents are often constrained to play only the games they were trained for – whilst the exact instantiation of the game may vary (e.g. the layout, initial conditions, opponents) the goals the agents must satisfy remain the same between training and testing. Deviation from this can lead to catastrophic failure of the agent,” DeepMind’s researchers write in a paper that provides the full details on their open-ended learning.


Zoom Agrees to Settle Security Lawsuit for $85 Million

The lawsuit stems from users' complaints about the company's data privacy and security practices, including instances in which customers had their video conferences interrupted by "Zoom bombing," in which attackers gained access to meeting passwords or bypassed security features and disrupted the proceedings with profanity and offensive images. During the COVID-19 global pandemic, many organizations have turned to Zoom and other tech firms for video conferencing and collaboration services, which led to an increase in hacking attempts. At one point, the U.S. Justice Department warned that prosecutors could bring federal charges against those who disrupted meetings through Zoom bombing. In April 2020, an analysis by Citizen Lab, a group based at the University of Toronto that studies surveillance and its impact on human rights, found that although Zoom advertised that it used full end-to-end encryption, the company only deployed the inadequate AES-128 encryption standard within its cloud-based videoconferencing platform.


The surprising link between creativity and risk

Though the connection between creativity and risk-taking seems intuitive, social scientists have struggled to show a direct link between the two. That’s because measuring creativity itself has proven to be devilishly difficult. “Past studies which aimed to explore the relationship between creativity and risk-taking have equated creativity to measures such as associational fluency, divergent thinking, tolerance of ambiguity, creative lifestyle, or intellectual achievements,” psychologists Vaibhav Tyagi, Yaniv Hanoch, Stephen D. Hall, and Susan L. Denham of the University of Georgia and Mark Runco of Plymouth University in the UK wrote in 2017, in Frontiers in Psychology. But, they added, “each of these measures only provides a narrow insight into some aspects of creativity.” Adopting a different approach, the researchers looked at creativity as a multidimensional trait involving self-described personality and creative achievements, ideation (the process of forming new ideas), association formation, and problem-solving, among other qualities.


The Ethical Challenges Of AI In Defence

The chief concern of using AI in defence and weaponry is that it might not perform as desired, leading to catastrophic results. For example, it might miss its target or launch attacks that are not approved, lead to conflicts. Most countries test their weapons systems reliability before deploying them in the field. But AI weapon systems can be non-deterministic, non-linear, high-dimensional, probabilistic, and continuously learning. For testing a weapon system with such capabilities, traditional testing and validation techniques are insufficient. Furthermore, the race between the world’s superpowers to outpace each other has also made people uneasy as countries might not play by the norms and consider ethics while designing weapons systems, leading to disastrous implications on the battlefield. As defence starts leaning towards technology, it becomes imperative that we evaluate the loopholes of AI-based defence technologies that bad actors might exploit. For example, adversaries might seek to misuse AI systems by messing with training data or figuring out ways to gain illegal access to training data by analysing the specifically tailored test inputs.



Quote for the day:

"True leaders bring out your personal best. They ignite your human potential" -- John Paul Warren

Daily Tech Digest - August 02, 2021

Power With Purpose: The Four Pillars Of Leadership

A leader is defined by a purpose that is bigger than themselves. When that purpose serves a greater good, it becomes the platform for great leadership. Gandhi summed up his philosophy of life with these words: “My life is my message.” That one statement speaks volumes about how he chose to live his life and share his message of non-violence, compassion, and truth with the world. When you have a purpose that goes beyond you, people will see it and identify with it. Being purpose-driven defines the nobility of one’s character. It inspires others. At its core, your leadership purpose springs from your identity, the essence of who you are. Purpose is the difference between a salesman and a leader, and in the end, the leader is the one that makes the impact on the world. ... The earmark of a great leader is their care and concern for their people. Displaying compassion towards others is not about a photo-op, but an inherent characteristic that others can feel and hear when they are with you. It lives in the warmth and timbre of your voice. It shows in every action you take. Caring leaders take a genuine interest in others. 


Using GPUs for Data Science and Data Analytics

It is now well established that the modern AI/ML systems’ success has been critically dependent on their ability to process massive amounts of raw data in a parallel fashion using task-optimized hardware. Therefore, the use of specialized hardware like Graphics Processing Units (GPUs) played a significant role in this early success. Since then, a lot of emphasis has been given to building highly optimized software tools and customized mathematical processing engines (both hardware and software) to leverage the power and architecture of GPUs and parallel computing. While the use of GPUs and distributed computing is widely discussed in the academic and business circles for core AI/ML tasks (e.g. running a 100-layer deep neural network for image classification or billion-parameter BERT speech synthesis model), they find less coverage when it comes to their utility for regular data science and data engineering tasks. These data-related tasks are the essential precursor to any ML workload in an AI pipeline and they often constitute a majority percentage of the time and intellectual effort spent by a data scientist or even an ML engineer.


The Use of Deep Learning across the Marketing Funnel

Simply put, Deep Learning is an ML technique where very large Neural networks are used to learn from the large quantum of data and deliver highly accurate outcomes. The more the data, the better the Deep Learning model learns and the more accurate the outcome. Deep Learning is at the centre of exciting innovation possibilities like Self Driven Cars, Image recognition, virtual assistants, instant audio translations etc. The ability to manage both structured and unstructured data makes this a truly powerful technology advancement. ... Differentiation not only comes from product proposition and comms but also how consumers experience the brand/service online. And here too strides in Deep Learning are enabling marketers with more sophisticated ways to create differentiation. Website Experience: Based on the consumer profile and cohort, even the website experience can be customized to ensure that a customer gets a truly relevant experience creating more affinity for the brand/service. A great example of this is Netflix where no 2 users have a similar website experience based on their past viewing of content.


Navigating the 2021 threat landscape: Security operations, cybersecurity maturity

When it comes to cybersecurity teams and leadership, the report findings revealed no strong differences between the security function having a CISO or CIO at the helm and organizational views on increased or decreased cyberattacks, confidence levels related to detecting and responding to cyberthreats or perceptions on cybercrime reporting. However, it did find that security function ownership is related to differences regarding executive valuation of cyberrisk assessments (84 percent under CISOs versus 78 percent under CIOs), board of director prioritization of cybersecurity (61% under CISOs versus 47% under CIOs) and alignment of cybersecurity strategy with organizational objectives (77% under CISOs versus 68% under CIOs). The report also found that artificial intelligence (AI) is fully operational in a third of the security operations of respondents, representing a four percent increase from the year before. Seventy-seven percent of respondents also revealed they are confident in the ability of their cybersecurity teams to detect and respond to cyberthreats, a three-percentage point increase from last year.


Don’t become an Enterprise/IT Architect…

The world is somewhere on the rapid growth part of the S-curve of the information revolution. It is at the point that the S-curve is going into maturity that speed of change slows down. It is at that point that the gap between expectation of change capacity by upper management and the reality of the actual capacity for change is going to increase. And Enterprise/IT Architects–Strategists are amongst other things tasked with bridging that worsening gap. Which means that — for Enterprise/IT Architects–Strategists and many more people who actually are active in shaping that digital landscape — as long as there is no true engagement of top management, the gap between them and their upper management looks like this red curve, which incidentally also represents how enjoyable/frustrating EA-like jobs are ... We’re in the period of rapid growth, the middle of the blue curve. That is also the period where (IT-related, which is much) change inside organisations (and society) gets more difficult every day, and thus noticeably slows down. Well run and successful projects that take more than five years are no exception. 


A Journey in Test Engineering Leadership: Applying Session-Based Test Management

Testing is a complex activity, just like software engineering or any craft that takes study, critical thinking and commitment. It is not possible to encode everything that happens during testing into a document or artifact such as a test case. The best we can do is report our testing in a way that tells a compelling and informative story about risk to people who matter, i.e. those making the decisions about the product under test. ... SBTM is a kind of activity-based test management method, which we organize around test sessions. The method focuses on the activities testers perform during testing. There are many activities that testers perform outside of testing, such as attend meetings, help developers troubleshoot problems, attend training, and so on. Those activities don’t belong in a test session. To have an accurate picture of only the testing performed and the duration of the testing effort, we package test activity into sessions.


Beware of blind spots in data centre monitoring

The answer is to combine the tools that tell you about the past and present states, with a tool that will shine a light on how the environment will behave in the future. Doing this requires the use of Computational Fluid Dynamics (CFD). A virtual simulation of an entire data center, CFD-based simulation enables operators to accurately calculate the environmental conditions of the facility. Virtual sensors, for instance, ensure the simulated data reflects the sensor data. Consequently, the results can be used to investigate conditions anywhere you want, in fine detail. CFD also extends beyond temperature maps and includes humidity, pressure and air speed. Airflow, for instance, can be traced to show how it is getting from one place to another, offering unparalleled insight into the cause of thermal challenges. Critically, CFD enables operators to simulate future resilience. A validated CFD model will offer information about any configuration of your data centre, simulating variations in current configurations, or in new ones you haven’t yet deployed.


SolarWinds CEO Talks Securing IT in the Wake of Sunburst

Specific to the pandemic, a lot of technologies, endpoint security, cloud security, and zero trust, which have proliferated after the pandemic -- organizations have changed how they talk about how they are deploying these. Previously there may have been a cloud security team and an infrastructure security team, very soon the line started getting blurred. There was very little need for network security because not many people were coming to work. It had to be changed in terms of organization, prioritization, and collaboration within the enterprise to leverage technology to support this kind of workforce. ... Every team has to be constantly vigilant about what might be happening in their environment and who could be attacking them. The other side of it is constant learning. You constantly demonstrate awareness and vigilance and constantly learn from it. The red team can be a very effective way to train an entire organization and sensitize them to let’s say a phishing attack. As common as phishing attacks are, a large majority of people, including in the technology sectors, do not know how to fully prevent them despite the fact there are lot of phishing [detection] technology tools available.


Is your network AI as smart as you think?

The challenge comes when we stop looking at collections as independent elements and start looking at networks as collections of collections. A network isn’t an anthill, it’s the whole ecosystem the anthill is inside of including trees and cows and many other things. Trees know how to be trees, cows understand the essence of cow-ness, but what understands the ecosystem? A farm is a farm, not some arbitrary combination of trees, cows, and anthills. The person who knows what a farm is supposed to be is the farmer, not the elements of the farm or the supplier of those elements, and in your network, dear network-operations type, that farmer is you. In the early days, the developers of AI explicitly acknowledged the separation between the knowledge engineer who built the AI framework and the subject-matter expert whose knowledge shaped the framework. In software, especially DevOps, the management tools aim to achieve a goal state, which in our farm analogy, describes where cows, trees, and ants fit in. If the current state isn’t the goal state, they do stuff or move stuff around to converge on the goal. It’s a great concept, but for it to work we have to know what the goal is. 


Milvus 2.0: Redefining Vector Database

The microservice design of Milvus 2.0, which features read and write separation, incremental and historical data separation, and CPU-intensive, memory-intensive, and IO-intensive task separation. Microservices help optimize the allocation of resources for the ever-changing heterogeneous workload. In Milvus 2.0, the log broker serves as the system's backbone: All data insert and update operations must go through the log broker, and worker nodes execute CRUD operations by subscribing to and consuming logs. This design reduces system complexity by moving core functions such as data persistence and flashback down to the storage layer, and log pub-sub make the system even more flexible and better positioned for future scaling. Milvus 2.0 implements the unified Lambda architecture, which integrates the processing of the incremental and historical data. Compared with the Kappa architecture, Milvus 2.0 introduces log backfill, which stores log snapshots and indexes in the object storage to improve failure recovery efficiency and query performance.



Quote for the day:

"Expression is saying what you wish to say, Impression is saying what others wish to listen." -- Krishna Sagar

Daily Tech Digest - August 01, 2021

For tech firms, the risk of not preparing for leadership changes is huge

Tech execs should be more rigorous about succession planning for one important reason: institutional memory. Tech firms generally are younger than other companies of a similar size, which partly explains why the median age of S&P 500 companies plunged to 33 years in 2018 from 85 years in 2000, according to McKinsey & Co. These enterprises clearly have accomplished a lot in their short lives, but in their haste, most have not captured their history, unlike their longer-lived peers in other sectors. Less than half of these tech firms, in fact, have formally recorded their leader’s story for posterity. That puts them at a disadvantage when, inevitably, they will be required to onboard newcomers to their C-suites. It’s best to record this history well before the intense swirl of a leadership transition begins. Crucially, it will help the incoming and future generations of leadership understand critical aspects of its track record, the lessons learned, culture and identity. It also explains why the organization has evolved as it has, what binds people together and what may trigger resistance based on previous experience. It’s as much about moving forward as looking back.


The importance of having accountability in AI ethics

In recent years, the EU has made conscious steps towards addressing some of these issues, laying the groundwork for proper regulation for the technology. Its most recent proposals revealed plans to classify different AI applications depending on their risks. Restrictions are set to be introduced on uses of the technology that are identified as high-risk, with potential fines for violations. Fines could be up to 6pc of global turnover or €30m, depending on which is higher. But policing AI systems can be a complicated arena. Joanna J Bryson is professor of ethics and technology at the Hertie School of Governance in Berlin, whose research focuses on the impact of technology on human cooperation as well as AI and ICT governance. She is also a speaker at EmTech Europe 2021, which is currently taking place in Belfast as well as online. Bryson holds degrees in psychology and artificial intelligence from the University of Chicago, the University of Edinburgh and MIT. It was during her time at MIT in the 90s that she really started to pick up on the ethics around AI.


Data Platform: Data Ingestion Engine for Data Lake

When we design and build a Data Platform, we always need to evaluate if automation provides enough value to compensate the team effort and time. Time is the only resource that we can not scale. We can increase the team, but the relationship between people and productivity is not direct. Sometimes when a team is very focused on the automation paradigm, people want to automate everything, even actions that only require one time or do not provide real value. ... Usually, this is not an easy decision, and it has to be evaluated by all the team. In the end, it is an ROI decision. I don't like this concept very much because it often focuses on economic costs and forgets about people and teams. Before starting any design and development, we have to analyze if there are tools available to cover our needs. As software engineers, we often want to develop our software. But, from a team or product view, we should focus our efforts on the most valuable components and features. The goal of the Data Ingestion Engine is to make it easier the data ingestion from the data source into our Data Platform providing a standard, resilient and automated ingestion layer.


Beyond OAuth? GNAP for Next Generation Authentication

With GNAP, a client can ask for multiple access tokens in one grant request (vs. multiple requests). For instance, you could request read privileges on one resource and read and write privileges on another. ... In GNAP, the requesting client declares what kinds of interactions it supports. The authorization server responds to the request with an interaction to be used to communicate with the resource owner or the resource client. These interactions are defined in the GNAP spec as first-class objects, which provides extension points for future communication. Interactions may include redirecting the browser, opening a deeplink URL in a mobile application or providing a user code to be used elsewhere. ... GNAP provides a grant identifier if the authorization server determines a grant can be continued, unlike OAuth2. In the sample below, the grant identifier, access_token.value, can be presented to the authorization server if the grant needs to be modified or continued after the initial request.


The Future Of Work Will Demand These 8 New Skills

Closely related to entrepreneurship is resilience. Humans are nothing if not adaptable but embracing shifts and bouncing forward (rather than back) will require new competencies. The skill of resilience requires you to 1) stay aware of new information 2) make sense of it 3) reinvent, innovate and solve problems. Finding fresh approaches and flexing based on your insights will be fundamental to success. ... Inherent to moving forward, is the ability to believe in a positive future and focus on possibilities. When experts find fault with a lack of responsiveness, it’s often the result of a lack of imagination. The skills of being able to envision and foresee what might happen are critical to staying motivated, inspired and driven to create new beginnings. ... Success has always been about your network, but achievement in the future will depend even more on the strength of relationships. Your social capital and primary, secondary and tertiary relationships will be critical netting to offer you new learning, access to new opportunities and social support. The new skill will be the ability to build rapport—and to build it quickly and it from a distance.


Will Artificial Intelligence Be the End of Web Design & Development

Whilst there has been plenty of hype in recent years around the impact AI will have to the website design and development community, the reality is that Artificial (Design) Intelligence technology is still very-much in its infancy …and there’s a long way to go before we see web designers and developers being replaced by robots. AI-powered platforms and tools are actually making digital creatives and engineers more productive and more effective, allowing them to produce higher-quality, digital experiences at a lower-cost. The concept behind using Artificial Intelligence to create websites is quite simple: AI-powered code-completion tools are used to “make” a website on its own and then machine learning is leveraged to optimize the user interface – entirely through adaptive intelligence, with minimal human intervention. ... The power of human creativity brings with it an innate curiosity; we are always looking to challenge the status-quo and experiment with new forms and aesthetics. Creativity will always be a human endeavor. 


Intelligent ERP: What It Takes To Thrive In A World Of Big Data

While challenging, this requirement led to an innovation that helped the payment services provider optimize its financial operations and better understand and expand its business. ZPS collaborated with the University of Seville in Spain to build a customized cash-flow model to uncover valuable liquidity and financial planning insights. Within this guarantee-monitoring model, ZPS uses Intelligent ERP to replicate data on contract accounts receivable in near-real time to a business warehousing solution and other reporting applications. An in-memory database then processes the data, calculates key figures such as customer cash-in and factoring cash-outs, and uses these figures to determine the amounts to be guaranteed each day. Furthermore, with a live connection to its business warehousing solution, ZPS uses a cloud-based analytics solution to let employees access calculated data and consume reports through intuitive dashboards and predictive stories. By amplifying the value of its Big Data with Intelligent ERP and augmented analytics, ZPS allows a larger circle of business users to gain insights into financial KPIs, such as gross customer cash-ins or days from order. 


Is McKinsey wrong about the financial benefits of diversity?

The authors emphasize that this isn’t definitive proof that there is no connection between racial and ethnic diversity and profits—more research is needed on that front. They also note several other important caveats, including that S&P 500 companies are not a random sample of public US firms, and that their method of identifying race and ethnicity among executives (using faces and names) is likely to overestimate the number of white executives. But they criticize McKinsey’s methodology, including its metric for measuring diversity among executives. They conclude that “caution is warranted in relying on McKinsey’s findings to support the view that US publicly traded firms can deliver improved financial performance if they increase the racial/ethnic diversity of their executives.” Among the additional research that Green and Hand call for is a way to better examine whether there is any causal relationship between a firm’s diversity and its financial performance. McKinsey, by its own admission, is only looking at correlation. 


Data scientists continue to be the sexiest hires around

With the value of data science clear in the potential of these industries, there is no reason to believe data science will be anything but a growing profession for years and years to come. AI adoption alone has skyrocketed in recent years. Now, half of all surveyed organizations say they have applied AI to fulfill at least one function, with many more intending to invest in data-driven solutions. As the accessibility and power of data become more common, so too does the need for data scientists. Now, data scientists must help businesses navigate a world of global data collection and applications. From securing business processes to meeting international data security standards to connecting new and vital patterns in business trends, data scientists are vital to the success of innumerable businesses across industries. One such measure they can be part of is setting global data security standards for various industries. Data science is still one of the sexiest jobs you can have because it increasingly means helping people and saving money. 


Stanford Researchers Put Deep Learning On A Data Diet

With the cost for deep learning model training on the rise, individual researchers and small organisations are settling for pre-trained models. Today, the likes of Google or Microsoft have budgets (read:millions of dollars) for training state of the art language models. Meanwhile, efforts are underway to make the whole paradigm of training less daunting for everyone. Researchers are actively exploring ways to maximise training efficiency to make models run faster and use less memory. A common practice is to train small models until they converge and then run a compression technique lightly. Techniques like parameter pruning have already become popular for reducing redundancies without sacrificing accuracy. In pruning, redundancies in the model parameters are explored, and the uncritical yet redundant ones are removed. Identifying important training data plays a role in online and active learning. But how much of the data is superfluous? ... For instance, the capabilities of computer vision systems have improved greatly due to (a) deeper models with high complexity, (b) increased computational power and (c) availability of large-scale labeled data. 



Quote for the day:

"Successful leadership requires positive self-regard fused with optimism about a desired outcome." -- Warren Bennis