Daily Tech Digest: Daily Tech Digest

Navigating the Crossroads of Data Confidentiality and AI

Striking a balance between ensuring data privacy and maximizing the effectiveness of AI models can be quite complex. The more data we utilize for training AI systems, the more accurate and powerful they become. However, this practice often clashes with the need to safeguard privacy rights. Techniques like federated learning offer a solution by allowing AI models to be trained on data sources without sharing raw information. For the uninitiated, Federated Learning leverages the power of edge computing to train local models. These models use data that never leaves the private environment (like your phone, IoT devices, corporate terminals, etc.). Once the local models are trained, they are then leveraged to build a centralized model that can be used for related use cases. ... Due to the recent acceleration in the adoption of AI, government regulations play a pivotal role in shaping the future of AI and data confidentiality. Legislators are increasingly recognizing the significance of data privacy and are implementing laws such as the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA).

CISOs vs. developers: A battle over security priorities

“Developers and CISOs juggle numerous security priorities, often conflicting across organizations,” noted Luke Shoberg, Global CISO at Sequoia Capital. “The report emphasizes the need for internal assessments, fostering deeper collaboration, and building trust among teams managing this critical domain. Recognizing technical and cultural obstacles, organizations have made significant strides in understanding the importance of securing the software supply chain for sustained business success.” “The world of software consumption and security has radically changed. From containers to the explosion of open source components, every motion has been toward empowering developers to build faster and better,” said Avon Puri, Global Chief Digital Officer at Sequoia Capital. “But with that progress, the security paradigm has been challenged to refocus on better controls and guarantees for the provenance of where software artifacts come from and that their integrity is being maintained. The survey shows developers and security teams are wrestling with this new reality in the wake of major exploits like Log4j and SolarWinds.

Deception technology use to grow in 2024 and proliferate in 2025

It's worth mentioning that all scanning, data collection, processing, and analysis will be continuous to keep up with changes to the hybrid IT environment, security defenses, and the threat landscape. When organizations implement a new SaaS service, deploy a production application, or make changes to their infrastructure, the deception engine notes these changes and adjusts its deception techniques accordingly. Unlike traditional honeypots, burgeoning deception technologies won't require cutting-edge knowledge or complex setup. While some advanced organizations may customize their deception networks, many firms will opt for default settings. In most cases, basic configurations will sufficiently confound adversaries. Remember, too, that deception elements like decoys and lures remain invisible to legitimate users. Therefore, when someone goes poking at a breadcrumb or canary token, you are guaranteed that they are up to no good. In this way, deception technology can also help organizations improve security operations around threat detection and response.

What Role Will Open-Source Hardware Play in Future Designs?

The extent of open-source hardware’s impact on electronics design is still uncertain. While it could likely lead to all these benefits, it also faces several challenges to mainstream adoption. The most significant of these is the volatility and high costs of the necessary raw materials. Roughly 70% of all silicon materials come from China. This centralization makes prices prone to fluctuations from local disruptions in China or throughout the supply chain. Similarly, long shipping distances raise related prices for U.S. developers. Even if integrated circuit design becomes more accessible, these costs keep production inaccessible, slowing open-source devices’ growth. Similarly, industry giants may be unwilling to accept the open-source movement. While open-source designs open new revenue streams, these market leaders profit greatly from their proprietary resources. The semiconductor fabs supporting these large companies are even more centralized. It may be difficult for open-source hardware to compete if these organizations don’t embrace the movement.

How Should Developers Respond to AI?

“Unionizing against AI” wasn’t a specific goal, Quick clarified in an email interview with The New Stack. He’d meant it as an example of the level of just how much influence can come from a united community. “My main thought is around the power that comes with a group of people that are working together.” Quick noted what happened when the United Auto Workers went on strike. “We are seeing big changes happening because the people decided collectively they needed more money, benefits, etc. I can only begin to guess at what an AI-related scenario would be, but maybe in the future, it takes people coming together to push for change on regulation, laws, limitations, etc.” Even this remains a concept more than any tangible movement, Quick stressed in his email. “Honestly, I don’t have much more specific actions or goals right now. We’re just so early on that all we can do is guess.” But there is another scenario where Quick thinks community action would be necessary to push for change: the hot-button issue of “who owns the code.”

Security, privacy, and generative AI

For many of the proposed applications in which LLMs should excel, delivering false responses can have serious consequences. Luckily, many of the mainstream LLMs have been trained on numerous sources of data. This allows these models to speak on a diverse set of topics with some fidelity. However, there is typically insufficient knowledge around specialized domains in which data is relatively sparse, such as deep technical topics in medicine, academia, or cybersecurity. As such, these large base models are typically further refined via a process called fine-tuning. Fine-tuning allows these models to achieve better alignment with the desired domain. Fine-tuning has become such a pivotal advantage that even OpenAI recently released support for this capability to compete with open-source models. With these considerations in mind, consumers of LLM products who want the best possible outputs, with minimal errors, must understand the data in which the LLM is trained (or fine-tuned) to ensure optimal usage and applicability.

How to keep remote workers connected to company culture

As important as workplace collaboration and communication tools are, technology alone can’t keep remote workers engaged with business objectives. Before the pandemic, auto finance firm Credit Acceptance centered its operations around in-person interactions in its offices, for which it got accolades; after COVID-19 arrived, the company’s 2,200 employees had to work remotely. “You didn't work from home at all – [only in] rare circumstances,” said Wendy Rummler, chief people officer at Credit Acceptance. “We considered our culture too important, [we believed that] we couldn't maintain it if we had a fully remote workforce, or even partially for that matter.” Fast forward a couple of years and the picture is markedly different now, with almost all staffers now fully remote. Internal pulse surveys have found that employee engagement has remained as high as before the pandemic, said Rummler. This is no accident, she said; Credit Acceptance deliberately set out to maintain its work culture without regular person-to-person interactions.

Should AI Require Societal Informed Consent?

The concept of societal informed consent has been discussed in engineering ethics literature for more than a decade, and yet the idea has not found its way into society, where the average person goes about their day assuming that technology is generally helpful and not too risky. In most cases, technology is generally helpful and not too risky, but not in all cases. As artificial intelligence grows more powerful and is applied to more new fields (many of which may be inappropriate), these cases will multiply. How will technology producers know when their technologies are not wanted if they never ask the public? ... One of the characteristics of a representative democracy is that -- at least in theory -- our elected officials are looking out for the well-being of the public. ... It is time for the government and the public to have a new conversation, one about technology -- specifically artificial intelligence. In the past we’ve always given technology the benefit of the doubt; tech was “innocent until proven guilty” and a long-time familiar phrase in and around Silicon Valley has been “it’s better to ask forgiveness, not permission.” We no longer live in that world.

Harnessing the potential of generative AI in marketing

Augmenting human creativity with the power of generative AI holds so much promise that the use cases we know now are only the tip of the proverbial iceberg. Companies that are looking to get a head start should, therefore, ensure that they have laid down the foundations for doing so. An important consideration in deploying generative AI is the availability of data. Contextualisation is a key benefit of generative AI and large language models (LLMs). But for enterprises with legacy, on-premise systems, their data is usually isolated within silos. Organisations looking to deploy generative AI solutions for their marketing efforts should leverage cloud data platforms to unify all their internal data. Aside from breaking down silos, businesses should also ensure seamless access to all their data. A lot of the data generated by marketing teams is either unstructured or semi-structured; such as social media posts, emails, and text documents, to name a few. Marketing teams should ensure that their cloud data platforms can load, integrate, and analyse all types of data.

Managing Missing Data in Analytics

Missing at Random (MAR) is a very common missing data situation encountered by data scientists and machine learning engineers. This is mainly because MCAR and MNAR-related problems are handled by the IT department, and data issues are addressed by the data team. MAR data imputation is a method of substituting missing data with a suitable value. Some commonly used data imputation methods for MAR are:In hot-deck imputation, a missing value is imputed from a randomly selected record coming from a pool of similar data records. In hot-deck imputation, the probabilities of selecting the data are assumed equal due to the random function used to impute the data. In cold-deck imputation, the random function is not used to impute the value. Instead, other functions, such as arithmetic mean, median, and mode, are used. With regression data imputation, for example, multiple linear regression (MLR), the values of the independent variables are used to predict the missing values in the dependent variable by using a regression model. Here, first the regression model is derived, then the model is validated, and finally the new values, i.e., the missing values, are predicted and imputed.

Quote for the day:

"Failure isn't fatal, but failure to change might be" -- John Wooden

Daily Tech Digest

Pages

Daily Tech Digest - November 13, 2023