Daily Tech Digest - January 01, 2025

The Architect’s Guide to Open Table Formats and Object Storage

Data lakehouse architectures are purposefully designed to leverage the scalability and cost-effectiveness of object storage systems, such as Amazon Web Services (AWS) S3, Google Cloud Storage and Azure Blob Storage. ... Data lakehouse architectures are purposefully designed to leverage the scalability and cost-effectiveness of object storage systems, such as Amazon Web Services (AWS) S3, Google Cloud Storage and Azure Blob Storage. This integration enables the seamless management of diverse data types — structured, semi-structured and unstructured — within a unified platform. ... The open table formats also incorporate features designed to boost performance. These also need to be configured properly and leveraged for a fully optimized stack. One such feature is efficient metadata handling, where metadata is managed separately from the data, which enables faster query planning and execution. Data partitioning organizes data into subsets, improving query performance by reducing the amount of data scanned during operations. Support for schema evolution allows table formats to adapt to changes in data structure without extensive data rewrites, ensuring flexibility while minimizing processing overhead.


The future of open source will be messy

First, it’s important to point out that open source software is both pervasive and foundational. Where would we be without Linux and the vast treasure trove of other open source projects on which the internet is built? However, the vast majority of software, written for use or sale, is not open source. This has always been true. Developers do care about open source, and for good reason, but it is not their top concern. As Redis CEO Rowan Trollope told me in a recent interview, “If you’re the average developer, what you really care about is capability: Does this [software] offer something unique and differentiated that’s awesome that I need in my application.” ... Meanwhile, Meta and the rest of the industry keep releasing new code, calling it open source or open weights (Sam Johnston offers a great analysis), without much concern for what the OSI or anyone else thinks. Johnston may be exaggerating when he says, “The more [the word] open appears in an artificial intelligence product’s branding, the less open it actually tends to be,” but it’s clear that the term open gets used a lot, starting with category leader OpenAI, which is not open in any discernible sense, without much concern for any traditional definitions. 


What’s next for generative AI in 2025?

“Data is the lifeblood of any AI initiative, and the success of these projects hinges on the quality of the data that feeds the models,” said Andrew Joiner, CEO of Hyperscience, which develops AI-based office work automation tools. “Alarmingly, three out of five decision makers report their lack of understanding of their own data inhibits their ability to utilize genAI to its maximum potential. The true potential…lies in adopting tailored SLMs, which can transform document processing and enhance operational efficiency.” Gartner recommends that organizations customize SLMs to specific needs for better accuracy, robustness, and efficiency. “Task specialization improves alignment, while embedding static organizational knowledge reduces costs. Dynamic information can still be provided as needed, making this hybrid approach both effective and efficient,” the research firm said. ... While Agentic AI architectures are a top emerging technology, they’re still two years away from reaching the lofty automation expected of them, according to Forrester. While companies are eager to push genAI into complex tasks through AI agents, the technology remains challenging to develop because it mostly relies on synergies between multiple models, customization through retrieval augmented generation (RAG), and specialized expertise. 


The Perils of Security Debt: Serious Pitfalls to Avoid

Security debt is caused by a failure to “build security in” to software from the design to deployment as part of the SDLC. Security debt accumulates when a development organization releases software with known issues, deferring the redressal of its weaknesses and vulnerabilities. Sometimes the organization skips certain test cases or scenarios in pursuit of faster deployment and in the process failing to test software thoroughly. Sometimes the business decides that the pressure to finish a project is so great that it makes more sense to release now and fix issues later. Later is better than never, but when “later” never arrives, existing security debt becomes worse. ... Great leadership is the beacon that not only charts the course but also ensures your crew – your IT team, support staff, and engineers – are well-prepared to face the challenges ahead. It instills discipline, vigilance, and a culture of security that can withstand the fiercest digital storms. The Board and leadership must understand and champion the importance of security for the organization. By setting the tone at the top, they can drive the cultural and procedural changes needed to prevent the accumulation of the security debt. Periodic review and monitoring of security metrics, and identifying & tracking security debt as a risk can help keep the organization accountable and on track.


The long-term impacts of AI on networking

Every enterprise who self-hosted AI told me the mission demanded more bandwidth to support “horizontal” traffic than their normal applications, more than their current data center needed to support. Ten of the group said that this meant they’d need the “cluster” of AI servers to have faster Ethernet connections and higher-capacity switches. Everyone agreed that a real production deployment of on-premises AI would need new network devices, and fifteen said they bought new switches even for their large-scale trials. The biggest problem with the data center network I heard from those with experience is that they believed they built up more of an AI cluster than they needed. Running a popular LLM, they said, requires hundreds of GPUs and servers, but small language models can run on a single system, and a third of current self-hosting enterprises said they believed it is best to start small, with small models, and build up only when you had experience and could demonstrate a need. This same group also pointed out that control was needed to ensure only truly useful AI applications where run. “Applications otherwise build up, exceed, and then increase, the size of the AI cluster,” said users. 


Bridging Skill Gaps in the Automotive Industry with AI-Led Immersive Simulations

This crisis of personnel shortfall is particularly acute in sectors like autonomous driving and AI-driven manufacturing, where the required skillset surpasses the capabilities of the current workforce. This alarming shortage of specialised expertise poses a serious threat to the industry’s progress. It could potentially lead to production halts at various facilities, delay the launch of next-generation vehicles, and hinder the transition to self-driving cars powered by sustainable energy. In order to address this issue, orthodox educational methods must be modernised to incorporate cutting-edge technologies like AI and robotics. ... Unlike traditional training, which often involves static lessons or expensive hands-on practice, immersive simulations allow workers to practice in environments that would be too risky or costly in real life. For example, with autonomous vehicles, workers can practice fixing and calibrating vehicle systems in a virtual world without the risk of damaging anything. These simulations can also create different road conditions for workers to experience, helping them build critical decision-making skills without real-world consequences. 


AI agents might be the new workforce, but they still need a manager

AI agents need to be thoughtfully managed, just as is the case with human work, and there's work to be done before an agentic AI-driven workforce can truly assume a broad range of tasks. "While the promise of agentic AI is evident, we are several years away from widespread agentic AI adoption at the enterprise level," said Scott Beechuk, partner with Norwest Venture Partners. "Agents must be trustworthy given their potential role in automating mission-critical business processes." The traceability of AI agents' actions is one issue. "Many tools have a hard time explaining how they arrived at their responses from users' sensitive data and models struggle to generalize beyond what they have learned," said Ananthakrishnan. ... Unpredictability is a related challenge, as LLMs "operate like black boxes," said Beechuk. "It's hard for users and engineers to know if the AI has successfully completed its task and if it did so correctly." ... Human workers also are capable of collaborating easily and on a regular basis. For AI workers, it's a different story. "Because agents will interact with multiple systems and data stores, achieving comprehensive visibility is no easy task," said Ananthakrishnan. It's important to have visibility to capture each action an agent takes.


Change management: Achieve your goals with the right change model

You need a good leadership team of influential people who are all pulling in the same direction. This is the only way to implement upcoming changes and anchor them in the company. It is important to include people in the leadership team who have a great deal of influence and/or are well respected by the workforce. At the same time, these people must be fully committed to the planned change. ... Communication comes before implementation. Those affected must understand it to become participants or supporters. Initiating measures without first explaining the context to those involved would unnecessarily create unrest in the company. When communicating, it makes sense to proceed in several steps: the change team first informs the clients and gets a “go” from them. After that, the change team informs the managers so that they can answer questions from employees during company-wide communication. ... Quick wins must be realized and made visible to increase motivation. Quick wins should therefore also be identified when defining objectives, because success is important to ensure that the initial motivation does not fizzle out. Initial successes should be related to the overarching goal, because then they strengthen intrinsic motivation. Small successes can thus have a big impact.


Forrester on cybersecurity budgeting: 2025 will be the year of CISO fiscal accountability

Forrester sees the increasing adoption of AI and generative AI (gen AI) as driving the needed updates to infrastructure. “Any Gen AI project that we discussed with customers ultimately becomes a data integration project,” says Pascal Matska, vice president and research director at Forrester. “You have to invest into specific capabilities and platforms that run specific AI workloads in the most suitable infrastructure at the right price point, and also drive investments into cloud-native technologies such as Kubernetes and containers and modern data platforms that really are there to help you drive out some of the frictions that exist within the different business silos,” Matska continued. ... CISOs who drive gains in revenue advance their careers. “When something touches as much revenue as cybersecurity does, it is a core competency. And you can’t argue that it isn’t,” Jeff Pollard, VP and principal analyst at Forrester, said during his keynote titled “Cybersecurity Drives Revenue: How to Win Every Budget Battle” at the company’s Security and Risk Forum in 2022. Budgeting to protect revenue needs to start with the weakest, most at-risk areas. These include software supply chain security, API security, human risk management, and IoT/OT threat detection. 


Passkey technology is elegant, but it’s most definitely not usable security

"The problem with passkeys is that they're essentially a halfway house to a password manager, but tied to a specific platform in ways that aren't obvious to a user at all, and liable to easily leave them unable to access ... their accounts," wrote the Danish software engineer and programmer, who created Ruby on Rails and is the CTO of web-based software development firm 37signals. "Much the same way that two-factor authentication can do, but worse, since you're not even aware of it." ... The security benefits of passkeys at the moment are also undermined by an undeniable truth. Of the hundreds of sites supporting passkeys, there isn't one I know of that allows users to ditch their password completely. The password is still mandatory. And with the exception of Google's Advanced Protection Program, I know of no sites that won't allow logins to fall back on passwords, often without any additional factor. ... Under the FIDO2 spec, the passkey can never leave the security key, except as an encrypted blob of bits when the passkey is being synced from one device to another. The secret key can be unlocked only when the user authenticates to the physical key using a PIN, password, or most commonly a fingerprint or face scan. In the event the user authenticates with a biometric, it never leaves the security key, just as they never leave Android and iOS phones and computers running macOS or Windows.



Quote for the day:

"You are a true success when you help others be successful." -- Jon Gordon

No comments:

Post a Comment