Daily Tech Digest: Daily Tech Digest

The Beautiful Lies of Machine Learning in Security

The biggest challenge in ML is availability of relevant, usable data to solve your problem. For supervised ML, you need a large, correctly labeled dataset. To build a model that identifies cat photos, for example, you train the model on many photos of cats labeled "cat" and many photos of things that aren't cats labeled "not cat." If you don’t have enough photos or they're poorly labeled, your model won't work well. In security, a well-known supervised ML use case is signatureless malware detection. Many endpoint protection platform (EPP) vendors use ML to label huge quantities of malicious samples and benign samples, training a model on "what malware looks like." These models can correctly identify evasive mutating malware and other trickery where a file is altered enough to dodge a signature but remains malicious. ML doesn't match the signature. It predicts malice using another feature set and can often catch malware that signature-based methods miss. However, because ML models are probabilistic, there's a trade-off. ML can catch malware that signatures miss, but it may also miss malware that signatures catch.

6 Machine Learning Algorithms to Know About When Learning Data Science

Decision trees are models that resemble a tree like structure containing decisions and possible outcomes. They consist of a root node, which forms the start of our tree, decision nodes which are used to split the data based on a condition, and leaf nodes which form the terminal points of the tree and the final outcome. Once a decision tree has been formed, we can use it to predict values when new data is presented to it. ... Random Forest is a supervised ensemble machine learning algorithm that aggregates the results from multiple decision trees, and can be applied to classification and regression based problems. Using the results from multiple decision trees is a simple concept and allows us to reduce the problem of overfitting and underfitting experienced with a single decision tree. To create a Random Forest we first need to randomly select a subset of samples and features from the main dataset, a process known as “Bootstraping”. This data is then used to build a decision tree. Carrying out bootstrapping avoids issues of the decision trees being highly correlated and improves model performance.

Data science isn’t particularly sexy, but it’s more important than ever

Not only is data cleansing an essential part of data science, it’s actually where data scientists spend as much as 80% of their time. It has ever been thus. As Mike Driscoll described in 2009, such “data munging” is a “painful process of cleaning, parsing and proofing one’s data.” Super sexy! Now add to that drudgery the very real likelihood that enterprises, as excited as they are to jump into data science, many lack “a suitable infrastructure in place to start getting value out of AI,” as Jonny Brooks has articulated: The data scientist likely came in to write smart machine learning algorithms to drive insight but can’t do this because their first job is to sort out the data infrastructure and/or create analytic reports. In contrast, the company only wanted a chart that they could present in their board meeting each day. The company then gets frustrated because they don’t see value being driven quickly enough and all of this leads to the data scientist being unhappy in their role. As I have written before: “Data scientists join a company to change the world through data, but quit when they realize they’re merely taking out the data garbage.”

Top 7 Skills Required to Become a Data Scientist

Having a deep understanding of machine learning and artificial intelligence is a must to have to implement tools and techniques in different logic, decision trees, etc. Having these skill sets will enable any data scientist to work and solve complex problems specifically that are designed for predictions or for deciding future goals. Those who possess these skills will surely stand out as proficient professionals. With the help of machine learning and AI concepts, an individual can work on different algorithms and data-driven models, and simultaneously can work on handling large data sets such as cleaning data by removing redundancies. ... The base of establishing your career as a data science professional will require you to have the ability to handle complexity. One must ensure to have the capability to identify and develop both creative and effective solutions as and when required. You might face challenges in finding out ways to develop any solution that possibly needs to have clarity in concepts of data science by breaking down the problems into multiple parts to align them in a structured way.

The Psychology Of Courage: 7 Traits Of Courageous Leaders

Like so many complex psychological human characteristics, courage can be difficult to nail down. On the surface, courage seems like one of those “I know it when I see it” concepts. In my twenty years spent facilitating and coaching innovation, creativity, strategy and leadership programs, and in partnership with Dr. Glenn Geher of the Psychology Department of the State University of New York at New Paltz, I’ve identified behavioral attributes that often correlate with a person’s access to their courage. Each attribute has influential effects on organizational culture at all levels. Fostering these attributes in your own life (at work and beyond) and within your team can help you lead toward the courageous future you’re striving to achieve. ... Courage requires taking intentional risks. And the bigger the risk, the more courage it takes (and the bigger the outcome can be). Those who understand the importance of facing fear and being vulnerable, who accept that falling and getting up again is part of the journey, tend to have quicker access to their courage.

There is a path to replace TCP in the datacenter

"The problem with TCP is that it doesn't let us take advantage of the power of datacenter networks, the kind that make it possible to send really short messages back and forth between machines at these fine time scales," John Ousterhout, Professor of Computer Science at Stanford, told The Register. "With TCP you can't do that, the protocol was designed in so many ways that make it hard to do that." It's not like the realization of TCP's limitations is anything new. There has been progress to bust through some of the biggest problems, including in congestion control to solve the problem of machines sending to the same target at the same time, causing a backup through the network. But these are incremental tweaks to something that is inherently not suitable, especially for the largest datacenter applications (think Google and others). "Every design decision in TCP is wrong for the datacenter and the problem is, there's no one thing you can do to make it better, it has to change in almost every way, including the API, the very interface people use to send and receive data. It all has to change," he opined.

Typemock Simplifies .NET, C++ Unit Testing

When testing legacy code, you need to test small parts of the logic one by one, such as the behavior of a single function, method or class. To do that the logic must be isolated from the legacy code, he explained. As Jennifer Riggins explained in a previous post, unit testing differs from integration testing, which focuses on the interaction between these units or components, and catches errors at the unit level earlier, so the cost of fixing them is dramatically reduced. ... Typemock uses special code that can intersect with the flow of the software, and instead of calling the real code, it doesn’t matter whether it’s a real method or a virtual method, it can intercept it, and you can fake different things in the code, he said. Typemock has been around since 2004 when Lopian launched the company with Roy Osherove, a well-known figure in test-driven development. They first released Typemock Isolator in 2006, a tool for unit testing SharePoint, WCF and other .NET projects. Isolator provides an API helps users write simple and human-readable tests that are completely isolated from the production code.

Why Web 3.0 Will Change the Current State of the Attention Economy Drastically

The attention economy requires improvements, and Web 3.0 is capable of making them happen. In the foreseeable future, it will drastically change the interplay between consumers, advertisers and social media platforms. Web 3.0 will give power to the people. It may sound pompous, but it's true. How is that possible? Firstly, Web 3.0 will grant users ownership of their data, so you'll be able to treat your data like it's your property. Secondly, it will enable you to be paid for the work you are doing when making posts and giving likes on social media. Both options provide you with the opportunity to monetize the attention that you give and receive. The agreeable thing about Web 3.0 is that it's all about honest ownership. If a piece of art can be an NFT with easily traceable ownership, your data can be too. If you own your data, you can monetize or offer it on your terms, knowing who is going to use it and how. For instance, there is Permission, a tokenized Web 3.0 advertising platform that connects brands with consumers, with the latter getting crypto rewards for their data and engagement.

Serverless-first: implementing serverless architecture from the transformation outset

While a serverless-first mindset provides a range of benefits, some businesses may be hesitant to make the transition due to concerns around cloud provider security, vendor lock-in, sunk costs from other strategies and ongoing issues with debugging and development environments. However, even among the most serverless-adverse, this mindset can provide benefits to a select part of an organisation. Take for example a bank’s operations. While the maintenance of a traditional network infrastructure is crucial for uptime of the underlying database, with a serverless approach they have the freedom to implement an agile mindset with consumer-facing apps and technologies as demand grows. Agile and serverless strategies typically go hand-in-hand, and both can encourage quick development, modification and adaptation. In relation to concerns around vendor lock-in, some organisations may look towards a cloud-agnostic strategy. However, writing software for multiple clouds removes the ability to use features offered by one specific cloud, meaning any competitive advantage of using a specific vendor is then lost.

CISO in the Age of Convergence: Protecting OT and IT Networks

Pan Kamal, head of products at BluBracket, a provider of code security solutions, says one of the first steps an organization can take is to create an IT-OT convergence task force that maps out the asset inventory and then determine where IT security policy needs to be applied within the OT domain. “Review industry-specific cybersecurity regulations and prioritize implementation of mandatory security controls where called for,” Kamal adds. “I also recommend investing in a converged dashboard -- either off the shelf or create a custom dashboard that can identify vulnerabilities and threats and prioritize risk by criticality.” Then, organizations must examine the network architecture to see if secure connections with one-way communications -- via data diodes for example -- can eliminate the possibility of an intruder coming in from the corporate network and pivoting to the OT network Another key element is conducting a review of security policies related to both the equipment and the software supply chain, which can help identify secrets in code present in git repositories and help remediate them prior to the software ever being deployed.

Quote for the day:

"Inspired leaders move a business beyond problems into opportunities." -- Dr. Abraham Zaleznik

Daily Tech Digest

Pages

Daily Tech Digest - July 28, 2022