Daily Tech Digest by Kannan Subbiah: data bias

Showing posts with label data bias. Show all posts

Daily Tech Digest - October 08, 2022

How to manage IT infrastructure in a fast-growing company: the DataRobot experience

With Jamf, we offered a new form of employee communication with IT through the IT Self-Service application. In fact, it is a portal for company employees to change the status quo in established business processes within the company. Our position: IT Self-Service is an employee’s first IT companion and the first line of IT help. The main idea of this service is to create conditions to reduce the load on the IT-team and reduce the number of open tickets to HelpDesk. This means more efficient use of the company’s IT resources. ... Since classical DevOps engineers were at the origin of the company’s IT onboarding process automation, the scenario of computer preparation for onboarding was implemented with the world’s most popular DevOps configuration management system, Ansible. It’s written in Python using the declarative markup language YAML. The approach was respectable because it solved the problem of preparing computers for both macOS/Ubuntu platforms with a platform-dependent branching of the deployment script.

How to make your APIs more discoverable

API discoverability is a key aspect of any API management initiative. The discoverability of an API directly impacts its adoption and usage. A typical big enterprise with multiple development teams might build hundreds of APIs that they would want to reuse internally or share with partners that build complementary applications. If the teams are not able to discover existing APIs, they might build a new API with the same functionality. It might lead to a duplication of efforts and underutilization of the existing API. It is also an unscalable practice to contact the API developer each time someone wants to use the API. There needs to be a better and more hands-off way for internal teams and partners to discover and understand the usage of these APIs without directly contacting the developers who built them. API discoverability does not just mean making it easy to find an API by providing an inventory. It should also address some key aspects that are important for an API consumer, such as understanding the API through documentation, request and response format, sign-up options, and the business terms and conditions (in case of a partner) of using the API.

The long-term answer to fixing bias in AI systems

Some of these [long-term fix] recommendations are hard. For instance, one way these systems get biased is they're obviously being run by for-profit organizations. The usual players are Google, Facebook and Amazon. They are banking on their algorithms trying to optimize user engagement, which on the surface seems like a good idea. The problem is, people don't engage with things just because they are good or relevant. More often, they engage with things because the content has certain kinds of emotions, like fear or hatred, or certain kinds of conspiracy. Unfortunately, this focus on engagement is problematic. It's primarily because an average user engages with things that are often not verified, but are entertaining. The algorithms essentially end up learning that, OK, that's a good thing to do. This creates a vicious cycle. A longer-term solution is to start breaking the cycle. That needs to happen from both sides. It needs to happen from these services, the tech companies that are targeting for higher engagement. They need to start changing their formula for how they consider engagement or how they optimize their algorithms for something other than engagement.

Great leaders ask great questions: Here are 3 steps to up your questioning game.

Having a good arsenal of questions at one’s disposal is a must for any leader, but the one staple of any leader is the open-ended question. Asking open-ended questions is like adjusting the lens of a camera, opening the aperture to create a wider field of view. This wider field sets a tone of receptivity, signaling that you are open to new information, in learning mode, and ready for a dialogue not a monologue. ... You may have heard the term active listening. It involves paying close attention to words and nonverbal actions and providing feedback to improve mutual understanding. But have you ever stopped to consider passive listening? Passive listening also involves listening closely to the speaker but without reacting. Instead, passive listening leaves space for silence. By combining both of these modes, we achieve what we call effective listening. ... One of the most powerful response techniques is the ability to ask questions. Questions frame the issue, remove ambiguity, expose gaps, reduce risk, give permission to engage, enable dialogue, uncover opportunities, and help to pressure-test logic.

The 10 Immutable Laws of Testing

The bug count measures what annoys our users the most - Bugs aren’t a measure of quality (that’s measured by things like fitness for purpose, reliable delivery, cost and other stuff). But bugs are what annoy our users most. If you don’t believe me, consider this: over 60% of users delete an app if it freezes, crashes or displays an error message. Cue P!nk. Bugs exist because we write them into our code: Complexity defeats good intentions - We all know where bugs come from: Developers writing code (enabled by users who want new functionality). Bugs are the visible evidence that our code is sufficiently complicated that we don’t fully understand it. We don’t like creating bugs and wish we didn’t do it and have developed some coping skills to address the problem … but we still write bugs into our code. Bugs (like tchotchkes) accumulate over time—every time we add or change functionality, to be precise - Everyone has an Aunt Edna where the inevitable result of her going out is that she brings home some new thing to put on a shelf. The inevitable result of creating software is more bugs (and, yes, more/better functionality).

Reliable Continuous Testing Requires Automation

Automation makes it possible to build a reliable continuous testing process that covers the functional and non-functional requirements of the software. Preferably this automation should be done from the beginning of product development to enable quick release and delivery of software and early feedback from the users. ... We see more and more organizations trying to adopt the DevOps mindset and way of working. Velinov stated that software engineers, including the QA engineers, have to care not only about how they develop, test, and deliver their software, but also about how they maintain and improve their live products. They have to think more and more about the end user. Velinov mentioned that a significant requirement is and has always been to deliver software solutions quickly to production, safely, and securely. That’s impacting the continuous testing, as the QAs have to adapt their processes to rely mainly on automation for quick and early feedback, he said.

Seven Principles I Follow To Be a Better Data Scientist

Data science is an ever-changing field, thus keeping up with the latest trend and techniques is essential in ensuring consistent performance at work. For data scientists who keep a full-time job, it is unrealistic to spend weeks learning something new to be able to apply it to your working projects. We need to learn fast, and one way to achieve this is through learning by doing. Rather than getting lost in too many details and background information in a new concept, the fastest way to fully grasp it is to follow a trustworthy practical tutorial and replicate it, then try to make customized innovations to achieve better results in your projects. Take an example of learning the Random Forest algorithm. We sure need to know some basics about the algorithm — what it is, where it can be used, etc. Then we just use it in a current project, following some tutorials, and see what the results are. Blog posts with examples are great sources to educate yourself fast, compared to textbooks, or online courses. Lastly, we troubleshoot the results and look for ways to improve the application of the algorithm.

What Good Security Looks Like in a Cloudy World

When it comes to security issues and fixes, it is extremely important to be able to differentiate between new and old findings because this will also eventually affect the next two pillars: prioritization and remediation. One of the things DevSecOps tools have made possible is a real-time understanding of what’s happening in our code, with processes aligned with developer workflows, such as fixes at commonly accepted gates, like pull requests, and even earlier with precommit hooks or in-IDE alerts. A similar approach to the way we prevent issues from being merged into our code base through common CI gating can be applied to runtime-related tools during the CD phase. In this way, you can prevent runtime-related issues from reaching production, as well. So if we are able to discover security flaws while we’re still coding or in predeployment to production systems, these can be handled now and within the developer or operational context and need never go into the backlog. This is a very important distinction between our categories of security issues.

Avoiding the Top Mistakes Made by Tech Startups

Scaling too quickly increases a startup's burn rate, reducing the time it has to demonstrate key metrics for its next funding round and other milestone events, Yépez explains. Such a startup can also trash trusted customer relationships by failing to deliver goods or services as promised. “That burned cash won’t come back, and neither will that customer,” he cautions. Conversely, limited funding forces some struggling businesses to assign staff members tasks that fall outside of their skillsets. “These responsibilities often suffer from poor execution and may have severe consequences for the startup,” says Thomas Dolan, co-founder of 28Stone Consulting, an IT and fintech consulting firm. Many startups also neglect to protect their intellectual property. In their rush to go to market, some founders unwittingly disclose their core technology, or offer their core technology, to potential investors and other external parties. Such activity triggers deadlines for filing patent applications, says Kyle Graves, an attorney at law firm Snell & Wilmer.

Becoming “cloud smart” — the path to accelerated digital innovation

“Cloud chaos” comes from a landscape of unknowns. What is our enterprise cloud architecture? How do public and private clouds co-exist? What about edge computing? How do we align legal and compliance requirements in the multi-cloud world for heavily regulated industries such as fintech? Those daunting tasks and risks reflect the multi-cloud complexity and chaos we constantly live in. Having worked with many organisations transitioning away from “cloud chaos”, I see similar challenges regardless of the size of the business. It takes a vast amount of effort to architect and manage multi-cloud platforms. Think about scalability, interoperability, consistency, and a unified user experience. Think about the skill sets and knowledge required to build and operate cloud-native apps. Also, think about automating and optimising cloud management, architect cloud, and edge infrastructure. Think about connecting and securing apps and clouds. And finally, think about app security, legal, and compliance among other areas. These challenges keep CIOs up at night.

Quote for the day:

"Be willing to make decisions. That's the most important quality in a good leader." -- General George S. Patton, Jr.

Daily Tech Digest - August 13, 2022

CEOs need to start caring about the cybersecurity talent gap crisis, new report shows

The focus on cybersecurity needs to start in the boardroom, Morgan argues. CEOs at every Fortune 500 company and midsize to large organization should advocate to have those with cybersecurity experience on their board, he says. “That could be the [chief information security officer (CISO)] or an outside executive with real-world cybersecurity experience,” he says. “Do it now to protect your organization, not after a breach or hack to protect your reputation.” By 2025, 35% of Fortune 500 companies will have board members with cybersecurity experience, according to the Cybersecurity Ventures report, and by 2031 that will climb to more than 50%. By comparison, last year just 17% of Fortune 500 companies had board members with this type of background. The thought is that if cybersecurity is a regular boardroom discussion, then the importance of it will trickle down to the rest of the organization, Morgan says, becoming a part of the company’s DNA. He encourages executives to take cybersecurity as seriously as profit and loss discussions.

5 elements of a successful digital platform

“Data is everything for us,” Rotenberg said. Making sure you have high quality data and that you can constantly iterate on it and improve it should be a priority when building a platform. “That’s something that we spend a lot of time on because it’s such an important foundation,” she said. One way the company uses it is to personalize the experience for clients. For example, this might mean using digital credentials. It may sound simple, but having the right mobile phone number means that Fidelity can interact with clients in the way they want. “Sometimes it’s the most basic things that actually make the biggest difference,” she said. ... There are a lot of different ways that fintechs and Fidelity could work with or against each other. “A fintech could be our competitor, our vendor, [or] we could be a client as well, and vice versa,” she said. Successful fintechs, in particular, usually have gotten something right in understanding a “customer friction” that other firms haven’t figured out. “They go deep in understanding the friction, they create success, and then they scale outward,” Rotenberg said.

Top cybersecurity products unveiled at Black Hat 2022

Software composition analysis (SCA), static application security testing (SAST), and container scanning are the latest capabilities in the new update to the Cycode supply chain security management platform. All new components will add to Cycode’s knowledge graph, which structures and correlates data from the tools and phases of the software development life cycle to allow programmers and security professionals to understand risks and coordinate responses to threats. A key function of the knowledge graph includes the ability to coordinate security tools on the platform to do tasks such as identifying when leaked code contains secrets like API keys or passwords, in order to reduce risk. Support for vulnerability detection and protection across runtime environments including Java Virtual Machine (JVM), Node.js, and .NET CLR, has been added to the Application Security Module in the Dynatrace software and infrastructure monitoring platform. Additionally, Dynatrace has extended its support to applications running in Go, a fast-growing, open-source programming language developed at Google.

Google Cloud and Apollo24|7: Building Clinical Decision Support System (CDSS) together

For any health organization that wants to build a CDSS system, one key block is to locate and extract the medical entities that are present in the clinical notes, medical journals, discharge summaries, etc. Along with entity extraction, the other key components of the CDSS system are capturing the temporal relationships, subjects, and certainty assessments. ... The advantage of AutoML Entity Extraction is that it gives the option to train on a new dataset. However, one of the prerequisites to keep in mind is that it needs a little pre-processing to capture the input data in the required JSONL format. Since this is an AutoML model just for Entity Extraction, it does not extract relationships, certainty assessments, etc. ... The major advantage of these BERT-based models is that they can be finetuned on any Entity Recognition task with minimal efforts. However, since this is a custom approach, it requires some technical expertise. Additionally, it does not extract relationships, certainty assessments, etc. This is one of the main limitations of using BERT-based models.

In a hybrid workforce world, what happens to all that office space?

Amy Loomis, a research director for IDC's worldwide Future of Work market research service, said her research isn't showing an overall reduction in square footage, but said more companies my be subleasing unused space or reconfiguring it to better suit hybrid work. The key phrase is "space optimization," which is being done to attract new employees and for environmental sustainability. In North America, 34% of companies surveyed by IDC said that was a key driver in real estate investments. “What we’re seeing is repurposing of office space,” Loomis said. “Organizations are investing in office spaces and making them as dynamic, reconfigurable, and sustainable as possible. "So, yes they left that building during the pandemic and predominantly went remote and hybrid, but as people are going forward into the new office space, it’s more likely to be multi-purpose, multifunction, multi-tenant,” Loomis added. Many real estate developers now see the value in repurposing spaces to include not only room for commercial use, but also space for retail and even residential housing.

6 Myths About the Cloud That You Should Stop Believing

Cloud migration is an enticing prospect, but you’ve probably heard what happens when you have too much of a good thing. Going the cloud route and cloud data integration doesn’t have to mean dumping your entire business at once. Despite the recognized short and long-term benefits, the expense alone would be too daunting a concept for many. Cloud migration can take many forms. Implementing a hybrid approach to cloud technology is considerably more common, with many people starting with a particular area or application (such as email) and working their way up. ... True, virtualization is a vital technology for cloud computing, but virtualization doesn’t equally cloud computing. While virtualization is mainly concerned with workload and server consolidation to reduce infrastructure costs, Hadoop in cloud computing encompasses much more. Consider that, according to an IOUG (Independent Oracle User Group) study of its members, cloud clients are embracing Platform as a Service faster than Infrastructure as a Service.

Department of Health investigates bias in medical devices and algorithms

As part of an independent review on equity in medical devices, led by Margaret Whitehead, WH Duncan chair of public health in the Department of Public Health and Policy, the government is seeking to tackle disparities in healthcare by gathering evidence on how medical devices and technologies may be biased against patients of different ethnicities, genders and other socio-demographic groups. For instance, some devices employing infrared light or imaging may not perform as well on patients with darker skin pigmentation, which has not been accounted for in the development and testing of the devices. Experts are being asked to provide as much information as possible about biases in medical devices. Along with information about the device type, name, brand or manufacturer, the independent review is also looking to gather as much detail as possible about the intended use of medical devices that may be discriminatory, the patient population on which they are used, and how and why these devices may not be equally effective or safe for all the intended patient groups.

Event-Driven Architectures & the Security Implications

It’s never easy to crush a rock, but it is far from impossible. Taking an existing application from traditional architecture to EDA requires extensive resources and development time. Also, while building something new can be exciting, reworking the old may be unstimulating, especially when it still seems functional. This can sometimes result in postponing such a drastic transition. However, this transformation can be quite enlightening—both from a technical and an operational viewpoint. Developers perceive EDA to be inherently complex, especially for businesses with intricate processes. There is the concern that EDA does not effectively capture critical aspects of a company and that monitoring and debugging the system is more challenging because of the lack of a centralized structure. However, this complexity does not simply disappear by opting for a different architecture. Monitoring and debugging are easier with suitable tracing tools that are tailor-made for distributed systems, proper encapsulation of individual services, and an in-depth understanding of the functions of individual services and the events that should trigger them.

Composing the future of banks

The biggest challenge for any bank is how do they reach such a vision of composable banking when over decades of investment in technology automation they have hundreds or thousands of systems, with some sharing data through extraction, some integrated through technical bridges and maybe a few more modern solutions through APIs? Integration is one of the biggest headaches a bank has, so the idea of composable banking would be simpler if every system had APIs, but that just isn’t the real world. In addition to this, not every process is based around system-to-system interaction. There are processes that require human intervention, often managed by business process automation software. Sometimes these processes are necessary because systems integration may not be possible without them: the swivel chair problem of keying data from one system into another. In the last few years, artificial intelligence (AI) has been added to the mix to make the routing of flows smarter. As always, technologists are great at solving individual processes, but business tends to be more complex, and it is only much later we start to see a bigger picture.

How to Hire the Best AI & Machine Learning Consultants

AI and machine learning consultants consist of qualified and experienced AI designers, developers, and other experts that help design, implement, and integrate AI solutions into the company’s business environment. They can provide, develop, and advise on a wide range of AI capabilities like predictive analytics, data science, natural language processing (NLP), computer vision, process automation, voice-enabled technology, and much more. These consultants can evaluate the potential of data, software infrastructure, and technology to effectively deploy AI systems and workflows. When bringing on the best AI and machine learning consultants, you should look for specialists that go beyond just data science. Most AI and machine learning projects involve far more than data science. For example, they involve engineering and aggregating data and formatting it to teach an AI system. These types of projects also often involve hardware, wireless, and networking, meaning the consultant should be an expert in the cloud and the Internet of Things (IoT).

Quote for the day:

"The great leaders are like best conductors. They reach beyond the notes to reach the magic in the players." -- Blaine Lee

Daily Tech Digest - December 25, 2021

10 data-driven strategies to spark conversions in 2022

Conversion begins with a click. And clicks come after you have successfully grabbed your user’s attention. A headline is often the first thing your users come across, and hence an excellent tool to use for grabbing their attention. Therefore, using attention-grabbing headlines (paired with other factors) can lead to better conversions. This is not your pass to creating controversial and low-value titles. Grab attention while delivering value and maintaining class. Again, tap into website analytics to find out which headlines have worked the best for you. If you are entirely new to the website world, know that headlines with numbers have shown to have 30% higher conversions than those without numbers. Additionally, short and concise headlines, which have a negative superlative (like x number of things you have never seen before or x killer Instagram profiles you need to follow), have a higher tendency to earn more clicks. A/B testing or split testing reveals incredibly insightful data that can work wonders on your bottom line.

TechScape: can AI really predict crime?

The LAPD is working with a company called Voyager Analytics on a trial basis. Documents the Guardian reviewed and wrote about in November show that Voyager Analytics claimed it could use AI to analyse social media profiles to detect emerging threats based on a person’s friends, groups, posts and more. It was essentially Operation Laser for the digital world. Instead of focusing on physical places or people, Voyager looked at the digital worlds of people of interest to determine whether they were involved in crime rings or planned to commit future crimes, based on who they interacted with, things they’ve posted, and even their friends of friends. “It’s a ‘guilt by association’ system,” said Meredith Broussard, a New York University data journalism professor. Voyager claims all of this information on individuals, groups and pages allows its software to conduct real-time “sentiment analysis” and find new leads when investigating “ideological solidarity”. “We don’t just connect existing dots,” a Voyager promotional document read. “We create new dots. What seem like random and inconsequential interactions, behaviours or interests, suddenly become clear and comprehensible.”

Privacy and Confidentiality in Security Testing

Now when we understand the difference between privacy and confidentiality and how it can affect a person, we can talk about keeping these privacy and confidentiality safe while testing. The increasing number of malware bots makes business owners concerned about keeping data confidential. It also makes implementing security testing vital for any software development, and especially for web applications. Knowing how to test software to prevent any personal data from being compromised from their site is essential. For this, let’s go through the steps QA testers can take to implement security testing. To illustrate our suggestions we'll use the interface of aqua ALM that is popular among QA teams for test management in security testing. ... The main goal of security testing is to prevent applications from malware penetrations and others access and also protect the confidentiality and privacy of a person.

An introduction to the magic of machine learning

We hear about machine learning a lot these days, and in fact it’s all around us. It can sound kind of mysterious, or even scary, but it turns out that machine learning is just math. And to prove that it’s just math, I will write this article the old-school way, with hand-written equations instead of code. If you prefer to learn by… To explain what machine learning is and how math makes it work, we will do a full walk-through of logistic regression, a fairly simple but fundamental model that is in some sense the building block of more complex models like neural networks. If I had to pick one machine learning model to understand really well, this would be it. Most often, we use logistic regression for a task called binary classification. In binary classification, we want to learn how to predict whether a data point belongs to one of two groups or classes, labeled 0 and 1. ... These training data allow us to learn the optimal theta parameters. What does optimal mean? Well, one reasonable and quite common definition is to say that the optimal theta is the set of parameters that maximizes the probability of obtaining our training data.

Alternative Feature Selection Methods in Machine Learning

The "Wrapper Methods" category includes greedy algorithms that will try every possible feature combination based on a step forward, step backward, or exhaustive search. For each feature combination, these methods will train a machine learning model, usually with cross-validation, and determine its performance. Thus, wrapper methods are very computationally expensive, and often, impossible to carry out. The "Embedded Methods," on the other hand, train a single machine learning model and select features based on the feature importance returned by that model. They tend to work very well in practice and are faster to compute. On the downside, we can’t derive feature importance values from all machine learning models. For example, we can’t derive importance values from nearest neighbours. In addition, co-linearity will affect the coefficient values returned by linear models, or the importance values returned by decision tree based algorithms, which may mask their real importance. Finally, decision tree based algorithms may not perform well in very big feature spaces, and thus, the importance values might be unreliable.

Diversity in cybersecurity: Barriers and opportunities for women and minorities

Our world is getting increasingly digitized, and cybercrime continues to break new records. As cyber risks intensify, organizations are beefing up defenses and adding more outside consultants and resources to their teams. But to their sad misfortune, they are getting hit by a major roadblock—a long-standing shortage of qualified cybersecurity talent. A closer look at the numbers reveal an even more startling statistic: women comprise only 25% of the cybersecurity workforce, according to research from ISC2, despite outpacing men in overall college enrollment. There are a number of reasons why women and minorities pursuing cybersecurity careers can be significantly beneficial to the overall industry. Here are two: People from different genders, ethnicities and backgrounds can provide a fresh perspective to solving highly complex security problems. And then there’s the simple fact that leaving cybersecurity jobs unfilled puts businesses at risk. As the cybersecurity skills gap continues to grow, that risk only increases.

Half-Billion Compromised Credentials Lurking on Open Cloud Server

“Through analysis, it became clear that these credentials were an accumulation of breached datasets known and unknown,” the NCA said in a statement provided to Hunt. “The fact that they had been placed on a U.K. business’s cloud storage facility by unknown criminal actors meant the credentials now existed in the public domain, and could be accessed by other third parties to commit further fraud or cyber-offenses.” The passwords have been added to HIBP, which means they’re searchable by individuals and companies worldwide seeking to verify the security risk of a password before usage. Previously unseen passwords include flamingo228, Alexei2005, 91177700, 123Tests and aganesq, Hunt said in a blog posting Monday. “It is a both unfortunate and mind boggling that over 200 million of the passwords that were shared by U.K. NCA were brand new to the HIBP service,” Baber Amin, COO at Veridium, said via email. “It points to the sheer size of the problem, the problem being passwords, an archaic method of proving one’s bonafides. If there was ever a call to action to work towards eliminating passwords and finding alternates, then this has to be it.”

A cybersecurity expert explains Log4Shell – the new vulnerability that affects computers worldwide

Log4Shell works by abusing a feature in Log4j that allows users to specify custom code for formatting a log message. This feature allows Log4j to, for example, log not only the username associated with each attempt to log in to the server but also the person’s real name, if a separate server holds a directory linking user names and real names. To do so, the Log4j server has to communicate with the server holding the real names. Unfortunately, this kind of code can be used for more than just formatting log messages. Log4j allows third-party servers to submit software code that can perform all kinds of actions on the targeted computer. This opens the door for nefarious activities such as stealing sensitive information, taking control of the targeted system and slipping malicious content to other users communicating with the affected server. It is relatively simple to exploit Log4Shell. I was able to reproduce the problem in my copy of Ghidra, a reverse-engineering framework for security researchers, in just a couple of minutes.

The Metaverse is Overhyped; But by 2050, AI Will Make It Real

The metaverse today is not a place to go so much as a collection of technologies surrounding tools like NVIDIA’s Omniverse that can create simulations used to train robots and autonomous cars. It is an easier-to-use and more comprehensive tool set, like what architects have used to create virtual building, but with far more realistic results, including lighting effects, reflections, and a limited application of physics. For point simulation, the metaverse concept is workable, but it really is just a better simulation platform for point projects today, and nowhere near the full virtual world we expect. By the end of the decade, NVIDIA’S Earth-2 project should be viable. This is currently the most aggressive public project in process, and Earth 2 could well become the foundation of a far broader use of the concept. Initially, Earth 2 will be limited by the technology available at the time, but once it is workable, it will be able to predict weather events more accurately and model potential climate change remedies better than the simulations we currently have.

Eliminating artificial intelligence bias is everyone's job

As new tools are provided around the auditability of AI, we'll see a lot more companies regularly reviewing their AI results. Today, many companies either buy a product that has an AI feature or capability embedded or it's part of the proprietary feature of that product, which doesn't expose the auditability. Companies may also stand up the basic AI capabilities for a specific use case, usually in that AI discover level of usage. However, in each of these cases the auditing is usually limited. Where auditing really becomes important is in "recommend" and "action" levels of AI. In these two phases, it's important to use an auditing tool to not introduce bias and skew the results. One of the best ways to help with auditing AI is to use one of the bigger cloud service providers' AI and ML services. Many of those vendors have tools and tech stacks that allow you to track this information. Also key is for identifying bias or bias-like behavior to be part of the training for data scientists and AI and ML developers. The more people are educated on what to look out for, the more prepared companies will be to identify and mitigate AI bias.  

Quote for the day:

“Hard times are sometimes blessings in disguise. We do have to suffer but in the end it makes us strong, better and wise.” -- Anurag Prakash Ray

Daily Tech Digest - June 20, 2020

Linux Foundation and Harvard announce Linux and open-source contributor security survey

Here's how it works: The Core Infrastructure Initiative (CII) Best Practices badge shows a project follows security best practices. The badges let others quickly assess which projects are following best practices and are more likely to produce higher-quality secure software. Over 3,000 projects are taking part in the badging project. There are three badge levels: Passing, silver, and gold. Each level requires that the OSS project meet a set of criteria; for silver and gold that includes meeting the previous level. The "passing" level captures what well-run OSS projects typically already do. A passing score requires the programmers to meet 66 criteria in six categories. For example, the passing level requires that the project publicly state how to report vulnerabilities to the project, that tests are added as functionality is added, and that static analysis is used to analyze software for potential problems. As of June 14, 2020, there were 3,195 participating projects, and 443 had earned a passing badge. The silver and gold level badges are intentionally more demanding. The silver badge is designed to be harder but possible for one-person projects.

The startup making deep learning possible without specialized hardware

It didn’t take long for the AI research community to realize that this massive parallelization also makes GPUs great for deep learning. Like graphics-rendering, deep learning involves simple mathematical calculations performed hundreds of thousands of times. In 2011, in a collaboration with chipmaker Nvidia, Google found that a computer vision model it had trained on 2,000 CPUs to distinguish cats from people could achieve the same performance when trained on only 12 GPUs. GPUs became the de facto chip for model training and inferencing—the computational process that happens when a trained model is used for the tasks it was trained for. But GPUs also aren’t perfect for deep learning. For one thing, they cannot function as a standalone chip. Because they are limited in the types of operations they can perform, they must be attached to CPUs for handling everything else. GPUs also have a limited amount of cache memory, the data storage area nearest a chip’s processors. This means the bulk of the data is stored off-chip and must be retrieved when it is time for processing. The back-and-forth data flow ends up being a bottleneck for computation, capping the speed at which GPUs can run deep-learning algorithms.

Company boards aren't ready for the AI revolution

Beyond governance of Big Data and AI, there’s a second bottleneck and that’s talent. The well-worn phrase is true: every business is a technology company now; soon, though, most will also be AI companies. So when it comes to hiring good data scientists and AI experts, these businesses will have to compete not only with their peers but also tech giants like Facebook, Amazon and Google. Instead of attempting to raid the physics and mathematics departments of their local universities for talent, I therefore recommend that companies look elsewhere for AI experts - on their own payroll. Most businesses have incredible talent in-house. All they have to do is provide their staff with the necessary training and support, which can be done with the help of technology partners, provided these are platform-agnostic so that they can support a wide range of technologies and use cases. Training will have to be delivered on two levels. The first is AI enablement, by training staff to program and handle the technical aspects of AI and machine learning; they need to understand how to use bots, deploy robotic process automation and use machine learning to harness big data.

The digital divide: Not everyone has the same access to technology

As we exit the immediate crisis here, the health crisis, and move into a period of economic recovery, we're certainly going to see tremendous amounts of job loss, transitions in needed skills, and our labor force is going to be dramatically affected around the world by what's happening now. We do have an opportunity to think about re-skilling in a new way. Can we provide certain swaths of the economy with educational resources that will help them participate in the technology economy in ways that were not permissible or possible before? Can we think through an infrastructure build that will enable schools, for example, in rural areas or in parts of the world that haven't traditionally had access to technology, to train their students in these kinds of skills? I think there is an opportunity to think systemically about changes that are needed, that have been needed for a long time, quite frankly, and to use this recovery period as an opportunity to bridge that divide and to ensure that we're providing opportunities for everyone.

How Decentralization Could Alleviate Data Biases In Artificial Intelligence

A few projects are also exploring the potential for blockchain-based federated learning, so to speak, in improving AI outcomes. Federated learning makes it possible for AI algorithms to amass experience from a wide range of siloed data. Instead of having the data moved to the computation venue, the computation happens at the data location. Federated learning allows data providers to retain control over their data. However, privacy risks lurk whenever federated learning is employed. Blockchain is able to alleviate this risk thanks to its superior traceability and transparency. Also, a smart contract could be used to discourage malicious players by requiring a security deposit, which is only refundable if the algorithm doesn’t violate the network’s privacy standards. Ocean Protocol and GNY are two projects exploring blockchain-based federated learning. Ocean recently launched a product, called Compute-to-Data, which allows data providers and data consumers to securely buy and sell data on the blockchain. The Singapore-based startup already has some enterprise names including Roche Diagnostics, the diagnostic division of multinational healthcare company F. Hoffmann-La Roche AG using its services.

Democratizing artificial intelligence is a double-edged sword

At one end of the spectrum is data, and the ingestion of data into data warehouses and data lakes. AI systems, and in particular ML, run on large volumes of structured and unstructured data — it is the material from which organizations can generate insights, decisions, and outcomes. In its raw form, it is easy to democratize, enabling people to perform basic analyses. Already, a number of technology providers have created data explorers to help users search and visualize openly available data sets. Next along the spectrum come the algorithms into which the data is fed. Here the value and complexity increase, as the data is put to work. At this point, democratization is still relatively easy to achieve, and algorithms are widely accessible; open source code repositories such as GitHub (purchased by Microsoft in 2018) have been growing significantly over the past decade. But understanding algorithms requires a basic grasp of computer science and a mathematics or statistics background. As we continue to move along the spectrum to storage and computing platforms, the complexity increases. During the past five years, the technology platform for AI has moved to the cloud with three major AI/ML providers: Amazon Web Services (AWS), Microsoft Azure, and Google Compute Engine.

What Will Happen When Robots Store All Our Memories

Mostly, though, Memory Bots became routine and part of the social fabric of the future as controversies faded, laws and regulations were refined to curb abuses and maximize safe usage, and people became intrigued and distracted by the latest new gadget that was going to wow them, then scare them, and then become routine. In the old Shlain Goldberg house in Marin County, you could still find Ken, or the essence and memories of Ken, captured inside an eight-inch-tall black cylindrical tube on the kitchen counter that looked remarkably like an ancient Alexa. (Sadly, Ken, as well as Tiffany, had just missed the advent of longevity tech that allowed their daughter to live thousands of years and counting.) Except that Ken-Alexa had a swivel head that was constantly recording everything, with the positive-negative filter still set right where Ken had left it, in the middle of the dial. Even when Odessa was centuries old but still looked the same as she did when she was 25, she could talk to her dad, and ask him questions, and hear him laugh.

Applying Observability to Ship Faster

We needed to learn to think in monitoring terms, learn more about monitoring tooling, and how best to monitor. Most monitoring systems are set up for platform and operations monitoring. Using these for application monitoring is taking them and engineering somewhere new. Early on, we got some weirdness out of our monitoring. The system was telling us we had issues when we didn’t. It sounds silly now, but reading and re-reading the monitoring system documentation until we really got it helped. Digging deeper into how different types of metrics and monitors were designed to be used allowed us to build a more stable monitoring system. We also found that there were things we wanted to do, that we couldn’t do with out-of-the-box monitoring. Our early application monitoring was noisy and misfired. Too frequently it told us we had problems that we didn’t have. We kept iterating. We ended up building more of the monitoring in code than we expected, but it was well worth the time. We got the bare bones of a monitoring system early, and by using it in the real world, we worked out what we really needed.

What’s Next for Self-Driving Cars?

The machine vision systems in cars today are excellent at recognizing obstacles like other vehicles and pedestrians. Anticipating how they’ll act is another issue entirely. People behave irrationally by running red lights or jaywalking, and that kind of behavior is hard for an AI to react to or expect. These AI systems will get better with more training data, but collecting that data can be complicated. Right now, putting an autonomous car on the road can be dangerous, but they need to be out there to gather data. As a result, the process of getting all the necessary training may be a long one. Autonomous cars may not be ready to disrupt the industry, but implementation is still possible. Public transportation is an ideal application for today’s self-driving vehicles because it’s a more predictable form of driving. By driving pre-defined routes at slower speeds, autonomous public transports can start to gather that all-important training data. Some companies have already started taking advantage of this area. A business called May Mobility has been running self-driving shuttles to train stops since May 2019.

4 roles responsible for data management and security

Including a section in apps that provides transparency on how it uses data can help ease security concerns. Zoom, which has been in the news due to its increased use amid COVID-19 and security concerns, recently brought in leaders in the security space and a new acquisition to help. Having a strong opt-in strategy is also important. Apple and Google have a good approach with their work on contact tracing. But opting in is not going to give you all – or even enough – of the data. ... The CDO should set strategy for managing all of an organization's data – both from a defensive standpoint (addressing compliance regulations, data privacy, good data hygiene, etc.) and from an offensive one (making data more easily consumable for those who want and need it). Some key agencies do plan to have specialist CDOs. The Department of Defense has been working to recruit candidates for its CDO position. And at the end of March, the Centers for Disease Control and Prevention (CDC) published the official job post for its CDO opening. ... Consumers are grappling with data collection, something they've struggled with for a while. People are trying to become more educated about application data collection and personal data privacy and security.

Quote for the day:

"Experience is a hard teacher because she gives the test first, the lesson afterwards." -- Vernon Law