Tech Bytes - Daily Digest: Daily Tech Digest

Compositional AI: The Future of Enterprise AI

Compositionality refers to the ability to form new (composite) services by combining the capabilities of existing (component) services. The existing services may themselves be composite, leading to a hierarchical composition. The concept is not new, and has been studied previously in different contexts; most notably, Web Services Composition and Secure Composition of Security Protocols. Web Services follow the Service Oriented Computing (SOC) approach of wrapping a business functionality in a self-contained Service. There are mainly two approaches to composing a service: dynamic and static. In the dynamic approach, given a complex user request, the system comes up with a plan to fulfill the request depending on the capabilities of available Web services at run-time. In the static approach, given a set of Web services, composite services are defined manually at design-time combining their capabilities. ... In the very primitive world of supervised learning, an AI Service consists of data used to train a model, which is then exposed as an API. There is of course an alternate deployment pipeline, where a trained model can be deployed on an edge device to be executed in an offline fashion.

Four common pitfalls of HyperLedger implementation

One of the main goals of distributed ledger technology (DLT), used by HyperLedger, is decentralization. The nodes (servers) of the network should be spread among all organizations in the consortium and they should not depend on the third party providers. However, we have seen implementations where the whole infrastructure is maintained by one organization or where it is spread among the organizations but all of them host their nodes provided by the same cloud vendor (e.g. AWS). With centralized infrastructure comes the threat that one organization or external provider could easily turn off the system and thus break the principal goal of DLT. ... One of the extremes in defining permissions in the DLT network, contrary to limiting the access of an organization, is privileging one of the organizations in such a way that it can make any changes to the distributed ledger. As such configuration does not have to introduce a vulnerability, it is against blockchain rules. We have met different implementations with this issue that all allowed one organization can freely modify the contents. The channel endorsement policy required the signature only from one organization.

ETL vs. Data Preparation

ETL relies on a predetermined set of rules and workflows, she said. Potential issues, such as misspellings or extra characters, must be anticipated beforehand so rules for how to deal with those issues can be built into the end-to-end workflow. Conversely, a data prep tool using built-in algorithms is capable of discovery and investigation of the data as it proceeds through the workflow. “For example, algorithms based on machine learning or natural language processing can recognize things that are spelled differently but are really the same.” She gave the example of a city called “St. Louis”, and how it could be entered in multiple ways, or there may be several cities with the same name spelled differently. In an ETL workflow, rules for encountering each particular variation must be programmed ahead of time, and variations not programmed are skipped. A data prep tool can find spelling differences without help, so that the user does not have to anticipate every possible variation. The tool can prompt for a decision on each different variation on the name of this city, providing an opportunity to improve the data before it’s used, she said.

The coming opportunity in consumer lending

The second major step is to build the decision engine. In this area, new entrants will have a large advantage over existing lenders with legacy software that they do not want to alter. The new decision engine can largely be built using advanced analytics, machine learning, and other tools that capitalize on speed and agility. By using machine learning, the new-entrant lenders will be able to automate as much as 95 percent of underwriting processes while also making more accurate credit decisions. Similarly, real-time machine-learning solutions can improve pricing and limit setting and help firms monitor existing customers and credit lines through smarter early-warning systems. Lenders can also use straight-through processing to generate faster transactions and a better customer experience. The design of the decision engine can be modular for maximum flexibility. That will allow lenders to retain control of strategic processes while potentially outsourcing other parts. The modular format can also facilitate risk assessment. This approach involves a series of steps, completely integrated from the front end to the back end, and is designed for objective and quick decision making

WhatsApp Privacy Controversy and India’s Data Protection Bill

Clause 40 of the PDP bill is particularly dangerous and could be detrimental to the data rights of the users of WhatsApp. This provision empowers the Data Protection Authority to include certain data fiduciaries in a regulatory sandbox who would be exempt from the obligation of taking the consent of the data principal in processing their data for up to 36 months. The GDPR does not have any provision related to the regulatory sandbox. Such a sandbox might be required to provide relaxations to certain corporations, such as those that deal with Artificial Intelligence so that they can test their technology in a Sandbox environment. However, it is a commonly accepted practice that in a good regulatory sandbox the users whose data is taken voluntarily participate in the exercise. Such a condition is altogether done away with by this provision. The authority that has to assess the applications for inclusion in a regulatory Sandbox is the Data Protection Authority (DPA). The members of the DPA are to be selected by bureaucrats serving under the Union government. So, it cannot be expected to work independently of government control (Clause 42(2)).

A Data Science Wish List for 2021 and Beyond

Sometimes, we simply cannot overcome the problem of needing more data. It could be that data collection is too expensive or the data is not possible to collect in a reasonable time frame. This is where synthetic data can provide real value. Synthetic data can be created by training a model to understand available data to such an extent that it can generate new data points that look, act, and feel real, i.e. mimic the existing data. An example could be a model that predicts how likely small and medium-sized businesses (SMBs) in the retail sector might be to default on loans. Factors such as location, number of employees, and annual turnover, might be key features in this scenario. A synthetic data model could learn the typical values of these features and create new data points that fit seamlessly into the real dataset, which can then be expanded and used to train an advanced loan default prediction model. ... Another benefit of synthetic data is data privacy. In the financial services industry, much of the data is sensitive and there are many legal barriers to sharing datasets. Leveraging synthetic data is one way we can reduce these barriers as synthetic datapoints feel real but do not relate to real accounts and individuals.

Top 4 Blockchain Risks A CIO Should Know

Blockchain risks lead to malicious activities such as double-spending and record hacking, which means a hacker will try to steal a blockchain participants’ or cryptocurrency owner’s credentials and transfer money to his/her account or hold the credentials as leverage for ransom. As per MIT’s 2019 report, since 2017, hackers have stolen around $2 million worth of cryptocurrency. Another malicious activity is double-spending, where hackers access the majority of the power and rewrite the transaction history. This allows them to spend the cryptocurrency and erase the transaction from history once they receive their orders. With digital money, the hacker can send the merchant a copy of the digital token while retaining the original token and using it again. Implementing and maintaining blockchain applications and platforms is expensive. If there is a fault in the working or the system fails due to the blockchain risks, it will cost a massive amount of money to fix things. A blockchain expert is required to overcome such risks, and the expert may charge a hefty amount to provide solutions.

Top Challenges Involved In Healthcare Data Management

Medical data is sensitive and must adhere to government regulations, such as the Health Insurance Portability and Accountability Act of 1996 (HIPAA) in the US. Data discovery challenges and poor data quality make it much more difficult to perform the required audits, meet regulatory requirements and limit the diversity of data healthcare providers can use for the benefit of patients. Adhering to the HIPAA rules may help in effective data governance. Effective data governance within a healthcare organization can help better manage and use data, create processes for resolving data issues and eventually, and enable users to make decisions based on high-quality information assets. However, all this begins with better data collection and making sure that the data collected is accurate, up-to-date, complete, and in compliance with the HIPAA regulatory standards. A well-designed HIPAA-compliant web form solution can be instrumental in enabling healthcare organizations to manage and streamline data collection processes, including – new patient forms, HIPAA release forms, contact update forms, patient medical history forms, and consent forms.

CDO's Next Major Task: Enabling Data Access for Non-Analysts

Unlike product managers from two decades ago, today's product manager wants to look at the user flow data on the website and design changes to UX flow to improve revenue. He doesn't have the luxury of a dedicated analyst supporting him for every question he has about his product. The marketing manager has direct hands-on access to the CRM system. He is pulling targeted customers for the next campaign and needs to have a lifetime value score for each of the customers to target the highest value customers effectively. To resolve the customer concerns quickly, customer support agents need access to what happened when the customer accessed the website two days ago. He doesn't have the luxury of the SLA of one-week resolution time of yesteryears; the customer expects resolution during the call. The CDO needs a proper plan to enable appropriate access to the right kind of data to the right person, with the right security level. Barring that, the business's numerous stakeholders will start standing up their individual mini data marts to serve their needs. If that happens, the CDO's past five years of centralizing data sources will amount to nothing. What is needed is a proper data access strategy and governance for the entire enterprise.

Why ML should be written as pipelines from the get-go

Data scientists are not trained or equipped to be diligent to care about production concepts such as reproducibility — they are trained to iterate and experiment. They don’t really care about code quality and it is probably not in the best interest of the company at an early point to be super diligent in enforcing these standards, given the trade-off between speed and overhead. Therefore, what is required is an implementation of a framework that is flexible but enforces production standards from the get-go. A very natural way of implementing this is via some form of pipeline framework that exposes an automated, standardized way to run ML experiments in a controlled environment. ML is inherently a process that can be broken down into individual, concrete steps (e.g. preprocessing, training, evaluating, etc), so a pipeline is a good solution here. Critically, by standardizing the development of these pipelines at the early stages, organizations can lose the cycle of destruction/recreation of ML models through multiple toolings and steps, and hasten the speed of research to deployment.

Quote for the day:

“Just because you’re a beginner doesn’t mean you can’t have strength.” -- Claudio Toyama

Tech Bytes - Daily Digest

Pages

Daily Tech Digest - April 04, 2021