Intel investigating breach after 20GB of internal documents leak online
US chipmaker Intel is investigating a security breach after earlier today 20
GB of internal documents, with some marked "confidential" or "restricted
secret," were uploaded online on file-sharing site MEGA. The data was
published by Till Kottmann, a Swiss software engineer, who said he received
the files from an anonymous hacker who claimed to have breached Intel earlier
this year. Kottmann received the Intel leaks because he manages a very popular
Telegram channel where he regularly publishes data that accidentally leaked
online from major tech companies through misconfigured Git repositories, cloud
servers, and online web portals. The Swiss engineer said today's leak
represents the first part of a multi-part series of Intel-related leaks. ZDNet
reviewed the content of today's files with security researchers who have
previously analyzed Intel CPUs in past work, who deemed the leak authentic but
didn't want to be named in this article due to ethical concerns of reviewing
confidential data, and because of their ongoing relations with Intel. Per
our analysis, the leaked files contained Intel intellectual property
respective to the internal design of various chipsets.
Data Prep for Machine Learning: Normalization
Preparing data for use in a machine learning (ML) system is time consuming,
tedious, and error prone. A reasonable rule of thumb is that data preparation
requires at least 80 percent of the total time needed to create an ML system.
There are three main phases of data preparation: cleaning; normalizing and
encoding; and splitting. Each of the three phases has several steps. A good
way to understand data normalization and see where this article is headed is
to take a look at the screenshot of a demo program. The demo uses a small text
file named people_clean.txt where each line represents one person. There are
five fields/columns: sex, age, region, income, and political leaning. The
"clean" in the file name indicates that the data has been standardized by
removing missing values, and editing bad data so that all lines have the same
format, but numeric values have not yet been normalized. The ultimate goal of
a hypothetical ML system is to use the demo data to create a neural network
model that predicts political leaning from sex, age, region, and income. The
demo analyzes the age and income predictor fields, then normalizes those two
fields using a technique called min-max normalization. The results are saved
as a new file named people_normalized.
Microsoft Teams Patch Bypass Allows RCE
While Microsoft tried to cut off this vector as a conduit for remote code
execution by restricting the ability to update Teams via a URL, it was not a
complete fix, the researcher explained. “The updater allows local connections
via a share or local folder for product updates,” Jayapaul said. “Initially,
when I observed this finding, I figured it could still be used as a technique
for lateral movement, however, I found the limitations added could be easily
bypassed by pointing to an…SMB share.” Server Message Block (SMB) protocol is
a network file sharing protocol. To exploit this, an attacker would need to
drop a malicious file into an open shared folder – something that typically
involves already having network access. However, to reduce this gating factor,
an attacker can create a remote rather than local share. “This would allow
them to download the remote payload and execute rather than trying to get the
payload to a local share as an intermediary step,” Jayapaul said. Trustwave
has published a proof-of-concept attack that uses Microsoft Teams Updater to
download a payload – using known, common software called Samba to carry out
remote downloading.
Federated learning improves how AI data is managed, thwarts data leakage
Researchers believe a shift in the way data is managed could allow more
information to reach learning algorithms outside of a single institution,
which could benefit the entire system. Penn Medicine researchers propose using
a technique called federated learning that would allow users to train an
algorithm across multiple decentralized data sources without having to
actually exchange the data sets. Federated learning works by training an
algorithm across many decentralized edge devices, as opposed running an
analysis on data uploaded to one server. "The more data the computational
model sees, the better it learns the problem, and the better it can address
the question that it was designed to answer," said Spyridon Bakas, an
instructor in the Perelman School of Medicine at the University of
Pennsylvania, in a press release. Bakas is lead author of a study on the use
of federated learning in medicine that was published in the journal Scientific
Reports. "Traditionally, machine learning has used data from a single
institution, and then it became apparent that those models do not perform or
generalize well on data from other institutions," Bakas said.
10 Tools You Should Know As A Cybersecurity Engineer
Wireshark is the world’s best network analyzer tool. It is an open-source
software that enables you to inspect real-time data on a live network.
Wireshark can dissect packets of data into frames and segments giving you
detailed information about the bits and bytes in a packet. Wireshark supports
all major network protocols and media types. Wireshark can also be used as a
packet sniffing tool if you are in a public network. Wireshark will have
access to the entire network connected to a router. ... Netcat is a simple but
powerful tool that can view and record data on a TCP or UDP network
connections. Netcat functions as a back-end listener that allows for port
scanning and port listening. You can also transfer files through Netcat or use
it as a backdoor to your victim machine. This makes is a popular
post-exploitation tool to establish connections after successful attacks.
Netcat is also extensible given its capability to add scripting for larger or
redundant tasks. In spite of the popularity of Netcat, it was not maintained
actively by its community. The Nmap team built an updated version of Netcat
called Ncat with features including support for SSL, IPv6, SOCKS, and HTTP
proxies.
Hey software developers, you’re approaching machine learning the wrong way
Unfortunately, lots of folks who set out to learn Machine Learning today have
the same experience I had when I was first introduced to Java. They’re given
all the low-level details up front — layer architecture, back-propagation,
dropout, etc — and come to think ML is really complicated and that maybe they
should take a linear algebra class first, and give up. That’s a shame, because
in the very near future, most software developers effectively using Machine
Learning aren’t going to have to think or know about any of that low-level
stuff. Just as we (usually) don’t write assembly or implement our own TCP
stacks or encryption libraries, we’ll come to use ML as a tool and leave the
implementation details to a small set of experts. At that point — after
Machine Learning is “democratized” — developers will need to understand not
implementation details but instead best practices in deploying these smart
algorithms in the world. ... What makes Machine Learning algorithms distinct
from standard software is that they’re probabilistic. Even a highly accurate
model will be wrong some of the time, which means it’s not the right solution
for lots of problems, especially on its own. Take ML-powered speech-to-text
algorithms: it might be okay if occasionally, when you ask Alexa to “Turn off
the music,” she instead sets your alarm for 4 AM.
Garmin Reportedly Paid a Ransom
WastedLocker, a ransomware strain that reportedly shut down Garmin's
operations for several days in July, is designed to avoid security tools
within infected devices, according to a technical analysis from Sophos. In
June and July, several research firms published reports on WastedLocker,
noting that the ransomware appears connected to the Evil Corp cybercrime
group, originally known for its use of the Dridex banking Trojan. "Because
WastedLocker has no known security vulnerabilities in how it performs its
encryption, it's unlikely that Garmin obtained a working decryption key that
fast in any other way but by paying the ransom," Chris Clements, vice
president of solutions architecture for Cerberus Sentinel, tells ISMG. Fausto
Oliveira, principal security architect at the security firm Acceptto, adds:
"What I believe happened is that Garmin was unable to recover their services
in a timely manner. Four days of disruption is too long if they are using any
reliable type of backup and restore mechanisms. That might have been because
their disaster recovery backup strategy failed or the invasion was to the
extent that backup sources were compromised as well."
Splicing a Pause Button into Cloud Machines
Splice Machine was born in the days of Hadoop, and uses some of the same
underlying data processing engines that were distributed in that platform. But
Splice Machine has surpassed the capabilities of that earlier platform by
ensuring tight integration with those engines in support of its customers
enterprise AI initiatives, not to mention elastic scaling via Kubernetes. The
way that Splice Machine engineered HBase (for storage) and Spark (for
analytics), and its enablement of ACID capabilities for SQL transactions, are
core differentiating factors that weigh in Splice Machine’s favor for being a
platform on which to build real-time AI applications, according to Zweben.
“Doing table scans as the basis of an analytical workload is abysmally slow in
HBase, and so, in Splice Machine, we engineered at a very low level the access
to the HBase storage with a wrapper of transactionality around it, so you’re
only seeing what’s been committed in the database based on ACID semantics,”
Zweben explained. “That goes under the cover at a very well-engineered level,
looking at the HBase storage and grabbing that into Spark dataframes,” he
continued. “We’ve engineered tightly integrated connectivity for performance.
...”
How Synthetic Data Accelerates Coronavirus Research
To access data at the speed required while also respecting the privacy and
governance needs of patient data, Washington University at St. Louis,
Jefferson Health in Philadelphia, and other healthcare organizations have
opted for an alternative, using something called synthetic data. Gartner
defines synthetic data as data that is "generated by applying a sampling
technique to real-world data or by creating simulation scenarios where models
and processes interact to create completely new data not directly taken from
the real world." Here's how Payne describes it: "We can take a set of data
from real world patients but then produce a synthetic derivative that
statistically is identical to those patents' data. You can drill down to the
individual role level and it will look like the data extracted from the EHR
(electronic health record), but there's no mutual information that connects
that data to the source data from which it is derived." Why is that so
important? "From the legal and regulatory and technical standpoint, this is no
longer potentially identifiable human subjects' data, so now our investigators
can literally watch a training video and get access to the system," Payne
said. "They can sign a data use agreement and immediately start iterating
through their analysis."
Realtime APIs: Mike Amundsen on Designing for Speed and Observability
For systems to perform as required, data read and write patterns will frequently
have to be reengineered. Amundsen suggested judicious use of caching results,
which can remove the need to constantly query upstream services. Data may also
need to be “staged” appropriately throughout the entire end-to-end request
handling process. For example, caching results and data in localized points of
presence (PoPs) via content delivery networks (CDNs), caching in an API gateway,
and replication of data stores across availability zones (local data centers)
and globally. For some high transaction throughput use cases, writes may have to
be streamed to meet demand, for example, writing data locally or via a high
throughput distributed logging system like Apache Kafka for writing to an
external data store at a later point in time. Engineers may have to “rethink the
network,” (respecting the eight fallacies of distributed computing), and design
their cloud infrastructure to follow best practices relevant to their cloud
vendor and application architecture. Decreasing request and response size may
also be required to meet demands. This may be engineered in tandem with the
ability to increase the message volume.
Quote for the day:
"The secret of leadership is simple: Do what you believe in. Paint a picture of the future. Go there. People will follow." -- Seth Godin
No comments:
Post a Comment