Monday, May 3, 2021

Review of "Andrew Ng: Forget about building an AI-first business. Start with a mission."

Review of "Andrew Ng: Forget about building an AI-first business. Start with a mission.

An AI pioneer reflects on how companies can use machine learning to transform their operations and solve critical problems."

His advice is distilled down to:

Use an Agile approach - start small and fast and learn and adapt along the way. Do not wait until the data is clean before you start.

"It is often starting to do an AI project with the data you already have that enables an AI team to give you the feedback to help prioritize what additional data to collect."

it’s more important to start quickly, and it’s okay to start small.

Good data vs Big Data. Where industries do not have the scale of operations adopt the mindset of good data instead of big data.

In the good data paradigm it is critical to have consistency in labelling the data. The data should be leanly labelled and curated.

This is opposed to the big data paradigm where the large datasets allow for better averaging and regression to the mean.

GitHub is good enough.

For a lot of AI projects, the open-source model you download off GitHub—the neural network that you can get from literature—is good enough. Not for all problems, but the main problems.

Let’s not mess with the code anymore. The only thing you’re going to do now is build processes to improve the quality of the data.”

Wednesday, December 2, 2020

Why No One Can Manage Projects - Especially Technology Projects

Fix the Process, Not the Problem

Forbes

By Steve Andriole, 1 Dec 2020

“Project Management” is an enormous industry. Software, training, certifications and even Masters degrees – everything you can imagine — to make us better at managing simple and complex projects. But we keep failing. Over and over again. Is there a solution? Yes.

Software Isn’t the Answer
Management Artifacts, Like SCRUM or DevOps Aren't the Answer Either

It’s Always Been About the People

Find – and Reward – Real Leaders
Build the Right Teams
Fix Training

Fix the Process, Not the Problem

Harvard Business Review

By Harold L. Sirkin and George Stalk, Jr., July–August 1990 Issue

...everyone at the paper mill became a problem solver. Together, managers and mill workers learned to take the initiative not just for identifying problems but also for developing better processes for fixing problems and improving products. Their approach did not depend on key senior executives taking charge and telling people what to do. Instead, the entire organization learned how to learn.

In the pressure to get things done, many managers fear being patient. They focus on short-term fixes to existing problems rather than on instituting processes to solve and eventually prevent problems and to identify unsuspected opportunities.

Read More

Thursday, May 28, 2020

Coronavirus (Covid-19): More Scary than Dangerous?

Assessing the impact of Coronavirus (Covid-19) on Total Deaths

Problem :: At the cusp of reported US Coronavirus deaths hitting the 100,000 mark (27 Mary 2020), it would be wise for us to step back a bit and review how dangerous this virus really is. We have a media environment with hidden agendas, conflicting accounts of Covid-19 death numbers, and varied cause of deaths (COD) attributions.; What is the actual impact of Covid-19 on the total deaths numbers? Is the Coronavirus more scary than dangerous?
Solution :: One method is to analyse and compare the deaths due to All Causes over the same period of months, and contextualize over a number of years.

Data from US Center for Disease Control (CDC)

The country where the total deaths data over multiple years is most easily accessible is the US. The US CDC publishes this data over at the “Pneumonia and Influenza Mortality Surveillance from the National Center for Health Statistics Mortality Surveillance System” site (https://gis.cdc.gov/grasp/fluview/mortality.html)

At the time of this writing, Week 18 (starting 27 Apr 2020) is the latest week for 2020 where complete data is available. From this Database, the All Causes Deaths numbers can be viewed graphically thus:

Weekly All Causes Deaths (US CDC)

Analysis of Data

The interesting data points are Week 15 and Week 18 – which have 73.5k and 55.5k respectively. This clearly indicates that the peak deaths occurred at Week 15 and has been trending down since.

While the number 75.5k is a historical peak for any particular week, it is more useful to look at the overall numbers for the season, and how this compares with other years.

Table 1

The 2019-2020 season shows a peak for number of deaths. This might indicate that Coronavirus is primarily responsible for this peak. To give a better picture it is useful to see if this is out of the ordinary compared to other years.

Seasonal Year-on-Year Comparisons

Table 2 compares the year-on-year increment (or reduction) of deaths:

Table 2

View graphically as:

Changes in deaths # an % from previous year

This is enlightening. We now see that:

the largest year-on-year increase is from 2014 to 2015 seasons, recording an increment of 140,231 deaths and representing an 8.86% increase!
while the 2020 season shows an increase of 71,249 over the 2019 season (representing a 4.11% increase), it is in fact lower than some previous seasons (2014→2015 and 2016→2017) both in numbers and percentages
The variation in deaths year-on-year for 2019→2020 does not seem out of whack with what has occurred in the past and appears to follow a familiar trend.

Trend in Overall All Causes Deaths

Depicting Table 1 graphically, we can see a clear (almost linear) upward trend in overall All Causes Deaths in the US over the recent years. This has been trending even before the Coronavirus.

Trend in US All Cause Deaths from week 40 – week 18

Conclusions

The historical peak of 71.5k deaths at week 15 indicates that Coronavirus did have an impact. Traditionally the peaks occur earlier in the season. The 71.5k number is indeed a strong spike.

However, when overall deaths numbers for the season are seen in a historical context, the excess deaths for the 2019-2020 season is not exceptionally high, and in fact follows the historically (rising) trend of deaths (with or without Coronavirus).

Perhaps, as this linked article suggests, Coronavirus has been more scary than dangerous.

Caveat – as this analysis stops at week 18, it is entirely possible that new data from subsequent weeks (when available) may point to a different assessment of the fatality counts and impact of Coronavirus to overall deaths.

Wednesday, September 4, 2019

How Does Bitcoin Work

Problem :: How to achieve Trustless public transactions; no need for “trusted” 3^rd party (like banks, govt etc.) to validate transactions.
Solution :: Use a Distributed Ledger - The Blockchain

The Distributed Ledger

Problem to Solve: How to eliminate need for a trusted 3^rd party

To eliminate the need for a trusted 3^rd party to host this Ledger – all participants maintain their own copy of this Ledger.
Each and every transaction is broadcast to the world for participants to record into their own private Ledger.

Source: Financial Markets Group, Federal Reserve Bank of Chicago

Problem to Solve: How to digitally sign and verify messages/transactions

Use Cryptographic Signatures to digitally sign messages/files/transactions.
Cryptographic Signatures consist of Public key/Private (or Secret) key pairs e.g. pk = 01000011…, sk= 11000100…
Formally:
1. Sign(Message.sk)=Signature
2. Verify(Message.Signature.pk)= True or False
3. Typically Signature is 256 bits long, which means the universe of all signatures = 2²⁵⁶ virtually impossible break using brute force.
Signing a transaction involves
1. applying the participant’s secret key to the transaction contents: Sign(Message.sk)=Signature
Verifying a transaction involves
1. Verifying the Signature: Verify(Message.Signature.pk)= True or False
2. Knowing the full history of transactions up to that point – to prevent the participants over-spending e.g. in Bitcoin
Note: We can use the same infrastructure to encrypt the Message itself:

Problem to Solve: How to prevent participants copy-pasting transactions multiple times

To prevent users copy-pasting transactions multiple times – each transaction also needs a unique ID => each transaction will require a completely new Signature

Problem to Solve: Arriving at consensus – which Ledger copy is to be trusted?

How would participants be able to trust that everyone’s copy of the Ledger will record the transactions in exactly the same way and sequence?
The Proof-of-Work protocol
- Bitcoin’s solution is to require a Proof-of-Work i.e. participants are to trust the Ledger instance that has the most computational work put into it (decentralized consensus)
- We need a function to generate a number based on the content that
  1. requires effort to compute
  2. is infeasible to compute in the reverse direction
  3. is easy to verify once found
- Luckily Cryptographic Hash Functions have this characteristic e.g. SHA256, 256-bit hash function:
  
  SHA256 (“TheQuickBrownFox”) => 011010101100……
  
  Hash Function
  
  Message/File
  
  “Hash” or “Digest”
  - Hash output looks random but is not – output is always the same for a given input
  - For SHA256, changing the input even slightly will result in an output hash that is completely different and is entirely unpredictable
  - A Cryptographic Hash Function like SHA256 will be infeasible to compute in the reverse direction i.e. cannot compute the Message/File from the Hash output. (interestingly, there is no mathematical proof that this assertion is correct – but so far no one has succeeded to reverse engineer SHA256)

Problem to Solve: How to use Cryptographic hash functions to create Proof-of-Work

Find a number n where
- SHA256(List of transactions, n) = hash output (with constraint of e.g. first 30 bits are zeros)
- The only known way to find n is trial and error – this means that computational effort is needed to solve for n
- E.g. for a hash output where first 30 bits are zeros, the probability for each guess will be
- The idea is that
  1. Finding n should be hard – requires = 1,073,741,824 guesses
  2. Verifying n should be easy – apply SHA256 and check if there are 30 zeros
- Bitcoin organizes transactions in Blocks – each Block consists of list of transactions + Proof-of-Work – a Block is only valid if it has a Proof-of-Work
- To ensure proper sequencing, each Block also contains the hash of the previous Block as its header – hence organizing the Ledger as a chain of Blocks – a Blockchain.
  - Block = Previous block hash + List of Transactions + Proof-of-Work
  - Blockchain
- Creating a new Block is also referred to as “Mining”
- Bitcoin nodes/miners compete to create new Blocks (i.e. find n) – the first one to compute and share it wins the Proof-of-Work contest.
- Successful Mining is rewarded with newly “minted” Bitcoins (as an incentive) – hence injecting new currency into the “economy”.
- For Bitcoin, the difficulty of creating new blocks is set to an average of 10 minutes per block. Other newer blockchains have shorter creation times for new blocks

Question: Why is the Bitcoin Blockchain creation difficulty set to an average of 10 minutes per block?

In the original Bitcoin paper, the author Satoshi Nakamoto (no one knows who this actually is – or whether this represents a group of people), the 10 minute recommendation is a compromise to limit the data storage requirements for the entire blockchain:
A block header with no transactions would be about 80 bytes. If we suppose blocks are generated every 10 minutes, 80 bytes * 6 * 24 * 365 = 4.2MB per year. With computer systems typically selling with 2GB of RAM as of 2008, and Moore's Law predicting current growth of 1.2GB per year, storage should not be a problem even if the block headers must be kept in memory.
In the Bitcoin Wiki, the reason given for the 10 minute average is to reduce computational waste between the first (new block) confirmation time and the amount of computational work wasted due to blockchain splits i.e. shorter times will result in more chain splits and more computation to resolve the splits:
10 minutes is the average time taken to find a block. It can be significantly more or less time than that depending on luck; 10 minutes is simply the average case.

Blocks are how the Bitcoin achieves consensus on who owns what. Once a block is found everyone agrees that you now own those coins, so you can spend them again. Until then it's possible that some network nodes believe otherwise, if somebody is attempting to defraud the system by reversing a transaction. The more confirmations a transaction has, the less risk there is of a reversal.

Ten minutes was specifically chosen by Satoshi as a tradeoff between first confirmation time and the amount of work wasted due to chain splits. After a block is mined, it takes time for other miners to find out about it, and until then they are actually competing against the new block instead of adding to it. If someone mines another new block based on the old block chain, the network can only accept one of the two, and all the work that went into the other block gets wasted. For example, if it takes miners 1 minute on average to learn about new blocks, and new blocks come every 10 minutes, then the overall network is wasting about 10% of its work. Lengthening the time between blocks reduces this waste.

Problem to Solve: How to prevent Fraudulent Blocks in the Blockchain

A potential path for abuse is that someone may be able to “win” the Proof-of-Work contest and end up manufacturing a fraudulent sequence of blocks
When this happens, participants will see 2 different solutions for the same Block – thus a branch is created in the Blockchain. Which branch should participants trust?
The Bitcoin solution is for participants to trust the longer branch i.e. the one with has more work put into it:
The logic being that it is almost impossible to out-work the entire Blockchain network – the only way to out-work the network (by consistently winning the Proof-of-Work contest) over a sufficiently long sequence of blocks would be to control >50% of the computing resources of the entire network. Even then there is no guarantee this could be sustained.

References & Acknowledgements

“But how does bitcoin actually work?”, by 3Blue1Brown, Published on YouTube 7 Jul 2017
“Blockchain and Financial Market Innovation”, by Rebecca Lewis , John W. McPartland , Rajeev Ranjan, Economic Perspectives, Vol. 41, No. 7, 2017
"Bitcoin: A Peer-to-Peer Electronic Cash System", by Satoshi Nakamoto, 2008
Bitcoin Wiki

Sunday, March 19, 2017

How the 404 Error Created the World Wide Web - Engineering vs Science

How the 404 Error Created the World Wide Web

Popular Mechanics

By Jesse Dunietz, Dec 5, 2016

Good Enough vs Perfection! How fault-tolerant is your solution?

I remember having similar thoughts about how the World Wide Web grew so fast. Prior to the World Wide Web, there already had been lots and lots of academic literature and research done on hypertext/hypermedia.

Tremendous effort/thinking had been put into schemes to upkeep the integrity of the links - and this ultimately slowed down the development of practical systems. Most of the solutions were unwieldy and impractical.

Similarly the popularity of HTML vs the more formal markup languages like GML and SGML points to the fact that HTML was a lot more fault-tolerant language - putting the onus on the creator (or individual) to ensure system integrity.

I also see similarities in human nature (and by extension social and economic systems) - the more fault-tolerant the system, the more they tend to be popular, pervasive and successful.

Key takeaway:

This is a classic philosophical difference between Engineering (good enough) solutions and Science (complete/perfect) answers. The real-world tends to favor the Engineering paradigm.

The 404 did for Hypertextwhat the zero did for Math: It was obvious ... but formalizing and creating a notation for it revolutionized the rest of the system.

Read More

Friday, March 17, 2017

Alcohol and Caffeine Built Civilization ?

How Alcohol and Caffeine Built Civilization

PJ Media

by Tyler O'neil, Mar 14, 2017

Choose your Poison (Antiseptic)!

Of the two, alcohol is definitely more important. Imagine if Man hadn't discovered how to domesticate alcohol. We'd still all be hunting and gathering - and a lot more inhibited.

All advanced cultures have developed alcohol in one form or another. In the early days, alcoholic beverages were much safer to drink than plain water.

The impact of caffeine only arrived much much later (15th Century as compared with 16,000 BC for alcohol)

The article though presents a small mystery - did Man begin cultivating for food with alcohol as a by-product, or was it the other way round? One of the comments in this article offers a very plausible answer 😉 :

Man started farming specifically to make beer. Why would mankind give up a meat diet, hunting and fishing..... for a bowl of barley?

Key takeaway:

Poison doesn't always kill you - sometimes it can create Civilization.

Alcohol is famous for being toxic — after all, it kills 3.3 million people each year, causing 5.9 percent of all deaths and 25 percent of those among people aged 20 to 39, according to the WHO. But research also suggests that alcohol may have helped create civilization

Read More

Roland Hor's Blog

Monday, May 3, 2021