Posts

What is Hypothesis Testing and how do we use it?

Image
Actually, you already use it Suppose you meet someone for the first time, and they tell you that they can run at 10 km per hour (or just under 7 miles per hour, for those three nations that still use this system).       Hint: average top running speed women ~ 10km/hr, and men ~ 12.8km/hr You might think, okay, that's close to average; that sounds reasonable.  Now if they said they can run as fast as 30km/hr. Now that's a bit of a stretch. The average professional sprinter could sustain speeds of 24km/hr, so 30km/hr is improbable, but not impossible since Usain Bolt tops out at a whopping 44km/hr during his Olympic record run.  Some people might say, nahhh, that's too far a stretch, but you might choose to believe them for now.  But if they said they could reach more than 50km/hr. Is it possible? Sure, they might be a hidden Olympic record-shattering machine, an athletic monster like nothing the world has seen before, but at this...

How does Entropy work to split Decision Trees?

How to determine which split is best for a decision tree Entropy! The one with lowest entropy wins. Think of entropy as the "impurity" of a group. Less entropy = more pure. More entropy = more chaos = less pure.  Note: -log(p) = information. We want to maximize this while Entropy = - sum (p*log(p)).  Turns out, when you add the probability in, and sum ALL possibilities in a group, the lower the better. It's just how the numbers work out.  Btw, if group is homogenous, i.e. only one possible outcome, then entropy = 0, no randomness by def.       Note: could use Gini index instead, easier calculation, Gini = 1- p^2 for all p, in group, then weighted average of all outcomes from a split, same with Entropy)      (computationally faster, but quite a bit, that's why R uses it as the default, and gives almost  the same answer as Entropy, but empirically slightly worse splits, i.e. lower accuracy on testing datasets) (also rec...

Lumen Candela Lux Nits

 1. Understand Steradian You know how exactly 2PI of a circle's radius makes up the the circumference of a circle Fact: this also means the angle subtended at the center of the circle by an arc of length 1 radius is exactly 1 radian ~ 57.3 degrees Now, in the 3D case, the surface area of a sphere is 4 * PI * r^2 and if we use the same logic, and analogous to using r as our base unit in the 2D case, we use r^2, we can say that 1 steradian the angle subtended by the surface areas of exactly is exactly 1 r^2 meaning the entire sphere has solid angle of 4*PI steradians Quick Recap:  1 radius (r) subtends an angle of 1 radian 1 surface patch (r^2) subtends a solid angle of 1 steradian (solid angle = angle but 3D, steradian = radian but 3D, surface patch = radius but 2D) (yes, 2D) 2. Candela (cd) Just note, that 1 common candle emits roughly 1 candela (cd) of luminous intensity (that's the actual term) If you think of light as tangible, and flying in all directions then if you c...

Idiosyncrasies of Modulo Arithmetic

 Modulo Arithmetic  (two main ways, technically there's a weird 3rd but we're gonna ignore that) Mathematical Definition: Original, by Gauss' definition in his 1801 magnum opus, mod is defined as: given integers a, b a ≡ b (mod n) (read as a is congruent to b modulo n) means a-b is an integer multiple of n so for something like a = 13, b = 63, n = 10 a-b = -50 which is a multiple of 10, thus it works so a logical extension would be,  13 mod 10 = 3k, and 63 mod 10 = 3k so any multiple of 3 would be a valid solution to a mod n but since it's equivalent to a and b having the same "remainder" (quotes because remaining has two definitions, which will be come clear very soon) we have the following more useful modern definition: for some dividend N divisor D quotient Q remainder R N = D * Q + R where |R| < |D| Intuitively,  suppose a situation of N people, and D groups with Q people per group with R remaining then, number of groups (D) * people per group (Q) + rem...

Sufficient and Necessary

 A is necessary for B means: A must be true whenever B is true e.g, being a mammal (clause A) is necessary to be a human (clause B) in order words, if human, then mammal,  or if not mammal, then not human so if B then A --- A is sufficient for B means: whenever A is satisfied, B must be satisfied e.g. if some number is rational (clause A), then it is sufficient for that number to be a real number (clause B) or in order words, if A, then B --- so if you want if A and B  AND   if B then A   to be true simultaneously then A must be sufficient for B, and A must be necessary for B e.g. A: today is fourth of July B: today is independence Day in USA if today is fourth of July, it is sufficient to infer that today is independence day  ( A => B ) if today is fourth of July is necessary for today to be independence day which is another way of saying, ( B => A ) --- so wait, you might ask why are both sufficient and necessary conditions qualities o...

Voting Systems

Popular voting systems 1. Plurality voting (also called "First Past the Post") every casts one vote for one option whichever vote has most votes wins problem since, suppose 5 options, and three get 20%, one gets 19%, and one gets 21%. That 21% wins.  problem 2 (called spoiler effect): if two big parties, one with slight majority, and then one third, small party, then small party could 'steal' votes from big party, leading to the smaller of the two big parties to win, despite perhaps many small party voting preferring the bigger of the two big parties to begin with.  2. Instant Runoff Voting (also called Alternative voting) everyone ranks all options then we eliminate least popular, and take put those votes into those voter's second choice rinse and repeat until only one is left (or when one has majority vote, i.e. >50%, since that just ends it) could have same spoiler effect problem (for three party example) if one party is roughly taking a ...

OSI Model

 Data Networks week 1 slides notes: OSI model, is a stack of protocols. 7 - Application Layer protocols:  HTTP, FTP, SMTP, TELNET, POP3, ... 6 - Presentation Layer: Performs Translation (e.g. ASCII -> EBCDIC) (aka formatting) Data Compression (lossless/lossy) Encryption (SSL, a cryptographic protocol) 5 - Session Layer: Authentication,  authorization,  session management (establish maintain terminate sessions) 4 - Transport layer Segmentation (breaks data into small segments, each with seq. number, source and destination port number and address) (and reorganizing at destination) Flow Control (controls amount of data being transmitted/received) Error control (automatic repeat request for missing/corrupt data) Uses Protocols either TCP (connection-oriented) (used in emails, ftp, ...) UDP (connectionless) (used in videos, games, ...) (aka just IP) so TCP over IP (TCP/IP) becomes connection oriented) 3 - Network Layer (IP) * Basic unit, packets (which build on top of ...