Wednesday, July 21, 2010

Architectures

SISD Uniprocesssor (single instruction stream, single data stream)
SIMD (Vector)
single instruction, multipe data
MIMD (multiprocessors) - our space
multiple instruction, multiple data

Understand the nature of the machines you're running on.

Shared Bus, distributed
Memory contention, communication contention, communication latency...networks of shared memory clusters and processors

Mutual exclusion --

Basic spin lock (spin lock) (critical section) (resets lock upon exit), introduce sequential bottleneck, Amdahl's law, bad thing... queuing up to get to critical section & perform piece of code. Lock suffers from contention. These are distinct phenomena.

Seq bottleneck --> no parallelism

Boolean value, test and set, swap true with current value, return value tells if prior value was true or false, can reset just by writing false.
TAS aka getAndSet

Test and Test and Set locks
Lurking stage / wait until lock "looks" free
spin while read returns true (lock taken)
Pouncing state (left thumb top dot pain intensity threshold)

Mystery #2, Mystery - both, TAS & TTAS do the same thing (in our model) except TTAS perform much better

Shared Bus - broadcast medium - one broadcaster at a time - processors and memory all "snoop"

Snoopy protocol.**

Processor prosser have small caches, 1 or 2 cycles, address & state information

new multicore chips have intermediate shared caches (L1 close to prosser very fast) L2 cache shared and a it slower, memory-much slower

Cache Coherence -

Write back cache -

prosser signal others "I've just changed data" using bus protocol.

Invalid, Valid, Dirty - raw seething

acquire write permission - notify everyone else you're changing it

only get data from memory if haven't found it in someone else's cache

minimize traffic on bus
request lock get as fast as can not slow everyone else down

measuring quiescence time
tom andersen
x = time of ops that don't use the bus. y = time of ops that cause intensive bus traffic

tom andersen

backoff lock - just put some backoff lock
Smart locks - machine learning

false sharing
padding
jonathan - makes smart locks (MIT phd student)

CLH lock

NUMA machine, memory distributed among processors, accessing data locally is really quick, if I have to go to a memory location that's far away, that's expensive.

MCS - Mellor Crummy and Scott, Explicit Pointers
A record, a bit, also a pointer.
allocate a node.
perform same kind of swap in CLH lock.
I know who the guy before me is.
I'm not gonna spin, I wanna spin on my record, that's the one that's local.

Purple spin on purple green spin on verde.

yo cambio la pointer, igoto his knowd and pawnt it to moi

red guy has inst'd his nowd, red gay spins, eventually purple will change dat to valz.

herez how mcs werks

qnode phalse tayle atauhmik

queue

phyx phointer

compare and swap to check if the tail pointer (node) is still pointing to me

if tail is still pointing to me, then nobody in queue, comparison operator, reaching consensus to agree that there is no tail

couldn't do this with reads and writes

compare and swap (MCS Queue lock)

other guy couldn't have been there, can't get in, has gotta notify me

then i spin

why do I spin now? Because I validated that if I failed the CAS, then if...

when he does I'm gonna let him go, I'm gonna release him.