SIMD (Vector)
single instruction, multipe data
MIMD (multiprocessors) - our space
multiple instruction, multiple data
Understand the nature of the machines you're running on.
Shared Bus, distributed
Memory contention, communication contention, communication latency...networks of shared memory clusters and processors
Mutual exclusion --
Basic spin lock (spin lock) (critical section) (resets lock upon exit), introduce sequential bottleneck, Amdahl's law, bad thing... queuing up to get to critical section & perform piece of code. Lock suffers from contention. These are distinct phenomena.
Seq bottleneck --> no parallelism
Boolean value, test and set, swap true with current value, return value tells if prior value was true or false, can reset just by writing false.
TAS aka getAndSet
Test and Test and Set locks
Lurking stage / wait until lock "looks" free
spin while read returns true (lock taken)
Pouncing state (left thumb top dot pain intensity threshold)
Mystery #2, Mystery - both, TAS & TTAS do the same thing (in our model) except TTAS perform much better
Shared Bus - broadcast medium - one broadcaster at a time - processors and memory all "snoop"
Snoopy protocol.**
Processor prosser have small caches, 1 or 2 cycles, address & state information
new multicore chips have intermediate shared caches (L1 close to prosser very fast) L2 cache shared and a it slower, memory-much slower
Cache Coherence -
Write back cache -
prosser signal others "I've just changed data" using bus protocol.
Invalid, Valid, Dirty - raw seething
acquire write permission - notify everyone else you're changing it
only get data from memory if haven't found it in someone else's cache
minimize traffic on bus
request lock get as fast as can not slow everyone else down
measuring quiescence time
tom andersen
x = time of ops that don't use the bus. y = time of ops that cause intensive bus traffic
tom andersen
backoff lock - just put some backoff lock
Smart locks - machine learning
false sharing
padding
jonathan - makes smart locks (MIT phd student)
CLH lock
NUMA machine, memory distributed among processors, accessing data locally is really quick, if I have to go to a memory location that's far away, that's expensive.
MCS - Mellor Crummy and Scott, Explicit Pointers
A record, a bit, also a pointer.
allocate a node.
perform same kind of swap in CLH lock.
I know who the guy before me is.
I'm not gonna spin, I wanna spin on my record, that's the one that's local.
Purple spin on purple green spin on verde.
yo cambio la pointer, igoto his knowd and pawnt it to moi
red guy has inst'd his nowd, red gay spins, eventually purple will change dat to valz.
herez how mcs werks
qnode phalse tayle atauhmik
queue
phyx phointer
compare and swap to check if the tail pointer (node) is still pointing to me
if tail is still pointing to me, then nobody in queue, comparison operator, reaching consensus to agree that there is no tail
couldn't do this with reads and writes
compare and swap (MCS Queue lock)
other guy couldn't have been there, can't get in, has gotta notify me
then i spin
why do I spin now? Because I validated that if I failed the CAS, then if...
when he does I'm gonna let him go, I'm gonna release him.
No comments:
Post a Comment