Why Parallelism? Why Efficiency?¶
第一章,纯纯引入parallel computing的,但是其中有很多“体现辩证性”的思考,我们简单复盘一下:
系统设计的 trade-off¶
(1) Parallel 的一般步骤:
- Decomposing work into pieces that can safely be performed in parallel
- Assigning work to processors
- Managing communication/synchronization between the processors
- so that it does not limit speedup
(2) 评价一个设计/系统的指标?考察因素:
- performance
- convenience
- cost
(3) Fast != Efficient:
比如, 对于 Is 2x speedup on computer with 10 processors a good result?
这个问题
- 一般来说, 一眼
not good
, 理由很简单, 我花了10倍的开销, 却只实现2倍的性能增益. 简直太差了! - 有些情况下, 我们认为它是
good enough
的:- 比如
10x processors
金钱成本很低, 这个2x speedup
很宝贵 (比如google网页的查询返回, 2x的优化就很厉害了)
- 比如
程序 - 指令 - 处理器 - 状态¶
(1) 灵魂拷问: What is a computer program?
说实话这个问题我听课时, 点了暂停, 想了好几分钟, 却依旧没有头绪
Answer: A program is just a list of processor instructions!
高级语言经过compiler之类的处理后, 变成机器可以识别的机器码, 随后执行...
因此, 一个 program 本质上就是一些 “可被处理器识别的指令”
(2) 灵魂拷问: What does a processor do?
Answer1: A processor executes instructions
上面这个回答纯搞笑, 没有向下挖掘
Answer2: modifies the computer’s state! (by instructions)
太对了! 本质上就是通过“指令的指导”, 将 registers / memory 等的状态 发生改变 (如: 存储的数值)
(3) 灵魂拷问: What do I mean when I talk about a computer’s “state” ?
Answer: values of program data, which are stored in a processor’s registers or in memory!
跟 (2) 提及的是一样的, 感觉跟之前国内上课提到的 “一台电脑本质上就是一个状态机” 相呼应
只是我们在国内从来不会说“为什么” 😅
Superscalar Processor¶
(1) 一个例子: Superscalar Processor (超标量处理器) Execution
- ILP: Instruction-Level Parallelism
- "指令级并行" 等级
- 在这个例子里, 整个“并行”的全过程, 对高级语言和程序员而言是“无感”的
- Superscalar execution: processor automatically finds * independent instructions in an instruction sequence and executes them in parallel on multiple execution units !
- Superscalar Processor: 以一个2级的为例
- 指令之间的依赖关系本质上会形成一个
Instruction Dependency Graph
:
(2) 单核已死: single-instruction stream performance is dead
原因:
- 功率受限 -> 晶体管数量受限 -> 单核的性能受限 (比如, 一个芯片只能装“受限数量”的晶体管)
- ILP 扩展性萎缩
What We Prefer Currently:
- faster processors <-- more execution units running in parallel
- units that are specialized for a specific task (graphics ...)