Try to implement the reproduction of original unison paper
Before I learnt the real running process
Small Test
For original test, according to README.md
Bash ./ns3 build dctcp-example dctcp-example-mtp
time ./ns3 run dctcp-example
time ./ns3 run dctcp-example-mtp
4-5 minutes for dctcp-example
and 1-2 minutes for dctcp-example-mtp
Local Machine + Single Thread
Bash 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33 ❯ lscpu
Architecture: aarch64
CPU op-mode( s) : 64 -bit
Byte Order: Little Endian
CPU( s) : 12
On-line CPU( s) list: 0 -11
Vendor ID: Apple
Model: 0
Thread( s) per core: 1
Core( s) per cluster: 12
Socket( s) : -
Cluster( s) : 1
Stepping: 0x0
CPU max MHz: 2000 .0000
CPU min MHz: 2000 .0000
BogoMIPS: 48 .00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb
dcpodp flagm2 frint
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Not affected
Srbds: Not affected
Tsx async abort: Not affected
Based on Thread(s) per core: 1
, we can learn that hyper-thread is off
on local machine.
Bash %% dctcp-example %%
real 5m15.132s
user 5m14.952s
sys 0m0.162s
%% dctcp-example-mtp %%
real 1m34.903s
user 6m16.114s
sys 0m1.078s
It aligns with the data provided in the original paper.
Sugon Server + Hyper Thread
Bash %% dctcp-example %%
real 14m57.122s
user 14m56.322s
sys 0m0.853s
%% dctcp-example-mtp %%
real 5m14.145s
user 20m49.211s
sys 0m3.165s
Sugon Server + Single Thread
Bash %% dctcp-example %%
real 15m54.774s
user 15m54.431s
sys 0m0.386s
%% dctcp-example-mtp %%
real 4m49.496s
user 19m12.724s
sys 0m1.938s
Medium Tests
sugon, hyperthread open
fat-tree-mtp-k2-c2: 46.0808s
fat-tree-mtp-k2-c4: 50.2891s
fat-tree-mtp-k2-c8: 53.13s
fat-tree-mtp-k2-c16: 77.3042s
fat-tree-ori-k2-c2: 91.5243s
fat-tree-ori-k2-c4: 156.968s
fat-tree-ori-k2-c8: 156.317s
fat-tree-ori-k2-c16: 202.152s
k
c
result
2
2
1.986
2
4
3.12
2
8
2.942
2
16
2.61
local machine, hyperthread close
fat-tree-mtp-k2-c2: 12.2736s
fat-tree-mtp-k2-c4: 13.3277s
fat-tree-mtp-k2-c8: 14.8488s
fat-tree-mtp-k2-c16: 19.2544s
fat-tree-ori-k2-c2: 24.6369s
fat-tree-ori-k2-c4: 39.1419s
fat-tree-ori-k2-c8: 43.6409s
fat-tree-ori-k2-c16: 54.4671s
k
c
result
2
2
2.00
2
4
2.937
2
8
2.940
2
16
2.83
sugon, hyperthread close
fat-tree-mtp-k2-c2: 47.7066s
fat-tree-mtp-k2-c4: 56.5749s
fat-tree-mtp-k2-c8: 58.9219s
fat-tree-mtp-k2-c16: 74.1024s
fat-tree-ori-k2-c2: 105.104s
fat-tree-ori-k2-c4: 162.615s
fat-tree-ori-k2-c8: 181.431s
fat-tree-ori-k2-c16: 229.165s
k
c
result
2
2
2.203
2
4
2.874
2
8
3.079
2
16
3.09
Large Tests
Config cmd not good
Go to ns-3-msccl
: repo
In Script Set : use fat-tree-mtp.sh and fat-tree.sh
k
flow_t
result
4
1
4.72
8
1
12
1
16
1
This test equals to our real goal .
But now I believe its ./ns3 configure
might be wrong. We can just collect result now and use another config cmd later.
Config cmd correct
k
flow_t
result
4
1
8
1
12
1
16
1
After I got it
In this part, I have learnt the principle under unison.
Hence I will try to perform experiment reproduction of original unison paper on sugon machine without hyperthread .
I try to follow exp.py
in Unison-evaluations-for-mtp and make corresponding shell scripts (ori + mtp) in unison branch.
test-mtp.sh
Bash 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39 #!/bin/bash
mkdir -p ./fat-tree-data/ori-data
mkdir -p ./fat-tree-data/mtp-data
# Note: this time mtp and ori scripts are totally different.
# Differ: 1) configuration cmd 2) --thread
echo "Cleaning ns3 for safety..."
./ns3 clean
echo "Configuring ns3 (mtp mode)..."
./ns3 configure -d optimized --enable-modules applications,flow-monitor,mpi,mtp,nix-vector-routing,point-to-point --enable-mtp --enable-examples
echo "Building fat-tree (mtp mode)..."
./ns3 build fat-tree-mtp
for k in 8 16 ; do
for c in 8 16 ; do
cmd = "./ns3 run \"fat-tree-mtp \
--k= $k \
--cluster= $c \
--delay=3000 \
--bandwidth=100Gbps \
--flow=false \
--incast=1 \
--victim= $( seq -s- 0 $(( k * k / 4 - 1 ))) \
--time=0.1 \
--interval=0.01 \
--flowmon=false \
--thread= $c \" \
2>&1 | tee \"./fat-tree-data/mtp-data/fat-tree-mtp-k ${ k } -c ${ c } .txt\""
echo "Running test with k= $k , cluster= $c "
eval $cmd
echo "Completed test with k= $k , cluster= $c "
sleep 2
done
done
echo "All fat-tree-mtp tests completed!"
test-ori.sh
Bash 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38 #!/bin/bash
mkdir -p ./fat-tree-data/ori-data
mkdir -p ./fat-tree-data/mtp-data
# Note: this time mtp and ori scripts are totally different.
# Differ: 1) configuration cmd 2) --thread
echo "Cleaning ns3 for safety..."
./ns3 clean
echo "Configuring ns3 (ori mode)..."
./ns3 configure -d optimized --enable-modules applications,flow-monitor,mpi,mtp,nix-vector-routing,point-to-point --enable-examples
echo "Building fat-tree (ori mode)..."
./ns3 build fat-tree-ori
for k in 8 16 ; do
for c in 8 16 ; do
cmd = "./ns3 run \"fat-tree-ori \
--k= $k \
--cluster= $c \
--delay=3000 \
--bandwidth=100Gbps \
--flow=false \
--incast=1 \
--victim= $( seq -s- 0 $(( k * k / 4 - 1 ))) \
--time=0.1 \
--interval=0.01 \
--flowmon=false\" \
2>&1 | tee \"./fat-tree-data/ori-data/fat-tree-ori-k ${ k } -c ${ c } .txt\""
echo "Running test with k= $k , cluster= $c "
eval $cmd
echo "Completed test with k= $k , cluster= $c "
sleep 2
done
done
echo "All fat-tree-ori tests completed!"
Need to differ from mtp and ori.
In fat-tree-mtp.cc
and fat-tree-ori.cc
: mtp::enable(numTh)
In scripts: 1) configuration cmd 2) --thread
The corresponding code is in UNISON-for-ns-3/tree/unison/src/mtp/examples
, hence we need to add --enable-examples
in config cmd.
But actually it's not 100% same as that in exp.py
, since there is no --enable-examples
arg in exp.py
.
We need 100% reduction, so jump to Unison-evaluations-for-mtp
branch .
but we can still collect data generated here :)
k_fat
cluster
result
8
8
8
16
16
8
16
16
This commit is about it.
test-mtp.sh:
Bash 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39 #!/bin/bash
mkdir -p ./fat-tree-data/ori-data
mkdir -p ./fat-tree-data/mtp-data
# Note: this time mtp and ori scripts are totally different.
# Differ: 1) configuration cmd 2) --thread
echo "Cleaning ns3 for safety..."
./ns3 clean
echo "Configuring ns3 (mtp mode)..."
./ns3 configure -d optimized --enable-modules applications,flow-monitor,mpi,mtp,nix-vector-routing,point-to-point --enable-mtp
echo "Building fat-tree (mtp mode)..."
./ns3 build fat-tree
for k in 8 16 ; do
for c in 8 16 ; do
cmd = "./ns3 run \"fat-tree \
--k= $k \
--cluster= $c \
--delay=3000 \
--bandwidth=100Gbps \
--flow=false \
--incast=1 \
--victim= $( seq -s- 0 $(( k * k / 4 - 1 ))) \
--time=0.1 \
--interval=0.01 \
--flowmon=false \
--thread= $c \" \
2>&1 | tee \"./fat-tree-data/mtp-data/fat-tree-mtp-k ${ k } -c ${ c } .txt\""
echo "Running test with k= $k , cluster= $c "
eval $cmd
echo "Completed test with k= $k , cluster= $c "
sleep 2
done
done
echo "All fat-tree (mtp mode) tests completed!"
test-ori.sh
Bash 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38 #!/bin/bash
mkdir -p ./fat-tree-data/ori-data
mkdir -p ./fat-tree-data/mtp-data
# Note: this time mtp and ori scripts are totally different.
# Differ: 1) configuration cmd 2) --thread
echo "Cleaning ns3 for safety..."
./ns3 clean
echo "Configuring ns3 (ori mode)..."
./ns3 configure -d optimized --enable-modules applications,flow-monitor,mpi,mtp,nix-vector-routing,point-to-point
echo "Building fat-tree (ori mode)..."
./ns3 build fat-tree
for k in 8 16 ; do
for c in 8 16 ; do
cmd = "./ns3 run \"fat-tree \
--k= $k \
--cluster= $c \
--delay=3000 \
--bandwidth=100Gbps \
--flow=false \
--incast=1 \
--victim= $( seq -s- 0 $(( k * k / 4 - 1 ))) \
--time=0.1 \
--interval=0.01 \
--flowmon=false\" \
2>&1 | tee \"./fat-tree-data/ori-data/fat-tree-ori-k ${ k } -c ${ c } .txt\""
echo "Running test with k= $k , cluster= $c "
eval $cmd
echo "Completed test with k= $k , cluster= $c "
sleep 2
done
done
echo "All fat-tree (ori mode) tests completed!"
The only change is that: we delete --enable-examples
arg in config cmd.
One thing need to focus is: the simulation experiment code is in UNISON-FOR-NS-3/scratch
, like scratch/fat-tree.cc
(what we run here).
We can easily get the linking relationship in ./CMake.txt
:
Bash # Build scratch/simulation scripts
add_subdirectory( scratch)
The fat-tree topology here is totally different from that in unison branch!!!
Micro-benchmark
k_fat
cluster
result
8
8
1.23
8
16
1.59
16
8
0.71
16
16
1.46
Medium-benchmark
k_fat
cluster
result
8
24
1.54
8
48
X
8
72
X
Tips
1) Don't run script in different branch at the same time!
Tips: experiments in unison
and unison-evaluations-for-mtp
can not run at the same time, reason: they are git branch, the result file is the same, will lead to strange errors.
2) Need consider thread interaction!
We have to add some barrier
in scripts to avoid: some entities run without config, a good way is to set sleep 20
3) Tmux is account-level
rather than folder-level
.
4) It is notable that multiple experiments required by the same figure should be performed under the same hardware configuration.
If different figures require the same experiment, you can perform this experiment just once.
MORE important issue!!!!!!
unison-mpi
The test in paper:
k=8, cluster=48
k=8, cluster=72
k=8, cluster=96
They are all corresponding to fat-tree-distributed
.
prerequisites
问题1: 需要分布式系统联合操作
问题2: 这个实验,服务器最少要144核,sugon目前只有72
Bash bxhu@sugon:~/UNISON-for-ns-3$ lscpu
Architecture: x86_64
CPU op-mode( s) : 32 -bit, 64 -bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU( s) : 72
On-line CPU( s) list: 0 -71