Try to implement the reproduction of original unison paper 
Before I learnt the real running process 
Small Test 
For original test, according to README.md
Bash ./ns3  build  dctcp-example  dctcp-example-mtp
time   ./ns3  run  dctcp-example
time   ./ns3  run  dctcp-example-mtp
 
4-5 minutes for dctcp-example and 1-2 minutes for dctcp-example-mtp
Local Machine + Single Thread 
Bash  1 
 2 
 3 
 4 
 5 
 6 
 7 
 8 
 9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 ❯  lscpu
Architecture:              aarch64
   CPU  op-mode( s) :          64 -bit
   Byte  Order:              Little  Endian
CPU( s) :                    12 
   On-line  CPU( s)   list:     0 -11
Vendor  ID:                 Apple
   Model:                   0 
   Thread( s)   per  core:      1 
   Core( s)   per  cluster:     12 
   Socket( s) :               -
   Cluster( s) :              1 
   Stepping:                0x0
   CPU  max  MHz:             2000 .0000
   CPU  min  MHz:             2000 .0000
   BogoMIPS:                48 .00
   Flags:                   fp  asimd  evtstrm  aes  pmull  sha1  sha2  crc32  atomics  fphp  asimdhp  cpuid  asimdrdm  jscvt  fcma  lrcpc  dcpop  sha3  asimddp  sha512  asimdfhm  dit  uscat  ilrcpc  flagm  ssbs  sb
                            dcpodp  flagm2  frint
Vulnerabilities:
   Gather  data  sampling:    Not  affected
   Itlb  multihit:           Not  affected
   L1tf:                    Not  affected
   Mds:                     Not  affected
   Meltdown:                Not  affected
   Mmio  stale  data:         Not  affected
   Reg  file  data  sampling:  Not  affected
   Retbleed:                Not  affected
   Spec  rstack  overflow:    Not  affected
   Spec  store  bypass:       Mitigation;   Speculative  Store  Bypass  disabled  via  prctl
   Spectre  v1:              Mitigation;   __user  pointer  sanitization
   Spectre  v2:              Not  affected
   Srbds:                   Not  affected
   Tsx  async  abort:         Not  affected
 
Based on Thread(s) per core:     1, we can learn that hyper-thread is off on local machine.
Bash %%  dctcp-example  %%
real     5m15.132s
user     5m14.952s
sys  0m0.162s
%%  dctcp-example-mtp  %%
real     1m34.903s
user     6m16.114s
sys  0m1.078s
 
It aligns with the data provided in the original paper.
Sugon Server + Hyper Thread 
Bash %%  dctcp-example  %%
real     14m57.122s
user     14m56.322s
sys  0m0.853s
%%  dctcp-example-mtp  %%
real     5m14.145s
user     20m49.211s
sys  0m3.165s
 
Sugon Server + Single Thread 
Bash %%  dctcp-example  %%
real     15m54.774s
user     15m54.431s
sys      0m0.386s
%%  dctcp-example-mtp  %%
real     4m49.496s
user     19m12.724s
sys      0m1.938s
 
Medium Tests 
sugon, hyperthread open 
fat-tree-mtp-k2-c2: 46.0808s
fat-tree-mtp-k2-c4: 50.2891s
fat-tree-mtp-k2-c8: 53.13s
fat-tree-mtp-k2-c16: 77.3042s
fat-tree-ori-k2-c2: 91.5243s
fat-tree-ori-k2-c4: 156.968s
fat-tree-ori-k2-c8: 156.317s
fat-tree-ori-k2-c16: 202.152s
k 
c 
result 
 
 
2 
2 
1.986 
 
2 
4 
3.12 
 
2 
8 
2.942 
 
2 
16 
2.61 
 
 
local machine, hyperthread close 
fat-tree-mtp-k2-c2: 12.2736s
fat-tree-mtp-k2-c4: 13.3277s
fat-tree-mtp-k2-c8: 14.8488s
fat-tree-mtp-k2-c16: 19.2544s
fat-tree-ori-k2-c2: 24.6369s
fat-tree-ori-k2-c4: 39.1419s
fat-tree-ori-k2-c8: 43.6409s
fat-tree-ori-k2-c16: 54.4671s
k 
c 
result 
 
 
2 
2 
2.00 
 
2 
4 
2.937 
 
2 
8 
2.940 
 
2 
16 
2.83 
 
 
sugon, hyperthread close 
fat-tree-mtp-k2-c2: 47.7066s
fat-tree-mtp-k2-c4: 56.5749s
fat-tree-mtp-k2-c8: 58.9219s
fat-tree-mtp-k2-c16: 74.1024s
fat-tree-ori-k2-c2: 105.104s
fat-tree-ori-k2-c4: 162.615s
fat-tree-ori-k2-c8: 181.431s
fat-tree-ori-k2-c16: 229.165s
k 
c 
result 
 
 
2 
2 
2.203 
 
2 
4 
2.874 
 
2 
8 
3.079 
 
2 
16 
3.09 
 
 
Large Tests 
Config cmd not good 
Go to ns-3-msccl: repo 
In Script Set : use fat-tree-mtp.sh  and fat-tree.sh 
k 
flow_t 
result 
 
 
4 
1 
4.72 
 
8 
1 
 
 
12 
1 
 
 
16 
1 
 
 
 
This test equals to our real goal .
But now I believe its ./ns3 configure might be wrong. We can just collect result now and use another config cmd later.
Config cmd correct 
k 
flow_t 
result 
 
 
4 
1 
 
 
8 
1 
 
 
12 
1 
 
 
16 
1 
 
 
 
After I got it 
In this part, I have learnt the principle under unison. 
Hence I will try to perform experiment reproduction of original unison paper on sugon machine without hyperthread .
I try to follow exp.py in Unison-evaluations-for-mtp  and make corresponding shell scripts (ori + mtp) in unison branch.
test-mtp.sh 
Bash  1 
 2 
 3 
 4 
 5 
 6 
 7 
 8 
 9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 #!/bin/bash 
mkdir  -p  ./fat-tree-data/ori-data
mkdir  -p  ./fat-tree-data/mtp-data
# Note: this time mtp and ori scripts are totally different. 
# Differ: 1) configuration cmd 2) --thread 
echo   "Cleaning ns3 for safety..." 
./ns3  clean
echo   "Configuring ns3 (mtp mode)..." 
./ns3  configure  -d  optimized  --enable-modules  applications,flow-monitor,mpi,mtp,nix-vector-routing,point-to-point  --enable-mtp  --enable-examples
echo   "Building fat-tree (mtp mode)..." 
./ns3  build  fat-tree-mtp
for   k  in   8   16 ;   do 
   for   c  in   8   16 ;   do 
     cmd = "./ns3 run \"fat-tree-mtp \ 
    --k= $k  \ 
    --cluster= $c  \ 
    --delay=3000 \ 
    --bandwidth=100Gbps \ 
    --flow=false \ 
    --incast=1 \ 
    --victim= $( seq  -s-  0   $(( k * k / 4 - 1 )))  \ 
    --time=0.1 \ 
    --interval=0.01 \ 
    --flowmon=false \ 
    --thread= $c \" \ 
    2>&1 | tee \"./fat-tree-data/mtp-data/fat-tree-mtp-k ${ k } -c ${ c } .txt\"" 
     echo   "Running test with k= $k , cluster= $c " 
     eval   $cmd 
     echo   "Completed test with k= $k , cluster= $c " 
     sleep  2 
   done 
done 
echo   "All fat-tree-mtp tests completed!" 
 
test-ori.sh 
Bash  1 
 2 
 3 
 4 
 5 
 6 
 7 
 8 
 9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 #!/bin/bash 
mkdir  -p  ./fat-tree-data/ori-data
mkdir  -p  ./fat-tree-data/mtp-data
# Note: this time mtp and ori scripts are totally different. 
# Differ: 1) configuration cmd 2) --thread 
echo   "Cleaning ns3 for safety..." 
./ns3  clean
echo   "Configuring ns3 (ori mode)..." 
./ns3  configure  -d  optimized  --enable-modules  applications,flow-monitor,mpi,mtp,nix-vector-routing,point-to-point  --enable-examples
echo   "Building fat-tree (ori mode)..." 
./ns3  build  fat-tree-ori
for   k  in   8   16 ;   do 
   for   c  in   8   16 ;   do 
     cmd = "./ns3 run \"fat-tree-ori \ 
    --k= $k  \ 
    --cluster= $c  \ 
    --delay=3000 \ 
    --bandwidth=100Gbps \ 
    --flow=false \ 
    --incast=1 \ 
    --victim= $( seq  -s-  0   $(( k * k / 4 - 1 )))  \ 
    --time=0.1 \ 
    --interval=0.01 \ 
    --flowmon=false\" \ 
    2>&1 | tee \"./fat-tree-data/ori-data/fat-tree-ori-k ${ k } -c ${ c } .txt\"" 
     echo   "Running test with k= $k , cluster= $c " 
     eval   $cmd 
     echo   "Completed test with k= $k , cluster= $c " 
     sleep  2 
   done 
done 
echo   "All fat-tree-ori tests completed!" 
 
Need to differ from mtp and ori.
In fat-tree-mtp.cc and fat-tree-ori.cc: mtp::enable(numTh) 
In scripts: 1) configuration cmd 2) --thread 
 
 
The corresponding code is in UNISON-for-ns-3/tree/unison/src/mtp/examples, hence we need to add --enable-examples in config cmd. 
 
But actually it's not 100% same as that in exp.py, since there is no --enable-examples arg in exp.py.
We need 100% reduction, so jump to Unison-evaluations-for-mtp branch .
but we can still collect data generated here :)
 
k_fat 
cluster 
result 
 
 
8 
8 
 
 
8 
16 
 
 
16 
8 
 
 
16 
16 
 
 
 
This commit  is about it.
test-mtp.sh:
Bash  1 
 2 
 3 
 4 
 5 
 6 
 7 
 8 
 9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 #!/bin/bash 
mkdir  -p  ./fat-tree-data/ori-data
mkdir  -p  ./fat-tree-data/mtp-data
# Note: this time mtp and ori scripts are totally different. 
# Differ: 1) configuration cmd 2) --thread 
echo   "Cleaning ns3 for safety..." 
./ns3  clean
echo   "Configuring ns3 (mtp mode)..." 
./ns3  configure  -d  optimized  --enable-modules  applications,flow-monitor,mpi,mtp,nix-vector-routing,point-to-point  --enable-mtp
echo   "Building fat-tree (mtp mode)..." 
./ns3  build  fat-tree
for   k  in   8   16 ;   do 
   for   c  in   8   16 ;   do 
     cmd = "./ns3 run \"fat-tree \ 
    --k= $k  \ 
    --cluster= $c  \ 
    --delay=3000 \ 
    --bandwidth=100Gbps \ 
    --flow=false \ 
    --incast=1 \ 
    --victim= $( seq  -s-  0   $(( k * k / 4 - 1 )))  \ 
    --time=0.1 \ 
    --interval=0.01 \ 
    --flowmon=false \ 
    --thread= $c \" \ 
    2>&1 | tee \"./fat-tree-data/mtp-data/fat-tree-mtp-k ${ k } -c ${ c } .txt\"" 
     echo   "Running test with k= $k , cluster= $c " 
     eval   $cmd 
     echo   "Completed test with k= $k , cluster= $c " 
     sleep  2 
   done 
done 
echo   "All fat-tree (mtp mode) tests completed!" 
 
test-ori.sh
Bash  1 
 2 
 3 
 4 
 5 
 6 
 7 
 8 
 9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 #!/bin/bash 
mkdir  -p  ./fat-tree-data/ori-data
mkdir  -p  ./fat-tree-data/mtp-data
# Note: this time mtp and ori scripts are totally different. 
# Differ: 1) configuration cmd 2) --thread 
echo   "Cleaning ns3 for safety..." 
./ns3  clean
echo   "Configuring ns3 (ori mode)..." 
./ns3  configure  -d  optimized  --enable-modules  applications,flow-monitor,mpi,mtp,nix-vector-routing,point-to-point
echo   "Building fat-tree (ori mode)..." 
./ns3  build  fat-tree
for   k  in   8   16 ;   do 
   for   c  in   8   16 ;   do 
     cmd = "./ns3 run \"fat-tree \ 
    --k= $k  \ 
    --cluster= $c  \ 
    --delay=3000 \ 
    --bandwidth=100Gbps \ 
    --flow=false \ 
    --incast=1 \ 
    --victim= $( seq  -s-  0   $(( k * k / 4 - 1 )))  \ 
    --time=0.1 \ 
    --interval=0.01 \ 
    --flowmon=false\" \ 
    2>&1 | tee \"./fat-tree-data/ori-data/fat-tree-ori-k ${ k } -c ${ c } .txt\"" 
     echo   "Running test with k= $k , cluster= $c " 
     eval   $cmd 
     echo   "Completed test with k= $k , cluster= $c " 
     sleep  2 
   done 
done 
echo   "All fat-tree (ori mode) tests completed!" 
 
The only change is that: we delete --enable-examples arg in config cmd.
One thing need to focus is: the simulation experiment code is in UNISON-FOR-NS-3/scratch, like scratch/fat-tree.cc(what we run here).
We can easily get the linking relationship in ./CMake.txt: 
Bash # Build scratch/simulation scripts 
add_subdirectory( scratch) 
 
The fat-tree topology here is totally different from that in unison branch!!! 
Micro-benchmark 
k_fat 
cluster 
result 
 
 
8 
8 
1.23 
 
8 
16 
1.59 
 
16 
8 
0.71 
 
16 
16 
1.46 
 
 
 
 
 
 
Medium-benchmark 
k_fat 
cluster 
result 
 
 
8 
24 
1.54 
 
8 
48 
X 
 
8 
72 
X 
 
 
Tips 
1) Don't run script in different branch at the same time!
Tips: experiments in unison and unison-evaluations-for-mtp can not run at the same time, reason: they are git branch, the result file is the same, will lead to strange errors.
 
2) Need consider thread interaction!
We have to add some barrier in scripts to avoid: some entities run without config, a good way is to set sleep 20
 
3) Tmux is account-level rather than folder-level.
4) It is notable that multiple experiments required by the same figure should be performed under the same hardware configuration. 
If different figures require the same experiment, you can perform this experiment just once.
 
MORE important issue!!!!!!
unison-mpi 
The test in paper:
k=8, cluster=48 
k=8, cluster=72 
k=8, cluster=96 
 
They are all corresponding to fat-tree-distributed.
prerequisites 
问题1: 需要分布式系统联合操作
问题2: 这个实验,服务器最少要144核,sugon目前只有72
Bash bxhu@sugon:~/UNISON-for-ns-3$  lscpu
Architecture:              x86_64
   CPU  op-mode( s) :          32 -bit,  64 -bit
   Address  sizes:           46   bits  physical,  48   bits  virtual
   Byte  Order:              Little  Endian
CPU( s) :                    72 
   On-line  CPU( s)   list:     0 -71