跳转至

Design and Implementation

Goals and Approach

We design dSDN to meet the following goals: (1) on-box operation with no dependence on external infrastructure, (2) support for operator-defined control code, (3) network efficiency, matching that achieved by state-of-the-art TE solutions, and (4) simple consensusfree path computations. The first goal is a benefit of traditional protocols while the last three are realized by cSDN.

我们设计dSDN以实现以下目标:

  1. 独立设备运行 dSDN无需依赖外部基础设施,可在设备上独立运行。这一特性继承了传统网络协议的优势,确保了系统的自主性和可靠性。
  2. 支持运营商定义的控制代码 系统允许网络运营商自定义控制代码,提供了极大的灵活性和可定制性。这使得网络管理人员能够根据特定需求调整网络行为。
  3. 网络效率 dSDN的网络效率与最先进的流量工程(TE)解决方案相匹配。这意味着系统能够在资源利用和性能优化方面达到业界领先水平。
  4. 简化的无共识路径计算 系统采用简化的无需共识的路径计算方法。这种设计显著降低了计算复杂度,提高了网络的响应速度和可扩展性。

值得注意的是,第一个目标体现了传统协议的优势,而后三个目标则是通过 cSDN(集中式SDN) 的概念实现的。这种设计方法巧妙地结合了传统网络协议和软件定义网络的优点,为现代网络架构提供了一种创新的解决方案。

Achieving the first two goals is made possible by the ability to run custom code on routers in a vendor-agnostic manner (§2.4); we envision each router running an on-box container with our custom code that we call the dSDN controller. Figure 4 shows this architecture in dSDN as the counterpart to the cSDN control architecture shown in Figure 2. Achieving the latter two goals – efficiency and consensus-free computations – requires more thought.

实现前两个目标得以可能是因为我们能够以厂商无关的方式在路由器上运行自定义代码(§2.4)。我们设想每个路由器运行一个on-box容器,其中包含我们称为dSDN控制器的自定义代码。图4展示了dSDN中的这种架构,作为图2所示的cSDN控制架构的对应部分。实现后两个目标——效率和无共识计算——则需要更深入的思考。

alt text

Efficiency. cSDN achieves efficiency, i.e., high network utilization, by running a TE computation that acts on a global view of topology and traffic demands. To achieve the same in dSDN, we too run TE over a global network view but do so at every router. Hence, each dSDN controller discovers its local state – link status, traffic matrix, etc. – via the APIs described in §2.4, then floods this information to all other controllers via a simple dissemination protocol (akin to IS-IS). As a result, every dSDN controller has a global network view over which it is able to run TE.

Assuming all routers have the same view, they will compute identical paths and in this sense dSDN is equivalent to having a single controller computing all routes. In practice, of course, different routers may have slightly divergent views and we evaluate the impact of this in §5.

效率。cSDN通过在全局拓扑和流量需求视图上运行流量工程(TE)计算来实现效率,即高网络利用率。为在dSDN中实现相同效果,我们同样在全局网络视图上运行TE,但是在每个路由器上执行此操作。因此,每个dSDN控制器通过§2.4中描述的API发现其本地状态——链路状态、流量矩阵等,然后通过简单的传播协议(类似于IS-IS)将这些信息泛洪到所有其他控制器。结果是,每个dSDN控制器都拥有一个可以运行TE的全局网络视图。

假设所有路由器具有相同的视图,它们将计算出相同的路径,在这个意义上,dSDN等同于由单一控制器计算所有路由。当然,在实践中,不同路由器可能会有略微不同的视图,我们在§5中评估了这种差异的影响。

如何形成全局网络视图

这一部分的idea是:对于dSDN,像cSDN一样,全局运行TE

不同的是:现在全局范围内,每个router都需要进行自己的状态并广播,从而形成全局的网络视图

Consensus-free route computation. cSDN enjoys the simplicity of “consensus-free” route computations. A cSDN controller is authoritative in that it alone computes paths and programs these as forwarding rules at every router. Each router then follows these rules no matter what its own view of network state is. This authoritative design does not require consensus across routers for path selection. The challenges of distributed consensus are well documented in BGP [21, 64] and even in relatively simple distributed protocols such as IS-IS which can suffer loops and dead-ends until all routers converge.

The key question is thus, how do we achieve consensus-free forwarding in a decentralized control plane? The answer is via source routing. When a packet enters the network, the head-end router adds a source route to the packet header and all other routers blindly follow the source route. Thus the path any given packet takes is decided by a single authoritative entity — the head-end router. The head-end needs no programming of paths at transit routers, and hence no agreement to establish state. Instead, all state about the path is encoded in the packet header.

Source routing has traditionally been implemented as “loose” source routing [11, 31] in which, rather than record every node along the path, the source route records only a subset of routers or special “waypoint” along the path. This introduces some complexity, requiring either inter-node signaling (to establish waypoints [2]), or an underlay routing protocol (to establish connectivity between routers). Our approach avoids this complexity by utilizing “strict” source routing, in which the complete router-level path is enumerated in the packet header, as described below.

无共识路由计算。cSDN享有"无共识"路由计算的简单性。cSDN控制器具有权威性,它单独计算路径并将其作为转发规则编程到每个路由器中。然后,每个路由器都遵循这些规则,无论其自身对网络状态的看法如何。这种权威设计不需要路由器之间就路径选择达成共识。分布式共识的挑战在BGP[21, 64]中有充分的记录,甚至在相对简单的分布式协议(如IS-IS)中也存在,这些协议可能会在所有路由器收敛之前出现循环和死胡同。

因此,关键问题是,我们如何在分散的控制平面中实现无共识转发?答案是通过源路由。当数据包进入网络时,首端路由器将源路由添加到数据包头部,所有其他路由器都盲目遵循这个源路由。因此,任何给定数据包所采取的路径都由单一权威实体——首端路由器决定。首端不需要在中转路由器上编程路径,因此不需要达成协议来建立状态。相反,所有关于路径的状态都编码在数据包头部。

源路由传统上被实现为"松散"源路由[11, 31], 在这种方式中, 源路由不记录路径上的每个节点, 而只记录路径上的一部分路由器或特殊的"路径点"。这引入了一些复杂性, 需要节点间信令(以建立路径点)或底层路由协议(以建立路由器之间的连接)。我们的方法通过使用"严格"源路由避免了这种复杂性,在这种方式中,完整的路由器级路径被列举在数据包头部, 如下所述。

如何在分散的控制平面中实现无共识转发

Answer: 源路由

任何给定数据包所采取的路径都由单一权威实体: 首端路由器 决定

所有关于路径的状态都编码在数据包头部

Design Details

A dSDN controller implements three main tasks:

(1) Learning local and global network state. dSDN requires the following local state from each router: (i) link status and utilization; (ii) attached prefixes, and (iii) aggregate traffic demands to each egress router. A dSDN controller obtains this local state from its underlying router stack by subscribing to the relevant telemetry and configuration paths via the gNMI API [8] and OpenConfig [44] data models.

A controller disseminates the above local information in the form of a “Node State Update” (NSU) message that also includes the node’s ID, link IDs, and a unique sequence number. NSUs are disseminated to other routers using standard flooding. By listening to NSUs from other routers, every dSDN controller reconstructs a global view of the network topology including not just standard link status but also available link capacity, which prefixes are associated with each router, and traffic demands.

(1) 学习本地和全局网络状态。dSDN需要从每个路由器获取以下本地状态: (i) 链路状态和利用率; (ii) 附加的前缀; (iii) 到每个出口路由器的聚合流量需求。dSDN控制器通过gNMI API和OpenConfig数据模型订阅相关遥测和配置路径, 从其底层路由器堆栈获取这些本地状态。

控制器以"节点状态更新"(NSU)消息的形式传播上述本地信息, 该消息还包括节点ID、链路ID和唯一序列号。NSU使用标准泛洪方式传播到其他路由器。通过监听来自其他路由器的NSU, 每个dSDN控制器重建网络拓扑的全局视图, 不仅包括标准链路状态, 还包括可用链路容量、与每个路由器关联的前缀以及流量需求。

Learning local and global network state

What we need in each router:

  1. 链路状态和利用率
  2. 此时pkt附加的前缀
  3. 到每个出口路由器的聚合流量需求

What actually report in each router:

会发送一个 NSU,其包含:

  1. 节点ID
  2. 链路ID
  3. 唯一序列号

这样所有信息全部broadcast,导致每个router都有全局拓扑的视图,因此每个router都知道:

  1. 标准链路状态
  2. 可用链路容量
  3. 与每个路由器关联的前缀
  4. 流量需求

(2) Computing paths. dSDN controllers compute paths using a TE algorithm based on prior work [27] which approximates max-min fair allocations and balances short paths with maintaining high network utilization, but modified with some optimizations. Most important of these is the removal of per-service utility curves, as demand is measured in-band and is thus aggregated by (destination router, priority class) tuple. In dSDN, every router 𝑅 runs TE to compute the placement of all flows in the network, from which it then selects the subset of paths that start at 𝑅 for programming.

(2) 计算路径。dSDN控制器使用基于先前工作的流量工程(TE)算法计算路径, 该算法近似最大最小公平分配, 并平衡短路径与维持高网络利用率, 但进行了一些优化修改。其中最重要的是移除了每个服务的效用曲线, 因为需求是通过带内测量的, 因此按 (目标路由器,优先级类别) 元组进行聚合。在dSDN中, 每个路由器R运行TE来计算网络中所有流的放置, 然后从中选择起始于R的路径子集进行编程。

Computing paths

dSDN控制器使用流量工程(TE)算法计算路径

  1. 每个路由器R运行TE来计算网络中所有流的放置
  2. 然后从中选择起始于R的路径子集进行 path programming

(3) Programming strict source routes. We encode source routes as stacks of labels enumerating each link to be traversed using its unique link ID learned from NSUs. We use MPLS to encode labels in the packet header, similarly to the adjacency-SID-based MPLS-SR data plane design [3] which is commonly supported today by WAN vendor hardware.

When a packet first enters the network, the head-end router performs a two-stage lookup that maps from the packet’s destination IP address to a source route. The first lookup table maps from the destination IP address and priority class to a unique egress router; this (prefix→egress) table is constructed using the prefix information carried in NSUs. The second lookup table maps from the egress router to a set of weighted source routes computed by TE, picking one by hashing a portion of the packet header. This two-stage lookup is standard and is supported without additional latency by the forwarding ASIC; see [11, 27] for a detailed explanation.

At intermediate routers, encapsulated packets are forwarded based on their outer label using a third MPLS forwarding table that contains static forwarding entries for the link IDs that the router advertises. This table is programmed when the dSDN controller comes up. Each router pops the outer label before forwarding the packet on. In case a failure renders a static label invalid, we use local repair paths that take the packet around the failure similar to the FRR mechanism [1] used in B4 and B2 today 1 ; the invalid label is popped and a bypass source route is prepended, taking the packet to its original next hop to continue onwards as intended by the head-end. This only lasts until the head-end router learns of the failure and recomputes paths to avoid the failure, after which the packet will take a path that avoids that failure entirely.

Figure 5 shows a simple example of the life of a packet. When a packet destined for a host attached to 𝑅1 enters the network at 𝑅0, the forwarding hardware first maps 1.1.1.7 to 𝑅1 using the top table, and then picks the source route 𝐴-𝐷-𝐺 by looking up 𝑅1 in the second table and hashing to select between the source routes to 𝑅1. The selected route is placed as a stack of labels 𝐴, 𝐷, and 𝐺 in the packet header, then the packet is forwarded along this path with the outer label popped at each transit.

A core potential challenge of this design is the number of labels that must go in the header of the packet. This is a challenge on two operational fronts: for a path of length 𝑛, (1) the head-end router must push 𝑛 labels onto the packet, and (2) transit routers must be able to read past up to 𝑛 labels to reach inner headers that provide entropy for effective load-sharing across multiple paths [68]. Modern routers support up to 12 labels [47] for both of these operations, which is sufficient to encode the path lengths in our current WANs. For networks with longer paths, or older hardware, we propose a sub-label encoding, described in Appendix §A, that compresses the source route by encoding multiple hops in a single label in a consensus-free manner.

(3) 编程严格源路由。我们将源路由编码为标签栈, 使用从NSU中学习到的每个链路的唯一链路ID来枚举要遍历的每个链路。我们使用MPLS在数据包头中编码标签, 类似于目前WAN供应商硬件普遍支持的基于邻接SID的 MPLS-SR 数据平面设计。

alt text

当数据包首次进入网络时, 首端路由器执行两阶段查找, 将数据包的目的IP地址映射到源路由。第一个查找表将目的IP地址和优先级类映射到唯一的出口路由器; 这个(前缀→出口)表是使用NSU中携带的前缀信息构建的。第二个查找表将出口路由器映射到由TE计算的一组加权源路由, 通过对数据包头部的一部分进行哈希来选择一个。这种两阶段查找是标准的, 可以由转发ASIC支持而无需额外延迟; 详细解释见[11,27]。

在中间路由器上, 封装的数据包根据其外层标签使用第三个MPLS转发表进行转发, 该表包含路由器通告的链路ID的静态转发条目。这个表在dSDN控制器启动时进行编程。每个路由器在转发数据包之前弹出外层标签。如果故障导致静态标签无效, 我们使用本地修复路径, 类似于B4和B2目前使用的FRR机制, 将数据包绕过故障; 无效标签被弹出, 并在前面添加一个旁路源路由, 将数据包带到原始的下一跳, 继续按照首端的意图前进。这种情况只会持续到首端路由器了解故障并重新计算路径以避开故障, 之后数据包将完全避开该故障的路径。

图5显示了一个数据包生命周期的简单示例。当目的地为连接到R1的主机的数据包在R0进入网络时, 转发硬件首先使用顶部表将1.1.1.7映射到R1, 然后通过在第二个表中查找R1并对源路由进行哈希选择来选择源路由A-D-G。所选路由作为标签A、D和G的栈放置在数据包头中, 然后数据包沿着这条路径转发, 每次传输时弹出外层标签。

这种设计的一个核心潜在挑战是必须放入数据包头的标签数量。这在两个操作方面是一个挑战: 对于长度为n的路径, (1)首端路由器必须将n个标签推送到数据包上, (2)传输路由器必须能够读取最多n个标签以到达提供有效负载分享多条路径的内部头。现代路由器支持这两种操作最多12个标签, 这足以编码我们当前WAN中的路径长度。对于具有更长路径或较旧硬件的网络, 我们提出了一种子标签编码, 如附录§A所述, 通过以无共识的方式在单个标签中编码多个跳来压缩源路由。

Warning

这一部分,pkt的路由信息传递非常重要,见上述的黄字 即可

Fault Tolerance. dSDN uses standard techniques developed in the context of IS-IS implementations to address controller crashes, bugs, failure detection, and so forth [7, 55]. This includes techniques such as: loading network state from an immediate neighbor (after a controller crash/restart), rollback (in the event of a bug), invariant checks (for malformed NSUs), pre-configured backup paths (for immediate link failure), and so forth. As with existing centralized systems, dSDN relies on the measurements that routers report to accurately reflect the true state of the system, as these observations are used as input to the TE algorithm. Increasing tolerance to byzantine failures remains an open research problem.

容错性。dSDN使用在IS-IS实现背景下开发的标准技术来处理控制器崩溃、错误、故障检测等问题[7, 55]。这包括以下技术:

  • 从直接邻居加载网络状态(在控制器崩溃/重启后)
  • 回滚(在出现错误的情况下)
  • 不变性检查(针对格式错误的NSU)
  • 预配置的备份路径(用于即时链路故障)
  • 等等

与现有的集中式系统一样,dSDN依赖于路由器报告的测量数据准确反映系统的真实状态,因为这些观察结果被用作TE算法的输入。提高对拜占庭故障的容忍度仍然是一个开放的研究问题。

byzantine failures

Byzantine failures(拜占庭故障) 是分布式系统中一种特殊的故障类型,指的是系统中的某些节点可能会表现出任意的、不可预测的或恶意的行为。这种故障模型源自拜占庭将军问题。

具体来说,拜占庭故障的特点包括:

  1. 节点可能发送错误的信息:故障节点可能向其他节点发送不一致或错误的信息。
  2. 节点可能选择性地发送信息:故障节点可能只向部分节点发送信息,而不是向所有节点广播。
  3. 节点可能表现出恶意行为:故障节点可能试图破坏系统的正常运行,如故意延迟消息、发送冲突的信息等。
  4. 节点的行为可能是不可预测的:故障节点的行为可能随时间变化,难以预测和应对。

在分布式系统中,增加对拜占庭故障的容忍度是一个复杂的问题,因为系统需要在存在这些不可信任节点的情况下仍然能够达成共识和正常运行。虽然已经有一些算法(如实用拜占庭容错算法PBFT)可以在一定程度上解决这个问题,但在大规模、高效率和安全性之间找到平衡仍然是一个开放的研究领域。

Incremental Deployment. While we have described dSDN as a standalone or clean-slate design, we see a natural path for incremental deployment which is that we initially deploy dSDN as an alternative to existing “underlay” protocols such as IS-IS. In this model, we retain cSDN as our primary controller while dSDN acts as the backup. This requires only a software upgrade to existing routers and can be initially deployed in a single, e.g., edge, region of the network.

The benefit of even this first step is a better performing underlay (since TE implements capacity-aware path selection while IS-IS does not) and a coherent architecture (since both cSDN and dSDN use similar operator-defined logic).

In the next stage of deployment, one might reverse the role of cSDN and dSDN, with cSDN-programmed routes only used as backup and ultimately leveraging a streamlined form of cSDN infrastructure primarily for monitoring and management purposes, rather than on the critical path for control decisions.

增量部署。虽然我们将dSDN描述为一个独立或全新的设计, 但我们看到了一条自然的增量部署路径, 即最初将dSDN部署为现有"底层"协议(如IS-IS)的替代方案。在这种模式下, 我们保留cSDN作为主要控制器, 而dSDN作为备份。这只需要对现有路由器进行软件升级, 并且最初可以部署在网络的单个区域(例如边缘区域)。即使是这第一步也能带来以下好处:

  • 性能更好的底层网络 (因为TE实现了容量感知的路径选择, 而IS-IS没有)
  • 一致的架构 (因为cSDN和dSDN都使用类似的运营商定义逻辑)

在部署的下一阶段, 可以反转cSDN和dSDN的角色, 将cSDN编程的路由仅用作备份, 最终利用简化形式的cSDN基础设施主要用于监控和管理目的, 而不是作为控制决策的关键路径。

Two Steps
  1. Retain cSDN as our primary controller while dSDN acts as the backup
  2. Reverse the role of cSDN and dSDN, with cSDN-programmed routes only used as backup

In the end, dSDN is widely used and cSDN infrastructure primarily for monitoring and management purposes

Upgrades. dSDN assumes each controller solves the global TE problem consistently, but since the controller code is operator-defined, we also expect that deployed dSDN code will be updated more frequently than vendor code. This raises the question of how different versions of dSDN’s TE algorithm can coexist in the network, for example during rollouts of software updates. In our existing cSDN infrastructure we find that updates qualitatively changing the TE algorithm are far less common than those that change other internals of the controller (e.g., improving efficiency, adding new functionality, etc.). To support arbitrarily extending dSDN functionality, exchanging additional information can be done by opaquely extending the NSU with additional custom fields, similarly to how IS-IS is extended to carry arbitrary additional information via TLVs [39].

升级。dSDN假设每个控制器都能一致地解决全局TE问题, 但由于控制器代码是由运营商定义的, 我们预计部署的dSDN代码会比供应商代码更频繁地更新。这就提出了一个问题: dSDN的TE算法的不同版本如何在网络中共存, 例如在软件更新的推出过程中。在我们现有的cSDN基础设施中, 我们发现对TE算法进行定性更改的更新远不如那些改变控制器其他内部结构 (例如,提高效率、添加新功能等) 的更新常见。为了支持任意扩展dSDN功能, 可以通过不透明地扩展NSU添加自定义字段来交换额外信息, 类似于IS-IS通过TLVs携带任意附加信息的方式。

When updates to the algorithm do need to occur, we note that source routing ensures forwarding correctness is maintained regardless – packets will always take the path decided by the head-end, and thus be loop-free. The principal concern is rather congestion during the upgrade process due to the upgraded and old controllers “mispredicting” each other’s traffic placement. We anticipate that algorithm designers will evaluate this via simulation or emulation before deployment of changes. If such congestion is of concern, network operators can allow controllers to account for what algorithm each other controller is using in their solver. For example, in a network with three routers, if router A places traffic using capacity-oblivious shortest-path while B and C use a TE algorithm, routers B and C can first compute the paths A will place its traffic on, then run their TE algorithm for the remaining traffic placement. The information of which algorithm each router is using can be included in the flooded NSUs. Alternatively, existing techniques such as carefully ordering upgrades or leaving scratch space to absorb such congestion [25, 31] can be used. We leave an in-depth evaluation of this to future work.

当需要对算法进行更新时, 我们注意到源路由确保了转发的正确性得以维持 —— 数据包将始终采用首端决定的路径, 因此不会出现循环。主要的担忧是在升级过程中由于升级后和旧的控制器"错误预测"彼此的流量放置而导致的拥塞。我们预计算法设计者将在部署更改之前通过模拟或仿真来评估这一点。如果这种拥塞令人担忧, 网络运营商可以允许控制器在其求解器中考虑每个其他控制器使用的算法。例如, 在一个有三个路由器的网络中, 如果路由器A使用不考虑容量的最短路径放置流量, 而B和C使用TE算法, 那么路由器B和C可以首先计算A将其流量放置的路径, 然后为剩余的流量放置运行它们的TE算法。每个路由器使用的算法信息可以包含在泛洪的NSU中。或者, 可以使用现有技术, 如仔细排序升级或留出空间以吸收这种拥塞[25,31]。我们将这一深入评估留给未来的工作。

dSDN的TE算法的不同版本如何在网络中共存
  1. Fact: 对TE算法进行定性更改的更新远不如那些改变控制器其他内部结构 (例如,提高效率、添加新功能等) 的更新常见
    • 可以通过不透明地扩展NSU添加自定义字段来交换额外信息
  2. 当需要对算法进行更新
    • principal concern: congestion during the upgrade process due to the upgraded and old controllers "mispredicting" each other’s traffic placement
    • 算法设计人应当主动模拟/仿真评估:网络运营商可以允许控制器在其求解器中考虑每个其他控制器使用的算法

Implementation

We prototyped the above design in approximately 22,000 lines of Go for the controller itself and 6,800 lines of C++ for the TE algorithm implementation, and have been running this prototype on production-grade Arista routers in our test lab. Figure 6 shows the prototype’s system architecture.

alt text

alt text

System Modularity. We split the TE solver and controller into independent containers, with the former exposing a Solve API that takes the network state and demands and returns paths. This separation allows the TE algorithm to be easily replaced, implemented in a different language, or even migrated off-box onto an adjacent server if further computational resources are required.

系统模块化。我们将TE求解器和控制器分割成独立的容器,前者暴露一个Solve API,该API接收网络状态和需求,并返回路径。这种分离允许TE算法被轻易替换、用不同的语言实现,或者如果需要更多计算资源,甚至可以迁移到相邻的服务器上。

The controller itself is modular, with standalone components that communicate via pub-sub bus as shown in Figure 6. Communication with other dSDN nodes, including neighbor discovery and flooding NSUs, is handled by the NodeStateExchange module. The StateDB module combines this stream of external updates with local system readings taken by the LocalState module to produce a global network view in what we call the NodeStateDB. The Pathing module uses this view to compute a solution by calling the TE Solver container, and the Programmer uses this solution to program paths into the hardware’s forwarding tables. Additional supporting modules provide interfaces for monitoring internal state, debugging, and configuration purposes.

gRPC Communication. The dSDN controller uses gRPC for all external communication; gRPC entirely abstracts the data layer (e.g., packet size management, data chunking) and ensures reliable transfer [61]. To avoid requiring an address-discovery system, dSDN further relies on IPv6 link local addressing and establishes a well-known dSDN port. In contrast to protocols like IS-IS, this design allows dSDN to cleanly isolate routing from communication details.

gRPC通信。dSDN控制器使用gRPC进行所有外部通信;gRPC完全抽象了数据层(例如,数据包大小管理、数据分块)并确保可靠传输。为了避免需要地址才能发现系统,dSDN进一步依赖于IPv6链路本地寻址,并建立一个众所周知的dSDN端口。与IS-IS等协议相比,这种设计允许dSDN清晰地将路由与通信细节隔离开来。

gRPC

dSDN (分布式软件定义网络)系统简化网络地址发现和通信的方式:

  1. IPv6链路本地地址:

    • IPv6链路本地地址是一种特殊的IPv6地址, 前缀为fe80::/10。
    • 这些地址==只在同一网络链路上有效==, 不会被路由器转发到其他网络。
    • 每个启用IPv6的网络接口都会自动配置一个链路本地地址。
  2. 避免地址发现系统:

    • 传统网络中, 设备通常需要通过DHCP等协议来获取IP地址。
    • 使==用IPv6链路本地地址, 设备可以在没有任何配置的情况下就能获得一个可用的地址==。
  3. 建立已知的dSDN端口:

    • dSDN系统为其通信指定了一个固定的端口号。
    • 所有dSDN设备都知道并使用这个端口进行通信。
  4. 简化通信过程:

    • 通过结合使用IPv6链路本地地址和固定的dSDN端口, 系统可以在不需要复杂的地址发现机制的情况下建立通信。
    • 设备只需要知道对方的 链路本地地址dSDN端口 就可以开始通信
  5. 优势:

    • 简化了网络配置过程
    • 减少了对中央地址管理系统的依赖
    • 提高了网络的自组织能力
    • 适合于动态变化的网络环境

总之, 这种设计通过利用IPv6的特性和预定义的通信端口, 使dSDN系统能够以一种简单、自动化的方式建立网络通信, 而无需复杂的地址发现和配置过程。

IPv6 Link-Local Addresses

不复杂的地址发现机制主要指的是 IPv6中使用的链路本地地址(Link-Local Address)邻居发现协议(Neighbor Discovery Protocol, NDP)

这种机制相比传统的IPv4地址发现要简单得多,主要有以下几个特点:

  1. 自动配置: IPv6设备可以自动生成链路本地地址,无需手动配置或DHCP服务器。
  2. 无需额外基础设施: 不需要像DHCP那样的专门服务器来分配地址。
  3. 即插即用: 设备连接到网络后可以立即使用链路本地地址进行通信,无需等待地址分配。
  4. 邻居发现: 通过NDP,设备可以自动发现同一链路上的其他设备,无需手动配置。
  5. 地址冲突检测: NDP包含了重复地址检测(DAD)机制,可以自动检测并解决地址冲突。
  6. 路由器发现: 设备可以自动发现本地网段的路由器,无需手动配置默认网关。
  7. 参数发现: 可以自动获取网络前缀等参数,简化了网络配置。
  8. 有限的作用域: 链路本地地址只在本地链路有效,不会被路由到其他网段,简化了地址管理。

这种机制使得设备可以在没有任何预先配置的情况下, 仅通过物理连接就能快速建立本地网络通信, 大大简化了网络配置和管理的复杂度。

如果基于本地IPv6地址的话,dSDN如何跨网段通信

虽然链路本地地址允许设备在同一网段内立即通信, 但dSDN确实需要一种机制来实现跨网段通信。以下是dSDN可能采用的方法:

  1. 全局单播地址分配: 除了链路本地地址,dSDN控制器还会为每个设备分配一个全局单播地址。这个地址可以通过DHCP或其他自动配置机制获得。
  2. 路由信息交换: dSDN控制器之间会交换路由信息,包括各自管理的网段和可达性信息。这样每个控制器都能构建完整的网络拓扑视图。
  3. 源路由: 如原文所述,dSDN使用源路由机制。当需要跨网段通信时,源设备的dSDN控制器会计算出完整的路径,并将这个路径信息编码在数据包头中。
  4. 隧道技术: 对于无法直接路由的情况,dSDN可能会使用隧道技术(如IP-in-IP或GRE)来封装和转发跨网段的流量。
  5. 边界网关: 在不同网段之间,可能会部署dSDN边界网关,负责处理跨网段的路由和转发。
  6. 分层路由: dSDN可能采用分层路由策略,将网络划分为多个区域,区域内使用更详细的路由信息,区域间使用聚合路由。
  7. 动态地址学习: dSDN控制器可以动态学习和更新网络中的地址信息,确保跨网段通信的正确性。

通过这些机制, dSDN能够在利用链路本地地址快速建立本地通信的同时, 也实现高效的跨网段通信。

细节解析

Design

We split the TE solver and controller into independent containers, with the former exposing a Solve API that takes the network state and demands and returns paths.

Details

  1. Communication with other dSDN nodes, including neighbor discovery and flooding NSUs, is handled by the NodeStateExchange module.
  2. The StateDB module combines this stream of external updates with local system readings taken by the LocalState module to produce a global network view in what we call the NodeStateDB.
  3. The Pathing module uses this view to compute a solution by calling the TE Solver container, and the Programmer uses this solution to program paths into the hardware’s forwarding tables.
  4. Additional supporting modules provide interfaces for monitoring internal state, debugging, and configuration purposes.