跳转至

ROAMING SETUP AND PERFORMANCE

TL;DR
  1. 均采用归属路由(Home-Routed, HR)方式
    • 影响1: 漫游用户访问互联网的出口点始终是其归属网络,引入额外延迟
  2. 漫游用户通常使用其归属网络的DNS服务器(与HR一致)
    • 影响1: 漫游用户的DNS查询时间显著高于归属用户
    • 影响2: 影响CDN(内容分发网络)的副本选择,用户可能被导向归属网络的CDN节点而非更近的本地缓存
  3. 国际漫游对HTTP/HTTPS性能造成负面影响

Measurements

We run a series of measurements that enable us to identify the roaming setup, infer the network configuration for the 16 MNOs that we measure and quantify the end-user performance for the roaming configurations which we detect. We run traceroute for path discovery, dig for Domain Name Service (DNS) lookups and curl for testing data transfers with popular URLs. We complement this analysis with metadata (e.g., radio access technology, signal strength parameters) collected from each node.

For each MNO, we measure in parallel the roaming user, the home user and the visited user (see § 2 for terminology) through the MONROE-Roaming scheduler. In this way, we are able to capture potential performance penalties that might result, for example, from roaming internationally under a home-routed configuration. We performed measurements using both 3G and 4G networks to evaluate the impact of potentially different configurations for the two radio access technologies.

Next, we describe each measurement test and its resulting dataset in more details.

traceroute: We run periodic traceroute measurements against all the servers we deploy in each country as measurement responders. We repeat the measurements ten times towards each target. The resulting dataset lists the set of IP hops along the data paths from each vantage point towards each measurement responder. Additionally, we collect the public mapped IP address for each vantage point (i.e., the IP endpoint associated with the mobile client as seen from the public Internet).

dig: We run the dig utility for DNS lookups against a list of 180 target Fully Qualified Domain Names (FQDNs) mapped to advertisement services. We use the independent filter lists from https://filterlists.com to build the list of targets. We focus on ad services because this type of third party services inflate significantly performance metrics of web services (e.g., page load time), as well as impact the web experience of mobile users [14]. Thus, it is important to captute (and potentially eliminate) any additional delay penalty that might impact how fast a roaming user receives this type of content. Each experiment uses the default DNS server for the tested MNO and queries for the A record associated to each of the target FQDNs. We store the entire output of each dig query, including the query time, the DNS server used and the A record retrieved. We repeat the dig queries 2 times for each FQDN from each vantage point, for a total of more than 2,000 queries per round.

curl: We run curl towards a set of 10 target popular webpages 3 over HTTP1.1/TLS. We repeat the measurements towards each URL at least 10 times (increasing the sample size if the SIM data quota allows it). We store various metrics, including the download speed, the size of the download, the total time of the test, the time to first byte, the name lookup time (query time) and the handshake time.

metadata: We collect contextual information from the nodes, including the visited network Mobile Country Code (MCC) / Mobile Network Code (MNC) for each roaming SIM and the radio technology. This allows us to verify which visited network each roaming SIM uses as well as to identify and separate the collected data by radio technology.

Roaming configuration

Our initial goal is to determine the roaming setup for each MNO (i.e., whether it used LBO, HR or IHBO). For this, we determine the MNO that allocates the public IP address of the roaming SIM. Our results show that HR was used by all 16 MNOs from all the different roaming locations we capture. We further corroborate this result by retrieving the first hop replying with a public IP address along the data path from a roaming SIM to each server and identifying the MNO that owns it. We find that the first hop with a public IP address along the path lies in the original home network of each roaming SIM, which is consistent with HR.

Next, we evaluate the following performance metrics for each roaming SIM, home SIM and visited SIM: (i) the number of visited networks we observe for the roaming SIM, (ii) the number of hops from vantage point to target measurement server, (iii) the number of home network PGWs that the roaming SIMs reach in comparison with the home network SIMs.

Visited network selection: The metadata we collect during the measurement campaign for each MNO enables us to verify the visited network that each roaming user camps on in the visited country. In general, we note stability both in 4G roaming and 3G roaming in the selection of the visited network (Table 2) in the five roaming locations. We also observe some differences between MNOs. For example, for Telekom DE, the 4G visited network chosen by each roaming SIM never changed during the measurement campaign, even when we forced the radio technology handover. This is consistent for all the five roaming locations. For O2 DE, on the other hand, the default 4G visited network did change over time for the SIMs roaming in Italy (3 visited networks), Norway (3 visited networks), and Sweden (2 visited networks). However, it should be noted that the length of the measurement period varies for each MNO, as it is impacted by multiple external factors (e.g., at times some of our measurement responders were affected by power outages or some SIM cards were not connecting to the 4G network due to poor coverage). This may explain or influence part of the differences observed between the MNOs.

Traceroutes, number of hops: We analyze our collected traceroute results from the roaming SIMs and compare with the traceroute results we collect from the corresponding home SIM towards the same target server. For all MNOs we find that the number of hops is the same. 4 This is consistent with the HR configuration (Fig. 1), where the GTP tunnel is defined between the SGW of the visited network and the PGW of the home network.

Traceroutes, infrastructure: By learning the IP addresses of the infrastructure elements along the data path, we are able to infer aspects of the infrastructure deployment strategy of each MNO. In particular, by checking the IP address of the first hop in the path (Table 2), we find that MNOs have different strategies in terms of their deployments. We note that the first hops have an even distribution on their assignation to mobile users, showing that the MNOs have a similar approach for load balancing in their network. For example, for O2 DE we find 20 different first hops, suggesting that there might be a large number of PGWs deployed in the LTE infrastructure, while for Vodafone UK we see that the same first hop appears on the data path, suggesting that the GTP tunnels of all our roaming users is terminated at a single PGW. We also note that although for the majority of MNOs, these hops are configured with private address space, three operators (Telekom DE, Telenor NO and Telenor SE) use public address space for their infrastructure. The last column in Table 2 details the breakdown of measurements among the number of different first hop IP addresses found. In some cases, a clear bias exists.

Finally, we verify that the set of first hops for roaming SIMs is the same as the set we observe from the home SIMs. This suggests that the roaming SIMs do not receive any differential treatment in terms of allocation to the PGWs. This is consistent for all MNOs we measure. Furthermore, when checking the 3G data paths, we find that the set of IP addresses we see in 3G is a subset of the set of IP addresses we see in 4G, suggesting that the two functions are co-located in the same PGW [20]. We also check the time when the first IP address was used. We discover that all the PGWs are active in the same time. Multiple first IP addresses can be used at different time. We further contacted 3 MNOs and the information they provided about their network confirms our findings.

Home-Routed Roaming: Implications

Delay implications: The HR data implies that the roaming user’s exit point to the Internet is always in the original home network (Fig. 1). Thus, the data that the roaming user consumes always flow through the home network. Depending on the location of the server, this translates to a potential delay penalty. Fig. 4 shows the ECDF of the RTT we measured between the roaming SIMs and the target servers located in the visited or home networks (red and green curves, respectively). To compare the HR with the LBO configuration, we also include the RTT measurements between the visited SIMs against the same targets in the visited or home networks (blue and purple curves, respectively). The RTTs experienced by the visited SIMs serve as estimates of the best RTTs that one could expect with a LBO configuration, since LBO relies on access to local infrastructure with no need for tunnelling back to the home network. We note that the largest delay penalty occurs when the roaming user tries to access a server located in the visited country. This is because the packets must go back and forth from the home network. Surprisingly, we note that the HR configuration also impacts the case when the roaming user accesses a target server located in the home network. That is, the GTP tunnel is slower than the native Internet path. In this case, the median value of the delay penalty considering all the MNOs is approximately 17ms. This varies across MNOs and in some cases we observe very low penalties (e.g., just 0.2ms for O2 Germany).

We investigate this performance impact further and calculate the estimated delay penalty between LBO and HR when the target is in the visited network. In more detail, we compute the delay penalty as the difference between the median delay to reach a given server when roaming, and the median delay to reach the same server from home. Fig. 5a exemplifies these median values for Vodafone Germany. We note that, in general, the delay penalty varies widely with the geographical location of the roaming users and the target servers. For example, when a German SIM roams in Spain, the difference in terms of RTT is higher if the server is in the visited country (i.e., Spain) (red curve in Fig. 4). If the German SIM roams in Spain or Italy and the target server is in Norway or Sweden the delay penalty of the roaming is smaller, since to go to Norway or Sweden the data path would anyway likely pass through Germany (and this is similar to the delay one would have because of the HR configuration).

We then evaluate the RTT difference between the roaming SIM and the visited SIM towards the same target and we group them per MNO. Fig. 5b shows the median value of the delay penalty of an MNO (on the x axis of the tile plot) while roaming against each of the six different servers (on the y axis of the tile plot, marked by country). We note that the delay penalty varies as a function of the location of the home country. For example, German SIMs experience a lower delay penalty, which is potentially due to them being in an advantageous position in the center of Europe.

DNS implications: The results of the dig measurements show that the DNS server offered to a roaming user is the same as the one offered when at home. This is again consistent with the use of HR. We verify whether this translates into an inflated query time for the roaming user. Fig. 5c presents the distribution of DNS query times for all the SIMs of TIM IT. We note that for the home user the query time is significantly lower in average than for the other five roaming users. This is consistent for all the 16 MNOs we measured. This further translates into implications in terms of CDN replica selection: the roaming user would be likely redirected to CDN content at its home network, and will not be able to access the same content from a local cache (which would in any case result in facing a higher delay due to the home routing policy).

HTTP performance implications: Similar to the delay and DNS implications, international roaming affects HTTP and HTTPS performance. We quantify this penalty by considering the handshake time between each SIM and the target web servers. The median value of the handshake time from the visited SIMs towards all the targets we measure is 170ms, while the median value for the roaming SIMs is 230ms. This leads to a delay penalty of approximately 60ms. As in the cases before, some MNOs are affected more by this roaming effect than others.

TL; DR

  • 测量方法:
    • 通过traceroute(路径发现)、dig(DNS查询,针对180个广告服务域名)和curl(HTTP/TLS数据传输,针对10个热门网页)等工具进行测量
    • 收集元数据,如无线接入技术(3G/4G)、信号强度、到访网络识别码(MCC/MNC)
    • 对漫游用户、归属用户和到访用户在3G和4G网络下并行进行性能测量
  • 漫游配置发现:
    • 普遍采用归属路由(HR):
      • 所有16个被测MNO在其所有漫游位置均采用归属路由(Home-Routed, HR)方式
      • 这意味着漫游用户的数据流量会先路由回其归属网络,再接入互联网
    • 到访网络选择:
      • 漫游时选择的到访网络通常较为稳定,但也观察到部分MNO(如O2 DE)的漫游SIM卡在不同时间会连接到不同的到访网络
    • 路径与基础设施(Traceroute分析):
      • 漫游SIM卡与归属SIM卡(访问同一目标时)的路径跳数相同,符合HR配置 (说明面向广域互联网的出口是同一个)
      • 不同MNO的PGW(分组数据网关)部署策略各异(有的PGW数量多,有的集中)
      • 漫游SIM卡与归属SIM卡使用相同的PGW集合,未见针对漫游用户的差异化PGW分配
      • 3G网络的IP路径是4G路径的子集,暗示PGW功能共存
  • 归属路由(HR)的影响:
    • 延迟影响:
      • HR导致漫游用户访问互联网的出口点始终是其归属网络,这会引入额外延迟
      • 访问位于“到访国”的服务器时延迟惩罚最大(数据需往返归属国)
      • 访问“归属国”的服务器时也存在延迟(GTP隧道通常比本地路径慢,中位数延迟约17ms)
      • 延迟惩罚的大小与用户及服务器的地理位置密切相关(例如,位于欧洲中部的德国SIM卡漫游时延迟惩罚相对较低)
    • DNS影响:
      • 漫游用户通常使用其归属网络的DNS服务器(与HR一致)
      • 导致漫游用户的DNS查询时间显著高于归属用户
      • 影响CDN(内容分发网络)的副本选择,用户可能被导向归属网络的CDN节点而非更近的本地缓存,增加访问延迟
    • HTTP性能影响:
      • 国际漫游对HTTP/HTTPS性能造成负面影响
      • 漫游SIM卡的TLS握手时间中位数(230ms)显著高于到访SIM卡(170ms),约有60ms的额外延迟
      • 不同MNO受此影响程度不同