MobileInsight: Extracting and Analyzing Cellular Network Information on Smartphones¶

(1) 研究背景与动机

黑盒问题: 蜂窝网络(Cellular Network)是一个庞大且封闭的基础设施. 普通用户和开发者无法访问底层的运行时协议操作, 只能通过有限的 Socket API 传输数据
现有工具的局限性: 现有的调试工具(如 QXDM, XCAL)通常需要昂贵的专业硬件, 仅限于 PC 端使用, 或者只能提供粗粒度的信息(如 Android API)
目标: 开发一个无需额外硬件, 无需运营商支持, 运行在商用现成智能手机(COTS)上的软件工具, 用于监测和分析移动网络协议

(2) 系统架构

MobileInsight 作为用户空间的 Service 运行, 核心由两部分组成:

[1] 监测器 (Monitor): 数据收集与解析

侧信道利用 (Side Channel): 利用芯片组暴露的外部诊断模式(External Diagnostic Mode), 通过虚拟设备(如 /dev/diag)直接获取底层的原始二进制日志
解析器 (Parser): 支持将原始 Hex 日志解码为标准化的蜂窝消息
覆盖范围: 支持 3G/4G 的控制平面协议(RRC, EMM, ESM, GMM, CM/SM)以及部分 L1/L2 层协议(PHY, MAC, RLC, PDCP)
优化: 采用按需收集(On-demand collection)和并行处理来降低开销

[2] 分析器 (Analyzer): 协议行为推断

状态机推断 (Protocol State Dynamics): 基于 3GPP 标准构建参考状态机, 利用运行时消息追踪设备的协议状态(如 RRC 连接状态, 移动性管理状态)
操作逻辑推断 (Operation Logic Inference): 推断网络侧的决策逻辑(如基站的切换策略 Handoff Policy)
算法: 将切换逻辑建模为有限状态机, 利用在线算法(Adaptation of QSM algorithm)聚合测量报告和切换命令来推断运营商的策略

(3) 性能与评估

兼容性: 在 30 款不同型号的手机(主要为 Qualcomm 芯片组)和 8 个运营商(包括 AT&T, Verizon, T-Mobile, China Mobile 等)上进行了验证
开销: CPU 占用率仅为 1-7%, 内存占用低于 30MB, 额外功耗在 3-4% 以内
准确性: 处理 99% 的信令消息延迟低于 0.8ms. 切换预测的准确率在 87.5% 到 95.3% 之间

Introduction¶

The cellular network is a “closed” yet critical infrastructure. On one hand, mobile users are increasingly accessing online services through their 3G/4G networks on their smart devices (e.g., smartphones and tablets). The resulting data volume has contributed to 88% of global mobile traffic now, and is projected to reach 97% by 2019, with a tenfold traffic growth [1]. On the other hand, users and devices have very limited access to their runtime operations on all cellular protocols. Mobile applications transfer data through the cellular interface via the socket API. Beyond that, the network itself largely remains a blackbox to users.

The lack of open access into fine-grained runtime network operations creates barriers for researchers and developers to accurately understand and refine how cellular protocols operate at the device and inside the network. For example, the device experiences a handoff on the go but has no clue on why it is triggered and whether it is a good decision. In reality, the device has been observed to hand over and get stuck in 2G even when 4G is available. Another real-life instance is that it is not uncommon to take long time to upload a photo or experience a failed call via 4G. It is not clear whether it is caused by poor radio quality or network protocol issues. The list goes on and long.

For a rather closed, large-scale cellular network system, we need both data and analytics to identify problems and renovate the design. This calls for a community tool suite that can be built and shared together. Such tool could stimulate research on the 3G/4G mobile networks. Ideally, the tool should possess three features simultaneously: (1) It can collect runtime operation traces using commercial off-the-shelf (COTS) devices without extra hardware support; (2) Given the data traces, it provides analytics to extract dynamic protocol behaviors for both common usage settings and abnormal failure cases; (3) The tool offers simple APIs to build applications and the framework can be readily extended. Unfortunately, no such community software tools are available to date. The existing ones cannot meet all three requirements [2–7] (see Table 14 for a comparison summary). Furthermore, operators are reluctant to release the traces collected from the infrastructure side. They also have limited access to the device-side operations.

In this paper, we take the first step to develop MOBILE INSIGHT 1 , a software toolwhich enables runtime cellular network monitoring and analytics on COTS smartphones. It aims to satisfy three features above; We seek to overcome the barrier by providing open access (in software) to fine-grained cellular information on 3G/4G protocols; We empower in-device analytics, which not only disclose what happens but also shed light on why and how. The tool is intended as an open platform that is extensible. The goal is to facilitate researchers and developers to readily and quickly obtain the low-level network information through easy-to-use APIs.

In a nutshell, MOBILE INSIGHT runs as a user-space service on COTS smartphones (root access required for some phone models). It does not require any extra support from operators, or additional hardware (USRP, PC or testing equipments). MOBILE INSIGHT leverages a side channel inside the COTS smartphones, and extracts cellular operations from signaling messages between the device and the network. These control-plane messages regulate essential utility functions of radio access, mobility management, security, data/voice service quality, to name a few. Given these messages, it further enables in-device analytics for cellular protocols. We not only infer runtime protocol state machines and dynamics on the device side, but also infer protocol operation logic (e.g., handoff policy from the carrier) from the network. MOBILE INSIGHT offers a simple API for use and extension. With its simple API, we further describe three example cases to show how MOBILE INSIGHT can be used. These showcases demonstrate how MOBILE INSIGHT benefits end devices in a variety of scenarios, including failure diagnosis, performance improvement, and security loophole detection.

We have implemented MOBILE INSIGHT on Android phones with Qualcomm chipsets (feasibility on iPhones and nonQualcomm chipsets has been also validated). It currently supports all 3G/4G control-plane protocols for radio resource control, mobility management and session management, as well as certain 4G below-IP protocols that convey control information. We have tested MOBILE INSIGHT with 8 carriers: four US carriers (Verizon, AT&T, T-Mobile, Sprint) plus Project Fi [8], and three Chinese operators, using 30 phones from 11 phone models. Our evaluation shows that, MOBILE INSIGHT works well on both high-end and low-end phones using different cellular chipsets and OSes. It logs cellular events and executes analysis with acceptable overhead. It can accurately infer protocol state-machines and operation logics and process 99% of cellular signaling messages in less than 0.8 ms in most cases. The tool itself consumes 1–7% CPU usage, 30MB memory and 3–4% extra energy at maximum. The code and application, as well as the traces we collect through MOBILE INSIGHT, are available to the research community 1 .

This work has three main contributions.

We present MOBILE INSIGHT, an in-device software tool to monitor, analyze and exploit runtime cellular protocol information on COTS smartphones;
We devise side-channel techniques to collect signaling messages for 3G/4G protocols, and design inference techniques to analyze protocol state dynamics and operation logic;
We conduct extensive tests to assess its effectiveness, and build three showcase examples to demonstrate its potential of wide applicability.

蜂窝网络(Cellular Network)是一个"封闭"但至关重要的基础设施. 一方面, 移动用户越来越多地通过智能设备(如智能手机和平板电脑)上的 3G/4G 网络访问在线服务. 由此产生的数据量目前已占全球移动流量的 88%, 预计到 2019 年将达到 97%, 实现十倍的流量增长. 另一方面, 用户和设备对其所有蜂窝协议的运行时操作(Runtime Operations)的访问权限非常有限. 移动应用程序通过套接字 API(Socket API)经由蜂窝接口传输数据. 除此之外, 网络本身对用户来说很大程度上仍然是一个黑盒.

缺乏对细粒度运行时网络操作的开放访问, 为研究人员和开发人员准确理解和优化蜂窝协议在设备端及网络内部的运行机制构成了障碍:

例如, 设备在移动过程中经历切换(Handoff), 但无法知晓其触发原因以及该决策是否合理.

实际上, 据观察, 即使在有 4G 信号的情况下, 设备也可能切换并滞留在 2G 网络中.

另一个现实案例是, 通过 4G 上传照片耗时过长或通话失败的情况并不少见.

目前尚不清楚这是由糟糕的无线电质量还是网络协议问题引起的.

此类例子不胜枚举.

对于一个相对封闭的大规模蜂窝网络系统, 我们需要数据和分析来识别问题并革新设计. 这呼吁建立一个可以共同构建和共享的社区工具套件. 这种工具可以促进对 3G/4G 移动网络的研究.

理想情况下, 该工具应同时具备三个特征:

能够在无需额外硬件支持的情况下, 使用商用现成(COTS)设备收集运行时操作轨迹
给定数据轨迹, 它能提供分析功能, 以提取常见使用场景和异常故障情况下的动态协议行为
该工具提供简单的 API 来构建应用程序, 且框架易于扩展

遗憾的是, 迄今为止尚无此类社区软件工具可用. 现有的工具无法满足所有三个要求(比较摘要见表 14). 此外, 运营商不愿发布从基础设施侧收集的轨迹. 他们对设备侧操作的访问权限也有限.

在本文中, 我们迈出了开发 MobileInsight 的第一步, 这是一款在 COTS 智能手机上实现运行时蜂窝网络监测和分析的软件工具. 它旨在满足上述三个特征; 我们试图通过提供对 3G/4G 协议细粒度蜂窝信息的开放访问(软件层面)来克服障碍; 我们赋能设备内分析, 不仅揭示发生了什么, 还阐明为什么发生以及如何发生. 该工具旨在成为一个可扩展的开放平台. 其目标是促进研究人员和开发人员通过易于使用的 API, 随时且快速地获取底层网络信息.

简而言之, MobileInsight 作为用户空间服务运行在 COTS 智能手机上(部分机型需要 Root 权限). 它不需要运营商的任何额外支持, 也不需要额外的硬件(如 USRP, PC 或测试设备). MobileInsight 利用 COTS 智能手机内部的侧信道(Side Channel), 从设备与网络之间的信令消息中提取蜂窝操作信息. 这些控制平面消息调控着无线接入, 移动性管理, 安全性, 数据/语音服务质量等基本功能. 基于这些消息, 它进一步实现了蜂窝协议的设备内分析. 我们不仅推断设备侧的运行时协议状态机和动态, 还从网络侧推断协议操作逻辑(例如运营商的切换策略). MobileInsight 提供了简单的 API 以供使用和扩展. 利用其简单的 API, 我们进一步描述了三个示例案例, 以展示如何使用 MobileInsight. 这些展示案例论证了 MobileInsight 如何在多种场景下造福终端设备, 包括故障诊断, 性能改进和安全漏洞检测.

我们已经在搭载高通(Qualcomm)芯片组的 Android 手机上实现了 MobileInsight(在 iPhone 和非高通芯片组上的可行性也已得到验证). 目前, 它支持用于 无线资源控制(RRC), 移动性管理(MM)和会话管理(SM)的所有 3G/4G 控制平面协议, 以及某些传递控制信息的 4G IP 层以下协议

我们在 8 家运营商的网络中测试了 MobileInsight: 四家美国运营商(Verizon, AT&T, T-Mobile, Sprint)加上 Project Fi [8], 以及三家中国运营商, 共使用了 11 种型号的 30 部手机. 我们的评估表明, MobileInsight 在使用不同蜂窝芯片组和操作系统的无论是高端还是低端手机上都运行良好. 它记录蜂窝事件并执行分析, 且开销在可接受范围内. 它可以准确地推断协议状态机和操作逻辑, 并在大多数情况下于 0.8 毫秒内处理 99% 的蜂窝信令消息. 该工具本身最多消耗 1-7% 的 CPU 使用率, 30MB 内存和 3-4% 的额外能耗. 代码和应用程序, 以及我们通过 MobileInsight 收集的轨迹, 均向研究社区开放.

这项工作主要有三个贡献:

我们提出了 MobileInsight, 这是一种在 COTS 智能手机上监测, 分析和利用运行时蜂窝协议信息的设备内软件工具
我们设计了侧信道技术来收集 3G/4G 协议的信令消息, 并设计了推断技术来分析协议状态动态和操作逻辑
我们进行了广泛的测试以评估其有效性, 并构建了三个展示示例来证明其广泛适用的潜力

Cellular Network Primer¶

Figure 1 (left) illustrates a simplified cellular network architecture. The mobile device (i.e., smartphone) connects to the Internet or telephony network through the base stations and the cellular core network. Both control-plane and data-plane (i.e., user plane) operations are needed to receive data/voice services. The data plane delivers user content (data/voice), whereas the control plane exchanges signaling messages to facilitate content delivery.

Cellular network protocol stack. Figure 1 (middle) shows the cellular protocol stack at the device, which has three parts. The first is to enable radio access between the device and the base station. Physical (L1) and link (L2) functionalities, including PHY, MAC, RLC (Radio Link Control) and PDCP (Packet Data Convergence Protocol), are implemented. The second part is the control-plane protocols, which are split into access stratum (AS) and non-access stratum (NAS). The AS protocols regulate radio access through Radio Resource Control (RRC). RRC is mainly for radio resource allocation and radio connection management; it also helps to transfer signaling messages over the air. NAS is responsible for conveying non-radio signaling messages between the device and the core network. Two protocols of mobility management (MM) and session management (SM) also belong to the control plane. MM offers location updates and mobility support for call/data sessions, while SM is to create and mandate voice calls and data sessions. The last piece is the data-plane protocols above IP, which are not cellular specific but use the standard TCP/IP suite.

Table 1 lists the protocols studied in this work. Multiple variants exist for a common function (like RRC and NAS) in 3G and 4G. 3G also has variants for its circuit-switched (CS) and packetswitched (PS) domains for voice and data, respectively. L1/L2 protocols and control-plane protocols are generally cellular specific.

Limited in-device access through APIs. In commodity phones, the OS and mobile apps have limited access to low-level, cellularspecific information at runtime. As shown in Figure 1 (right), cellular-specific protocols (say, control-plane and L1/L2 protocols) are implemented within the chipset (e.g., Qualcomm Snapdragon and Samsung Exynos). As a result, cellular-specific information is mostly inaccessible to the software (for both kernel-space and user-space) in usual scenarios. The OS gets access to basic cellular functions and states (e.g., registration, dialing a voice call and enabling/disabling data) through the de facto radio interface layer (RIL) library which interacts with the cellular interface exposed by the chipset. The RIL implementation is vendor-specific, and relies on the standardized AT commands [19]. For ease of app development and permission control, the OS further encapsulates a subset of RIL library to APIs, e.g., TelephonyManager class for Android [6,7]. Some system services on specific phone models (e.g., FieldTestMode in Nexus 5 [20] and iPhone [21]) may directly access some, but not all cellular information from the RIL interface. Table 2 offers a sample of RIL commands [22], which exposes coarse-grained information (call/data/cell level) only.

Debugging tools, such as QXDM [2], XCAL [3], MTK Catcher [4], xgoldmon [23], can collect cellular network messages and offer fine-grained information. However, they all work with PCs, and do not offer in-device collection or protocol analytics (see §9 for more discussions).

图 1(左)展示了一个简化的蜂窝网络架构. 移动设备(即智能手机)通过基站和蜂窝核心网连接到互联网或电话网络. 为了接收数据/语音服务, 需要同时进行控制平面和数据平面(即用户平面)的操作. 数据平面负责传递用户内容(数据/语音), 而控制平面则交换信令消息以协助内容的传递.

alt text

蜂窝网络协议栈. 图 1(中)显示了设备端的蜂窝协议栈, 它由三部分组成:

alt text

第一部分用于实现设备与基站之间的无线接入. 该部分实现了物理层(L1)和链路层(L2)的功能, 包括 PHY, MAC, RLC(无线链路控制)和 PDCP(分组数据汇聚协议)

第二部分是控制平面协议, 分为接入层(AS)和非接入层(NAS). AS 协议通过无线资源控制(RRC)来调控无线接入

RRC 主要用于无线资源分配和无线连接管理; 它还有助于在空中接口传输信令消息
NAS 负责在设备和核心网之间传递非无线信令消息
移动性管理(MM)和会话管理(SM)这两个协议也属于控制平面
- MM 提供位置更新以及针对呼叫/数据会话的移动性支持
- SM 则用于创建和管理语音呼叫及数据会话

最后一部分是 IP 之上的数据平面协议, 它们并非蜂窝网络特有, 而是使用标准的 TCP/IP 套件

表 1 列出了本文研究的协议. 在 3G 和 4G 中, 常见的功能(如 RRC 和 NAS)存在多种变体. 3G 还有分别用于语音和数据的电路交换(CS)和分组交换(PS)域的变体. L1/L2 协议和控制平面协议通常是蜂窝网络特有的.

alt text

受限的设备内 API 访问. 在商用手机中, 操作系统和移动应用在运行时对底层, 蜂窝特定信息的访问权限非常有限. 如图 1(右)所示.

alt text

蜂窝特定协议(例如控制平面和 L1/L2 协议)是在 芯片组 (例如 Qualcomm Snapdragon 和 Samsung Exynos)内部实现的. 因此, 在通常场景下, 蜂窝特定信息对于软件(无论是内核空间还是用户空间)大多是不可访问的.

操作系统通过事实标准的 无线接口层(RIL)库 获取基本的蜂窝功能和状态(例如注册, 拨打语音电话和启用/禁用数据), 该库与芯片组暴露的蜂窝接口进行交互.

RIL 的实现是特定于厂商的, 并依赖于标准化的 AT 命令 [145].

为了便于应用开发和权限控制, 操作系统进一步将 RIL 库的子集封装为 API, 例如 Android 的 TelephonyManager 类 [146].

特定手机型号上的某些系统服务(例如 Nexus 5 和 iPhone 中的 FieldTestMode)可能直接从 RIL 接口访问部分而非全部蜂窝信息. 表 2 提供了一些 RIL 命令示例, 这些命令仅暴露粗粒度的信息(呼叫/数据/小区级别).

alt text

调试工具, 如 QXDM, XCAL, MTK Catcher, xgoldmon, 可以收集蜂窝网络消息并提供细粒度信息. 然而, 它们都需要配合 PC 使用, 并不提供设备内的收集或协议分析功能(更多讨论见 §9).

MOBILE INSIGHT Overview¶

MOBILE INSIGHT offers a pure software-based solution for indevice collection and analytics of cellular protocol information. It runs as a user-space service. It infers protocol operations and key configurations by exploiting messages exchanged between the device and the network at the hardware chipset. It supports fine-grained, per-message information retrieval and analysis from a set of cellular-specific protocols on the control plane and at lower layers. It not only unveils what is going on with cellular-specific operations, but also sheds light on why and how. Specifically, MOBILE INSIGHT seeks to achieve three concrete goals.

• In-device deployability. It should be readily deployable in COTS phones without extra hardware or changes on the existing infrastructure or the device OS.

• Protocol analytics. In addition to archiving protocol messages, MOBILE INSIGHT should supplement analytics for standardized cellular protocols, including their state dynamics and operation logics. Ideally, the analysis is done at runtime, so that it can be used for various usages such as performance improvement and failure diagnosis.

• Fine granularity and wide coverage. It should provide fine-grained information to runtime protocol operations. Moreover, it should support protocols across layers and on both control and data planes.

Figure 2 illustrates the architecture of MOBILE INSIGHT, which has two main components.

(i) Monitor (§4). It first exposes raw cellular logs from the cellular interface to the device user-space at runtime, and then parses them into protocol messages and extracts their carried information elements. It builds an extensible modular framework, where each parser works on a per-protocol basis. The parsed messages are then fed to the analyzer.

(ii) Analyzer (§5). Given the extracted messages, the analyzer aims to unveil protocol dynamics and operation logics. Based on the observed messages and the anticipated behavior model (from cellular domain knowledge), the analyzer infers protocol states, triggering conditions for state transitions, and protocol's taken actions. Moreover, it infers certain protocol operation logics (say, handoff) that uses the operator-defined policies and configurations. It offers built-in abstraction per protocol and allows for mobile OS/app developers to customize their analyzers.

MobileInsight 提供了一种纯软件解决方案, 用于在设备内对蜂窝协议信息进行采集和分析. 它作为用户空间(User-space)服务运行. 通过利用硬件芯片组处设备与网络之间交换的消息, 它推断协议操作和关键配置

它支持从控制平面和底层的一系列蜂窝特定协议中进行细粒度的, 逐消息的信息提取和分析. 它不仅揭示了蜂窝特定操作正在发生什么, 还阐明了其原因和方式

具体而言, MobileInsight 旨在实现三个具体目标:

设备内可部署性 (In-device deployability). 它应能轻易部署在商用现成(COTS)手机上, 且无需额外的硬件, 也无需更改现有基础设施或设备操作系统
协议分析 (Protocol analytics). 除了归档协议消息外, MobileInsight 还应补充针对标准化蜂窝协议的分析, 包括其状态动态和操作逻辑. 理想情况下, 分析是在运行时完成的, 以便用于各种用途, 如性能提升和故障诊断
细粒度与广覆盖 (Fine granularity and wide coverage). 它应为运行时协议操作提供细粒度信息. 此外, 它应支持跨层的协议, 以及控制平面和数据平面的协议

图 2 展示了 MobileInsight 的架构, 它包含两个主要组件:

alt text

(i) 监测器 (Monitor, §4)

它首先在运行时将原始蜂窝日志从蜂窝接口暴露给设备用户空间

然后将其解析为协议消息并提取其中携带的信息元

它构建了一个可扩展的模块化框架, 其中每个解析器基于特定协议工作

解析后的消息随后被传递给分析器

(ii) 分析器 (Analyzer, §5)

基于提取的消息, 分析器旨在揭示协议动态和操作逻辑

基于观测到的消息和预期行为模型(源自蜂窝领域知识), 分析器推断协议状态, 状态转换的触发条件以及协议所采取的动作

此外, 它还推断某些使用运营商定义策略和配置的协议操作逻辑(例如切换). 它为每个协议提供了内置抽象, 并允许移动操作系统/应用程序开发人员定制他们的分析器

In-Device Runtime Monitor¶

To enable in-device runtime monitoring, we need to address three issues: (1) How to expose raw cellular information from the hardware to the software ( § 4.1)? (2) How to decode the information into valid messages, given rich types and inter-dependency of protocol messages ( § 4.2)? (3) How to meet the requirement for low latency and reduce the system overhead ( § 4.3)? We next elaborate on each.

为了实现设备内运行时监测, 我们需要解决三个问题:

(1) 如何将硬件层面的原始蜂窝信息暴露给软件(§ 4.1)?

(2) 鉴于协议消息类型丰富且相互依赖, 如何将这些信息解码为有效消息(§ 4.2)?

(3) 如何满足低延迟并降低系统开销的要求(§ 4.3)?

接下来我们将详细阐述每一点

4.1 Exposing Raw Logs from Side Channel¶

The first issue is that ordinary in-device schemes cannot expose message-level cellular information to the user space (see § 2). We thus leverage an alternative side channel between the chipset and the software. We find that the chipset supports an external diagnostic mode, which exposes the cellular interface to the USB port. In fact, this diagnostic mode exists for major cellular chipsets (including Qualcomm, MediaTek and Intel series) and mobile OSes (including Android and iOS). However, no public documents are available for this mode. We have to learn its details from the open-source code of diagnostic drivers (summarized in Table 3).

Figure 3a illustrates the USB-based diagnostic mode on Android for Qualcomm chipsets. The cellular interface maps itself to a virtual device (e.g., /dev/diag) in the OS. Different from RIL, this virtual device exposes all raw cellular messages as binary streams. When the USB is connected to the external collector (e.g., a PC), the OS uses USB tethering [24] to bind the virtual device with a USB port (e.g., /dev/ttyUSB). The external collector thus fetches the cellular messages from the hardware interface. This is how the debuggers (Qualcomm QXDM [2], MediaTek Catcher [4], Intel xgoldmon [23], XCAL [3]) collect logs from USB. Similar virtual devices also exist on other chipsets and mobile OSes (summarized in Table 3), with slightly different implementations 2 .

MOBILE INSIGHT emulates an external logger at the mobile device to collect raw cellular logs. We issue commands directly to the virtual device (e.g., via ioctl or AT command AT+TRACE, depending on the chipset and mobile OS types). These commands include activation/deactivation of cellular message types, and callback registrations to receive hex logs. We then pull the hex log streams from the virtual device, and pass them to the in-device message parser. This ensures that, for each cellular message accessible to external debuggers, it is also available from MOBILE INSIGHT.

通过侧信道暴露原始日志

第一个问题是, 普通的设备内方案无法向用户空间暴露消息级别的蜂窝信息(见 § 2). 因此, 我们利用了芯片组和软件之间的另一种侧信道(side channel).

我们发现 芯片组支持一种外部诊断模式 (external diagnostic mode), 该模式将蜂窝接口暴露给 USB 端口.

事实上, 主流的蜂窝芯片组(包括高通, 联发科和英特尔系列)和移动操作系统(包括 Android 和 iOS)都存在这种诊断模式. 然而, 该模式没有公开的文档. 我们必须从开源的诊断驱动代码中研究其细节(总结于表 3).

图 3a 展示了 Android 上针对高通芯片组的基于 USB 的诊断模式:

alt text

蜂窝接口将自身映射为操作系统中的一个虚拟设备(例如 /dev/diag).

与 RIL 不同, 此虚拟设备将所有原始蜂窝消息作为二进制流暴露出来.

当 USB 连接到外部收集器(例如 PC)时, 操作系统使用 USB 网络共享(USB tethering)将虚拟设备与 USB 端口(例如 /dev/ttyUSB)绑定.

因此, 外部收集器可以从硬件接口获取蜂窝消息.

这就是调试器(高通 QXDM, 联发科 Catcher, 英特尔 xgoldmon, XCAL)从 USB 收集日志的方式. 其他芯片组和移动操作系统上也存在类似的虚拟设备(总结于表 3), 实现略有不同.

MobileInsight 在移动设备上模拟了一个外部日志记录器, 来收集原始蜂窝日志

我们直接向虚拟设备发送命令(例如, 根据芯片组和移动操作系统类型, 通过 ioctl 或 AT 命令 AT+TRACE).

这些命令包括激活/停用蜂窝消息类型, 以及注册回调以接收十六进制日志(hex logs).

然后, 我们从虚拟设备拉取十六进制日志流, 并将其传递给设备内消息解析器.

这确保了对于外部调试器可访问的每条蜂窝消息, MobileInsight 也能获取到.

4.2 Parsing Cellular Network Messages¶

Given the raw cellular logs, we next parse each message. The issue is to decode a variety of message types (see Table 4) in their rich formats. Figure 3 exemplifies the structure of the 4G RRC message from the side channel. It carries a metadata header and the payload, plus configurable and message-specific information elements. The metadata headers’ formats are specific to cellular chipsets. We infer them based on the raw binary logs from the diagnostic virtual device, and the publicly available open-source driver code [23, 25–27]. The message-specific information elements are standardized in the 3GPP standards [9–12,16,17,17,18,28].

MOBILE INSIGHT parses such rich messages in two steps, as shown in Figure 3a. During the first step, a metadata parser is applied to the raw hex logs to extract the message type ID and release version. It then selects the corresponding message parser with a switch branch over the (type-ID, release) tuple. To develop message parser for each signaling message, we extract the message formats from the standards of each protocol. Some formats can be automatically extracted. For instance, the 3G/4G RRC standards provide abstract message notations under ASN.1 [29], which can be readily compiled into message decoders. For other messages, we manually convert them to machine-readable formats.

Handling protocol dependency. Certain messages are interdependent even at the parsing level. L1/L2 protocols (PDCP/RLC/MAC/PHY) may need control parameters from RRC for correct parsing. For example, the PDCP packet headers might be compressed with RoHC [30], whose parameters are carried in the RRC Reconfiguration message. Without these RoHC parameters, these packets cannot be decompressed or decoded. We thus implement a protocol configuration repository. Upon receiving RRC reconfiguration, MOBILE INSIGHT extracts related information elements for PDCP/RLC/MAC/PHY, such as the compression parameters (for PDCP), acknowledgement mode configuration (for RLC), DRX timers (for MAC), modulation support (for PHY), etc. They are used to parse upcoming messages accordingly.

获取原始蜂窝日志后, 接下来我们需要解析每条消息. 难点在于解码格式丰富多样的消息类型(见表 4). 图 3 展示了从侧信道获取的 4G RRC 消息的结构:

alt text

它携带一个元数据头(metadata header)和有效载荷(payload), 以及可配置的和特定于消息的信息元(information elements). 元数据头的格式是特定于蜂窝芯片组的. 我们根据来自诊断虚拟设备的原始二进制日志和公开可用的开源驱动代码推断出它们. 消息特定的信息元在 3GPP 标准中进行了标准化.

MobileInsight 分两步解析这些复杂的消息, 如图 3a 所示. 在第一步中, 元数据解析器应用于原始十六进制日志, 以提取消息类型 ID 和版本号(release version). 然后, 它通过基于(类型 ID, 版本)元组的分支选择(switch branch)来选择相应的消息解析器. 为了开发每个信令消息的消息解析器, 我们从每个协议的标准中提取消息格式. 某些格式可以自动提取. 例如, 3G/4G RRC 标准提供了 ASN.1 下的抽象消息符号, 可以直接编译成消息解码器. 对于其他消息, 我们要手动将其转换为机器可读的格式.

处理协议依赖性. 某些消息甚至在解析层面也是相互依赖的. L1/L2 协议(PDCP/RLC/MAC/PHY)可能需要来自 RRC 的控制参数才能正确解析. 例如, PDCP 数据包头可能使用 RoHC 进行压缩, 其参数携带在 RRC 重配置(RRC Reconfiguration)消息中. 如果缺乏这些 RoHC 参数, 这些数据包就无法解压或解码. 因此, 我们实现了一个协议配置仓库(protocol configuration repository). 收到 RRC 重配置消息后, MobileInsight 会提取 PDCP/RLC/MAC/PHY 的相关信息元, 例如压缩参数(用于 PDCP), 确认模式配置(用于 RLC), DRX 计时器(用于 MAC), 调制支持(用于 PHY)等. 这些信息将用于相应地解析后续消息.

4.3 Optimization¶

We apply several optimizations to reduce message collection/processing latency and system overhead. First, MOBILE INSIGHT uses on-demand collection to only archive those logs required by the device-specified analyzers. It asks each protocol analyzer to declare its needed cellular messages ( § 5), and dynamically configures the cellular interface to record only those messages of interests. In § 7.3, we will show that it can help reduce the storage overhead by up to two orders of magnitude. Second, it invokes on-demand parsing to only decode those necessary fields. For example, the analyzer may only want to learn the connectivity state in RRC. It thus parses the metadata only, and then passes the message to the analyzer with an annotation of the message parsers needed. Then the analyzer can parse it on demand by calling the decode(), which reads the annotation and calls the correct message parser. Last, we parallelize log collection and parsing. The trace collection proxy and the parser are two separate daemons, and the proxy passes the raw logs via an in-memory queue. This prevents that the analysis is blocked by log collection.

Table 4 summarizes the supported messages by the time of submission. For raw log collection, it supports the same types of messages as the state-of-art external debuggers. For message parsing, it currently supports 240 message types, encapsulated in 68 typespecific metadata headers and 3GPP releases 7-12. It decodes all signaling messages on radio resource control, mobility management and session management for 3G and 4G. It partially supports 4G PHY, MAC, RLC, and PDCP messages, mainly those conveying control information. It also supports CDMA/EvDO messages partially, including the paging and radio link protocols. We have realized full support for Qualcomm Snapdragon processors, and validated the feasibility on MediaTek/Intel chipsets and iOS.

我们采用了几种优化措施来减少消息收集/处理延迟和系统开销.

首先, MobileInsight 使用按需收集(on-demand collection), 仅归档设备指定分析器所需的日志. 它要求每个协议分析器声明其所需的蜂窝消息(§ 5), 并动态配置蜂窝接口以仅记录感兴趣的消息. 在 § 7.3 中, 我们将展示这可以将存储开销降低多达两个数量级.

其次, 它调用按需解析(on-demand parsing), 仅解码那些必要的字段. 例如, 分析器可能只想了解 RRC 中的连接状态. 因此, 它仅解析元数据, 然后将消息传递给分析器, 并附带所需消息解析器的注释. 然后, 分析器可以通过调用 decode() 按需解析它, 该函数读取注释并调用正确的消息解析器.

最后, 我们将日志收集和解析并行化(parallelize). 轨迹收集代理(proxy)和解析器是两个独立的守护进程(daemon), 代理通过内存队列传递原始日志. 这防止了分析过程被日志收集阻塞.

表 4 总结了截至提交时支持的消息. 对于原始日志收集, 它支持与最先进的外部调试器相同类型的消息. 对于消息解析, 它目前支持 240 种消息类型, 封装在 68 个特定类型的元数据头和 3GPP Release 7-12 中. 它解码 3G 和 4G 的无线资源控制(RRC), 移动性管理(MM)和会话管理(SM)的所有信令消息. 它部分支持 4G PHY, MAC, RLC 和 PDCP 消息, 主要是那些传递控制信息的消息. 它还部分支持 CDMA/EvDO 消息, 包括寻呼和无线链路协议. 我们已经实现了对高通骁龙处理器的全面支持, 并验证了在联发科/英特尔芯片组和 iOS 上的可行性.

alt text

Cellular Protocol Analytics¶

With the cellular messages, MOBILE INSIGHT further builds runtime analytics for protocol behaviors. Table 5 summarizes the protocol analytics we have developed. For each protocol, we uncover two dimensions of its behaviors:

Protocol state dynamics: They include the protocol states, and the state transition events. They are controlled by the standardized protocol state machines and runtime observation of protocol messages and configurations.

Protocol operation logic: It decides what parameters to use and which messages to send/receive. For network-centric 3G/4G design, it is the algorithm or policy used by the network operator to determine the parameters/messages used by protocols.

5.1 Extraction of Protocol State Dynamics¶

The cellular protocol states at the device are regulated by the state machine in 3G/4G networks. The runtime protocol state dynamics provide direct hints about performance (e.g. high/low-rate connectivity state in RRC) and functional correctness (e.g., failure states in mobility management or session management). For each signaling protocol (3G/4G RRC, MM and SM), MOBILE INSIGHT seeks to capture its runtime state dynamics, including the current state, the state transitions and the conditions for transitions.

We take a two-phase approach (see Figure 4 for an example). We first derive a reference state-machine model for each protocol based on the 3GPP standards [9–12]. This model abstracts the device-side states and transition conditions as a function of cellular messages. We then feed runtime cellular messages from our in-device monitor ( § 4) to this model. This provides the exact protocol states and staterelated configurations. From those cellular messages, we derive the transition parameters and track the state transitions by following the reference state machine. Since both the standardized state machines and runtime messages are known in MOBILE INSIGHT, the ground truth on the runtime protocol states can be obtained.

Reference state machine. We focus on the protocol state machine at the device. For each protocol (RRC/MM/SM), the standard specifies the protocol states and substates, and the transition conditions. In RRC, the state represents the radio connectivity between the device and the base station. For MM, the state denotes the device’s registration status to the core network. For SM, the state represents the data session activity and QoS configurations. Figure 4 exemplifies the 4G-RRC state machine defined in [12]. We extract both the main states (say, RRC_IDLE and RRC_CONN) and the substates at each state (e.g., Continuous-RX and Short/Long-DRX in RRC_CONN). Each state transition is modeled as a boolean function of cellular messages: true if the transition condition is met upon receiving this cellular message. The transition can be directly triggered upon receiving certain messages, and/or controlled by parameters inside the message (say, timers). It can also be activated upon receiving multiple messages (e.g., five rejection messages from MM lead to the “out-of-service” state [10]). Note that such a reference model itself does not provide runtime state dynamics or concrete parameter settings for the state transition. Instead, it serves as a template to track cellular messages and states at runtime. Table 6 summarizes the sizes of the device-side state machines for each protocol, which are independent of network carriers and phone models. It can be seen that all protocols’ state machines are of modest sizes, thus able to be tracked efficiently at end device.

Runtime message-driven state tracking. Given the reference model, MOBILE INSIGHT next tracks state transitions based on incoming cellular messages from the in-device monitor ( § 4). It first reads the reference state machine, and determines the cellular messages to be monitored. Each observed message is passed to all those state-transition functions originated from the current protocol state. If any transition function is satisfied, MOBILE INSIGHT updates the current state. Figure 4 shows how this works for the 4G-RRC connectivity dynamics. Between idle and connected states, receiving the RRC Connection Setup/Release message immediately triggers the transition. The transitions between substates within RRC_CONN rely on timer configurations. In this case, the transition functions between substates extract timers from the runtime RRC Connection Reconfiguration message, and use internal timers to track the potential transitions. In the example, CRX switches to Short-DRX upon T 1 timeout, and moves back to CRX but not Long-DRX since timer T 2 stops. T 2 is configured one or multiple short DRX cycle (here, 2 × 20 ms = 40ms).

Other protocols. We also apply similar techniques to track the states in 3G/4G MM and SM protocols (Table 6).

• Mobility management: These protocols control the device’s registration status to the core network, and manage the tracking/location/routing area for the device. We track the device’s registration status based on messages of attach/detach and location/routing/tracking area update. In each (de)registration status, the device’s configurations (e.g., security mode, voice usage preference, network features) are also recorded. Such information can be used for failure diagnosis and security loophole detection (see § 8).

• Session management: These protocols, including 4G ESM, 3G CM/SM, control the device’s data/voice session activities. Each data session has its own QoS profile (e.g., traffic/delay class, maximum bitrate) and data billing policy. In MOBILE INSIGHT, we track the data session activity based on the session setup/modify/release messages. We extract the QoS profile and the billing policy (in the form of traffic ﬂow template) from these messages, and use them as hints for data performance and network failure diagnosis (see § 8).

Correctness. MOBILE INSIGHT provides protocol state dynamics identical to those constructed by using messages from external debuggers (e.g. QXDM and MTK Catcher). This is for two factors: (1) The protocol state machines are standardized, while the standards dictate the protocols to follow the runtime parameters; (2) MOBILE INSIGHT has access to the same cellular information as those tools. In the RRC protocol context, MOBILE INSIGHT directly extracts and predicts RRC states with explicit information. It is thus better than the implicit learning scheme (e.g., through power measurement [31–33]). Similarly, MOBILE INSIGHT directly analyzes 4G EMM and ESM protocols, and their 3G variants as well. Note that, however, the ultimate correctness of our protocol state tracking at the device depends on two premises: (1) the device chipset implementation follows the 3GPP standards; (2) the cellular information from the diagnostic mode is accurate. We do not have any evidence to show that neither premise is invalid now.

Limitations. While MOBILE INSIGHT can accurately reconstruct the device-side protocol state dynamics, it does not have direct access to the network-side protocol state counterparts (which can be different from client-side protocol state dynamics). Indeed, the network-side protocol state dynamics could be inferred based on the device-side protocol’s state, which has been regulated by the 3GPP standards. We leave this inference to the future work.

协议状态动态提取 (Extraction of Protocol State Dynamics)

这一小节关注如何精确追踪设备端的协议状态变化!

目的与意义:

捕捉 3G/4G 协议(RRC, MM, SM)的运行时状态动态, 为分析网络性能(如 RRC 连接状态)和功能正确性(如故障诊断)提供直接线索

两阶段方法:

alt text

构建参考模型: 基于 3GPP 标准推导出参考状态机模型, 作为追踪运行时消息和状态的模板
运行时追踪: 将设备内监测到的实时蜂窝消息输入该模型, 利用状态转换函数(结合计时器等参数)来更新和确定当前的精确协议状态

覆盖范围:

支持 RRC(无线连接性), 移动性管理(MM, 注册状态)和会话管理(SM, 数据会话与 QoS 配置)

准确性:

由于依据标准化的状态机且使用与专业调试工具相同的数据源, 其提供的状态动态具有极高的准确性(基准真值)

5.2 Inference of Protocol Operation Logic¶

MOBILE INSIGHT can also infer certain protocol operation logic from the network. The logic is the algorithm or policy by the operator to determine what configurations the protocol should use and what messages to send/receive. By analyzing operation logic, the device can forecast possible performance degradation (e.g., handoff to a low-speed cell) and functional incorrectness (e.g., network failures). We next present our initial effort on inferring the networkside protocol logic using the case study on handoff.

Handoff switches the device’s serving cell from one to another. It is critical to end devices, since the target cell to be chosen may have varying performance. In 3G/4G, when the device is at RRC_CONN state, the handoff decision is made by the base station and assisted by the device. Figure 5 (top) depicts a typical handoff procedure and its signaling between the device and the network. The phone is initially served by BS1. The serving cell (BS1) asks the phone to measure and report the radio quality of neighboring cells. Upon receiving the measurement report from the phone, BS1 runs its decision logic to determine whether handoff should be triggered. It may reconfigure the device for further measurements (right), or issue the device handoff command (left).

We need to address two challenges when inferring network-side handoff logic. First, the connected-state handoff decision logic can be operator specific. The 3GPP standards leave the freedom for operators to customize their decision logic. Second, the device does not have full access to all network-side operations. It has to rely on its observations and interactions with the network to learn the logic.

Fortunately, the operation logic is not arbitrary in reality. It typically follows well-justified common practices [34–40]. Operators tend to apply the stable logic to each cell. It remains stable, and thus predictable in operational 3G/4G networks. Moreover, we observe that many network-side protocol operations be interactive and stateful. The logic is customizable, but regulated by standardized mechanisms at the protocol level. The network often relies on device feedback to operate its protocols (e.g., measurement report for handoff). Consider the example of Figure 5. To make a proper handoff decision, the serving cell needs to know the device-perceived signal strength of nearby candidate cells. It thus configures the device to perform measurements and report the signal strength (via the Meas Control command). This interaction may take multiple rounds, because the base station may request the device to measure more candidates based on prior measurements. According to 3GPP standards [11,12], both the handoff commands and measurement report criteria are of limited options. Although the device has no direct access to network-side operations, it may infer them by pairing control commands and feedbacks.

Consequently, we model network operations as a finite-state machine and devise an online inference algorithm. Our approach adapts QSM, a state merging algorithm [41, 42] in AI with leveraging domain-specific knowledge on cellular networks to improve inference accuracy.

Modeling handoff decision logic. We model the handoff logic as a domain-specific finite-state machine. Our model takes into account the standardized mechanisms, including measurement control, measurement report and handoff procedure. Each state denotes the device’s control state configured by the network (e.g., Meas Control and Handoff Command). Two states are equivalent if their control parameters are identical. For Meas Control, this means identical measurement report criteria (e.g., A3 defined in [11, 12]). For Handoff Command, we assume that they are equivalent if the target cells are identical. The state equivalence is essential for the state merging process. The state transition happens when a new control command (Meas Control or Handoff Command) is received by the device. It is invoked upon receiving the device’s Meas Report message in response to the current control state. Following QSM, the state transition is modeled as a prefix of Meas Report sequence. Any sequence matching this prefix would trigger the state transition. This model is valid for handoff, because the base station may make handoff decisions before receiving all reports from the device.

Online inference. We use an online algorithm to infer the handoff decision logic. For each serving cell, it collects runtime handoff events and associated measurement controls/reports as samples, and iteratively updates its inferred state machine by aggregating a new state with the existing one. Each iteration has three steps: sample collection, partial recovery, and aggregation.

(a) Sample collection. We collect the training samples at runtime without active probing of the cellular network. We define a sample sequence as the tuple of an old control command (e.g., Meas control), a new control command (e.g., Meas control or Handoff command) and the Meas Report sequence in between. To collect a sample sequence, we track all corresponding messages in the background (via in-device monitor) until the next control command arrives. In the example, three samples are collected in two cases where the device hands over from 4G BS1 to 4G BS2 (left) and 3G BS3 (right).

(b) Partial recovery. From each sample sequence, we generate a state transition by converting the old/new control command into from/to the control state, with the feedback sequence being the transition condition (the sequence itself is also a prefix). Use sample sequence 1 as an example: Meas

MeasReport1,... Control−−−−−−−−− Command. We thus de→Handoff

rive a transition between the state “monitor other 4G neighbor cells”→“handoff to another 4G cell” (here, BS2) when the measurement report indicates RSS 2 > RSS 1 + 3 (event A3 in [12]). Similarly, we derive the transition of extending the measurement from 4G to 4G and 3G, and the transition to a 4G→ 3G handoff.

(c) Aggregation. When a new partial transition is created, the aggregation step merges it to the existing state machine. It works in three steps. First, it performs symbolic mapping to generalize the rule. This is feasible because the 3GPP standards define the measurement control/report parameters in an abstract form [12]. For example, we translate RSS 2 > RSS 1 + 3 into a general rule RSS n(4G) > RSS s(4G) + 3 by mapping cells 1 and 2 into their roles in 4G: the serving cell and the neighboring candidate. Second, it locates where to merge. For each edge from the partial transition sample, we search if it exists in the current state machine. We use a directed acyclic graph G to represent the state machine. Finally, we merge the new rule into the existing graph by running the union operation over the graph.

Algorithm 1 shows the pseudo-code for aggregation. There are three cases: both source and destination states (nodes) exist in the graph, only one node exists, and no nodes exist. If no nodes are not found, it is treated as a new edge and added to the existing state machine as an isolated graph. When only one node exists, we create a new edge from the existing graph (by adding the non-existing node) and initialize its transition condition as the measurement sequence from the sample. When both nodes exist, the transition condition (prefixes) should be merged. We search for the longest common prefix in the existing transition and merge it with this new prefix. In theory, the old and new conditions for the same transition might differ or even conﬂict with each other (e.g., RSS > −110 and RSS < −110). However, it would not occur in practice because the rules used by the same serving cell are consistent.

Ideally, the above algorithm should be performed over every serving cell. In practice, however, per-cell inference suffers from insufficient training samples; otherwise, the user has to wander around to collect unique samples within one cell coverage. Moreover, it requires more storage for per-cell logic. To tackle this issue, we observe that operators tend to apply the same logic to each cell type (e.g., under the same frequency), with minor tuning on parameters (e.g., thresholds). It is thus feasible to aggregate samples from cells of the same type, and infer the decision logic from each frequency. The above aggregation algorithm still applies. We merge those samples from the serving cells over the same frequency. Though this may lead to conﬂicting decision logics between cells of the same type in theory, it is unlikely to happen in operational 3G/4G network (§7.2). To handle it, we duplicate the state transitions and create two branches and mark it as an exception for further checking.

Correctness and limitations. The correctness of the inferred handoff logic is generally ensured by the good properties of QSM. [41–44] prove that, the state machine can be fully recovered if sufficient samples can reach all states and differentiate any pair of nonequivalent states in the logic. This requires the device to receive all possible types of Meas Control and Handoff Command from sufficient samples. If the device has not collected enough samples that meet above conditions, the state machine may be incomplete.

A limitation of our approach is that, the inference may not capture the internal states on the network-side handoff logic that do not interact with the device. For instance, even when the radiorelated handoff criterion is met, the network may not invoke the handoff to the target cell for load balancing; such operations may remain invisible to the device. The inferred one would then be the device-perceived handoff logic only. The direct access to these network-side internal states would be possible only if the cellular infrastructure is open, which however is unlikely to occur in reality. Our study shows that our inference typically captures network-side logic in practice ( § 7.2).

Operation logic for other protocols. Besides handoff, other cellular protocols may also use their own logic. For example, operators may customize their QoS allocation policies in SM. PHY may customize its radio block allocation and rate adaptation algorithms. We are exploring to adapt our online inference to these contexts.

协议操作逻辑推断 (Inference of Protocol Operation Logic)

(1) 核心概念与目标:

什么是协议操作逻辑？
- 它是网络运营商用来决定协议行为的算法或策略
- 例如, 基站决定“何时命令手机进行测量”以及“何时命令手机切换基站”的规则
为什么要推断它？
- 设备通常不知道网络为什么做出某个决定
- 通过推断逻辑, 设备可以预测潜在的性能问题（如被错误切换到低速小区）或功能故障

论文以“切换（Handoff）”为例来阐述这一技术

alt text

交互机制： 在 3G/4G 中, 切换决策由基站（网络侧）做出, 但依赖设备（手机侧）的辅助
典型流程（见论文图 5）：
1. 测量控制 (Meas Control)： 基站下发命令, 告诉手机“去测量哪些邻区”
2. 测量报告 (Meas Report)： 手机反馈信号强度
3. 决策执行： 基站根据报告决定是继续测量, 还是发送切换命令 (Handoff Command)
4. 测量报告 (Meas Report)： 手机反馈信号强度
5. 决策执行： 基站根据报告决定是继续测量, 还是发送切换命令 (Handoff Command)

(2) 面临的挑战与可行性

挑战:

黑盒: 运营商的策略是私有的, 且标准允许自定义, 手机无法直接读取基站的代码
不可见: 手机只能看到发给自己的命令和自己发的报告, 看不到基站内部的处理过程

可行性(为什么能推断):

稳定性: 运营商通常对同一类小区使用稳定的逻辑, 不会随意变化
受限的交互: 尽管逻辑可定制, 但必须遵循 3GPP 标准的信令格式(例如 A3 事件), 命令和报告的选项是有限的
输入输出可见: 我们可以将基站看作一个函数 f(x), 手机发送的测量报告是输入 x, 基站下发的控制命令是输出 y. 通过收集大量的 (x, y) 对, 可以反推 f(x)

(3) 在线推断算法 (Online Inference Algorithm)

论文采用了一种基于 QSM(状态合并算法)的在线学习算法, 分三个步骤不断迭代更新逻辑模型:

Sample Collection
- 被动收集: 不主动发包, 仅在后台记录
- 样本定义: 一个完整的样本是一个三元组: <旧控制命令, 测量报告序列, 新控制命令>
- 例子: 手机正在测量 4G 邻区(旧命令) -> 手机报告邻区信号比当前强 3dB(报告) -> 基站命令切换到该邻区(新命令).
Partial Recovery
- 转换生成： 将收集到的样本直接转化为一条具体的规则
- 逻辑： 在“旧命令状态”下, 如果满足“报告条件”, 则跳转到“新命令状态”
- 实例： 状态“监测 4G 邻区” -> “报告: RSS2 > RSS1 + 3” -> 状态“切换到 BS2”
Aggregation
- 符号映射 (Symbolic Mapping)：
  - 为了让规则通用化, 算法将具体的 Cell ID（如小区 1、小区 2）抽象为角色（如“服务小区”、“同频邻区”）
  - 效果： 将“当小区 2 比小区 1 强 3dB 时切换”泛化为“当任意同频邻区比服务小区强 3dB 时切换”
- 定位与合并 (Locate & Merge)：
  - 在现有的状态机图中查找是否存在相同的源状态和目标状态
  - 如果存在, 则合并转换条件（取最长公共前缀）
  - 算法会处理图的联合操作, 将新边添加到图中

(4) 关键优化：基于频率的聚合

问题：如果对每个单独的小区（Cell ID）都推断一套逻辑, 样本量太少, 且不仅存储开销大, 用户也难以遍历所有状态
解决方案：利用运营商的配置习惯 "同一频率下的所有小区通常使用相同的逻辑"
实现：算法不区分具体小区, 而是将同一频率下的所有小区的样本聚合在一起训练同一个状态机
- 这大大增加了样本量, 提高了推断的准确性和速度

limitation

BS还有一些切换的逻辑, 是不需要告诉UE的, 比如:

如果网络根据内部状态(如基站负载过高)进行切换, 而这些状态不通过信令告诉手机, MobileInsight 就无法推断出这一因果关系

上述方法只能推断“设备感知到”的逻辑

Implementation¶

MOBILE INSIGHT seeks to provide an open platform to facilitate researchers and developers to learn the protocol operations in cellular networks. It thus defines simple APIs for its monitor and analyzer components. We implement MOBILE INSIGHT on off-the-shelf smartphones, as a user-space service. We choose the user-space rather than in-kernel solution for ease of deployability. It consists of 29,698 lines of code (12,254 lines of C/C++ and 17,444 lines of Python), excluding the 3rd-party libraries. Our current implementation is mainly on Android phones with Qualcomm chipsets, but porting to other platforms is ongoing.

MobileInsight 旨在提供一个开放平台, 以便利研究人员和开发人员学习蜂窝网络中的协议操作. 因此, 它为其监测器(monitor)和分析器(analyzer)组件定义了简单的 API. 我们在商用现成(off-the-shelf)智能手机上, 将 MobileInsight 实现为一种用户空间服务.

我们选择用户空间而非内核内(in-kernel)解决方案是为了易于部署. 它包含 29,698 行代码(12,254 行 C/C++ 和 17,444 行 Python), 不包括第三方库. 目前的实现主要基于搭载高通芯片组的 Android 手机, 但移植到其他平台的工作正在进行中.

MOBILE INSIGHT API. We first illustrate how to use the API via an example (more detailed usage can be found in [45]), which seeks to analyze the protocol state dynamics of 3G and 4G RRC.

Both monitor and analyzer functions in MOBILE INSIGHT are encapsulated into classes, for instance, Monitor class and LteRrcAnalyzer class. MOBILE INSIGHT abstracts the inference per protocol into a module called analyzer. To call a chosen function, an atop app/service has to initiate an instance of its corresponding class. For example, src = Monitor() creates a Monitor instance. Second, the target app/service declares the needed analyzers, and binds them to the monitor via set_source(src) method. This lets an analyzer register a callback function upon certain cellular events from monitor. Finally, we start MOBILE INSIGHT by running the monitor via src.run(), which logs the events and drives the analysis.

MobileInsight API

我们首先通过一个旨在分析 3G 和 4G RRC 协议状态动态的示例来说明如何使用 API(更详细的用法可见 [45]).

Bash
# Initialize a in-device monitor 
src = Monitor() 
#Declare 3G/4G RRC analyzers 
lte_rrc_analyzer = LteRrcAnalyzer() #4G RRC 
wdcma_rrc_analyzer = WcdmaRrcAnalyzer() #3G RRC 
#Bind the analyzers to the monitor 
lte_rrc_analyzer.set_source(src) 
wdcma_rrc_analyzer.set_source(src) 
#Start processing 
src.run()

MobileInsight 中的监测器和分析器功能均封装在类中, 例如 Monitor 类和 LteRrcAnalyzer 类. MobileInsight 将每个协议的推断抽象为一个称为"分析器"的模块. 要调用选定的功能, 上层应用/服务必须初始化其对应类的一个实例. 例如, src = Monitor() 创建一个 Monitor 实例. 其次, 目标应用/服务声明所需的分析器, 并通过 set_source(src) 方法将其绑定到监测器. 这使得分析器能够注册回调函数, 以响应来自监测器的特定蜂窝事件. 最后, 我们通过运行监测器的 src.run() 来启动 MobileInsight, 该函数记录事件并驱动分析.

In-device monitor. We implement the monitor using two daemons: a proxy daemon to extract raw hex logs from the cellular interface (chipset), and a parser daemon to decode messages. This allows for pipeline parallelism, thus reducing processing latency. The proxy daemon retrieves raw logs by leveraging Android’s open-source driver for the cellular virtual interface [25]. Specifically, we first open the virtual device /dev/diag, and enable the logging mode by sending a command (defined in [25]) via ioctl function. We then register a callback function linked to the virtual device to be notified whenever raw binaries are generated. The parser daemon implements the decoding of cellular messages in Table 4. We also implement optimization techniques in § 4.3, including on-demand collection, on-demand decoding and in-memory processing. To further speed up processing, both daemons are implemented in C/C++ and compiled with Android NDK.

设备内监测器 (§ 4)

我们使用两个守护进程来实现监测器: 一个代理(proxy)守护进程用于从蜂窝接口(芯片组)提取原始十六进制日志, 以及一个解析器(parser)守护进程用于解码消息. 这允许 流水线并行处理, 从而降低处理延迟. 代理守护进程利用 Android 的蜂窝虚拟接口开源驱动 [25] 来检索原始日志. 具体而言, 我们首先打开虚拟设备 /dev/diag, 并通过 ioctl 函数发送命令(定义在 [25] 中)来启用日志记录模式. 然后, 我们注册一个链接到虚拟设备的回调函数, 以便在生成原始二进制数据时收到通知. 解析器守护进程实现了表 4 中蜂窝消息的解码. 我们还实现了 § 4.3 中的优化技术, 包括按需收集, 按需解码和内存中处理. 为了进一步加速处理, 两个守护进程均使用 C/C++ 实现, 并使用 Android NDK 编译.

Built-in Analyzers. We implement each built-in analyzer as a Python module 3 . We port the Python-based analyzer framework via python-for-android [46] which allows to compile the Python code into Android apk. The analyzer framework integrates all analyzers into a directed acyclic graph. Each node is an analyzer, and a directed edge v→w denoting the dependency of w on v. Each analyzer is initiated at most once. It is shared by multiple callers when needed.

内置分析器 (§ 5)

我们将每个内置分析器实现为一个 Python 模块. 我们通过 python-for-android [46] 移植了基于 Python 的分析器框架, 该工具允许将 Python 代码编译为 Android apk. 分析器框架将所有分析器集成到一个有向无环图(DAG)中. 每个节点是一个分析器, 有向边 v → w 表示 w 依赖于 v. 每个分析器最多被初始化一次. 当需要时, 它由多个调用者共享.

Miscellaneous issues. We discuss two related issues.

◦ Message coverage. The current version has not covered all cellular messages to date. We only focus on those most useful ones (control-plane and L1/L2 ones carrying control information). In principle, the same method is applicable to support all messages. We are extending MOBILE INSIGHT to data-plane protocols (below-IP) and their analysis.

◦ Rooted phones. MOBILE INSIGHT currently works with rooted phones. Studies claim that about 27.4% users have rooted their phones [47]. Root should not be a big problem at current stage. MOBILE INSIGHT’s current target is the research community who does research on cellular networks. In fact, MOBILE INSIGHT only requires access permission to a specific system folder and the cellular interface. Once it is granted, it does not require other permissions for root privilege. To support more mobile devices, we are also exploring rootless techniques, such as building MOBILE INSIGHT as a system service, and customizing boot image with minimal modification to grant cellular access privilege to MobileInsight.

其他问题 (Miscellaneous issues)

(1) 消息覆盖范围:

目前的版本尚未覆盖所有的蜂窝消息. 我们仅关注那些最有用的消息(控制平面以及携带控制信息的 L1/L2 层消息). 原则上, 相同的方法适用于支持所有消息. 我们正在将 MobileInsight 扩展到数据平面协议(IP 层以下)及其分析.

(2) 已 Root 的手机:

MobileInsight 目前适用于已 Root 的手机. 研究声称约有 27.4% 的用户已经 Root 了他们的手机. 现阶段 Root 权限不应是大问题. MobileInsight 目前的目标群体是从事蜂窝网络研究的研究社区. 事实上, MobileInsight 仅需要访问特定系统文件夹和蜂窝接口的权限. 一旦获得授权, 它就不需要 Root 特权的其他权限. 为了支持更多的移动设备, 我们也在探索无 Root(rootless)技术, 例如将 MobileInsight 构建为系统服务, 以及通过最小化修改来定制引导镜像(boot image)以授予 MobileInsight 蜂窝访问权限.

We first compare MOBILE INSIGHT with other approaches to learning cellular information from the device. Table 14 summarizes the features and limitations of each scheme. It shows that M OBILE I N SIGHT is the only software-only in-device cellular network analyzer, which covers more 3G/4G control-plane and low-layer protocols, supports both fine-grained information collection and protocol analytics at runtime, operates on COTS phones , and offers APIs for mobile applications. There are also in-device analyzers [58–60], but they focus on application and transport layers, not cellular-specific lower-layers.

Meanwhile, extensive research has been conducted to improve device-side performance over cellular networks, including video adaptation [56], energy saving [61–63], and cellular congestion control [64, 65], etc. They could also benefit from M OBILE I N SIGHT with further access to the fine-grained, runtime information. Finally, there are ongoing efforts on optimizations for handoff [37–40], software-defined LTE [66–70] and backend cellular infrastructure [71–73]. The insights from MOBILE INSIGHT over operational networks (e.g., signaling protocols and handoff policies), can help to better design the future network infrastructure.

我们首先将 MobileInsight 与其他从设备端获取蜂窝信息的方法进行了比较. 表 14 总结了每种方案的特性和局限性. 结果表明, MobileInsight 是 唯一一款纯软件的设备内蜂窝网络分析器, 它覆盖了更多的 3G/4G 控制平面和底层协议, 支持运行时的细粒度信息收集和协议分析, 运行在商用现成(COTS)手机上, 并为移动应用程序提供 API. 虽然也存在一些设备内分析器 [58–60], 但它们侧重于应用层和传输层, 而非蜂窝特定的底层.

alt text

同时, 为了提高蜂窝网络上的设备端性能, 人们进行了广泛的研究, 包括视频自适应 [56], 节能 [61–63] 和蜂窝拥塞控制 [64, 65] 等. 通过进一步访问细粒度的运行时信息, 这些研究也能从 MobileInsight 中受益. 最后, 在切换优化 [37–40], 软件定义 LTE [66–70] 和后端蜂窝基础设施 [71–73] 方面也有持续的努力. MobileInsight 在运营网络中提供的见解(例如信令协议和切换策略), 有助于更好地设计未来的网络基础设施.

Conclusion¶

The cellular network provides more control utilities than the wired Internet, including radio resource control, security, mobility support, and carrier-grade services, to name a few. Understanding these functions and their protocol operations will be important for refining the design and optimizing application performance. However, such fine-grained protocol operations have remained inaccessible to the research community.

MOBILE INSIGHT represents our first effort to build a software tool to open up the blackbox operations. It enables open access to the low-level protocol operations in 3G/4G from the device side. It runs on the COTS phone, but leverages its increasing capability. It directly extracts the signaling and/or low-layer messages from the side channel toward 3G/4G hardware interface, decodes the protocol messages, and infers the protocol state dynamics and decision logic at runtime through analyzers. Through MOBILE INSIGHT’s APIs, applications can benefit from accessing such low-level domain knowledge. In presence of network failures, security loopholes, or performance degrade, MOBILE INSIGHT helps to detect the problematic instances, infer the root causes, and suggest fixes.

In the broader context, MOBILE INSIGHT is designated to be an open, extensible tool for the community and by the community. It may help us to examine cellular networks in the large-scale setting via crowdsourcing. More community efforts are clearly needed to enhance and extend every aspect, particularly analyzers and applications atop. The collected datasets can further be shared within the community. Our own experience so far has confirmed that such tool-building efforts are quite worthwhile and can be rewarding.

蜂窝网络提供了比有线互联网更多的控制功能, 包括无线资源控制, 安全性, 移动性支持以及运营商级服务等, 仅举几例. 理解这些功能及其协议操作对于改进网络设计和优化应用性能至关重要. 然而, 研究界长期以来一直无法访问此类细粒度的协议操作.

MobileInsight 代表了我们为构建软件工具以揭示黑盒操作而做出的首次尝试. 它实现了从设备端对 3G/4G 底层协议操作的开放访问. 它运行在商用现成(COTS)手机上, 并充分利用其日益增强的计算能力. 它通过面向 3G/4G 硬件接口的侧信道直接提取信令和/或底层消息, 解码协议消息, 并通过分析器在运行时推断协议状态动态和决策逻辑. 通过 MobileInsight 的 API, 应用程序能够从访问此类底层领域知识中获益. 面对网络故障, 安全漏洞或性能下降, MobileInsight 有助于检测问题实例, 推断根本原因并提出修复建议.

在更广泛的层面上, MobileInsight 旨在成为一个服务于社区并由社区共建的开放, 可扩展工具. 它有助于我们通过众包在大规模环境下监测蜂窝网络. 显然, 我们需要更多的社区努力来增强和扩展其各个方面, 特别是分析器和上层应用程序. 收集到的数据集可以进一步在社区内共享. 我们目前的经验证实, 此类工具构建工作是非常值得且富有成效的.

MobileInsight: Extracting and Analyzing Cellular Network Information on Smartphones¶

Introduction¶

Cellular Network Primer¶

MOBILE INSIGHT Overview¶

In-Device Runtime Monitor¶

4.1 Exposing Raw Logs from Side Channel¶

4.2 Parsing Cellular Network Messages¶

4.3 Optimization¶

Cellular Protocol Analytics¶

5.1 Extraction of Protocol State Dynamics¶

5.2 Inference of Protocol Operation Logic¶

Implementation¶

Related Work¶

Conclusion¶