How LLM Saved Me from Struggling with Experiment Reproduction: LEO Networking as A Case Study¶

Reproducing network experiments is critical to advancing the research in computer networks. However, in reality many researchers often struggle with experiment reproduction: not only because reading, understanding, and debugging prior work is time-consuming and labor-intensive, but also because not all papers publicly release their code, thereby forcing subsequent researchers to re-implement experiments from scratch.

In this paper, we explore an intriguing question: can recent large language models (LLMs) assist in understanding research papers and generating code, thereby accelerating the reproduction of network experiments? Focusing on the rapidly evolving area of low-Earthorbit (LEO) satellite networks (LSN), we present LaseR 1 , a semiautomated, LLM-assisted tool designed to facilitate the reproduction of LSN experiments. LaseR judiciously integrates the capabilities of LLM with LSN simulation to ease the burden of LSN experimentation. Our case studies provide preliminary evidence that LaseR can efficiently reproduce experimental results consistent with those reported in the original papers, while substantially reducing the manual effort required by LSN researchers.

网络实验的复现对于推动计算机网络研究的进步至关重要。然而，在现实中，许多研究人员常常在实验复现上遇到困难：这不仅因为阅读、理解和调试先前的工作既耗时又耗力，也因为并非所有论文都公开发布其代码，从而迫使后续研究者从零开始重新实现实验。

在本文中，我们探讨了一个有趣的问题：近期的大语言模型（LLM）能否辅助理解研究论文和生成代码，从而加速网络实验的复现？ 本文聚焦于快速发展的低地球轨道（LEO）卫星网络（LSN）领域，提出了 LaseR¹，一个半自动化的、由LLM辅助的工具，旨在简化LSN实验的复现过程。 LaseR巧妙地将LLM的能力与LSN仿真相结合，以减轻LSN实验的负担。我们的案例研究提供了初步证据，表明LaseR能够高效地复现出与原始论文报告一致的实验结果，同时显著减少了LSN研究人员所需的人工投入。

Introduction¶

Late one night, a PhD student working on LEO networking just received the notification from SOSIGCOMMOBIXDI [42], one of the leading conferences in computer networks and systems, regarding the status of a recent paper submission. The reviewers acknowledged the significance of the research problem and commended the novelty of design. However, they also pointed out a critical limitation: the evaluation lacked comparative experiments against several closely related prior works, and thus the paper was conditionally accepted with a one-month revision period, requiring additional experiments. While the initial excitement was undeniable, the student was quickly tempered by a growing concern: these related paper works lacked publicly available source code, requiring their approaches to be re-implemented from scratch. With the deadline approaching, a pressing question accordingly has emerged: how could the PhD student efficiently and faithfully reproduce prior works to conduct meaningful and comparative experiments under tight time constraints?

Background and motivation. The above anecdote is likely a familiar experience for many students and researchers. There is not doubt that reproducibility is critical in computer network and system research, particularly in fast-evolving domains like LEO networking, where new technologies emerge rapidly. However, due to intellectual property concerns or other reasons, not all published papers release their source code. We fully respect and understand the decisions of all authors regarding their code release. However, the lack of open-source implementations and data inevitably poses challenges for reproducibility and conducting comparative evaluations in subsequent research.

Taking the area of LEO networking as an example, Table 1 presents a summary of papers published in top conferences and journals in computer network over the past five years, along with the status of their source code availability. These findings highlight a significant reproducibility gap: only 16 papers (13.8%) make available the full source code necessary for reproducing their results, 11 papers (9.5%) share partial artifacts, and the remaining 89 papers (76.7%) offer no implementation or evaluation materials.

The status quo raises an intriguing question: in a rapidly evolving field like LEO networking, is there a more efficient way for students and researchers to reproduce existing methods and experiments (even though the code of the original paper is unavailable), and to conduct faithful comparative evaluations? If yes, such solutions would free researchers from the time-consuming, labor-intensive and error-prone process of reproduction, allowing them to focus their efforts on innovation and original contributions.

Our initial attempts: LLM-assisted network experiment reproduction. Inspired by recent innovations in large language models (LLMs), we explore the possibility of leveraging the advanced natural language understanding abilities of LLMs to assist researchers in interpreting existing papers and generating executable code that facilitates experimental reproduction. In this paper, we present LaseR, a semi-automated platform for LSN experimentation. LaseR judiciously integrates the capabilities of LLM with simulator of LEO satellite network (LSN) to automate LSN experimentation.

Specifically, LaseR incorporates two core components for LEO experiment reproduction: (i) an LSN Knowledge Retrieval module which embeds a retrieval-augmented generation (RAG) pipeline that injects up-to-date, domain-specific knowledge into the LLM, substantially mitigating hallucination and improving factual correctness in generated LSN experiment code; and (ii) a Few-Shot Generator for LSN Simulation which standardizes the format of generated code, enhancing the reproducibility and maintainability. Further, we conduct three case studies to evaluate LaseR by reproducing experiments across three representative directions of recent LEO network research. Our case studies demonstrate that LaseR can reproduce experiment results consistent with the original papers, while reducing reproduction from hundreds or even thousands of lines of code to just dozens of prompts and around a hundred lines of code. We have released the code of LaseR 2 . Although still in its early stages, we hope LaseR will pave the way for future research in network experiment reproducibility.

一天深夜，一位从事LEO网络研究的博士生刚刚收到了来自计算机网络与系统领域顶级会议之一SOSIGCOMMOBIXDI [42]的通知，内容是关于他近期一篇论文提交的状态。审稿人承认了该研究问题的重要性，并赞扬了设计的创新性。然而，他们也指出了一个关键的局限：评估中缺少与几项密切相关的先前工作的对比实验。因此，该论文被有条件录用，并有一个月的修改期，要求补充额外的实验。尽管最初的兴奋难以言表，但这份激动很快被一个日益增长的担忧所冲淡：这些相关的论文没有公开发布源代码，需要从零开始重新实现它们的方法。随着截止日期的临近，一个紧迫的问题随之而来：这位博士生如何在紧张的时间限制下，高效且忠实地复现先前的工作，以进行有意义的对比实验？

背景与动机。 上述轶事可能是许多学生和研究人员的共同经历。毫无疑问，可复现性在计算机网络与系统研究中至关重要，尤其是在像LEO网络这样技术迅速涌现的快节奏领域。然而，由于知识产权问题或其他原因，并非所有发表的论文都会发布其源代码。我们完全尊重并理解所有作者关于代码发布的决定。然而，开源实现和数据的缺乏不可避免地给后续研究中的可复现性和对比评估带来了挑战。

alt text

以LEO网络领域为例，表1总结了过去五年在计算机网络顶级会议和期刊上发表的论文及其源代码可用性情况。这些发现揭示了一个显著的可复现性差距：只有16篇论文（13.8%）提供了复现其结果所需的完整源代码，11篇论文（9.5%）分享了部分产物，而其余的89篇论文（76.7%）则没有提供任何实现或评估材料。

这一现状引出了一个有趣的问题：在像LEO网络这样快速发展的领域，是否存在一种更高效的方式，让学生和研究人员能够复现现有的方法和实验（即使原始论文的代码不可用），并进行忠实的对比评估？如果答案是肯定的，这样的解决方案将把研究人员从耗时、耗力且易于出错的复现过程中解放出来，使他们能够将精力集中在创新和原创贡献上。

我们的初步尝试：LLM辅助的网络实验复现。 受近期大语言模型（LLM）创新的启发，我们探索利用LLM先进的自然语言理解能力来辅助研究人员解读现有论文并生成可执行代码，从而促进实验复现的可能性。在本文中，我们提出了LaseR，一个用于LSN实验的半自动化平台。LaseR巧妙地将LLM的能力与LEO卫星网络（LSN）仿真器相结合，以实现LSN实验的自动化。

具体来说，LaseR包含两个用于LEO实验复现的核心组件：

（i）一个LSN知识检索模块，该模块嵌入了一个检索增强生成（RAG）流程，将最新的、领域特定的知识注入到LLM中，从而显著减轻幻觉并提高生成的LSN实验代码的事实正确性

（ii）一个LSN仿真少样本生成器，该生成器规范了生成代码的格式，增强了可复现性和可维护性

此外，我们进行了三个案例研究，通过在近期LEO网络研究的三个代表性方向上复现实验来评估LaseR。我们的案例研究表明，LaseR可以复现出与原始论文一致的实验结果，同时将复现工作从成百上千行代码减少到仅需几十条提示和大约一百行代码。我们已经发布了LaseR的代码²。尽管尚处于早期阶段，我们希望LaseR能为未来网络实验可复现性的研究铺平道路。

Before presenting the detailed design of LaseR, we review related work on satellite network simulation and large language models.

LEO network simulation/emulation. Typically, conducting experiments directly on real-world LEO satellite networks is challenging. For example, conducting topology change or routing experiments in a real LSN (e.g., Starlink) is generally infeasible due to limited access and system constraints. Therefore, LSN simulation and emulation have become the predominant approaches for validating research in the area of LEO networking. Various tools and frameworks such as Hypatia [25], StarPerf [28], StarryNet [27], GaliLEO[15], 𝑥eoverse[24] and other simulators[24, 34, 36, 45, 49] have been developed by the network community for LSN experimentation. However, most of these existing tools require considerable domain-specific knowledge to use effectively. Researchers must be familiar with both satellite systems (e.g., constellation design, orbital parameters) and networking protocols, making the learning curve steep. Consequently, even with these simulation and emulation tools, faithfully reproducing results from existing studies still remains a time-consuming and labor-intensive task.

ML/AI for network automation. With the rapid development of machine learning (ML) and artificial intelligence (AI) technologies, the network community has begun exploring how ML/AI techniques can enhance network automation [21, 22, 32, 35]. Most existing efforts, however, have primarily focused on automating network configuration and adaptive parameter tuning. For example, Yen et al. [48] trained a graph neural network to uncover protocol bugs, while Chen et al. [9] built a reinforcement-learning agent that tunes SDN policies in near real time. More recently, NetLLM [46] fine-tunes a GPT-style backbone on multi-modal traces to predict viewport quality and schedule video chunks, demonstrating the feasibility of domain-adapted LLMs for networking tasks.

LLM-assisted experiment reproduction. The emergence of large language models (LLMs) has significantly transformed the way natural language is understood, processed, and generated. Recently, some efforts have begun to explore how general LLMs can be leveraged to improve the efficiency of experiment reproducibility [18, 26, 44, 47, 50]. These approaches typically rely on prompt engineering to interact with LLMs iteratively, guiding them to generate executable experimental codes. However, prompt-based workflows present two key limitations when applied to large-scale LSN experiments. First, traditional approaches typically rely on fragmented prompt engineering, where researchers manually compose code piece by piece through a series of minimal and narrowly scoped prompts. This piecemeal process is highly inefficient and requires substantial time and manual effort. Second, even when executable code is successfully generated, the simulation results often diverge significantly from those shown in the original paper. Considerable effort is still required to optimize and debug the generated codes in order to reproduce the expected outcomes accurately.

在介绍LaseR的详细设计之前，我们先回顾一下关于卫星网络仿真和大语言模型的相关工作。

LEO网络仿真/模拟。 通常，直接在真实的LEO卫星网络上进行实验是具有挑战性的。例如，由于访问权限和系统限制，在真实的LSN（如星链）中进行拓扑变化或路由实验通常是不可行的。因此，LSN仿真和模拟已成为验证LEO网络领域研究的主流方法。网络社区已经开发了各种工具和框架用于LSN实验，例如Hypatia [25], StarPerf [28], StarryNet [27], GaliLEO[15], 𝑥eoverse[24]以及其他仿真器[24, 34, 36, 45, 49]。然而，大多数现有工具都需要大量的领域特定知识才能有效使用。研究人员必须同时熟悉卫星系统（如星座设计、轨道参数）和网络协议，这使得学习曲线非常陡峭。因此，即使有了这些仿真和模拟工具，忠实地复现现有研究的结果仍然是一项耗时且耗力的任务。

用于网络自动化的ML/AI。 随着机器学习（ML）和人工智能（AI）技术的快速发展，网络社区已经开始探索如何利用ML/AI技术来增强网络自动化[21, 22, 32, 35]。然而，大多数现有工作主要集中在自动化网络配置和自适应参数调优上。例如，Yen等人[48]训练了一个图神经网络来发现协议漏洞，而Chen等人[9]构建了一个强化学习代理，能够近乎实时地调整SDN策略。最近，NetLLM [46]在一个多模态轨迹上微调了一个GPT风格的骨干网络，用于预测视口质量和调度视频块，展示了领域自适应LLM在网络任务中的可行性。

LLM辅助的实验复现。 大语言模型（LLM）的出现极大地改变了自然语言被理解、处理和生成的方式。最近，一些工作开始探索如何利用通用LLM来提高实验复现的效率[18, 26, 44, 47, 50]。这些方法通常依赖于提示工程与LLM进行迭代交互，引导它们生成可执行的实验代码。然而，当应用于大规模LSN实验时，基于提示的工作流存在两个关键限制。首先，传统方法通常依赖于零散的提示工程，研究人员通过一系列最小化且范围狭窄的提示来手动逐片构建代码。这种零敲碎打的过程效率极低，需要大量的时间和人工投入。其次，即使成功生成了可执行代码，其仿真结果也常常与原始论文中展示的结果有显著差异。为了准确复现预期的结果，仍然需要花费大量精力来优化和调试生成的代码。

The Design of LASER¶

In this section, we introduce the design details of LaseR, a novel experimentatiton tool that integrates: (i) a LLM fine-tuned with satellite network knowledge and (ii) LSN simulation tools to facilitate more efficient, semi-automated LSN experimentation.

3.1 System Overview¶

Basic idea and technical challenges. The basic idea behind LaseR is straightforward: given a paper to reproduce, a researcher first leverages an LLM to read and interpret the content, then uses the LLM to generate corresponding experimental code. This code is subsequently refined and debugged manually to reproduce the expected results. However, LaseR faces two key technical challenges. First, the functions and code snippets generated by general LLMs are primarily based on patterns learned during pretraining. As a result, they are prone to hallucinations, producing incorrect or misleading code that appears syntactically valid but is semantically flawed or non-executable. For instance, general LLMs like ChatGPT [5], DeepSeek [3], Deepmind [4], Qwen [6] and Claude 3 [2] often generate function calls that do not exist (thus unavailable). Second, even after extensive prompt engineering and multiple interaction rounds, the generated code may still yield results that differs significantly from those reported in the original paper. This limited reproducibility remains a major obstacle in automating LSN experiment replication.

LaseR architecture. Figure 1 plots the high-level architecture of LaseR. In particular, LaseR incorporates the following core components to mitigate the above limitations in LSN experimentation.

• LSN knowledge retrieval module (§3.2). To address the hallucination issues commonly observed in general-purpose LLMs, e.g., generating results that are irrelevant or inconsistent with user input due to the lack of sufficient domain-specific knowledge about LSN during pre-training, LaseR adopts a retrievalaugmented generation (RAG) [10, 16, 19, 23, 31, 37, 38] approach. By dynamically integrating external knowledge resources specifically related to LSN simulation experiments, LaseR improves LLM’s ability to produce more accurate and relevant responses.

• LSN simulation few-shot generator (§3.3). To bridge the gap between LLM and existing LSN simulators, LaseR incorporates a few-shot generator that enables the LLM to effectively learn the LSN simulator’s API specifications. This approach ensures that the LSN simulation code generated by the LLM strictly follow the simulator’s API format, and thus the generated experimental code can be executed directly upon existing LSN simulators.

Runtime workflow. As shown in Figure 1, LaseR eases the burden of code reproduction and a user can use LaseR to run LSN experiments as follows. First, the user writes the requirements of LSN experiment in natural language. The knowledge retrieval module queries an LSN knowledge base to supplement the LLM with domain-specific information, thereby enhancing the relevance of the generated output to the experiment requirement. Then, the fewshot generator produces a code template tailored to the specific LSN simulator, strictly following its programming interface. Leveraging the LLM and code template, LaseR generates the simulation code and performs checks to ensure syntactic correctness and compliance with the simulator’s API specifications. Finally, the user imports the generated code into the LSN simulation tool to run the experiment and obtain the results. If the results show large discrepancies, prompt fix is required. In our current implementation, we integrate LaseR with an open-source LSN simulator StarPerf [28].

基本思想与技术挑战。 LaseR背后的基本思想很简单：给定一篇需要复现的论文，研究人员首先利用LLM来阅读和解释其内容，然后使用LLM生成相应的实验代码。这些代码随后被手动优化和调试，以复现预期的结果。

然而，LaseR面临两个关键的技术挑战:

首先，通用LLM生成的函数和代码片段主要基于其在预训练期间学到的模式。因此，它们容易产生幻觉，生成看起来语法有效但语义错误或不可执行的不正确或误导性代码。例如，像ChatGPT [5], DeepSeek [3], Deepmind [4], Qwen [6]和Claude 3 [2]这样的通用LLM经常生成不存在（因此不可用）的函数调用

其次，即使经过大量的提示工程和多轮交互，生成的代码可能仍然产生与原始论文报告的结果有显著差异的结果。这种有限的可复现性仍然是自动化LSN实验复现的主要障碍

LaseR架构。 图1展示了LaseR的高层架构。

alt text

具体来说，LaseR包含以下核心组件，以缓解上述在LSN实验中的限制。

LSN知识检索模块 (§3.2) 为了解决通用LLM中常见的幻觉问题（例如，由于预训练期间缺乏足够的LSN领域特定知识而生成与用户输入无关或不一致的结果），LaseR采用了一种检索增强生成（RAG）[10, 16, 19, 23, 31, 37, 38]的方法。通过动态集成与LSN仿真实验特别相关的外部知识资源，LaseR提升了LLM产生更准确和相关响应的能力。
LSN仿真少样本生成器 (§3.3) 为了弥合LLM与现有LSN仿真器之间的差距，LaseR集成了一个少样本生成器，使LLM能够有效地学习LSN仿真器的API规范。这种方法确保LLM生成的LSN仿真代码严格遵循仿真器的API格式，从而使生成的实验代码可以直接在现有的LSN仿真器上执行。

运行时工作流 如图1所示，LaseR减轻了代码复现的负担，用户可以按以下步骤使用LaseR运行LSN实验。

首先，用户用自然语言编写LSN实验的需求。
知识检索模块查询一个LSN知识库，为LLM补充领域特定信息，从而增强生成输出与实验需求的关联性。
然后，少样本生成器根据特定的LSN仿真器生成一个代码模板，严格遵循其编程接口。
利用LLM和代码模板，LaseR生成仿真代码，并进行检查以确保语法正确性及与仿真器API规范的符合性。
最后，用户将生成的代码导入LSN仿真工具中运行实验并获得结果。如果结果显示较大差异，则需要进行提示修复。

在我们当前的实现中，我们将LaseR与一个开源的LSN仿真器StarPerf [28]集成。

3.2 LSN Knowledge Retrieval Module¶

Typically, LLM models are trained with general knowledge to support a broad range of use cases. However, they often lack the domainspecific knowledge required for specialized tasks such as generating code for LSN experimentation. Retrieval-augmented generation (RAG) offers an effective solution by enriching the LLM’s input with relevant external knowledge sources. This approach enables the model to access up-to-date and task-specific information without the need for retraining, thereby significantly enhancing the quality and relevance of its generated outputs. In particular, Figure 2 plots the design details of LaseR’s knowledge retrieval module, which performs the following steps to improve the code quality of LLM outputs for LSN experiments.

LSN knowledge pre-processing. Given that the majority of current research on LSN is published in the form of academic papers available on the internet, this module needs to extract and store the content of the collected documents while removing irrelevant or redundant information. Here, LaseR employs OCR technology [20] to extract the text and then assess the relevance of the paper to LSN to determine whether further processing is needed.

Embeddings map. This step converts text into embeddings, which can then be stored and used for similarity comparison based on Cosine similarity [40] or Euclidean distance [8]. LaseR uses a unified model to ensure mapping consistency, whether for storing relevant textual information or for handling user interactions in natural language. The choice of this component is flexible, as it essentially serves to convert diverse text into embedding vectors. Coarse-grained LSN knowledge retrieval. LaseR adopts a coarse-grained vector database to store the title of each paper along with its corresponding vector (mapped using the Embeddings Map). In particular, the top𝑚 most similar results are quickly retrieved. This initial step mainly focuses on narrowing the search scope, reducing the computational complexity of the next stage of retrieval.

Fine-grained LSN knowledge retrieval. In addition, LaseR leverages a fine-grained vector database to store each paragraph or sentence within a paper along with its corresponding semantic vector. Further retrieval is then performed on the 𝑚 papers obtained in the previous stage, where 𝑛 of the most relevant paragraphs or sentences are automatically selected for each paper. At this point, for the origin query, 𝑚 × 𝑛 paragraphs and sentences have been identified as background knowledge in LSN domain.

Synthesizer. After the previous retrieval process, the origin query and the 𝑚 ×𝑛 paragraphs or sentences are used as input. Although sufficient background knowledge is now available, the sheer volume of content, lack of logical structure, and potential conflicting viewpoints can lead to numerous errors when generating code. Therefore, the input for the next module should be precise and of moderate length. The main function of the synthesizer is to take the origin query and background knowledge as input and produce a refined query that is supplemented with knowledge, ensuring that it is conflict-free and of appropriate length. The synthesizer finally utilizes a pre-trained model for generating summaries and outputs the experiment demand with LSN knowledge.

通常，LLM模型是用通用知识训练的，以支持广泛的用例。然而，它们常常缺乏执行专业任务（如为LSN实验生成代码）所需的领域特定知识。检索增强生成（RAG）通过用相关的外部知识源丰富LLM的输入，提供了一个有效的解决方案。这种方法使模型能够访问最新的、特定于任务的信息，而无需重新训练，从而显著提高了其生成输出的质量和相关性。具体来说，图2展示了LaseR知识检索模块的设计细节，该模块执行以下步骤以提高LLM为LSN实验输出的代码质量。

alt text

LSN知识预处理。 鉴于当前关于LSN的大多数研究都以学术论文的形式在互联网上发布，该模块需要提取和存储收集到的文档内容，同时去除不相关或冗余的信息。在这里，LaseR采用OCR技术[20]提取文本，然后评估论文与LSN的相关性，以确定是否需要进一步处理。

嵌入映射。 这一步将文本转换为嵌入向量，然后可以存储这些向量并用于基于余弦相似度[40]或欧氏距离[8]的相似性比较。LaseR使用统一的模型来确保映射的一致性，无论是在存储相关文本信息时，还是在处理自然语言的用户交互时。该组件的选择是灵活的，因为它本质上是将多样化的文本转换为嵌入向量。

粗粒度LSN知识检索。 LaseR采用一个粗粒度的向量数据库来存储每篇论文的标题及其对应的向量（使用嵌入映射进行映射）。特别是，它能快速检索出最相似的前𝑚个结果。这一初始步骤主要侧重于缩小搜索范围，降低下一阶段检索的计算复杂性。

细粒度LSN知识检索。 此外，LaseR利用一个细粒度的向量数据库来存储一篇论文中的每个段落或句子及其对应的语义向量。然后对前一阶段获得的𝑚篇论文进行进一步检索，为每篇论文自动选择𝑛个最相关的段落或句子。此时，针对原始查询，已经确定了𝑚 × 𝑛个段落和句子作为LSN领域的背景知识。

综合器。 经过前述的检索过程，原始查询和𝑚 × 𝑛个段落或句子被用作输入。尽管现在有了充足的背景知识，但大量的内容、缺乏逻辑结构以及可能存在的冲突观点，在生成代码时可能导致许多错误。因此，下一个模块的输入应该是精确且长度适中的。综合器的主要功能是接收原始查询和背景知识作为输入，并生成一个经过知识补充的、确保无冲突且长度合适的精炼查询。综合器最终利用一个预训练模型来生成摘要，并输出带有LSN知识的实验需求。

3.3 LSN Simulator Few-Shot Generator¶

We combine LaseR with StarPerf [28], a recent LSN simulator and thus leverage LaseR to generate executable code that can run upon the StarPerf tool to complete LSN experiments. Specifically, to effectively integrate LLM with the LSN simulator, LaseR adopts a unified abstraction layer to translate the high-level intents into simulator-specific code, as well as prompt-engineered LLM support using the simulator’s examples and documents. Further, LaseR uses structured prompts and context tracking to manage user intents. Finally, LaseR includes a human-in-the-loop process to facilitate oversight and explainability. This is why we call the current LaseR a “semi-automated” experimental tool.

Template and few-shot learning. LSN simulators have their own APIs, configuration formats, and scripting paradigms. Automatically generated code must be syntactically correct, semantically valid, and compatible with the simulation environment. To achieve this goal, LaseR adopts two key techniques. First, it builds on the StarPerf API to generate a set of reusable “templates” for LSN experimentation. Similar to code templates in traditional C++ or Android development, these templates encapsulate the core logic required to conduct simulations (e.g., configuring satellite constellation parameters), while still providing flexible sections for user customization. Second, LaseR leverages few-shot learning techniques [14, 41, 43] from the machine learning community, enabling the LLM to learn from these templates and correctly fill them following the API usage of the simulator. This approach can ensure that the generated code runs correctly within the simulation environment.

Example. Figure 3 plots a concrete example illustrating how a user could leverage LaseR’s template to generate experiment code. A user first provides a description of experimental requirements. LaseR then selects an appropriate code template and inputs it into the LLM. Finally the LLM automatically fills in the appropriate simulation template and generates executable simulation code.

我们将LaseR与最近的一个LSN仿真器StarPerf [28]相结合，从而利用LaseR生成可在StarPerf工具上运行的可执行代码来完成LSN实验。具体来说，为了有效地将LLM与LSN仿真器集成，LaseR采用了一个统一的抽象层，将高层意图转换为特定于仿真器的代码，并利用仿真器的示例和文档进行提示工程化的LLM支持。此外，LaseR使用结构化提示和上下文跟踪来管理用户意图。最后，LaseR包含一个人在回路中（human-in-the-loop）的过程，以促进监督和可解释性。这就是为什么我们称当前的LaseR为一个“半自动化”的实验工具。

模板与少样本学习。 LSN仿真器有其自身的API、配置格式和脚本范式。自动生成的代码必须语法正确、语义有效，并与仿真环境兼容。为实现这一目标，LaseR采用了两种关键技术。首先，它基于StarPerf API为LSN实验生成了一套可复用的“模板”。类似于传统C++或Android开发中的代码模板，这些模板封装了进行仿真所需的核心逻辑（例如，配置卫星星座参数），同时仍为用户定制提供了灵活的部分。其次，LaseR利用了机器学习社区的少样本学习技术[14, 41, 43]，使LLM能够从这些模板中学习，并正确地按照仿真器的API用法来填充它们。这种方法可以确保生成的代码在仿真环境中正确运行。

示例。 图3展示了一个具体的示例，说明用户如何利用LaseR的模板来生成实验代码。

用户首先提供实验需求的描述
然后LaseR选择一个合适的代码模板并将其输入到LLM中
最后，LLM自动填充相应的仿真模板并生成可执行的仿真代码

alt text

How LLM Saved Me from Struggling with Experiment Reproduction: LEO Networking as A Case Study¶

Introduction¶

Related Work¶

The Design of LASER¶

3.1 System Overview¶

3.2 LSN Knowledge Retrieval Module¶

3.3 LSN Simulator Few-Shot Generator¶