AutoIOT: LLM-Driven Automated Natural Language Programming for AIoT Applications

1The Hong Kong Polytechnic University, 2University of Cambridge, 3 Hong Kong University of Science and Technology
The architecture of AutoIOT

Abstract

The advent of Large Language Models (LLMs) has profoundly transformed our lives, revolutionizing interactions with AI and lowering the barrier to AI usage. While LLMs are primarily designed for natural language interaction, the extensive embedded knowledge empowers them to comprehend digital sensor data. This capability enables LLMs to engage with the physical world through IoT sensors and actuators, performing a myriad of AIoT tasks. Consequently, this evolution triggers a paradigm shift in conventional AIoT application development, democratizing its accessibility to all by facilitating the design and development of AIoT applications via natural language. However, some limitations need to be addressed to unlock the full potential of LLMs in AIoT application development. First, existing solutions often require transferring raw sensor data to LLM servers, which raises privacy concerns, incurs high query fees, and is limited by token size. Moreover, the reasoning processes of LLMs are opaque to users, making it difficult to verify the robustness and correctness of inference results. This paper introduces AutoIOT, an LLM-based automated program generator for AIoT applications. AutoIOT enables users to specify their requirements using natural language (input) and automatically synthesizes interpretable programs with documentation (output). AutoIOT automates the iterative optimization to enhance the quality of generated code with minimum user involvement. AutoIOT not only makes the execution of AIoT tasks more explainable but also mitigates privacy concerns and reduces token costs with local execution of synthesized programs. Extensive experiments and user studies demonstrate AutoIOT's remarkable capability in program synthesis for various AIoT tasks. The synthesized programs can match and even outperform some representative baselines.

Introduction

Recent advances in large language models (LLMs) (e.g., GPT-4) fundamentally changed the way we interact with AI. While initially designed to understand natural languages, recent pioneering works have demonstrated considerable proficiency of LLMs in exploiting embedded world knowledge by interpreting IoT sensor data to perform various AIoT tasks. Recent works term such an endeavor – Penetrative AI. Fig. 1 illustrates how LLMs can be tasked to comprehend and even interact with the physical world through integration with IoT sensors and actuators.

Figure 1: Illustration of how LLMs can sense and interact with the physical world in AIoT applications.

However, current LLMs on AIoT tasks fall short in supporting AIoT applications:

  • The trustworthiness of the inference results is compromised since the inference process is performed inside a "black box" and opaque to users. Thus, the robustness of the applications or the correctness of the inference results are hard to verify.
  • Transmitting the raw or intermediate sensor data from user devices to LLM servers raises privacy concerns, incurs prohibitive query fees, and increases response latency.
  • Sensor data typically exhibits extensive length and high dimensionality, making remote processing at LLM servers infeasible due to token limits.
Ideally, the integration of LLMs with AIoT applications should be trustworthy, privacy-preserving, and communication-efficient.

Can we leverage LLMs to synthesize programs to automatically fulfill AIoT application requirements?

This approach can

  • Enhance the explainability and trustworthiness of the AIoT applications as the synthesized programs can be examined and interpreted by developers.
  • Mitigate privacy concerns, and reduce the communication cost since the programs can be executed locally on user devices without offloading raw sensor data.
  • Efficiently process high-dimensional continuous sensor data without being limited by the token size or bounded by the round trip time over the network.
To this end, we propose AutoIOT, a user-friendly natural language programming system based on LLMs. AutoIOT automatically identifies and retrieves the necessary domain knowledge over the Internet, intelligently synthesizes programs, and evolves the programs iteratively given sample inputs and ground truth. Surprisingly, we found that the synthesized programs can sometimes outperform some representative baselines and sample programs of recent academic papers.

Motivation

The latest LLMs have demonstrated extraordinary proficiency in generating code snippets. For example, Mercury leverages LLMs to generate code for well-defined programming tasks. Is it feasible to instruct LLMs to synthesize programs that can tackle AIoT tasks? Our preliminary results show that it is possible yet extremely challenging for LLMs to synthesize functionally correct programs for AIoT tasks. Taking heartbeat detection as an example, as depicted in Fig. 2. When we instruct the LLM to generate a program to process raw ECG waveform and detect heartbeats, the LLM can only generate a few null functions without concrete implementation or import some nonexistent packages.

Figure 2: An example of direct code generation.

To fill this gap, we develop an LLM-driven automated natural language programming system named AutoIOT to three key modules:

  • Background knowledge retrieval module that automatically collects domain knowledge from the Internet for in-context learning.
  • Automated program synthesis module that emulates the program development lifecycle via CoT prompting. This module decomposes an AIoT task into several subtasks and generates corresponding functional code snippets.
  • Code improvement module that executes the synthesized program and feeds the compiler and interpreter feedback to the LLM, facilitating iterative code correction and improvement

System Overview

Fig. 3 illustrates the architecture of AutoIOT, consisting of three main modules: background knowledge retrieval, automated program synthesis, and code improvement.

Users can specify their requirements on AIoT applications in natural language (①). Then, the background knowledge retrieval module identifies a set of relevant terminologies (②) and searches over the Internet (③). With the retrieved domain-specific knowledge, the automated program synthesis module instructs the agent to draft an algorithm outline (④). The agent is then requested to elaborate on each step of the algorithm and produce a detailed design (⑤). Such a process decomposes a complex AIoT task into several manageable subtasks. Then, the agent is instructed to generate a code segment for each subtask (⑥). Afterward, the agent is requested to integrate the codes for subtasks and synthesize the final program (⑦). Next, the code self-improvement module executes the synthesized program and feeds the compiler and interpreter output back to the agent. The agent iteratively corrects syntax and semantics errors (⑧). With the obtained output from the synthesized program, AutoIOT explicitly instructs the agent to explore more advanced algorithms using the web search tool, aiming to optimize the program and improve the performance of inference tasks (⑨). After several iterations (④-⑨), AutoIOT will present the final program with detailed documentation (⑩). In addition, AutoIOT provides an interface for users to offer specific algorithms or instructions for code improvement.

Figure 3: System overview.

Evaluation

We evaluate the overall performance of AutoIOT on four representative AIoT applications.

Figure 4: Figure 10: The overall performance of the four IoT applications. In (a), A. for AutoIOT, H. for Hamiltion, C. for Christov, E. for Engzee, P. for Pan-Tompkins, and S. for SWT. In (b), N. for NN, 1D for 1D-CNN, Bi. for BiLSTM, C. for Conv-LSTM, L. for LSTM-RNN, and 𝒏 for ResNet-𝒏. In (c) & (d), A.1, A.2, and A.3 for three different AutoIOT - generated programs; B. for the baseline in the multimodal HAR application.

1. Heartbeat Detection Demo

Fig. 4(a) shows the heartbeat detection accuracy with MAE of AutoIOT (denoted as A.) and baselines. First of all, AutoIOT can synthesize a program automatically to achieve comparable performance with baselines in the heartbeat detection task. More surprisingly, the automatically synthesized program can even beat some of the baselines! For example, the synthesized program achieves higher detection accuracy than Pan-Tompkins (P.) and Engzee (E.). Moreover, it yields a lower error rate than Christov (C.) and Pan-Tompkins (P.). To investigate the reasons behind this, we examine and analyze the synthesized program.

2. IMU-based & mmWave-based Human Activity Recognition (HAR) Demo 1 Demo 2

Fig. 4(b) shows the classification accuracy in two HAR tasks. We observe that AutoIOT outperforms NN and 1DCNN while underperforms BiLSTM, Conv-LSTM and LSTMRNN. The main reasons are twofold: (1) HAR tasks require both signal processing and machine learning algorithms, increasing the programming complexity to some extent; (2) Training neural networks requires fine-tuning of a vast array of hyper-parameters (e.g., network architecture configurations, epoch number, learning rate, optimizer, and loss function). This significantly amplifies the instability of the generated code and calls for careful fine-tuning to achieve the best performance in practice. As a result, AutoIOT surpasses those baselines adopting simple model architectures (NN and one-dimensional CNN) but falls short against baselines using sophisticated architectures (BiLSTM and Conv-LSTM) with highly optimized hyper-parameters. Although during code improvement, some synthesized programs define a set of configurations and adopt a searching strategy to obtain optimal hyper-parameters, the performance still remains slightly lower than some baselines. This is because the determination of the optimal configurations for machine learning models is typically a trial-and-error process, requiring substantial human effort. Fortunately, we observe that if the user provides a potential search space in advance, the LLM can design a search algorithm to try different hyper-parameter configurations and select the one with the best performance.

3. Multimodal HAR Demo

For multimodal HAR, the input instruction (A1) includes the basic information of the task, i.e., the task target, the dataset specifications, and the output format. Based on that, we create two additional variations: one with a GPUmemory-constrained requirement (A2) and another with a high accuracy requirement (A3). We then feed the instructions into AutoIOT and measure the accuracy and inference time of the synthesized programs, with results shown in Fig. 4(c). By analyzing the three synthesized different programs, we observe that: (1) All the synthesized programs adopt a similar workflow as the baseline system, i.e., they first construct three encoders to extract effective features from the three modalities, then concatenate these features and feed them into a classifier for activity recognition. This implies that benefiting from our CoT-based problem-solving paradigm, AutoIOT recognizes the workflow and architecture as effective and standard for handling multimodal data-related tasks, which is consistent with most of the existing methods. (2) AutoIOT can adjust the generated code to fulfill different requirements. The second program consumes less memory than others due to the resource constraint requirement, resulting in lower accuracy but reduced inference time (Fig. 4(d)). On the other hand, the third program adopts a more complex and larger model architecture, requiring more GPU memory and incurring a longer inference time. Such differences validate the capabilities of AutoIOT in accurately understanding and processing natural language-based user requirements. These observations further demonstrate the effectiveness of AutoIOT in ensuring the correctness of user requirement understanding and the generated code, benefiting from our automatic self-improvement component.

4. User Study

To investigate the utilities of AutoIOT, we conduct a user study (N=20) by inviting 5 expert and 15 non-expert users. The expert users are PhD students and professors with work or research experience in the IoT field and have developed many IoT applications. We select human activity recognition using RFID data (XRF55 dataset [ 73]) as the IoT application, where a 1D Conv-based ResNet18 is the baseline.

Figure 5: User study (Objective Evaluation).
  • Objective Evaluation We first repeatedly measure the average task accuracy (classification accuracy) after executing the synthesized programs, with the results shown in Fig. 5(a).
  • Subjective Evaluation We ask the users to execute the synthesized programs and rate AutoIOT based on four subjective metrics:
    • System Utility (SU) measures the user's overall satisfaction with AutoIOT's performance.
    • Requirement Coverage (RC) evaluates how well the user requirements are fulfilled by AutoIOT.
    • Code & Documentation Readability (CDR) measures the clarity and structure of the code and documentation.
    • Generation Efficiency (GE) accesses how acceptable the waiting time is for synthesizing the final program.
    All the above metrics are rated by the users on a scale from 1 (not at all) to 6 (more than expected). The results are shown in shown in Fig. 6.
Figure 6: User study (Subjective Evaluation).

Conclusion

We propose AutoIOT, an LLM-driven automated natural language programming system for AIoT applications. Our system features three novel technical modules: background knowledge retrieval, automated program synthesis, and code improvement, transforming natural language descriptions into executable programs. Our experiments demonstrate the competitive performance of AutoIOT in synthesizing programs for a variety of AIoT applications, with comparable performance in challenging AIoT tasks and sometimes outperforming some representative baselines. This showcases the strong potential of exploiting the embedded common knowledge of LLMs to evolve AIoT application development.

For more details, please refer to our Paper.

The code for implementing AutoIOT is available at here.

BibTeX

@inproceedings{shen2025autoiot,
  title={AutoIOT: LLM-Driven Automated Natural Language Programming for AIoT Applications},
  author={Shen, Leming and Yang, Qiang and Zheng, Yuanqing and Li, Mo},
  booktitle={Proceedings of the 31st Annual International Conference on Mobile Computing and Networking},
  pages={1--15},
  year={2025}
}