In this work, we introduce TRUSTLLM, a comprehensive study of trustworthiness in LLM, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges.
In this section, we discuss the limitations of our current work and envision several future directions to be explored in this field.
Limitation and future plans on LLMs.
In the forthcoming research, we see seven distinct directions for us and other researchers to further explore the trustworthiness of LLMs.
- Expansion of prompt templates. We aim to increase the diversity of prompt templates, introducing a more comprehensive range for any given task. This expansion seeks to mitigate errors and randomness arising from prompt sensitivity.
- Inclusion of diverse datasets. Our approach will integrate a broader selection of existing datasets or the construction of new datasets, ensuring a comprehensive representation of data from various sources and types.
- Enrichment of tasks and subtasks. We will expand the various tasks and subtasks within our current framework. Acknowledging that different tasks embody varied perspectives, which are crucial when evaluating LLM performance, we will assess their capabilities across multiple dimensions—mainly focusing on their proficiency in processing and interpreting information in various contexts.
- Integration of more LLMs. Given the rapid advancements in the field of LLMs, we plan to continually integrate the latest models into our work, keeping the benchmark up-to-date and relevant.
- Domain-Specific trustworthiness evaluation. Moving beyond the general domain, we will also emphasize the importance of domain-specific contexts such as education [700, 701], healthcare [702, 647], finance [703, 704], cybersecurity [705, 706, 707] or other scientific areas . Our goal is to rigorously assess the trustworthiness of LLMs in specialized fields, exploring reliability in sector-specific applications.
- Expand the range of sections. TRUSTLLM is designed to evolve dynamically, adjusting to shifts in the field of LLMs. Ongoing explorations will lead to additional sections, refining the taxonomy to encompass areas like consciousness [604, 709], and beyond.
- Ecosystem & platform. We are actively working on establishing a trustworthy LLM ecosystem and platform based on TRUST This includes expansion efforts, relevant software, and development tools. For instance, a real-time updated leaderboard is in progress to facilitate the ongoing evaluation of LLM trustworthiness, supported by toolkits and documentation.
Beyond LLM: trustworthy large multimodal models and agents.
The remarkable achievements of LLM in the natural language field have spurred a surge in research exploration to develop similar models for other modalities, such as vision-and-language.
This has given rise to multimodal foundation models capable of serving as general-purpose assistants that can directly zero-shot transfer to perform well on a wide range of real-world tasks .
Though this paper focuses on the trustworthiness of LLM, the ideas and leanings can be generalized to multimodal foundation models.
Furthermore, the potential for developing similar models extends into various Internet of Things (IoT) applications (e.g., smart homes, smart grids, and smart agriculture) , time series , mobile computing [713, 714], and mobile edge networks .
The generalizability of TRUSTLLM to multimodal foundation models is promising, yet it necessitates dedicated efforts to tackle unique challenges inherent to each specific application scenario.
In this context, we discuss several future research directions for building trustworthy multimodal models, particularly those tailored to diverse and specialized environments.
- Modality gap and alignment. In addition to inheriting the trustworthy issues from the single language modality, it introduces unique challenges as multiple modalities are involved in the large multimodal models (LMM). For example, one key component of existing LMMs typically requires cross-modality data/feature alignment – thinking of various scenarios in which machines can be instructed to represent basic concepts, such as dogs and cats, through visual and linguistic channels. Misalignment between modalities may lead to failure modes in which LMM incorrectly identifies concepts.
- Data creation to follow human intents. Instruction tuning is a potent method for shaping how an AI assistant interacts with humans. For instance, when faced with identical offensive inquiries, the assistant may employ diverse strategies to build trust while completing the tasks. Within the multimodal domain, visual instruction tuning  can be crucial in aligning models with various considerations, encompassing safety, ethics, and moderation. At its core of visual instruction tuning, the data-centric paradigm may create a pipeline to produce multimodal instruction-following data that facilitates effective alignment between user intents and model response, fostering enhanced AI performance.
- Model capabilities, architectures and knowledge. Similar to LLM, one notorious issue of LMM is model hallucination, resulting in less trustworthy systems. However, the causes of hallucination can be broader for LMM. First, as users anticipate more advanced features from LMM, they may request tasks the model might not be fully equipped to handle. For instance, when users ask proprietary GPT-4V  or open-source LLaVA  to ground/associate image regions with descriptions in their responses, these models may attempt to provide answers but end up generating inaccurate or imaginary information. Secondly, since efficient model architectures for handling high-resolution images are yet to be fully explored, existing open-source LMMs down-sample user input images to 224 or 336 pixels per dimension. This low-resolution image may result in hallucination, as the finer details of images are not adequately presented to the models. Thirdly, a knowledge gap exists between general and specialized vertical domains in pre-trained models. For example, consider the multimodal healthcare assistant LLaVA-Med , whose pre-trained image encoder and language models originate from general domains. Consequently, LLaVA-Med’s performance in the biomedical field may fall short of expectations compared with LLaVA’s performance in the general domain.
- Evaluation of trustworthiness. While LMMs have shown excellent visual recognition and reasoning capabilities in an open-set manner with free-form text across many scenarios, there are also some trustworthiness-related issues on LMMs [719, 720, 721, 722, 723, 724, 725, 726, 727, 728]. Several benchmarks have been developed to evaluate various aspects of LMMs, including hallucination [729, 730] and adversarial robustness . Extending the LLM benchmarking idea presented in this paper to the multimodal space can be one natural next step.
- Tool usage in multimodal agents. To enhance model capabilities, a viable strategy involves utilizing existing functional APIs as external tools, invoking them as required. A standard method for employing these tools capitalizes on the in-context-learning capabilities of LLMs to create toolchains [732, 733]. Although this approach offers the benefit of low development costs due to its training-free nature, it may prove inefficient in resolving tool conflicts and inactivation issues, especially when dealing with a large set of tools, ultimately leading to suboptimal agent performance. To address this, learning to use tools via instruction tuning is considered in LLaVA-Plus . Employing external tools also raises new trustworthiness concerns, such as identifying and rectifying errors in tool usage to prevent error propagation in multi-turn interactions and implementing safeguards to avoid undesirable behaviors when third-party users onboard new tools .
- Trustworthiness trade-offs for IoT edge intelligence. While leveraging LMMs in various IoT domains offers significant potential for analyzing multifaceted IoT data, understanding context, and making informed decisions , IoT application scenarios pose additional challenges due to heterogeneous and resource-constrained devices and decentralized operation environments. Thus, machine learning systems are required to be redesigned or specifically optimized to address these IoT-centric demands (e.g., limited computational resources, real-time responses, and communication bottlenecks). These necessary model optimizations are typically outsourced or handled by third-party services, which will unfortunately introduce new attack surfaces such as backdoor attack. Furthermore, the issue of trustworthiness in IoT settings varies with the specific task at hand, necessitating tailored designs for LMM models. For example, irregular and unreliable data transmission via wireless networks often leads to incomplete datasets, adversely impacting the inferential accuracy and overall predictive capabilities of the system. Also, various wireless devices have been used for IoT applications such as human activity recognition
(HAR), which usually generate imbalanced wireless datasets in different domains (e.g., different indoor environments) [735, 736].
Imbalanced data will greatly influence the HAR classification performance.
In applications like smart grids, it is crucial for models to withstand data noise and adapt to dynamic grid conditions, such as variable energy demands or the integration of renewable energy sources .
In public safety applications , the model must reliably perform and provide real-time responses to natural disasters.
Therefore, it is essential to extend the research on model trustworthiness to tackle the diverse and specific trustworthiness concerns present in IoT edge intelligence applications.
Cryptographic Techniques for Enhancing LLM Trustworthiness. Modern cryptographic techniques are able to provide a trusted computing platform for various tasks and are thus capable of enhancing various security-critical tasks. In particular, secure computation and zero-knowledge proof protocols allow one or more parties to evaluate and reveal any controlled information.
These tools can potentially provide highly resilient solutions to address many of the principles mentioned in this paper (see [286, 285] as some recent examples).
However, huge challenges still exist before any cryptography-based solutions can be practical.
- Achieving end-to-end trustworthiness of LLMs. Even using the most advanced cryptography tools, without considering efficiency, they cannot address all security issues that appear in LLM due to the inherent connection between LLM models and reality. For example, using zero-knowledge proofs can ensure that LLMs are trained properly but cannot ensure the truthfulness of the training data or testify if it is (un)biased. Therefore, obtaining the end-to-end trustworthiness of LLMs requires not only cryptography tools but also rigorous definitions and solutions to model the human factors in the data and LLM pipeline.
- Close-to-practical efficiency. State-of-the-art cryptographic solutions that are powerful enough to support complex computations needed in LLMs are orders of magnitude slower than cleartext computation. Although the efficiency is still being improved, the strong security/privacy level of these protocols poses a limit on their ultimate efficiency. On the other hand, cryptographic tools may provide unnecessarily high guarantees in many applications when it comes to certain trustworthy dimensions, e.g., fairness. We believe that to achieve practically usable cryptography-based LLM systems, deep integration and co-design between the two areas are required, e.g., to identify the critical parts in the LLM architecture that require cryptographic protection or to align the security guarantees of cryptographic protocols to the requirements of LLM applications.
- Model and data federation in LLMs. The collaborative nature of cryptographic protocols provides a tool to allow a secure federation of LLMs and the data needed by LLMs. This includes data-to-data collaborative training of LLM models, model-to-model collaborative text/object generation from multiple confidential models, as well as private model adaptation/fine-tuning where model owners and adapting data holders are not trusting each other.