ai-alignment · GitHub Topics

emcie-co / parlant

#大语言模型#LLM agents built for control. Designed for real-world use. Deployed in minutes.

ai-agents genai 大语言模型 customer-service customer-success gemini llama3 openai Python ai-alignment

Python 3.41 k

2 天前

agencyenterprise / PromptInject

#计算机科学#PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...

ai-safety language-models ml-safety agi ai-alignment adversarial-attacks gpt-3 large-language-models 机器学习 chain-of-thought prompt-engineering

Python 399

1 年前

MinghuiChen43 / awesome-trustworthy-deep-learning

#计算机科学#A curated list of trustworthy deep learning papers. Daily updating...

adversarial-machine-learning 安全隐私深度学习 poisoning fairness backdoor ownership robustness interpretable-deep-learning causality hallucinations uncertainty watermarking ai-alignment

371

8 天前

Giskard-AI / awesome-ai-safety

#自然语言处理#📚 A curated list of papers & technical articles on AI Quality & Safety

人工智能 ai-alignment ai-safety 大语言模型 llmops 机器学习 mlops 自然语言处理 ml-testing model-validation 机器视觉 Awesome Lists ml-safety robustness

188

4 个月前

tomekkorbak / pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

ai-alignment ai-safety gpt language-models pretraining reinforcement-learning rlhf

Python 182

1 年前

lets-make-safe-ai / make-safe-ai

How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚

agi 人工智能 ai-safety artificial-general-intelligence ai-alignment

169

2 年前

tsinghua-fib-lab / AAAI2025_MIA-Tuner

[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".

ai-alignment large-language-models

Python 145

4 个月前

EzgiKorkmaz / adversarial-reinforcement-learning

Reading list for adversarial perspective and robustness in deep reinforcement learning.

robust-machine-learning deep-reinforcement-learning ai-safety ai-alignment responsible-ai ai-security llm-security

119

6 天前

AthenaCore / AwesomeResponsibleAI

#Awesome#A curated list of awesome academic research, books, code of ethics, data sets, institutes, maturity models, newsletters, principles, podcasts, reports, tools, regulations and standards related to Resp...

responsible-ai xai fairness-ai Awesome Lists explainable-ai interpretable-ai 人工智能 ai-alignment ai-safety

14 天前

dit7ya / awesome-ai-alignment

#Awesome#A curated list of awesome resources for Artificial Intelligence Alignment research

Awesome Lists ai-safety ai-alignment

2 年前

RLHFlow / Directional-Preference-Alignment

Directional Preference Alignment

rlhf ai-alignment large-language-models

10 个月前

wesg52 / sparse-probing-paper

Sparse probing paper full code.

ai-alignment ai-safety interpretability

Jupyter Notebook 58

2 年前

lzzcd001 / nabla-gfn

Official Implementation of Nabla-GFlowNet (ICLR 2025)

ai-alignment diffusion-models generative-model finetuning

Python 25

3 个月前

riceissa / aiwatch

Website to track people, organizations, and products (tools, websites, etc.) in AI safety

ai-safety PHP 数据库 dataset ai-alignment MySQL

HTML 23

7 天前

UCSC-VLAA / Sight-Beyond-Text

#大语言模型#[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

llama2 llava 大语言模型 mllm vicuna vision-language ai-alignment alignment vlm

Python 20

2 年前

liondw / Signal-Alignment

An initiative to create concise and widely shareable educational resources, infographics, and animated explainers on the latest contributions to the community AI alignment effort. Boosting the signal ...

人工智能 ai-alignment design 教学

2 年前

phelps-sg / llm-cooperation

#大语言模型#Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 202...

economics gpt-3 大语言模型 ai-safety ai-alignment gpt-4

Python 12

8 个月前