GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

ai-alignment

Website
Wikipedia
emcie-co/parlant
https://static.github-zh.com/github_avatars/emcie-co?size=40
emcie-co / parlant

#大语言模型#Parlant is the open-source conversation modeling engine for building better, deliberate Agentic UX. It gives you the power of LLMs without the unpredictability.

ai-agentsgenai大语言模型customer-servicecustomer-successgeminillama3openaiPythonai-alignment
Python 3.13 k
3 天前
https://static.github-zh.com/github_avatars/agencyenterprise?size=40
agencyenterprise / PromptInject

#计算机科学#PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...

ai-safetylanguage-modelsml-safetyagiai-alignmentadversarial-attacksgpt-3large-language-models机器学习chain-of-thoughtprompt-engineering
Python 379
1 年前
https://static.github-zh.com/github_avatars/MinghuiChen43?size=40
MinghuiChen43 / awesome-trustworthy-deep-learning

#计算机科学#A curated list of trustworthy deep learning papers. Daily updating...

adversarial-machine-learning安全隐私深度学习poisoningfairnessbackdoorownershiprobustnessinterpretable-deep-learningcausalityhallucinationsuncertaintywatermarkingai-alignment
369
5 天前
https://static.github-zh.com/github_avatars/Giskard-AI?size=40
Giskard-AI / awesome-ai-safety

#自然语言处理#📚 A curated list of papers & technical articles on AI Quality & Safety

人工智能ai-alignmentai-safety大语言模型llmops机器学习mlops自然语言处理ml-testingmodel-validation机器视觉Awesome Listsml-safetyrobustness
183
2 个月前
https://static.github-zh.com/github_avatars/tomekkorbak?size=40
tomekkorbak / pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

ai-alignmentai-safetygptlanguage-modelspretrainingreinforcement-learningrlhf
Python 182
1 年前
https://static.github-zh.com/github_avatars/lets-make-safe-ai?size=40
lets-make-safe-ai / make-safe-ai

How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚

agi人工智能ai-safetyartificial-general-intelligenceai-alignment
169
2 年前
https://static.github-zh.com/github_avatars/tsinghua-fib-lab?size=40
tsinghua-fib-lab / AAAI2025_MIA-Tuner

[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".

ai-alignmentlarge-language-models
Python 142
3 个月前
https://static.github-zh.com/github_avatars/EzgiKorkmaz?size=40
EzgiKorkmaz / adversarial-reinforcement-learning

Reading list for adversarial perspective and robustness in deep reinforcement learning.

robust-machine-learningdeep-reinforcement-learningai-safetyai-alignmentresponsible-aiai-securityllm-security
117
2 个月前
https://static.github-zh.com/github_avatars/AthenaCore?size=40
AthenaCore / AwesomeResponsibleAI

#Awesome#A curated list of awesome academic research, books, code of ethics, data sets, institutes, maturity models, newsletters, principles, podcasts, reports, tools, regulations and standards related to Resp...

responsible-aixaifairness-aiAwesome Listsexplainable-aiinterpretable-ai人工智能ai-alignmentai-safety
73
5 天前
https://static.github-zh.com/github_avatars/dit7ya?size=40
dit7ya / awesome-ai-alignment

#Awesome#A curated list of awesome resources for Artificial Intelligence Alignment research

Awesome Listsai-safetyai-alignment
71
2 年前
https://static.github-zh.com/github_avatars/wesg52?size=40
wesg52 / sparse-probing-paper

Sparse probing paper full code.

ai-alignmentai-safetyinterpretability
Jupyter Notebook 58
1 年前
https://static.github-zh.com/github_avatars/RLHFlow?size=40
RLHFlow / Directional-Preference-Alignment

Directional Preference Alignment

rlhfai-alignmentlarge-language-models
57
9 个月前
https://static.github-zh.com/github_avatars/riceissa?size=40
riceissa / aiwatch

Website to track people, organizations, and products (tools, websites, etc.) in AI safety

ai-safetyPHP数据库datasetai-alignmentMySQL
HTML 22
2 天前
https://static.github-zh.com/github_avatars/lzzcd001?size=40
lzzcd001 / nabla-gfn

Official Implementation of Nabla-GFlowNet (ICLR 2025)

ai-alignmentdiffusion-modelsgenerative-modelfinetuning
Python 20
1 个月前
https://static.github-zh.com/github_avatars/UCSC-VLAA?size=40
UCSC-VLAA / Sight-Beyond-Text

#大语言模型#[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

llama2llava大语言模型mllmvicunavision-languageai-alignmentalignmentvlm
Python 19
2 年前
https://static.github-zh.com/github_avatars/liondw?size=40
liondw / Signal-Alignment

An initiative to create concise and widely shareable educational resources, infographics, and animated explainers on the latest contributions to the community AI alignment effort. Boosting the signal ...

人工智能ai-alignmentdesign教学
18
2 年前
https://static.github-zh.com/github_avatars/phelps-sg?size=40
phelps-sg / llm-cooperation

#大语言模型#Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 202...

economicsgpt-3大语言模型ai-safetyai-alignmentgpt-4
Python 12
6 个月前
https://static.github-zh.com/github_avatars/IQTLabs?size=40
IQTLabs / daisybell

Scan your AI/ML models for problems before you put them into production.

bias-correctionbias-detectionCybersecurityai-alignmentai-safety
Python 11
3 个月前
https://static.github-zh.com/github_avatars/patcon?size=40
patcon / awesome-polis

#Awesome#Community list of awesome projects, apps, tools and more related to Polis.

ai-alignmentAwesome Listscollective-intelligencedemocracyparticipatory-democracycivic-techcivictechparticipationpca
JavaScript 10
13 天前
https://static.github-zh.com/github_avatars/rmoehn?size=40
rmoehn / farlamp

IDA with RL and overseer failures

idaresearch-projectai-alignment
TeX 8
4 年前
loading...