GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

ai-safety

Website
Wikipedia
https://static.github-zh.com/github_avatars/jphall663?size=40
jphall663 / awesome-machine-learning-interpretability

#Awesome#A curated list of awesome responsible machine learning resources.

fairnessxaiinterpretabilitytransparency机器学习数据科学PythonRAwesome Listsmachine-learning-interpretabilityinterpretable-machine-learninginterpretable-mlinterpretable-aiexplainable-mlai-safetyprivacy-enhancing-technologiesprivacy-preserving-machine-learning
3.8 k
5 天前
https://static.github-zh.com/github_avatars/PKU-Alignment?size=40
PKU-Alignment / safe-rlhf

#数据仓库#Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

ai-safetyalpaca数据集deepspeedlarge-language-modelsllama大语言模型reinforcement-learningreinforcement-learning-from-human-feedbackrlhftransformersvicunasafetygpttransformerbeaver
Python 1.49 k
1 年前
https://static.github-zh.com/github_avatars/OpenLMLab?size=40
OpenLMLab / MOSS-RLHF

Secrets of RLHF in Large Language Models Part I: PPO

rlhfalignmentai-safety
Python 1.37 k
1 年前
https://static.github-zh.com/github_avatars/cvs-health?size=40
cvs-health / uqlm

#大语言模型#UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection

ai-safetyhallucination大语言模型llm-evaluationuncertainty-estimationuncertainty-quantification
Python 692
2 天前
https://static.github-zh.com/github_avatars/JohnSnowLabs?size=40
JohnSnowLabs / langtest

#自然语言处理#Deliver safe & effective language models

benchmarkslarge-language-modelsml-safetyml-testingmlops自然语言处理responsible-aiai-safety人工智能benchmark-framework大语言模型
Python 523
1 个月前
https://static.github-zh.com/github_avatars/tigerlab-ai?size=40
tigerlab-ai / tiger

#大语言模型#Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)

classificationfine-tuning大语言模型llm-trainingragai-safetydata-augmentationlarge-language-models
Jupyter Notebook 396
2 年前
https://static.github-zh.com/github_avatars/agencyenterprise?size=40
agencyenterprise / PromptInject

#计算机科学#PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...

ai-safetylanguage-modelsml-safetyagiai-alignmentadversarial-attacksgpt-3large-language-models机器学习chain-of-thoughtprompt-engineering
Python 379
1 年前
https://static.github-zh.com/github_avatars/hendrycks?size=40
hendrycks / ethics

Aligning AI With Shared Human Values (ICLR 2021)

ai-safetygpt-3ml-safety
Python 288
2 年前
https://static.github-zh.com/github_avatars/ShengranHu?size=40
ShengranHu / Thought-Cloning

#计算机科学#[NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking

ai-safety人工智能深度学习imitation-learningreinforcement-learningPyTorch
Python 268
1 年前
https://static.github-zh.com/github_avatars/normster?size=40
normster / llm_rules

RuLES: a benchmark for evaluating rule-following in language models

ai-securitygpt-4ai-safety
Python 225
4 个月前
https://static.github-zh.com/github_avatars/cvs-health?size=40
cvs-health / langfair

#大语言模型#LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

人工智能biasbias-detectionfairnessfairness-aifairness-mlfairness-testinglarge-language-models大语言模型responsible-aiPythonai-safetyllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics
Python 215
4 天前
https://static.github-zh.com/github_avatars/Jiaqi-Chen-00?size=40
Jiaqi-Chen-00 / ImBD

[AAAI 2025 oral] Official repository of Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection

ai-safety
Python 214
2 个月前
https://static.github-zh.com/github_avatars/WindVChen?size=40
WindVChen / DiffAttack

An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility.

ai-safetydiffusion-models
Python 214
8 个月前
https://static.github-zh.com/github_avatars/Giskard-AI?size=40
Giskard-AI / awesome-ai-safety

#自然语言处理#📚 A curated list of papers & technical articles on AI Quality & Safety

人工智能ai-alignmentai-safety大语言模型llmops机器学习mlops自然语言处理ml-testingmodel-validation机器视觉Awesome Listsml-safetyrobustness
183
2 个月前
https://static.github-zh.com/github_avatars/tomekkorbak?size=40
tomekkorbak / pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

ai-alignmentai-safetygptlanguage-modelspretrainingreinforcement-learningrlhf
Python 182
1 年前
https://static.github-zh.com/github_avatars/phantasmlabs?size=40
phantasmlabs / phantasm

#大语言模型#Toolkits to create a human-in-the-loop approval layer to monitor and guide AI agents workflow in real-time.

ai-agentsai-safetyai-securityautomation-toolscontrol-flowdashboardhuman-computer-interactionhuman-in-the-loop大语言模型llm-securityllmops监控Open SourceRust
Svelte 176
7 个月前
https://static.github-zh.com/github_avatars/lets-make-safe-ai?size=40
lets-make-safe-ai / make-safe-ai

How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚

agi人工智能ai-safetyartificial-general-intelligenceai-alignment
169
2 年前
https://static.github-zh.com/github_avatars/PKU-YuanGroup?size=40
PKU-YuanGroup / Hallucination-Attack

#自然语言处理#Attack to induce LLMs within hallucinations

adversarial-attacks大语言模型hallucinations机器学习自然语言处理ai-safety深度学习
Python 155
1 年前
https://static.github-zh.com/github_avatars/PKU-Alignment?size=40
PKU-Alignment / beavertails

#数据仓库#BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

ai-safetyhuman-feedbacklanguage-model大语言模型rlhfsafetybeaver数据集gptllama
Makefile 143
2 年前
https://static.github-zh.com/github_avatars/ryoungj?size=40
ryoungj / ToolEmu

[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use

agentai-safetylanguage-modellarge-language-modelsprompt-engineering
Python 142
1 年前
loading...