Loading

该仓库已收录但尚未编辑。项目介绍及使用教程请前往 GitHub 阅读 README


0 条讨论

登录后发表评论

关于

Benchmark for evaluating open-ended generation

创建时间
是否国产

  修改时间

2024-11-06T03:22:01Z


语言

  • Python99.6%
  • Shell0.4%

thu-coai 的其他开源项目

A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models

Python1.9 k
2 年前

#大语言模型#Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。

1.08 k
2 年前

A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset

Python701
1 年前

KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation

Python491
2 年前

您可能感兴趣的

Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Python40
3 年前

Windows compile of bitsandbytes for use in text-generation-webui.

HTML360
2 年前

A framework for few-shot evaluation of language models.

Python10.27 k
1 小时前

Tools for downloading and analyzing summaries and evaluating summarization systems. https://summari.es/

Perl147
2 年前

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

Jupyter Notebook14 k
1 年前

A text generation benchmarking platform

Python857
4 年前
Python1.18 k
1 年前

Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"

Python380
2 年前

#自然语言处理#Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

Python298
1 年前

#大语言模型#利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

Python45.61 k
4 个月前