humaneval · GitHub Topics

#自然语言处理#We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

code-generation code-interpreter humaneval 大语言模型 text-generation 自然语言处理

Python 854

1 年前

the-crypt-keeper / can-ai-code

#大语言模型#Self-evaluating interview for AI coders

人工智能 ggml langchain llama-cpp 大语言模型 humaneval transformers

Python 592

1 个月前

abacaj / code-eval

Run evaluation on LLMs using human-eval benchmark

humaneval wizardcoder

Python 416

2 年前

SkyWorkAIGC / SkyCode-AI-CodeX-GPT3

SkyCode是一个多语言开源编程大模型，采用GPT3模型结构，支持Java, JavaScript, C, C++, Python, Go, shell等多种主流编程语言，并能理解中文注释。模型可以对代码进行补全，拥有强大解题能力，使您从编程中解放出来，专心于解决更重要的问题。| SkyCode is an open source programming model, which adopts...

codex deepmind Go gpt-neo humaneval Java JavaScript openai Python gpt3 gpt-3 Shell

393

2 年前

zorse-project / COBOLEval

#大语言模型#Evaluate LLM-generated COBOL

cobol evaluation humaneval 大语言模型

Python 39

1 年前

declare-lab / LLM-ReasoningTest

Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions

humaneval reasoning

Python 10

3 个月前

abhigupta2909 / LLMPerformanceLab

LLMs' performance analysis on CPU, GPU, Execution Time and Energy Usage

flask-restful humaneval Java JavaScript 大语言模型 mmlu ollama-api React Spring Boot MySQL

Java 0

1 年前

mousamax / Evaluation-Code-Generator-LLMs

JetBrains Task: Leveraging software evolution data with LLMs

huggingface humaneval

2 年前

mennahasan31 / llm_benchmark

#自然语言处理#llm_benchmark is a comprehensive benchmarking tool for evaluating the performance of various Large Language Models (LLMs) on a range of natural language processing tasks. It provides a standardized fr...

ai-tools alibaba anthropic benchmark evals evaluation evaluation-metrics humaneval information-seeking mistral 自然语言处理 openai reasoning streetfighterai

5 个月前