DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Economical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI research@deepseek.com Abstract We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training Acknowledgments 27 B DeepSeek-V2-Lite: A 16B Model Equipped with MLA and DeepSeekMoE 29 2 B.1 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 B.2 Performance Evaluation Evaluations on Math and Code 33 G Evaluation Formats 34 3 1. Introduction In the past few years, Large Language Models (LLMs) (Anthropic, 2023; Google, 2023; OpenAI, 2022, 2023) have undergone rapid development0 码力 | 52 页 | 1.23 MB | 1 年前3Trends Artificial Intelligence
breakthrough large language models (LLMs) that – in effect – found freedom with the November 2022 launch of OpenAI’s ChatGPT with its extremely easy-to-use / speedy user interface. In addition, relatively 260% Annual Growth Over Fifteen Years of… Data to Train AI Models Led To… Note: Only “notable” language models shown (per Epoch AI, includes state of the art improvement on a recognized benchmark, >1K FLOPs are often used to estimate the computational cost of training or running a model. Note: Only language models shown (per Epoch AI, includes state of the art improvement on a recognized benchmark, >1K0 码力 | 340 页 | 12.14 MB | 4 月前3OpenAI - AI in the Enterprise
they could offer more and better insights to clients. They started with three model evals: 01 Language translation Measuring the accuracy and quality of translations produced by a model. 02 Summarization candidate why this specific job was recommended to them. Indeed uses the data analysis and natural language capabilities of GPT-4o mini to shape these ‘why’ statements in their emails and messages to jobseekers style, and context. Consistent tone and style For a retailer, that could mean every product description stays true to brand voice; for a law firm, it means properly formatted citations, every time0 码力 | 25 页 | 9.48 MB | 5 月前3Google 《Prompt Engineering v7》
Summary 66 Endnotes 68 Prompt Engineering February 2025 6 Introduction When thinking about a large language model input and output, a text prompt (sometimes accompanied by other modalities such as image evaluating a prompt’s writing style and structure in relation to the task. In the context of natural language processing and LLMs, a prompt is an input provided to the model to generate a response or prediction such as text summarization, information extraction, question and answering, text classification, language or code translation, code generation, and code documentation or reasoning. Please feel free to0 码力 | 68 页 | 6.50 MB | 6 月前3OpenAI 《A practical guide to building agents》
foundations 7 Guardrails 24 Conclusion 32 2 Practical guide to building agents Introduction Large language models are becoming increasingly capable of handling complex, multi-step tasks. Advances in reasoning security reviews. 03 Heavy reliance on unstructured data: Scenarios that involve interpreting natural language, extracting meaning from documents, or interacting with users conversationally, for example and prevent redundant definitions. Broadly speaking, agents need three types of tools: Type Description Examples Data Enable agents to retrieve context and information necessary for executing the workflow0 码力 | 34 页 | 7.00 MB | 5 月前3OctoML OSS 2019 11 8
groundwork forimproved multi-language support for expPosing runtime, and |IRs. QQ octoML Unified Object Protocol vm::Object NDArray | Rd | tuplelclosure AST Nodes Cross language suppPort Easy to introduce0 码力 | 16 页 | 1.77 MB | 5 月前3TVM: Where Are We Going
tvm::runtime::Module GetFunction(string) -> tvm::runtime::PackedFunc SaveToBinary/LoadFromBinary Runtime Module Interface SubclassesUnified Runtime Benefit mod.export_library("mylib.so") Unified library packaging Free reduce_axis((0, 8)) C = tvm.compute((8, 8), lambda y, x: tvm.sum(A[k, y] * B[k], axis=k)) HW Interface Specification by Tensor Expression TensorizationVTA: Open & Flexible Deep Learning Accelerator for Flexible Deep Learning Acceleration. Moreau et al. IEEE Micro 2019. VTA Hardware/Software Interface (ISA) VTA MicroArchitecture VTA Simulator} compiler, driver, hardware design full stack open0 码力 | 31 页 | 22.64 MB | 5 月前3Dynamic Model in TVM
2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. VM bytecode Instruction Description Move Moves data from one register to another. Ret Returns the object in register result to caller’s0 码力 | 24 页 | 417.46 KB | 5 月前3TVM@Alibaba AI Labs
kernel, strides, padding, dilation, layout, out_dtype): #Describe algorithm with tensor expression language'; #Return the out operation w How to compute. @autotvm.register_ topi_schedule(schedule_conv2d_nchw,pvr0 码力 | 12 页 | 1.94 MB | 5 月前3DeepSeek图解10页PDF
零基础必知 为了更深入理解 DeepSeek-R1,首先需要掌握 LLM 的基础知识,包括其工 作原理、架构、训练方法。 近年来,人工智能(AI)技术的快速发展催生了大型语言模型((Large Language Model, LLM))的兴起。LLM 在自然语言处理(NLP)领域 发挥着越来越重要的作用,广泛应用于智能问答、文本生成、代码编写、机 器翻译等任务。LLM 是一种基于深度学习的人工智能模型,其核心目标是0 码力 | 11 页 | 2.64 MB | 7 月前3
共 10 条
- 1