积分充值
 首页
前端开发
AngularDartElectronFlutterHTML/CSSJavaScriptReactSvelteTypeScriptVue.js构建工具
后端开发
.NetC#C++C语言DenoffmpegGoIdrisJavaJuliaKotlinLeanMakefilenimNode.jsPascalPHPPythonRISC-VRubyRustSwiftUML其它语言区块链开发测试微服务敏捷开发架构设计汇编语言
数据库
Apache DorisApache HBaseCassandraClickHouseFirebirdGreenplumMongoDBMySQLPieCloudDBPostgreSQLRedisSQLSQLiteTiDBVitess数据库中间件数据库工具数据库设计
系统运维
AndroidDevOpshttpdJenkinsLinuxPrometheusTraefikZabbix存储网络与安全
云计算&大数据
Apache APISIXApache FlinkApache KarafApache KyuubiApache OzonedaprDockerHadoopHarborIstioKubernetesOpenShiftPandasrancherRocketMQServerlessService MeshVirtualBoxVMWare云原生CNCF机器学习边缘计算
综合其他
BlenderGIMPKiCadKritaWeblate产品与服务人工智能亿图数据可视化版本控制笔试面试
文库资料
前端
AngularAnt DesignBabelBootstrapChart.jsCSS3EchartsElectronHighchartsHTML/CSSHTML5JavaScriptJerryScriptJestReactSassTypeScriptVue前端工具小程序
后端
.NETApacheC/C++C#CMakeCrystalDartDenoDjangoDubboErlangFastifyFlaskGinGoGoFrameGuzzleIrisJavaJuliaLispLLVMLuaMatplotlibMicronautnimNode.jsPerlPHPPythonQtRPCRubyRustR语言ScalaShellVlangwasmYewZephirZig算法
移动端
AndroidAPP工具FlutterFramework7HarmonyHippyIoniciOSkotlinNativeObject-CPWAReactSwiftuni-appWeex
数据库
ApacheArangoDBCassandraClickHouseCouchDBCrateDBDB2DocumentDBDorisDragonflyDBEdgeDBetcdFirebirdGaussDBGraphGreenPlumHStreamDBHugeGraphimmudbIndexedDBInfluxDBIoTDBKey-ValueKitDBLevelDBM3DBMatrixOneMilvusMongoDBMySQLNavicatNebulaNewSQLNoSQLOceanBaseOpenTSDBOracleOrientDBPostgreSQLPrestoDBQuestDBRedisRocksDBSequoiaDBServerSkytableSQLSQLiteTiDBTiKVTimescaleDBYugabyteDB关系型数据库数据库数据库ORM数据库中间件数据库工具时序数据库
云计算&大数据
ActiveMQAerakiAgentAlluxioAntreaApacheApache APISIXAPISIXBFEBitBookKeeperChaosChoerodonCiliumCloudStackConsulDaprDataEaseDC/OSDockerDrillDruidElasticJobElasticSearchEnvoyErdaFlinkFluentGrafanaHadoopHarborHelmHudiInLongKafkaKnativeKongKubeCubeKubeEdgeKubeflowKubeOperatorKubernetesKubeSphereKubeVelaKumaKylinLibcloudLinkerdLonghornMeiliSearchMeshNacosNATSOKDOpenOpenEBSOpenKruiseOpenPitrixOpenSearchOpenStackOpenTracingOzonePaddlePaddlePolicyPulsarPyTorchRainbondRancherRediSearchScikit-learnServerlessShardingSphereShenYuSparkStormSupersetXuperChainZadig云原生CNCF人工智能区块链数据挖掘机器学习深度学习算法工程边缘计算
UI&美工&设计
BlenderKritaSketchUI设计
网络&系统&运维
AnsibleApacheAWKCeleryCephCI/CDCurveDevOpsGoCDHAProxyIstioJenkinsJumpServerLinuxMacNginxOpenRestyPrometheusServertraefikTrafficUnixWindowsZabbixZipkin安全防护系统内核网络运维监控
综合其它
文章资讯
 上传文档  发布文章  登录账户
IT文库
  • 综合
  • 文档
  • 文章

无数据

分类

全部云计算&大数据(93)VirtualBox(85)综合其他(49)Blender(32)机器学习(8)产品与服务(6)人工智能(5)后端开发(3)Krita(3)GIMP(3)

语言

全部英语(116)中文(简体)(15)中文(繁体)(10)日语(2)fj(1)韩语(1)ro(1)zh(1)

格式

全部PDF文档 PDF(141)其他文档 其他(6)
 
本次搜索耗时 0.031 秒,为您找到相关结果约 147 个.
  • 全部
  • 云计算&大数据
  • VirtualBox
  • 综合其他
  • Blender
  • 机器学习
  • 产品与服务
  • 人工智能
  • 后端开发
  • Krita
  • GIMP
  • 全部
  • 英语
  • 中文(简体)
  • 中文(繁体)
  • 日语
  • fj
  • 韩语
  • ro
  • zh
  • 全部
  • PDF文档 PDF
  • 其他文档 其他
  • 默认排序
  • 最新排序
  • 页数排序
  • 大小排序
  • 全部时间
  • 最近一天
  • 最近一周
  • 最近一个月
  • 最近三个月
  • 最近半年
  • 最近一年
  • pdf文档 TVM Meetup: Quantization

    Models in TVM AWS AI© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Quantization Overview • Represent FP32 numbers with a lower-precision INT8 numbers • Integer number stands All rights reserved. Quantization in TVM • Quantization within TVM - Automatic Quantization • TVM stack ingests a FP32 graph and a small dataset • Finds suitable quantization scale • Produces a quantized its Affiliates. All rights reserved. Quantization Appraoches in TVM Framework FP32 Graph MXNet Parser TF parser …. Relay FP32 Graph Relay Automatic Quantization Relay Int8 Graph Framework Pre-quantized
    0 码力 | 19 页 | 489.50 KB | 5 月前
    3
  • pdf文档 《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniques

    chapter, we introduce Quantization, a model compression technique that addresses both these issues. We’ll start with a gentle introduction to the idea of compression. Details of quantization and its applications after. The quantization section delves into the implementation details using code samples. We finish with a hands-on project that will walk you through the process of applying quantization in practical the next section we introduce Quantization, a popular compression technique which is also used in various fields of computer science in addition to deep learning. Quantization Before we jump to working
    0 码力 | 33 页 | 1.96 MB | 1 年前
    3
  • pdf文档 《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniques

    compression techniques. By ‘advanced’ we mean that these techniques are slightly more involved than quantization (as discussed in the second chapter). But that doesn’t mean they are harder to learn or implement particular clustering is a generalization of quantization. If you noticed, quantization ensures that any two weights that lie within the same quantization bin, are mapped to the same quantized weight value value. That is an implicit form for weight sharing. However, quantization falls behind in case the data that we are quantizing is not uniformly distributed, i.e. the data is more likely to take values
    0 码力 | 34 页 | 3.18 MB | 1 年前
    3
  • pdf文档 AI大模型千问 qwen 中文文档

    以了解它们。 1.4.3 生成你的 GGUF 文件 We introduce the method of creating and quantizing GGUF files in quantization/llama.cpp. You can refer to that document for more information. 1.4.4 PPL 评测 llama.cpp 为我们提供了评估 AutoAWQForCausalLM from transformers import AutoTokenizer # Specify paths and hyperparameters for quantization model_path = "your_model_path" quant_path = "your_quantized_model_path" quant_config = { "zero_point": BaseQuantizeConfig from transformers import AutoTokenizer # Specify paths and hyperparameters for quantization (续下页) 16 Chapter 1. 文档 Qwen (接上页) model_path = "your_model_path" quant_path = "your_quantized_model_path"
    0 码力 | 56 页 | 835.78 KB | 1 年前
    3
  • pdf文档 《Efficient Deep Learning Book》[EDL] Chapter 7 - Automation

    results. For example, between quantization and clustering, which one is preferable? What is the performance impact when both are used together? We have four options: none, quantization, clustering, and both. earlier example for choosing quantization and/or clustering techniques for model optimization. We have a search space which has two boolean valued parameters: quantization and clustering. A $$True$$ value
    0 码力 | 33 页 | 2.48 MB | 1 年前
    3
  • pdf文档 《Efficient Deep Learning Book》[EDL] Chapter 1 - Introduction

    these approaches are generic enough to be used across architectures. A classical example is Quantization (see Figure 1-8), which tries to compress the weight matrix of a layer, by reducing its precision precision (eg., from 32-bit floating point values to 8-bit unsigned / signed integers). Quantization can generally be applied to any network which has a weight matrix. It can often help reduce the model size size 2 - 8x, while also speeding up the inference latency. Figure 1-8: An illustration of the quantization process: mapping of continuous high-precision values to discrete fixed-point integer values. Another
    0 码力 | 21 页 | 3.17 MB | 1 年前
    3
  • pdf文档 《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architectures

    quality is within the acceptable parameters. For on-device models, TFLite offers post-training quantization as described in chapter 2. We could also incorporate compression techniques such as sparsity, a range of mobile and edge devices. Do you recall a technique that can reduce it further? Yes, Quantization! We will leave it for you as an exercise. Tell us how well it works! Summary This chapter was architectures for your deep learning projects. They can often be combined with other approaches like quantization, distillation, data augmentation, that we already learned. In the next chapter we will explore
    0 码力 | 53 页 | 3.92 MB | 1 年前
    3
  • pdf文档 PAI & TVM Meetup - Shanghai 20191116

    /c Weight Adjustment 和 90% 而 Baseline 国 INT8 quantization w/o WA 忻 INT8 quantization w/ WA 80% 70% 60% 50%6 MobileNet v1 MobileNet v1 0.5
    0 码力 | 26 页 | 5.82 MB | 5 月前
    3
  • pdf文档 PyTorch Release Notes

    JupyterLab 2.3.2 including Jupyter-TensorBoard ‣ TransformerEngine 0.10.0+96ed6fc ‣ PyTorch quantization wheel 2.1.2 PyTorch Release 23.07 PyTorch RN-08516-001_v23.07 | 6 Driver Requirements 2.6.2 ‣ JupyterLab 2.3.2 including Jupyter-TensorBoard ‣ TransformerEngine 0.9.0 ‣ PyTorch quantization wheel 2.1.2 PyTorch Release 23.06 PyTorch RN-08516-001_v23.07 | 14 Driver Requirements MAGMA 2.6.2 ‣ JupyterLab 2.3.2 including Jupyter-TensorBoard ‣ TransformerEngine 0.8 ‣ PyTorch quantization wheel 2.1.2 PyTorch Release 23.05 PyTorch RN-08516-001_v23.07 | 22 Driver Requirements
    0 码力 | 365 页 | 2.94 MB | 1 年前
    3
  • pdf文档 DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Keutzer, and A. Gholami. Kvquant: Towards 10 million context length LLM inference with KV cache quantization. CoRR, abs/2401.18079, 2024. URL https://doi.org/10.48550/arXiv.2401.18079. S. Hu, Y. Tu, X. Zhu, Z. Ye, L. Chen, S. Zheng, L. Ceze, A. Krishnamurthy, T. Chen, and B. Kasikci. Atom: Low-bit quantization for efficient and accurate LLM serving. CoRR, abs/2310.19102, 2023. URL https://doi.org/10.48550/arXiv
    0 码力 | 52 页 | 1.23 MB | 1 年前
    3
共 147 条
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 15
前往
页
相关搜索词
TVMMeetupQuantizationEfficientDeepLearningBookEDLChapterCompressionTechniquesAdvancedAI模型千问qwen中文文档AutomationIntroductionArchitecturesPAIShanghai20191116PyTorchReleaseNotesDeepSeekV2StrongEconomicalandMixtureofExpertsLanguageModel
IT文库
关于我们 文库协议 联系我们 意见反馈 免责声明
本站文档数据由用户上传或本站整理自互联网,不以营利为目的,供所有人免费下载和学习使用。如侵犯您的权益,请联系我们进行删除。
IT文库 ©1024 - 2025 | 站点地图
Powered By MOREDOC AI v3.3.0-beta.70
  • 关注我们的公众号【刻舟求荐】,给您不一样的精彩
    关注我们的公众号【刻舟求荐】,给您不一样的精彩