Greenplum机器学习⼯具集和案例
Local Storage Other RDBMSes Spark GemFire Cloud Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph Python Python. R, Java, Perl, C Programmatic Apache SOLR Text PostGIS GeoSpatial Custom Apps BI / Reporting Machine Learning AI IT Dev Business Analysts Data Scientists On-Premises Public (Hyper-Q) 2017.thegiac.com Greenplum ⼤大数据平台 • 一次打包,到处运行:裸机、私有云、公有云 • 各种数据源:Hadoop、S3、数据库、文件、Spark、Ka,a • 各种数据格式:结构化、半结构化(JSON/XML/Hstore)、非结构化 • 强大内核: MPP、优化器、多态存储、灵活分区、高速加载、PG内核 • 强大的灵活性、0 码力 | 58 页 | 1.97 MB | 1 年前3Greenplum 精粹文集
的方式来处理,你将获得意想不到的性 能和方便性;例如我们在某客户实现的数据转码、数据脱敏等,只需 要简单的改写原有代码后部署到 GP 中,通过并行计算获得数十倍性 能提高。 另外,GPTEXT(lucent 全文检索)、Apache Madlib(开源挖掘算法)、 SAS algorithm、R 都是通过 UDF 方式实现在 Greenplum 集群中分布 式部署,从而获得库内计算的并行能力。这里可以分享的是,SAS 曾 经做过测试,对 出来,几乎成为当前 Hadoop 开发使用的一个技术热点趋势。 这 些 技 术 包 括:Hive、Pivotal HAWQ、SPARK SQL、Impala、 Prest、Drill、Tajo 等等很多,这些技术有些是在 Mapreduce 上做 了优化。例如 Spark 采用内存中的 Mapreduce 技术,号称性能比 基于文件的的 Mapreduce 提高 10 倍;有的则采用 C/C++ 测试集 ,99 个 SQL)为例,包括 SPARK、Impala、Hive,只能支 持其中的 1/3 左右。 Big Date2.indd 15 16-11-22 下午3:38 16 由于 HADOOP 本身 Append-only 的特性,SQL-On-Hadoop 大多不 支持数据局部更新和删除功能 (update/delete);例如 Spark 计算时, 需要预先将数据装载到 DataFrames0 码力 | 64 页 | 2.73 MB | 1 年前3Greenplum for Kubernetes PGConf India 2019
DEPLOYMENT Local Storage Other RDBMSes Spark GemFire Cloud Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph Python. R, Java, Perl, C Programmatic Apache SOLR Text PostGIS GeoSpatial Custom Apps BI / Reporting Machine Learning AI On-Premises NEXT GENERATION DATA PLATFORM Kafka ETL Spring Cloud Data Flow0 码力 | 26 页 | 1.75 MB | 1 年前3Pivotal Greenplum 5: 新一代数据平台
Flow ETL 本地存储 HDFSS 云对象 存储 GemFire Spark 其他 RDBMSes 多结构数据 PIVOTAL GREENPLUM 平台 原生接口 分析应用 用户 JDBC、OBBC Teradata SQL Apache MADlib Python. R、 Java、Perl、C Apache SOLR PostGIS ANSI SQL 其他数据库 SQL ML/统计数据/图形 查询 优化器 (GPORCA) Workload Manager 多态存储 Command Center SQL 兼容性 (Hyper-Q) PostgreSQL 内核 JSON、Apache AVRO、Apache Parquet 和 XML 结构化数据 pivotal.io/cn 白皮书 5 © Copyright 2017 Pivotal Software, Inc.保留所有权利。 rpart、sandwich、scales、stringi、stringr、survival、tibble、tseries 和 zoo。 此外,Greenplum 5 还支持最新版 Apache MADlib(可用 SQL 进行机器学习和图分析),支持在 Apache Solr 数据库内实 施 GPText 完成索引和搜索功能,其中包含用于国际文本和社交媒体文本的自定义分词器和一个通用查询处理器(可接受 来自支持的 Solr0 码力 | 9 页 | 690.33 KB | 1 年前3Greenplum on Kubernetes 容器化MPP数据库
云数据库实现方案 ● 全新数据库 ○ Snowflake ● 原有数据库架构升级 ○ Vertica Eon Mode ● 容器化数据库+Kubernetes ○ Apache Spark ○ CockroachDB ○ Apache HAWQ 云数据库存储方案 ● 块存储 ○ 文件系统接口 ● 对象存储 ○ 成本低 ○ 扩展性强 ○ 访问延迟高 Greenplum on Kubernetes0 码力 | 33 页 | 1.93 MB | 1 年前3VMware Greenplum v6.18 Documentation
version of Greenplum Database due to licensing restrictions. Support for data connectors: Greenplum-Spark Connector Greenplum-Informatica Connector Greenplum-Kafka Integration Greenplum Streaming Server Greenplum Text Tanzu Greenplum Streaming Server Tanzu Greenplum Connector for Apache Spark Tanzu Greenplum Connector for Apache NiFi Tanzu Greenplum Connector for Informatica VMware Greenplum v6.18 Documentation Server is an ETL tool that provides high speed, parallel data transfer from Informatica, Kafka, Apache NiFi and custom client data sources to a Tanzu Greenplum cluster. Refer to the Tanzu Greenplum0 码力 | 1959 页 | 19.73 MB | 1 年前3VMware Greenplum v6.19 Documentation
version of Greenplum Database due to licensing restrictions. Support for data connectors: Greenplum-Spark Connector Greenplum-Informatica Connector Greenplum-Kafka Integration Greenplum Streaming Server Greenplum Text Tanzu Greenplum Streaming Server Tanzu Greenplum Connector for Apache Spark Tanzu Greenplum Connector for Apache NiFi Tanzu Greenplum Connector for Informatica VMware Greenplum v6.19 Documentation Server is an ETL tool that provides high speed, parallel data transfer from Informatica, Kafka, Apache NiFi and custom client data sources to a Tanzu Greenplum cluster. Refer to the Tanzu Greenplum0 码力 | 1972 页 | 20.05 MB | 1 年前3VMware Tanzu Greenplum v6.20 Documentation
due to licensing restrictions. Support for data connectors: Greenplum-NiFi Connector Greenplum-Spark Connector Greenplum-Informatica Connector Greenplum-Kafka Integration Greenplum Streaming Server Greenplum Text Tanzu Greenplum Streaming Server Tanzu Greenplum Connector for Apache Spark Tanzu Greenplum Connector for Apache NiFi Tanzu Greenplum Connector for Informatica VMware Tanzu Greenplum v6 Server is an ETL tool that provides high speed, parallel data transfer from Informatica, Kafka, Apache NiFi and custom client data sources to a Tanzu Greenplum cluster. Refer to the Tanzu Greenplum0 码力 | 1988 页 | 20.25 MB | 1 年前3VMware Greenplum v6.17 Documentation
Greenplum Text Tanzu Greenplum Streaming Server Tanzu Greenplum Connector for Apache Spark Tanzu Greenplum Connector for Apache NiFi Tanzu Greenplum Connector for Informatica VMware Greenplum v6.17 Documentation Server is an ETL tool that provides high speed, parallel data transfer from Informatica, Kafka, Apache NiFi and custom client data sources to a Tanzu Greenplum cluster. Refer to the Tanzu Greenplum Connector for Apache Spark v1.6.2 - The Tanzu Greenplum Connector for Apache Spark supports high speed, parallel data transfer between Greenplum and an Apache Spark cluster using Spark’s Scala API.0 码力 | 1893 页 | 17.62 MB | 1 年前3VMware Greenplum 6 Documentation
for VMware Greenplum and VMware GemFire VMware Greenplum Connector for Apache NiFi VMware Greenplum Connector for Apache Spark VMware Greenplum Connector for Informatica VMware Greenplum Streaming Restore VMware Greenplum Command Center VMware Greenplum Connector for Apache NiFi VMware Greenplum Connector for Apache Spark VMware Greenplum Connector for Informatica VMware Greenplum Data Copy single query. The VMware Greenplum Connector for Apache NiFi version 1.1.0 is available, which includes a change. Refer to the Greenplum Connector for Apache NiFi Documentation for more information about0 码力 | 2445 页 | 18.05 MB | 1 年前3
共 24 条
- 1
- 2
- 3