2.3 Spark核心概念简介

bin/pyspark

[Python 版本筛选](

>>> lines = sc.textFile("pyspark")
>>> pythonLines = lines.filter(lambda line: "Python" in line)
>>> pythonLines.count()
6
>>> pythonLines.first()
u'# to use IPython and set PYSPARK_DRIVER_PYTHON_OPTS to pass options when starting the Python driver'

Scala 版本筛选的例子

bin/spark-shell

Scala 版本筛选

scala> val lines = sc.textFile("spark-shell")
lines: org.apache.spark.rdd.RDD[String] = spark-shell MapPartitionsRDD[1] at textFile at :24

scala> val scalaLines = lines.filter(line => line.contains("scala"))
scalaLines: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at :25

scala> scalaLines.count()
res2: Long = 4

scala> scalaLines.first()
res3: String = # SPARK-4161: scala does not assume use of the java classpath,
</pre>
#### 向 Spark 传递函数 ####
[Java 版本](/SparkFastDataAnalysis/src/main/java/com/jueee/learnspark/dataanalysis/chapter02/J3CoreConcepts.java)  
[Scala 版本](/SparkFastDataAnalysis/src/main/java/com/jueee/learnspark/dataanalysis/chapter02/S3CoreConcepts.scala)  
[Python 版本](/SparkFastDataAnalysis/src/main/java/com/jueee/learnspark/dataanalysis/chapter02/P3CoreConcepts.py)