SQL and PySpark functions

Aggregation function
Window functions
Create / Drop tables
Create / Drop database
Insert and modify
Other Useful PySpark functions
1. lit(value).alias(column_name): 用于添加新的column到table而column的值=value 或者 value 是原来的table的某列的值，之后再用alias重新命名
2. pivot(column_name) .count() 用于把某一列的类别值展开变成multi-onehot的形式

tmp.groupBy("movieId").pivot("splitted_genres").count().fillna(0).show()

Last updated 4 years ago

Was this helpful?