LeetCode-Notes
  • Introduction
  • Records of Practice
  • 关于Github 不支持密码问题
  • 面试题
    • 搜索广告
    • 多模态大模型
    • 刷题记录
  • 算法代码实现
  • Python
    • Python 笔记
  • Spark
    • PySpark
    • Spark Issues
    • Spark调优笔记
  • FeatureEngineering
    • Feature Cleaning
    • Feature Selection
    • Feature Transformation
    • Feature Crossing
  • Recommendation Algorithm
    • Recall-and-PreRank
      • Non-Negative Matrix Fatorization(NMF)
      • Fatorization Machine(FM)
      • User-base/Item-base实现
      • 多路召回实现
    • Ranking
      • NeuralFM
      • DeepFM
      • Deep&Cross network (DCN)
    • DeepLearning-Basic
      • Attention
      • Dropout
      • Batch Norm
  • Machine Learning
    • XGBoost
    • Cross Entropy Loss
    • Other models
  • Graph Neural Network
    • GNN-1-Basic
  • Big Data
    • Reservoir Sampling
  • SQL
    • SQL and PySpark functions
    • Query Film Infomation
    • Create, Insert and Alter Actor Table
    • Manage Employment Data
    • Manage Employment Data -2
  • DataStructure
    • Searching
      • Find power
      • 2 Sum All Pair II
      • Two Sum
      • Search in Rotate Array
      • Search In Shifted Sorted Array II
      • Search in 2D array
      • Three Sum with duplicated values
      • Median of Two Sorted Arrays
    • Array
      • Longest Consecutive Subarray
      • Merge Two Array in-place
      • Trapping water
      • Rotate matrix
    • Sorting
      • Merge intervals
      • 排序
      • 最小的k个数
      • Find TopK largest- QuickSelect快速选择 method
      • MergeSort Linkedlist
      • 第K大元素
    • LinkedList
      • Reverse LinkedList I
      • Reverse K-group linked list
      • Detect Start of Cycle
      • HasCycle
      • DetectCycle II
      • 链表的共同节点
      • 链表中倒数第k个节点
      • 删除链表倒数第k个节点
      • 合并两个链表
      • 在排序数组中查找元素的第一个和最后一个位置
      • 删除链表里面重复的元素-1
    • Tree
      • Find Tree height (general iteration method)
      • Check BST and Check CompleteTree
      • ZigZag Order traversal
      • Binary Tree diameter I
      • Maximum Path Sum Binary Tree
      • Maximum Path Sum Binary Tree II
      • Binary Tree Path Sum To Target III
      • Tree diameter 树的直径II
      • Tree ReConstruction
      • Check if B is Subtree of A
      • The Kth smallest in Binary Search Tree
      • 打印Tree的右视图
      • 二叉搜索树的后序遍历序列
      • 重建二叉树
      • 判断二叉树是否对称
      • Path Sum to Target in Binary Tree
      • Tree-PreOrder-InOrder-PostOrder
    • Heap&Queue
      • Top-K smallest
      • 滑动窗口最大值
      • Find the K-Largest
    • 合并k个已排序的链表
    • String
      • Reverse String
      • 最长不含重复字符的子字符串
      • 最长回文串
      • 最长回文子序列-DP
    • DFS/BFS
      • Number of island
      • Number of Provinces
      • All Permutations of Subsets without duplication
      • All Permutations of Subsets with duplication
      • Combinations Of Coins
      • All Subset I (without fixing size of subset, without order, without duplication)
      • All Subset of K size without duplication II
      • All Subset of K size III (with duplication without considering order)
      • All Permutation II (with duplication and consider order)
      • Factor Combination-质数分解
    • DynamicProgramming
      • DP-解题过程
      • Find Continuous Sequence Sum to Target
      • 1800. Maximum Ascending Subarray Sum
      • NC91 最长上升子序列
      • 查找string的编码方式个数
      • Maximum Product
      • Longest Common Substring
      • Longest Common Substring-II
      • minEditCost
      • Backpack I
      • Array Hopper I
      • Minimum distance between strings
      • 最大正方形
  • Big Data Algorithms
    • Big Data Processing Algorithms
      • Reservior Sampling
      • Shuffle
      • MapReduce
      • Bloom Filter
      • BitMap
      • Heap For Big Data
Powered by GitBook
On this page

Was this helpful?

算法代码实现

Metric

// Some code
class Metrics:
    def cal_auc(self, y_pred, y_true):
        """
        y_true: 1D- list
        y_pred: 1D-list
        auc = (number of positive pred > negative pred pair) / all pair conditions
        """
        pos_ls= []
        neg_ls = []
        for i in range(len(y_pred)):
            if y_true[i] != 0:
                pos_ls.append(i)
            else:
                neg_ls.append(i)

        cnt = 0
        for p in pos_ls:
            for n in neg_ls:
                if y_pred[p] > y_pred[n]:
                    cnt += 1
        auc = cnt/(len(neg_ls) * len(pos_ls))
        return auc
    def mrr(self, y_pred, y_true, topN=3):
        """
        Mean Reciprocal Rank
        强调的是用户的需求项在模型推荐列表中的位置,越靠前越佳。
        MRR = 1/S sum_i (1/pi), pi = i^th item rank position, S= sample amount
        pi = 打分的商品列表中第i个商品所在的排序后列表的位置。 如果第i个商品不在排序后的列表的topN
        的位置, 那么就是 1/pi=0
        y_pred: 1D- list
        y_true: 1D - list  with target item 
        """
        # 倒排
        rank = len(y_pred) - np.argsort(np.array(y_pred)) -1
        print("mrr rank:",rank,y_pred)
        topN_rank = rank.squeeze()[:topN]
        mrr = 0
        for i in y_true:
            if  i not in topN_rank:
                pi=0
            else:
                pi = 1/ (1+topN_rank.tolist()[i])
            mrr += pi
        return mrr
    def ndcg(self, y_preds, y_trues, K=5):
        """
        inputs: 
            y_preds: N*L
            y_trues: N*L
        y_pred仅仅用来算dcg的序列, 不需要它的score 绝对值!!
        normalize discount cumulative gain
        ndcg = dcg/idcg
        dcg = sum_i^N ( rel_i / log_2(i+1) )
        1. 先把y_pred 倒排,取topK, 以及找他们对应的ground truth score
        2. 用ground truth score 作为rel, 以及用pred排的序的位置rank作为 分母算dcg
        3. 算idcg
        4. ndcg = mean_i^N dcg/idcg 对N条样本的ndcg求均值
        """
        def dcg(pred, K):
            pred= pred[:K] # topK  ground truth relevence scores
            dcg=sum([ s/(np.log2(2+ i)) for i, s in enumerate(pred) ])
            return dcg
        res = 0
        for y_pred, y_true in zip(y_preds, y_trues):
            sorted_true_rel = [ x for _, x in sorted(zip(y_pred,y_true), reverse=True) ]
            dcg_val = dcg(sorted_true_rel, K)
            ideal_true_rel = sorted(y_true, reverse=True)
            idcg_val = dcg(ideal_true_rel, K)
            print("idcg: ", idcg_val)
            if idcg_val >0:
                ndcg = dcg_val/idcg_val 
            else:
                ndcg = 0
            res += ndcg
        res /= len(y_preds)
        return res
    def recall(self, y_pred, y_true, k):
        """
        recall = TP / (TP + FN)
        """
        def rc(pred, lb,k):
            sorted_lb = [ l for p, l in sorted(zip(pred, lb))]
            topk = sorted_lb[:k]
            return sum(topk)/sum(lb)
        res= []
        for pred, lb in zip(y_pred,y_true):
            res.append(rc(pred, lb, k))
        #  average recall
        return sum(res)/len(res)
    
    def precision(self, y_pred, y_true, K):
        def pk(pred, lb,k):
            sorted_lb = [ l for p, l in sorted(zip(pred, lb))]
            topk = sorted_lb[:k]
            return sum(topk), sum(topk)/k
        res= []
        x=0
        for pred, lb in zip(y_pred,y_true):
            num, pk_v = pk(pred, lb, K)
            res.append(pk_v)
            x+= num
        #  average recall
        return sum(res)/len(res), x/(len(y_pred)*K)
    
    def hitrate(self, y_pred, y_true, K):
        """
        hitrate@k: 分用户推荐的item的top k 是否有ground truth
        用户力度
        input:
            y_pred: N*L  N= number of user
            y_true: N*L
        https://cran.r-project.org/web/packages/recometrics/vignettes/Evaluating_recommender_systems.html
        """
        def hit(pred, lb, k):
            res = []
            for i, (p, l) in enumerate(sorted(zip( pred, lb), reverse=True)):
                if i <k:
                    res.append(l)
            return sum(res)>0
        hr = []
        for p, l in zip(y_pred, y_true):
            hr.append(hit(p, l,K))
        return sum(hr)/len(hr)
    
mt=Metrics()
print("metric auc:",mt.cal_auc([0.8,0.1,0.4,0.5,0.6,0.1],[1,0,1,0,0,0]))
print("metric mrr:",mt.mrr([0.8,0.1,0.4,0.5,0.6,0.1],[3, 2],4))
print("metric ndcg:",mt.ndcg( [[0.8,0.1,0.4,0.5,0.6,0.1], [0.8,0.1,0.4,0.5,0.6,0.1]],[[1,0,1,0,0,0], [1,0,1,1,0,0]] ,4))
print("metric hitrate:",mt.hitrate( [[0.8,0.1,0.4,0.5,0.6,0.1], [0.8,0.1,0.4,0.5,0.6,0.1]],[[0,0,0,0,0,1], [1,0,1,1,0,0]] ,4))
print("metric recall:",mt.recall( [[0.8,0.1,0.4,0.5,0.6,0.1], [0.8,0.1,0.4,0.5,0.6,0.1]],[[0,0,0,0,0,1], [1,0,1,1,0,0]] ,10))
print("metric precision:",mt.precision( [[0.8,0.1,0.4,0.5,0.6,0.1], [0.8,0.1,0.4,0.5,0.6,0.1]],[[0,0,0,0,0,1], [1,0,1,1,0,0]] ,10))
Previous刷题记录NextPython 笔记

Last updated 1 day ago

Was this helpful?