{"id":21578,"date":"2018-12-21T09:06:50","date_gmt":"2018-12-21T00:06:50","guid":{"rendered":"http:\/\/www.techscore.com\/blog\/?p=21578"},"modified":"2018-12-21T09:49:04","modified_gmt":"2018-12-21T00:49:04","slug":"%e3%81%a1%e3%82%87%e3%81%a3%e3%81%a8%e3%81%be%e3%81%98%e3%82%81%e3%81%ab%e3%83%87%e3%83%bc%e3%82%bf%e5%88%86%e6%9e%90-group-by%e3%81%ae%e3%81%82%e3%82%8c%e3%81%93%e3%82%8c","status":"publish","type":"post","link":"https:\/\/www.techscore.com\/blog\/2018\/12\/21\/%e3%81%a1%e3%82%87%e3%81%a3%e3%81%a8%e3%81%be%e3%81%98%e3%82%81%e3%81%ab%e3%83%87%e3%83%bc%e3%82%bf%e5%88%86%e6%9e%90-group-by%e3%81%ae%e3%81%82%e3%82%8c%e3%81%93%e3%82%8c\/","title":{"rendered":"\u3061\u3087\u3063\u3068\u307e\u3058\u3081\u306b\u30c7\u30fc\u30bf\u5206\u6790: \"Group-By\"\u306e\u3042\u308c\u3053\u308c"},"content":{"rendered":"
\u3053\u308c\u306f \ud83d\ude3aTECHSCORE Advent Calendar 2018\ud83d\ude3a<\/a>\u306e21\u65e5\u76ee\u306e\u8a18\u4e8b\u3067\u3059\u3002<\/p>\n \u4eca\u56de\u307e\u3058\u3081\u306b\u30c7\u30fc\u30bf\u5206\u6790\u3092\u884c\u3046\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u306b\u643a\u308f\u308b\u3053\u3068\u306b\u306a\u308a\u3001\u305d\u306e\u53d6\u308a\u639b\u304b\u308a\u306b\u78ba\u8a8d\u3057\u305f\u3053\u3068\u3092\u5c11\u3057\u8a18\u4e8b\u306b\u3057\u307e\u3059\u3002<\/p>\n \u30c7\u30fc\u30bf\u5206\u6790\u306b\u304a\u3044\u3066Python\u306f\u4eca\u3084\u30c7\u30d5\u30a1\u30af\u30c8\u30b9\u30bf\u30f3\u30c0\u30fc\u30c9\u306a\u5b58\u5728\u3067\u3059\u3002 \u666e\u6bb5\u306f\u30a4\u30f3\u30d5\u30e9\u5bc4\u308a\u306e\u4ed5\u4e8b\u304c\u591a\u3044\u8457\u8005\u3067\u3059\u304c\u3001\u5143\u3005\u306f\u30d7\u30ed\u30b0\u30e9\u30df\u30f3\u30b0\u306e\u307b\u3046\u304c\u597d\u304d\u306a\u306e\u3067(\u904a\u3076\u305f\u3081\u306b)\u4f55\u304b\u3068Python\u3092\u4f7f\u3044\u307e\u3059\u3002 \u3061\u3087\u3063\u3068\u3057\u305f\u30ed\u30b0\u96c6\u8a08\u3082Pandas\u30e2\u30b8\u30e5\u30fc\u30eb\u306a\u3093\u304b\u3092\u4f7f\u3048\u3070<\/p>\n \u3068\u3044\u3046\u5177\u5408\u306bLinux\u30b3\u30de\u30f3\u30c9\u306esort & uniq\u30d5\u30a3\u30eb\u30bf\u306e\u3088\u3046\u306b\u30ef\u30f3\u30e9\u30a4\u30ca\u30fc\u3067\u66f8\u3051\u3066\u3057\u307e\u3044\u307e\u3059\u3002<\/p>\n \u3053\u3053\u304b\u3089\u672c\u984c\u3067\u3059\u304c\u3001\u30c7\u30fc\u30bf\u5206\u6790\u3068\u3044\u3046\u306e\u306f\"\u30c7\u30fc\u30bf\u3092\u5206\u985e\u3057\u3001\u305d\u308c\u3089\u306b\u4f55\u67d0\u304b\u306e\u51e6\u7406\u3092\u884c\u3044\u3001\u305d\u306e\u7d50\u679c\u3092\u7d50\u5408\u3059\u308b\"\u3068\u3044\u3046\u64cd\u4f5c\u306e\u30a4\u30c6\u30ec\u30fc\u30b7\u30e7\u30f3\u3067\u3059\u3002 \u305d\u3053\u3067\u52b9\u7387\u7684\u306a\u30c7\u30fc\u30bf\u5206\u6790\u3092\u884c\u3046\u305f\u3081\u306bPython\u306b\u304a\u3051\u308b\u3044\u304f\u3064\u304b\u306e\u5206\u985e(Group-By)+\u96c6\u8a08\u306e\u65b9\u6cd5\u3092\u78ba\u8a8d\u3057\u3066\u307f\u307e\u3057\u305f\u3002 <\/p>\n \u307e\u305a\u611a\u76f4\u306b\u30eb\u30fc\u30d7\u5b9f\u884c\u3092\u3057\u3066\u30c7\u30a3\u30af\u30b7\u30e7\u30ca\u30ea\u30c7\u30fc\u30bf\u306b\u8db3\u3057\u4e0a\u3052\u3066\u307f\u307e\u3057\u3087\u3046\u3002<\/p>\n \u3053\u3053\u3067\u306f\u691c\u8a3c\u7528\u306b105<\/sup>\u306e\u507d\u30c7\u30fc\u30bf\u3092\u7528\u610f\u3057\u3066\u305d\u308c\u3092\u96c6\u8a08\u3057\u307e\u3059\u3002\u5404\u30c7\u30fc\u30bf\u306f100\u7a2e\u985e\u306e\u9805\u76ee\u306e\u3046\u3061\u306e\u3044\u305a\u308c\u304b1\u3064\u3067\u3001\u5404\u3005\u306f0\u304b1\u306e\u5024\u3092\u6301\u3061\u307e\u3059\u3002pseudo_data\u306f\u507d\u30c7\u30fc\u30bf\u3092\u751f\u6210\u3059\u308b\u95a2\u6570\u3001group_by\u304c\u30b0\u30eb\u30fc\u30d4\u30f3\u30b0+\u96c6\u8a08\u3092\u3059\u308b\u95a2\u6570\u3067\u3059\u3002\u96c6\u8a08\u6642\u9593\u306e\u8a08\u6e2c\u306b\u306ftimeit\u30e2\u30b8\u30e5\u30fc\u30eb\u3092\u4f7f\u3044\u3001100\u56de\u5b9f\u884c\u3057\u305f\u3068\u304d\u306e\u5e73\u5747\u5b9f\u884c\u6642\u9593\u3092\u6c42\u3081\u3066\u3044\u307e\u3059\u3002<\/p>\n <\/p>\n Python\u3067\u306f\u30eb\u30fc\u30d7\u5b9f\u884c\u306e\u305f\u3081\u306e\u30a4\u30c6\u30ec\u30fc\u30bf\u751f\u6210\u306e\u305f\u3081\u306b\u306fitertools\u3092\u4f7f\u3044\u307e\u3059\u304c\u3001\u3053\u306e\u30e2\u30b8\u30e5\u30fc\u30eb\u306b\u3082groupby\u30a4\u30c6\u30ec\u30fc\u30bf\u304c\u7528\u610f\u3055\u308c\u3066\u3044\u307e\u3059\u3002 \u3068\u3044\u3046\u5177\u5408\u3067\u3057\u3087\u3046\u304b\u3002\u96c6\u8a08\u306e\u901f\u5ea6\u306f<\/p>\n \u304b\u306a\u308a\u9045\u3044\u3067\u3059\u306d\u3002itertools.groupby\u3092\u610f\u56f3\u901a\u308a\u306b\u4f7f\u7528\u3059\u308b\u305f\u3081\u306b\u306f\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u306f\u30bd\u30fc\u30c8\u6e08\u307f\u3067\u306a\u3044\u3068\u3044\u3051\u306a\u3044\u305f\u3081\u3001\u3053\u306e\u30a4\u30c6\u30ec\u30fc\u30b7\u30e7\u30f3\u3067\u4f59\u8a08\u306a\u6642\u9593\u304c\u304b\u304b\u3063\u3066\u3044\u305d\u3046\u3067\u3059\u3002<\/p>\n <\/p>\n \u5192\u982d\u306b\u3082\u3046\u51fa\u3057\u3066\u3057\u307e\u3044\u307e\u3057\u305f\u304c\u3001\u30c7\u30fc\u30bf\u5206\u6790\u7528\u30e9\u30a4\u30d6\u30e9\u30eaPandas\u3092\u7528\u3044\u308b\u3068\u8a18\u8ff0\u3082\u7c21\u5358\u3002<\/p>\n \u5b9f\u884c\u7d50\u679c\u306f<\/p>\n \u3068\u305d\u3053\u305d\u3053\u901f\u3044\u3002\u5b9f\u306f\u3053\u308c\u307e\u3067Pandas\u3092\u4f7f\u7528\u3057\u3066\u3044\u3066\u3082\u901f\u3044\u3068\u611f\u3058\u305f\u3053\u3068\u304c\u306a\u304b\u3063\u305f\u306e\u3067\u5c11\u3057\u610f\u5916\u3067\u3059\u3002<\/p>\n <\/p>\n \u3082\u3063\u3068\u9ad8\u52b9\u7387\u306a\u5b9f\u88c5\u304c\u306a\u3044\u304b\u3068\u8abf\u3079\u3066\u307f\u305f\u3068\u3053\u308d\u3001\u30b9\u30d1\u30fc\u30b9\u884c\u5217\u3092\u7528\u3044\u308b\u65b9\u6cd5\u304c\u3042\u308b\u3068\u306e\u3053\u3068\u3002\u307e\u305fPandas\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u306b\u3088\u308b\u3068Multi-index DataFrame\u304b\u3089\u7c21\u5358\u306bSciPy\u30e2\u30b8\u30e5\u30fc\u30eb\u306e\u30b9\u30d1\u30fc\u30b9\u884c\u5217\u3092\u4f5c\u6210\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u305d\u3046\u3002<\/p>\n \u3053\u308c\u3092\u7528\u3044\u3066group_by\u95a2\u6570\u3092\u66f8\u304d\u76f4\u3057\u305f\u3082\u306e\u304c\u4ee5\u4e0b\u306e\u901a\u308a\u3067\u3059\u3002<\/p>\n \u3053\u3053\u3067\u306f\u884c\u30fb\u5217\u30fb\u5024\u306e\u30bf\u30d7\u30eb\u306e\u30ea\u30b9\u30c8\u3068\u3057\u3066\u8868\u73fe(COO: coordinate format)\u3055\u308c\u308b\u30b9\u30d1\u30fc\u30b9\u884c\u5217\u3092\u4f7f\u3044\u307e\u3059\u3002\u306a\u304a\u884c\u5217\u8981\u7d20\u306e\u30e9\u30f3\u30c0\u30e0\u30a2\u30af\u30bb\u30b9\u3092\u52b9\u7387\u5316\u3059\u308b\u305f\u3081\u306b\u30bd\u30fc\u30c8\u6e08\u307f\u306e\u30a4\u30f3\u30c7\u30c3\u30af\u30b9\u3092\u6307\u5b9a\u3057\u3066DataFrame\u3092\u4f5c\u6210\u3057\u3001\u96c6\u8a08\u306b\u306fscipy.sparse.coo_matrix\u306b\u7528\u610f\u3055\u308c\u3066\u3044\u308bsum\u95a2\u6570\u3092\u7528\u3044\u3066\u5408\u7b97\u3057\u3066\u3044\u307e\u3059\u3002<\/p>\n \u3057\u304b\u3057\u5b9f\u884c\u7d50\u679c\u306f\u305d\u308c\u307b\u3069\u901f\u304f\u306a\u3044\u3002<\/p>\n <\/p>\n \u3082\u3046\u5c11\u3057\u50be\u5411\u3092\u898b\u3066\u307f\u308b\u305f\u3081\u306b\u30b0\u30eb\u30fc\u30d7\u6570(G)\u3001\u30c7\u30fc\u30bf\u30b5\u30a4\u30ba(N)\u306b\u5bfe\u3059\u308b\u8a08\u7b97\u6642\u9593\u3092\u56f3\u306b\u7e8f\u3081\u3066\u307f\u307e\u3057\u305f\u3002<\/p>\n <\/p>\n \u30c7\u30fc\u30bf\u30b5\u30a4\u30ba\u304c\u5897\u3048\u308b\u3068\u8a08\u7b97\u6642\u9593\u3082\u6bd4\u4f8b\u3057\u3066\u5897\u3048\u308b\u306e\u306f\u81ea\u660e\u3067\u3059\u306d\u3002Pandas\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u3067\u5c0f\u7d71\u8a08\u30c7\u30fc\u30bf\u3092\u6271\u3046\u3068\u304d\u306b\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u304c\u52a3\u5316\u3057\u3066\u308b\u306e\u306f\u304d\u3061\u3093\u3068\u5b9f\u88c5\u3092\u8ffd\u3063\u3066\u3044\u306a\u3044\u306e\u3067\u7406\u89e3\u3067\u304d\u3066\u3044\u307e\u305b\u3093\u3002\u4ee5\u524d\u304b\u3089\u9045\u3044\u3068\u611f\u3058\u3066\u3044\u305f\u306e\u306f\u3053\u306e\u3042\u305f\u308a\u306e\u9818\u57df\u3067\u4f7f\u7528\u3057\u3066\u3044\u305f\u304b\u3089\u3067\u3057\u3087\u3046\u304b\u2026 \u3053\u306e\u50be\u5411\u306f\u30b0\u30eb\u30fc\u30d7\u6570\u3092\u5897\u6e1b\u3057\u3066\u3082\u3042\u307e\u308a\u5909\u308f\u308a\u307e\u305b\u3093\u3002\u30b0\u30eb\u30fc\u30d7\u6570\u304c\u5897\u3048\u3066\u304f\u308b\u3068\u30b9\u30d1\u30fc\u30b9\u884c\u5217\u306b\u3088\u308b\u8a08\u7b97\u306f\u901f\u5ea6\u3082\u5b89\u5b9a\u3057\u3001\u826f\u3044\u7d50\u679c\u3092\u793a\u3057\u3066\u3044\u307e\u3059\u3002<\/p>\n (\u30d5\u30a7\u30a2\u306a\u6bd4\u8f03\u306e\u305f\u3081\u306b\u30ad\u30e3\u30c3\u30b7\u30e5\u306b\u8f09\u305b\u3066\u5b9f\u884c\u3057\u3001\u307e\u305f\u5404\u70b9\u306e\u6bd4\u8f03\u306f\u540c\u4e00\u306e\u30c7\u30fc\u30bf\u30b5\u30f3\u30d7\u30eb\u3092\u4f7f\u3063\u3066\u3044\u307e\u3059\u3002)<\/p>\n <\/p>\n \u7121\u6570\u306e\u30a4\u30c6\u30ec\u30fc\u30b7\u30e7\u30f3\u3092\u884c\u3046\u30c7\u30fc\u30bf\u5206\u6790\u306b\u304a\u3044\u3066\u30b0\u30eb\u30fc\u30d4\u30f3\u30b0\u51e6\u7406\u306e\u901f\u5ea6\u306f\u975e\u5e38\u306b\u91cd\u8981\u3067\u3059\u3002\u4f55\u3082\u610f\u8b58\u3057\u306a\u3044\u3067\u5b9f\u88c5\u3057\u3066\u3057\u307e\u3046\u3068\u4f5510\u500d\u3082\u51e6\u7406\u304c\u9045\u304f\u306a\u3063\u3066\u3057\u307e\u3046\u53ef\u80fd\u6027\u304c\u3042\u308a\u307e\u3059\u3002\u6271\u3046\u30c7\u30fc\u30bf\u306e\u30b5\u30a4\u30ba\u3084\u5206\u985e\u306e\u7c92\u5ea6\u3092\u8003\u3048\u3066\u30c7\u30fc\u30bf\u5206\u6790\u306b\u7740\u624b\u3059\u308b\u6bb5\u968e\u304b\u3089\u3061\u3083\u3093\u3068\u6c17\u3092\u3064\u3051\u305f\u3044\u3067\u3059\u306d\u3002<\/p>\n <\/p>\n","protected":false},"excerpt":{"rendered":" \u3053\u308c\u306f \ud83d\ude3aTECHSCORE Advent Calendar 2018\ud83d\ude3a\u306e21\u65e5\u76ee\u306e\u8a18\u4e8b\u3067\u3059\u3002<\/p>\n \u4eca\u56de\u307e\u3058\u3081\u306b\u30c7\u30fc\u30bf\u5206\u6790\u3092\u884c\u3046\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u306b\u643a\u308f\u308b\u3053\u3068\u306b\u306a\u308a\u3001\u305d\u306e\u53d6\u308a\u639b\u304b\u308a\u306b\u78ba\u8a8d\u3057\u305f\u3053\u3068\u3092\u5c11\u3057\u8a18\u4e8b\u306b\u3057\u307e\u3059\u3002KEY,VALUE\na,10\nb,12\nc,3\nb,3\nc,5<\/pre>\n
python -c \"import pandas; csv = pandas.read_csv('hoge.csv'); print(csv.groupby(csv.KEY).sum())\"\n VALUE\nKEY\na 10\nb 15\nc 8<\/pre>\n
\n\u7c21\u5358\u306a\u96c6\u8a08\u30fb\u30ef\u30f3\u30d1\u30b9\u306e\u51e6\u7406\u306a\u3089\u3070\u4e0a\u8a18\u306e\u3088\u3046\u306b\u8efd\u5fae\u306b\u6e08\u307e\u305b\u308c\u3070\u826f\u3044\u3067\u3059\u304c\u3001\u5c11\u3057\u5927\u304d\u306a\u30c7\u30fc\u30bf\u3092\u6271\u3046\u969b\u306f\u3053\u306e\u64cd\u4f5c\u306e\u30b9\u30d4\u30fc\u30c9\u304c\u91cd\u8981\u306b\u306a\u3063\u3066\u304d\u307e\u3059(\u30a4\u30c6\u30ec\u30fc\u30b7\u30e7\u30f3\u304c10\u306e\u4f55\u4e57\u3082\u306b\u3082\u306a\u308b\u5834\u5408\u3092\u8003\u3048\u3066\u307f\u308c\u3070\u305d\u306e\u610f\u5473\u304c\u308f\u304b\u308a\u307e\u3059\u3088\u306d)\u3002
\n\u3061\u3083\u3093\u3068\u3057\u305f\u5206\u6563\u30c7\u30fc\u30bf\u51e6\u7406\u57fa\u76e4\u304c\u3042\u308c\u3070\u826f\u3044\u306e\u3067\u3059\u304c\u3001\u3061\u3087\u3063\u3068\u3057\u305f\u8abf\u67fb\u30fb\u691c\u8a3c\u306e\u305f\u3081\u3060\u3051\u306b\u7528\u610f\u3059\u308b\u308f\u3051\u306b\u3082\u3044\u304b\u306a\u3044\u3067\u3059\u3002<\/p>\n
\n\u306a\u304a\u4eca\u56de\u306e\u691c\u8a3c\u74b0\u5883\u306f\u4ee5\u4e0b\u306e\u901a\u308a\u3067\u3059\u3002<\/p>\n\n
1. \u611a\u76f4\u306b\u8db3\u3057\u4e0a\u3052\u308b<\/h4>\n
from collections import defaultdict \nimport random\nimport timeit\n\n\ndef pseudo_data(n, size, seed=12345678):\n random.seed = seed\n keys = [\"KEY-{}\".format(i) for i in random.choices(range(0, n), k=size)]\n values = random.choices([0, 1], k=size)\n return keys, values\n\n\ndef group_by(keys, vals):\n d = defaultdict(int)\n for key, val in zip(keys, vals):\n d[key] += val\n return d\n\n\nglobal keys, values\nkeys, values = pseudo_data(n=10**2, size=10**5)\nresult = timeit.timeit(\"group_by(keys, values)\", globals=globals(), number=100)\nprint(\"Average time: {:.2e} sec.\".format(result\/100))\n<\/pre>\n
Average time: 1.11e-02 sec.<\/pre>\n
2. itertools\u3092\u4f7f\u3046<\/h4>\n
\n\u3053\u306e\u30a4\u30c6\u30ec\u30fc\u30bf\u3092\u4f7f\u3063\u3066\u4e0a\u8a18\u306egroup_by\u95a2\u6570\u3092\u3056\u3063\u304f\u308a\u66f8\u304d\u63db\u3048\u308b\u3068<\/p>\nfrom itertools import groupby\n\ndef group_by(keys, vals):\n data = zip(keys, vals)\n sorted_pairs = sorted(data, key=lambda x: x[0])\n return {key: sum(map(lambda x: x[1], val)) for key, val in groupby(sorted_pairs, lambda x: x[0])}<\/pre>\n
Average time: 9.54e-02 sec.<\/pre>\n
3. Pandas\u3092\u4f7f\u3046<\/h4>\n
import pandas\n\ndef group_by(keys, vals):\n return pandas.Series(vals).groupby(keys).sum().to_dict()<\/pre>\n
Average time: 1.16e-02 sec.<\/pre>\n
4. \u66f4\u306a\u308b\u9ad8\u307f\u3078<\/h4>\n
import numpy\nimport pandas\n\ndef group_by(keys, vals):\n unique_keys, rows = numpy.unique(keys, return_inverse=True)\n columns = numpy.arange(len(keys))\n df = pandas.DataFrame(vals, index=[rows, columns])\n matrix = df.to_sparse().to_coo()\n return dict(zip(unique_keys, matrix.sum(axis=1).flat))\n<\/pre>\n
Average time: 3.46e-02 sec.<\/pre>\n
\n
<\/a><\/p>\n
\u3055\u3044\u3054\u306b<\/h4>\n
\u7d9a\u304d\u3092\u8aad\u3080...<\/a><\/p>\n","protected":false},"author":81,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[308,317,18],"tags":[141,120],"_links":{"self":[{"href":"https:\/\/www.techscore.com\/blog\/wp-json\/wp\/v2\/posts\/21578"}],"collection":[{"href":"https:\/\/www.techscore.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.techscore.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.techscore.com\/blog\/wp-json\/wp\/v2\/users\/81"}],"replies":[{"embeddable":true,"href":"https:\/\/www.techscore.com\/blog\/wp-json\/wp\/v2\/comments?post=21578"}],"version-history":[{"count":58,"href":"https:\/\/www.techscore.com\/blog\/wp-json\/wp\/v2\/posts\/21578\/revisions"}],"predecessor-version":[{"id":21813,"href":"https:\/\/www.techscore.com\/blog\/wp-json\/wp\/v2\/posts\/21578\/revisions\/21813"}],"wp:attachment":[{"href":"https:\/\/www.techscore.com\/blog\/wp-json\/wp\/v2\/media?parent=21578"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.techscore.com\/blog\/wp-json\/wp\/v2\/categories?post=21578"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.techscore.com\/blog\/wp-json\/wp\/v2\/tags?post=21578"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}