Python vs C++ on sorting dict

I did a simple test which puts an uuid and a random as key to a dict, and the value to the dict is another random. If I put 10 thousand entries in the dict and sort it, it takes about 2 minutes with CPython or PyPy. numba can not run this script.

But it takes just less than 1 second in an equivalent in C++. So is Python normally hundreds times slower than C++?

fff8534b-ce71-430d-a58e-c38026756af2 0.87239 0.257589
fffad9fa-eb19-45e4-ba5e-2d303efbe5b3 0.928528 0.761864
fffeae1d-7b51-4fb0-a02e-48c51d00422d 0.816001 0.522104

EDIT:
My code was wrong. If I move sorted() out of loop, it is as fast as C++ version for sorting 1 million entries in dict.

import random
import math
import string
import uuid
from numba import jit

class K:
    def __init__(self, a='', b=0.0):
        self.a = a
        self.b = b

    def __hash__(self):
        return hash((self.a, self.b))

    def __eq__(self, other):
        return (self.a, self.b) == (other.a, other.b)

    def __lt__(self, other):
        return (self.a, self.b) < (other.a, other.b)

@jit(nopython=True)
def f():
    d = {}
    for i in range(10_000): 
        d[K(uuid.uuid4(), random.random())] = random.random() 
    d = dict(sorted(d.items(), key=lambda item: item[0]))  # out of loop

f()

The C++ version as following:

#include <stdlib.h>
#include <time.h>
#include <iostream>
#include <map>
#include <tuple>
using namespace std;
#include <boost/uuid/uuid.hpp>
#include <boost/uuid/uuid_io.hpp>
#include <boost/uuid/uuid_generators.hpp>

struct K {
    string a;
    double b;
    K(string a, double b){
        this->a = a;
        this->b = b;
    }
    friend bool operator == (const K &a, const K &b){
        return make_tuple(a.a, a.b) == make_tuple(b.a, b.b);
    }
    friend bool operator < (const K &a, const K &b){
        return make_tuple(a.a, a.b) < make_tuple(b.a, b.b);
    }
};

size_t f(){
    srand((unsigned)time(NULL));
    map<K, double> m;
    for (int i = 0; i != 10000; i++){
        string a = to_string(boost::uuids::random_generator()());
        double b = rand() / (double)RAND_MAX;
        m[K(a, b)] = rand() / (double)RAND_MAX;
    }

    return m.size();
}

int main(){
    cout << f() << endl;
    return 0;
}

You’re asking Python to sort the dict 10_000 times – after each insertion.
If you move the d = dict(sorted(d… line out of the loop you’ll find it’s much faster.

1 Like

Thanks Petr,

You’re right. If I indent the sorted out of loop and let it sort 1 million entries. It takes about 20 seconds which is same as C++ version with -O3 optimization. The PyPy takes 10 more seconds.

It seems Python is just as fast as C++ :slight_smile: Why some people say Python is slow.

$ date && python3 main.py && date
Wed Mar 23 20:38:08 CST 2022
1000000
Wed Mar 23 20:38:28 CST 2022
$
$ date && ~/Downloads/pypy3.9-v7.3.8-osx64/bin/pypy main.py && date
Wed Mar 23 20:38:31 CST 2022
1000000
Wed Mar 23 20:39:04 CST 2022
$

$ make && date && ./main && date
make: `main’ is up to date.
Wed Mar 23 21:10:03 CST 2022
1000000
Wed Mar 23 21:10:24 CST 2022
$

The truth is that you can’t say a language is faster or slower than another – it all depends on the specific code, what parts of the languages in question you use and how you use them. Here, your processing time is likely dominated by the sorting. CPython’s algorithm is Timsort, which is a smart and heavily optimized algorithm designed by Tim Peters specifically for Python. I would not be surprised if the magic of Timsort is what compensates for the overhead of interpreting Python code over running compiled and optimized C++ code.

Usually, if you rewrite a Python program in C/C++/Rust, you will find the new version faster, yes, but there are many "however"s. One is if you are writing numeric code. You can use NumPy, which is itself written in C and bypasses the Python interpreter for vector operations, which means that if your code is spending most of its time in the numeric calculations, it will likely be as fast as C code, or faster since the NumPy folks spend a lot of time optimizing NumPy. Another is that for many tasks, you can gain much more by finding an optimal algorithm than through microbenchmarks. Since Python is more convenient than these languages (especially C/C++), it will be easier to experiment with alternative algorithms. A last one is that if you do expensive operations for which Python has built-in functions, such as sorting, it will likely be as fast because CPython itself is written in C and its algorithmic parts (string matching, sorting, etc.) have received the attention of excellent algorithmicians.

3 Likes

Thanks Jean!

Hi Petr,

I did a new test on C++. The map in C++ will sort its elements everytime when new one is added. I thought this could be slower because there are so many sorting.

So I used unordered_map in my new test. It will not sort its elements everytime when new one is added. Because it is hash map in C++ and it can not be sorted.

I will then sort it only once after all the elements are added into unordered_map by converting unordered_map to map:

unordered_map<K, double> um;
...
map m(um.begin(), um.end());  // construct map from unordered_map

But, at the end, the new test with unordered_map is not faster but slower than the old test with only map.

#include <unordered_map>

namespace std {
    template <>
    struct hash<K> {
        std::size_t operator()(const K& k) const {
            return hash<string>()(k.a) ^ hash<double>()(k.b);
        }
    };
}

size_t f(){
    srand((unsigned)time(NULL));
    unordered_map<K, double> um;
    for (int i = 0; i != 1000000; i++){
        string a = to_string(boost::uuids::random_generator()());
        double b = rand() / (double)RAND_MAX;
        um[K(a, b)] = rand() / (double)RAND_MAX;
    }
    map m(um.begin(), um.end());  // construct map from unordered_map

    return m.size();
}

BTW, Timsort used in Python is written in C, right?

Thanks

map keeps entries in sorted order, but does not sort after each entry is added. Instead it uses a form of binary search tree under the covers to store the entries. If there are N elements in a map, adding a new entry takes O(log N) time. If it sorted instead, adding an entry would take O(N * log N) time. Huge difference.

In CPython (the C implementation of Python distributed by python.org), yes.

3 Likes

Thank you, Tim