Project2vec - a searchable database of projects on PyPI

Hi everyone,

as part of project Thoth, we have experimented with an idea to build a searchable database out of packages hosted on PyPI. The built solution uses keyword extraction and creates a state space of PyPI projects which can be further queried and explored. Please do not hesitate to let us know if you find the solution useful and would like to combine efforts in some way.

The solution is described in this article that also links to sources available in thoth-station/isis-api and thoth-station/selinon-worker repositories.

Thanks and have a great day!

1 Like

I didn’t see it in the linked article, so the likely obvious question—is the 2vec part indicating this uses some kind of word2vec algorithm?

1 Like

No, it does not use word2vec. 2vec part was chosen to indicate vectorized representation of a project.