MacOS 15.2, python 3.12, PyCharm Pro latest.
A python project I am working on uses a package from another developer that requires docx2txt.
command “pip install docx2txt” finds the last docx2txt (0.8), and does not complain in an obvious manner. However, the only addition to site-packages is the folder “docx2txt-0.8-dist-info”. In python, “import docx2txt” fails.
This was working until about five days ago, when I created a new project in PyCharm with venv.
pip is up to date and functional. Other packages pip install as expected. I don’t think this is specific to PyCharm as I have been through every check the web and AI’s come up with.
The docx2txt package is no longer maintained. I cloned the project and attempted to update it to use setuptools, and added .toml file. However, I don’t know how to build python packages and it isn’t the focus of my tasks right now. So my attempt hasn’t been successful.
Why are you installing the package manually? With PyCharm, you can download it via a few simple steps. By default, PyCharm uses pip to manage project packages.
Here are the few simple steps:
Click on the Gear tool on the upper right-hand of IDE.
Go to Settings.
The Settings window will appear. Scroll down to Project: python project.
Click Python interpreter.
Click the + button.
Begin typing the package that you would like to install. Options should start automatically appearing. Select the desired package.
On the lower right-hand corner, click Install Package.
You are correct, of course. FTR, I have and do use that feature for all the package installs. I attempted manual installs, inside and outside of PyCharm, to see if the problem was specific to the IDE or not.
The result is the same regardless of the method. I believe that the problem is that because docx2txt is no longer maintained and has not been updated in years it is no longer compatible with python 3.12 package tools. Or something along those lines.
Yes, I am pleased to say I have. The dev is working on it. As I understand it, the dependency came along with another project, from one of the big outfits, that is used in the developer’s project.
What is odd is that my project was working fine until seven or ten days ago. docx2txt installed and my project loaded and ran without an error related to it. I do keep the IDE up to date and so sent a note to support at jetbrains in case that is related.
In case it sparks an idea, I am using home-brew for python and such things.
Being that, as you stated above, it is no longer maintained, is your colleague open to using an alternative? I know that this will require some legwork as it requires porting from a now “no longer maintained” library package to one that is. It will help you going forward, for sure, however.
However, for this one, there is an auto generated warning that is printed on the very first line of your text document. To remove it, you have to pay a license fee. There is a work around however. First, you will have to install the following package (all via PyCharm installer).
Spire.Doc
In the following script example, taken from the link above, I have added a work around code to remove the auto generated line.
# Changing directory to the Desktop so that you can easily
# find the new .txt file that is created - else, include path to your preference
import os
os.chdir(r'C:\Desktop')
from spire.doc import Document, FileFormat
document = Document()
# Create a Word file and save it to the Desktop - for testing purposes
document.LoadFromFile("Input.docx")
# Save the Word file in txt format
document.SaveToFile("WordToTxt.txt", FileFormat.Txt)
document.Close()
# Workaround to remove the first line from the text file (the auto generated message)
with open('WordToTxt.txt', 'r') as fin:
data = fin.read().splitlines(True)
with open('WordToTxt.txt', 'w') as fout:
fout.writelines(data[1:])
The workaround entails reading the file and writing it back minus the very first line. This is accomplished via slicing.