Python pyppeteer/puppeteer library - issues with dependencies and/or headless chromium

Hello! I’m a new user who has recently switched from macOS to Linux. I really enjoy using Fedora 35 on my oldish MacBook Air. It’s really performant! However, I have encountered an issue with a Python library that prevents me from developing my project.

Background information
I’m developing a Python (3.10.1) project that uses the pyppdf library. The pyppdf library depends on a puppeteer port – pyppeteer.

The issue
The pyppdf library uses puppeteer and runs headless chromium to generate PDFs from HTML. During the first run, pyppdf downloads a specific chromium version (86.0.4240.0).

Unfortunately, the HTML->PDF conversion never completes because chromium never launches (AFAIK) and my program is stuck until I stop it.

Troubleshooting attempts

  • I tried installing the suggested CentOS dependencies (some of them are not available for Fedora(?)): puppeteer/troubleshooting.md at main · puppeteer/puppeteer · GitHub
  • I tried installing various chromium and webdriver dnf packages
  • I tried running the chromium instance that was downloaded by pyppdf. It throws errors and the GUI app is virtually not functional.
/home/rk/.local/share/pyppeteer/local-chromium/800217/chrome-linux/nacl_helper
[1:1:0112/222303.861702:ERROR:nacl_fork_delegate_linux.cc(329)] Bad NaCl helper startup ack (0 bytes)
[7670:7704:0112/222303.998922:ERROR:login_database.cc(663)] Password store database is too new, kCurrentVersionNumber=27, GetCompatibleVersionNumber=31
[7670:7704:0112/222303.999306:ERROR:password_store_default.cc(41)] Could not create/open login database.
[7670:7704:0112/222303.999343:ERROR:password_store_x.cc(225)] Could not start the migration into the encrypted LoginDatabase because the database failed to initialise.
[7699:7699:0112/222304.134663:ERROR:sandbox_linux.cc(374)] InitializeSandbox() called with multiple threads in process gpu-process.
  • I tried running the chromium instance downloaded by pyppdf with the following flags: ./chrome -v --no-sandbox --disable-setuid-sandbox. It throws the same errors but the GUI app is operable (websites are rendered). There’s also no way for me to pass these flags in my Python projects (I think).
  • I downloaded the newest chromium and ran it from the terminal with no issues

Final thoughts
My app and the pyppdf library work just fine on macOS and Ubuntu with the same Python interpreter version.
I guess that there must be something wrong with the dependencies or the way that Fedora handles older chromium versions.

I’d greatly appreciate your help! I googled a ton of threads before writing this post and found no solution.

1 Like

Hello,

That’s a cool python library. Thanks for sharing.

I tried it and it works perfectly on my ‘test’ environment. The problem is I have a bunch python packages installed from testing other things so I do not know if its dependencies are met by chance but pyppdf works out-of-the-box for me.

I am on Fedora 34 and I am using Pycharm with python 3.9 venv and the latest (0.1.2) pyppdf. Maybe try a clean virtual environment and see what error pyppdf throws. Perhaps you were working python 3.1 and an older version of pyppdf? By the way, I see that depending on the website it could take a while to download it and write the PDF.

Also, how do you use it on your laptop? Do you use an IDE at all? I tested it from a python console in Pycharm with pyppdf.save_pdf('path/to/output_file', 'https://target_website.html') and it’s fine.

Chromium also works fine. When I start it it throws the same errors as yours but it opens some error window about my profile and it just works when I close it. I can imagine --no-sandbox flag is highly undesirable so try to avoid it if possible.

1 Like

Hi @stiky, thank you for your answer! I’m glad that you’ve found the lib interesting :slight_smile:

I’m using Visual Studio Code as my code editor. I tried to replicate your environment by using Python 3.9 and the latest 0.1.2 pyppdf lib.
I ran the following code both in the integrated terminal and as a .py file.
pyppdf.save_pdf('/home/rk/git/test/', 'https://rafalkaron.github.io')

Unfortunately, the issue persists. I wait and wait and nothing happens, the code’s stuck.
I also tried converting a local .html file (I actually do that in my project) and removing the /home/rk/.local/share/pyppeteer/ directory prior to running the code.

It looks like that the most obvious differences between our environments are:

  • PyCharm vs. VSCode
  • Fedora 34 vs. Fedora 35

I’ll try to install PyCharm and maybe it’ll install some magical dependencies that’ll save the day. However, I’m inclined to think that there my Fedora version is the culprit here.

Edit: I tried running the code again in PyCharm and I got the following error:

>>> import pyppdf
>>> pyppdf.save_pdf('/home/rk/git/test/', 'https://rafalkaron.github.io')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rk/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/pyppdf/pyppeteer_pdf.py", line 256, in save_pdf
    return asyncio.get_event_loop().run_until_complete(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/rk/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/pyppdf/pyppeteer_pdf.py", line 115, in main
    browser = await launch(*_launch.args, **_launch.kwargs)
  File "/home/rk/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/pyppeteer/launcher.py", line 307, in launch
    return await Launcher(options, **kwargs).launch()
  File "/home/rk/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/pyppeteer/launcher.py", line 168, in launch
    self.browserWSEndpoint = get_ws_endpoint(self.url)
  File "/home/rk/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint
    raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

I’d greatly welcome any further ideas on how to solve my issue!

Hi, I just want to give my thought. Not sure if it help or not.

If we install Chromium, there should be a custom config for each user inside ~/.config/chromium. I’m just wonder if you have different Chromium version before (like say newer version) and you installing the older one, then the older Chromium version will use this config directory or any other local config related to Chromium inside home directory.

I’m not sure how to solve this, but may be you need to remove all Chromium from your system except the version you need. Then test it by creating new user to try using clean config from your desired Chromium version (or maybe just clean all Chromium config directory and other related to it in your current user home folder).

Since above messages look like because of Chromium version incompatibility.

I created Fedora35 virtual machine to test and finally got it working but I have to say I think the issue was with Fedora 35. I tried F35 before and many things stopped working so I downgraded to F34 again. A bit disappointing to be honest. Anyway, sorry for the rant.

So basically I think Chromium that comes with pppdf is too old for F35. I downloaded and installed manually Chromium-headless but I think you can do sudo dnf install chromium-headless. Then I went to the install folder /usr/lib64/chromium-browser and copied the contents of that folder to /home/USER/local/share/pyppeteer/local-chromium/800217/chrome-linux. Just copy/paste and replace and merge everything. Finally delete the original chrome executable and rename chromium-browser to chrome.

Now pyppdf is working as expected and I can launch the pyppeteer version of Chrome just like the normal browser. One more thing, when I first tried to use pyppdf I did not get the same errors as you, so I hope this solution works for you.

Thank you both for helping me out! My issue has just gotten more interesting.

I started off by removing the chromium config and it did resolve the chromium-profile mismatch errors! Thanks, @oprizal!

However, I still couldn’t get pyppdf working… I tried to force-update the pyppdf chromium version by using @stiky’s method. It didn’t help so I decided to test it out on a cleaner environment and… I caused a kernel panic error :smiley:

Steps to reproduce:

  1. Create a new Fedora user.
  2. Install pyppdf, test-run. The issue persists - the code’s stuck right after pyppdf finishes downloading chromium.
  3. Download a newer chromium version and try to run it in the terminal.
  4. The downloaded chromium won’t start at all because of some missing dependencies.
  5. Run sudo dnf update and sudo dnf upgrade (If I recall correctly).
  6. Some regex version mismatch errors show-up in the terminal (rel? Regex version mismatch, expected: 10.39 Fedora 35) and the system freezes.
  7. Power-cycle the machine to find the following error: Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
    Note: I tried to boot from different kernel versions but it didn’t help.

Fortunately, I have my macOS on another physical drive so my laptop is still functional :slight_smile: However, I’m not ready to give up on Fedora and I’d appreciate your help! Is there any hope restore my system?

I see that there is a solution already in the post you linked. Have you tried that? I cannot help you with the kernal panic but if it’s relevant just try the solution in the other post. Last time I did something similar to my OS I wiped everything. Since then I keep my /home folder on a separate drive.

As for your original issue, if you manage to recover your system, try to test in a clean environment. I believe my solution should work for you because I tried it on Fedora 35. Worst case scenario try with Fedora 34.

Since look like you won’t able to boot to Fedora, may be you can run the solution from the link (sudo semodule -B) with chroot.

Hi again!

@stiky, I tried your chromium workaround and it worked perfectly on a clean F35 installation (VM). Thanks!

@oprizal, thank you for linking the article about the chroot command. I have successfully chrooted into my broken Fedora 35 instance but running the suggested command didn’t help.

I couldn’t really run anything because I constantly got the following error:

error while loading shared libraries: /lib64/libpcre2-8.so.0: cannot read file data

It looks like I removed some critical lib while trying to resolve my original chromium issue.

Anyways, I think that I’ll reinstall the system because recovering my old one could be more time consuming. Also, I couldn’t determine what was causing the pyppdf error there.

I’ll try to be more careful in the future :slight_smile:

EDIT: Would you recommend any backup solutions? I can see that Timeshift does not like F35 much…

I use timeshift, the other alternative usually I use snapper but it’s confusing.

Yes, timeshift from official repos not work with F35. I created copr repos here to work with F35. TimeShift always save when I messed up somethings. :smiley:

If you’re using Timeshift and want to use btrfs snapshot, it need to rename the btrfs subvolume of your root to @ and subvolume home to @home. Edit fstab and update the grub. Also need to update the boot parameter with sudo grubby to pointing new subvolume name.