Website for PDF Extraction

Introduction

This project aims to provide a user-friendly platform for extracting PDFs from websites. Users can enter website URLs, and the server will convert them into PDFs, which can then be saved on the server. Users have the option to download the PDFs separately or merged.

Features

Convert Websites to PDF: Users can enter URLs of websites they wish to convert into PDFs.
Server-Side Conversion: The server will process the URLs and convert them to PDFs using Puppeteer.
Save PDFs on Server: The generated PDFs will be saved on the server, making them accessible for download.
Download PDFs: Users can download individual PDFs from the server.
Merge PDFs: Users have the option to merge multiple PDFs into a single PDF for easier downloading.

Getting Started

To get started with the PDF Extraction website, follow these steps:

Clone GitHub repository and open project in your preffered IDE
Write 'npm install' in terminal
Download NodeJS on your computer
Run local server using 'node server/server.js'
Open browser and access 'http://localhost:3000/'
You are ready to use the website

Usage

Add Website URLs:
- Text field for first URL is already visible
- In case you want to add additional URLs, click on the "Add New URL" button to add a new input field
- Enter the desired website URL's in the input box.
Convert Websites to PDFs:
- After adding the URLs, click on the "Extract PDFs" button to start the conversion process.
- The server will convert each website URL to a separate PDF file.
Save PDFs on Server:
- The converted PDFs will be saved on the server for easy access.
Download PDFs:
- Once the conversion is complete, a list of PDFs will be displayed on the website.
- Each PDF will have a "Download" link that allows users to download individual PDFs to their local device.
Merge PDFs (Optional):
- If desired, users can select multiple PDFs from the list and click on the "Merge PDFs and download" button.
- The server will combine the selected PDFs into a single merged PDF for easy downloading.
Deleting PDFs (Optional):
- If desired, user can delete PDFs from server manually (server will also delete old PDF files automatically).

Technologies Used

Frontend: HTML, CSS, JavaScript
Backend: Node.js, Express.js
PDF Conversion: Puppeteer

Goals

Enable downloading PDF files from server (DONE, works on localhost)
Enable auto deleting pdf files from server (DONE, PDF cache deleted on every reload of the server)
Delete URL box if user leaves it empty (DONE)
Implement loader (DONE)
Improve layout, fix visual problems (DONE)
Optimize CSS code (DONE)
Resolve issues with wrong user input
Implement merging

NOTE: Implementation of functionalities is still in progress, so there might be some issues using the website at the moment.

NOTE: Branch host fixes problems with hosting the website. Cache directory is now in repo

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
public		public
server		server
.gitignore		.gitignore
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Website for PDF Extraction

Introduction

Features

Getting Started

Usage

Technologies Used

Goals

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Website for PDF Extraction

Introduction

Features

Getting Started

Usage

Technologies Used

Goals

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages