---
title: "Smalot\\Pdf Parser Library for Joomla - WebTolk websites development, Joomla Extensions"
description: "Pdf parser library. Can read and extract information from pdf file."
url: "https://web-tolk.ru/en/dev/biblioteki/smalot-pdf-parser-php-biblioteka-dlya-joomla"
date: "2026-06-13T22:30:27+00:00"
language: "en-GB"
---

# Smalot\Pdf Parser Library for Joomla

- **Categories:** [Libraries](https://web-tolk.ru/en/dev/biblioteki), [Joomla 4 - Joomla 6extensions](https://web-tolk.ru/en/dev/rasshireniya-dlya-joomla-4)
- **Version:** 2.1.0
- **Date:** 17 March 2022

4436 8879 CTR 50% Lib Free

[Download](https://web-tolk.ru/en/get?element=smalotpdfparser)[Versions](https://web-tolk.ru/en/dev/biblioteki/smalot-pdf-parser-php-biblioteka-dlya-joomla/versions)[GitHub](https://github.com/smalot/pdfparser)

Pdf parser library. Can read and extract information from pdf file. There library is wrapped for Joomla 3 and Joomla 4

![Smalot\Pdf Parser Library for Joomla](https://web-tolk.ru/images/swjprojects/projects/38/en-GB/icon.jpg)

## Description

## Use reading PDF-files in Joomla

There is code example for Joomla 3 and Joomla 4.

```
<?php
defined('_JEXEC') or die('Restricted access');
use \Smalot\PdfParser\Parser;

// For Joomla 3

JLoader::registerNamespace('Smalot', JPATH_LIBRARIES);

// OR
// for  Joomla 4
JLoader::registerNamespace('Smalot', JPATH_LIBRARIES. '/Smalot');
$file_name     = 'images/path_to_file.pdf';
$parser        = new Parser();
$pdf           = $parser->parseFile(JPATH_SITE . '/' . $file_name);
$pdf_meta_data = $pdf->getDetails();
```

## Usage

First create a parser object and point it to a file.

```
$parser = new \Smalot\PdfParser\Parser();

$pdf = $parser->parseFile('document.pdf');
// .. or ...
$pdf = $parser->parseContent(file_get_contents('document.pdf'))
```

### Extract text

A common scenario is to extract text.

```
$text = $pdf->getText();

// or extract the text of a specific page (in this case the first page)
$text = $pdf->getPages()[0]->getText();
```

### Extract text positions

You can extract transformation matrix (indexes 0-3) and x,y position of text objects (indexes 4,5).

```
$data = $pdf->getPages()[0]->getDataTm();

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => 0.999429
                    [1] => 0
                    [2] => 0
                    [3] => 1
                    [4] => 201.96
                    [5] => 720.68
                )

            [1] => Document title
        )

    [1] => Array
        (
            [0] => Array
                (
                    [0] => 0.999402
                    [1] => 0
                    [2] => 0
                    [3] => 1
                    [4] => 70.8
                    [5] => 673.64
                )

            [1] => Calibri : Lorem ipsum dolor sit amet, consectetur a
        )
)
```

When activated via Config setting (`Config::setDataTmFontInfoHasToBeIncluded(true)`) font identifier (index 2) and font size (index 3) are added to dataTm.

```
// create config
$config = new Smalot\PdfParser\Config();
$config->setDataTmFontInfoHasToBeIncluded(true);

// use config and parse file
$parser = new Smalot\PdfParser\Parser([], $config);
$pdf = $parser->parseFile('document.pdf');

$data = $pdf->getPages()[0]->getDataTm();

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => 0.999429
                    [1] => 0
                    [2] => 0
                    [3] => 1
                    [4] => 201.96
                    [5] => 720.68
                )

            [1] => Document title
            [2] => R7
            [3] => 27.96
        )

    [1] => Array
        (
            [0] => Array
                (
                    [0] => 0.999402
                    [1] => 0
                    [2] => 0
                    [3] => 1
                    [4] => 70.8
                    [5] => 673.64
                )

            [1] => Calibri : Lorem ipsum dolor sit amet, consectetur a
            [2] => R9
            [3] => 11.04
        )
)
```

Text width should be calculated on text from dataTm to make sure all character widths are available. In next example we are using data from above.

```
$fonts = $pdf->getFonts();
$font_id = $data[0][2]; //R7
$font = $fonts[$font_id];
$text = $data[0][1];
$width = $font->calculateTextWidth($text, $missing);
```

### Extract metadata

You can also extract metadata. The available data varies from PDF to PDF.

```
$metaData = $pdf->getDetails();

Array
(
    [Producer] => Adobe Acrobat
    [CreatedOn] => 2022-01-28T16:36:11+00:00
    [Pages] => 35
)
```

### Read Base64 encoded PDFs

If working with [Base64](https://en.wikipedia.org/wiki/Base64) encoded PDFs, you might want to parse the PDF without saving the file to disk.

This sample will parse the Base64 encoded PDF and extract text from each page.

```
<?php
// Parse Base64 encoded PDF string and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseContent(base64_decode($base64PDF));

$text = $pdf->getText();
echo $text;
```

## Joomla

 **Extension type:** Library **Joomla version:** 4.0

## What's new

 2022-03-17 22:16:13

### Smalot / PDF Parser v.2.1.0

Version from February 2, 2022

## JSON-LD Schema

```json
{
    "@context": "https://schema.org",
    "@type": "BreadcrumbList",
    "@id": "https://web-tolk.ru/#/schema/BreadcrumbList/17",
    "itemListElement": [
        {
            "@type": "ListItem",
            "position": 1,
            "item": {
                "@id": "https://web-tolk.ru/en",
                "name": "Home"
            }
        },
        {
            "@type": "ListItem",
            "position": 2,
            "item": {
                "@id": "https://web-tolk.ru/en/dev",
                "name": "Joomla extensions"
            }
        },
        {
            "@type": "ListItem",
            "position": 3,
            "item": {
                "@id": "/en/dev/biblioteki",
                "name": "Libraries"
            }
        },
        {
            "@type": "ListItem",
            "position": 4,
            "item": {
                "name": "SmalotPdf Parser Library for Joomla"
            }
        }
    ]
}
```

```json
{
    "@context": "https://schema.org",
    "@graph": [
        {
            "@type": "Organization",
            "@id": "https://web-tolk.ru/#/schema/Organization/base",
            "name": "WebTolk",
            "url": "https://web-tolk.ru/",
            "logo": {
                "@type": "ImageObject",
                "@id": "https://web-tolk.ru/#/schema/ImageObject/logo",
                "url": "images/webtolk-1080p.jpg",
                "contentUrl": "images/webtolk-1080p.jpg",
                "width": 1920,
                "height": 1080
            },
            "image": {
                "@id": "https://web-tolk.ru/#/schema/ImageObject/logo"
            },
            "sameAs": [
                "https://github.com/WebTolk",
                "https://github.com/sergeytolkachyov",
                "https://vk.com/web_tolk",
                "https://vk.com/webtolkru",
                "https://tenchat.ru/sergeytolkachyov",
                "https://t.me/sergeytolkachyov",
                "https://t.me/webtolkru"
            ]
        },
        {
            "@type": "WebSite",
            "@id": "https://web-tolk.ru/#/schema/WebSite/base",
            "url": "https://web-tolk.ru/",
            "name": "WebTolk websites development, Joomla Extensions",
            "publisher": {
                "@id": "https://web-tolk.ru/#/schema/Organization/base"
            }
        },
        {
            "@type": "WebPage",
            "@id": "https://web-tolk.ru/#/schema/WebPage/base",
            "url": "https://web-tolk.ru/en/dev/biblioteki/smalot-pdf-parser-php-biblioteka-dlya-joomla",
            "name": "Smalot\\Pdf Parser Library for Joomla - WebTolk websites development, Joomla Extensions",
            "description": "Pdf parser library. Can read and extract information from pdf file.",
            "isPartOf": {
                "@id": "https://web-tolk.ru/#/schema/WebSite/base"
            },
            "about": {
                "@id": "https://web-tolk.ru/#/schema/SoftwareApplication/base"
            },
            "inLanguage": "en-GB",
            "breadcrumb": {
                "@id": "https://web-tolk.ru/#/schema/BreadcrumbList/17"
            }
        },
        {
            "@type": "SoftwareApplication",
            "name": "Smalot\\Pdf Parser Library for Joomla",
            "url": "https://web-tolk.ru/en/dev/biblioteki/smalot-pdf-parser-php-biblioteka-dlya-joomla",
            "description": "Pdf parser library. Can read and extract information from pdf file. There library is wrapped for Joomla 3 and Joomla 4",
            "applicationCategory": "Libraries",
            "softwareVersion": "2.1.0",
            "downloadUrl": "https://web-tolk.ru/en/get?element=smalotpdfparser",
            "image": "https://web-tolk.ru/images/swjprojects/projects/38/en-GB/icon.jpg",
            "operatingSystem": "ANY",
            "interactionStatistic": [
                {
                    "@type": "InteractionCounter",
                    "interactionType": "https://schema.org/DownloadAction",
                    "userInteractionCount": 4436
                },
                {
                    "@type": "InteractionCounter",
                    "interactionType": "https://schema.org/ViewAction",
                    "userInteractionCount": 8880
                }
            ],
            "mainEntityOfPage": {
                "@type": "WebPage",
                "url": "https://web-tolk.ru/en/dev/biblioteki/smalot-pdf-parser-php-biblioteka-dlya-joomla"
            },
            "softwareRequirements": "Joomla",
            "applicationSubCategory": "Libraries, Joomla 4 - Joomla 6extensions",
            "isAccessibleForFree": true
        }
    ]
}
```
