34

I have a SVG file that contains at least one embedded JPG/PNG image inside. I want to extract the JPG/PNG images from that SVG file and save them on disk.

I'm adding the inkscape tag as it is the program I use to edit SVG files, but I also accept solutions using other tools.

Denilson Sá Maia
  • 12,863
  • 12
  • 40
  • 44
  • 1
    If nothing else, Python could probably do it with some custom glue using lxml and PIL (or equivalent). – Keith Jun 21 '11 at 07:13
  • @Keith, indeed, I've just written [a Python script](http://superuser.com/questions/299977/how-to-extract-an-embedded-image-from-a-svg-file/683665#683665) to solve this question. It uses the built-in `xml.etree` library. – Denilson Sá Maia Dec 03 '13 at 23:35

8 Answers8

32

My own solution (or... workaround):

  1. Select the image in Inkscape
  2. Open the built-in XML Editor (Shift+Ctrl+X)
  3. Select the xlink:href attribute, which will contain the image as data: URI
  4. Copy the entire data: URI
  5. Paste that data: URI into a browser, and save it from there.

Alternatively, I can open the SVG file in any text editor, locate the data: URI and copy it from there.

Although this solution works, it's kinda cumbersome and I'd love to learn a better one.

Denilson Sá Maia
  • 12,863
  • 12
  • 40
  • 44
20

There's a better solution instead:

go to Extensions -> Images -> Extract Image..., there you can save selected raster image as a file. However this extension works weird and somehow works rather slowly (but perfectly well).

Another note: this extension is cumbersome and dies silently on vary large images. Also, with large number of raster images it can spike memory usage of inkscape to horrendous levels (like 3GB after only a handful of images extracted).

Because I've got about 20 svg files with about 70 raster images in them each, each image at least 1MB in size, I needed a different solution. After a short check using Denilson Sá tip I devised the following php script, that extracts images from svg files:

#!/usr/bin/env php
<?php

$svgs = glob('*.svg');

$existing = array();

foreach ($svgs as $svg){
    mkdir("./{$svg}.images");
    $lines = file($svg);
    $img = 0;
    foreach ($lines as $line){
        if (preg_match('%xlink:href="data:([a-z0-9-/]+);base64,([^"]+)"%i', $line, $regs)) {
            $type = $regs[1];
            $data = $regs[2];
            $md5 = md5($data);
            if (!in_array($md5, $existing)) {
                $data = str_replace(' ', "\r\n", $data);
                $data = base64_decode($data);
                $type = explode('/', $type);
                $save = "./{$svg}.images/{$img}.{$type[1]}";
                file_put_contents($save, $data);
                $img++;
                $existing[] = $md5;
            }
        } else {
            $result = "";
        }
    }
}

echo count($existing);

This way I can get all the images I want, and md5 saves me from getting repeated images.

I bet there must be another way that is a lot simpler, but it's up to inkscape devs to do it better.

Johnny_Bit
  • 321
  • 2
  • 6
  • Note: Your script only supports a single `data:` URL per line, and does not support newlines inside the href attribute (inkscape adds them for data URLs, and the [base64 spec even mandates that lines should not be longer than 76 chars](https://bugzilla.mozilla.org/show_bug.cgi?id=73026#c12)). Nice script for a quick hack, but it does not work with all kinds of SVG. – Denilson Sá Maia Dec 01 '13 at 02:16
  • @Johnny_Bit +1 for the use of md5 sum to prevent files duplication. I imrove your script [below](http://superuser.com/a/1182693/521108). – Ivan Z Feb 25 '17 at 14:59
  • good, march 2019 and worked easy grand with a reasonably big image. And pretty old laptop/ubuntu/inkscape 0.48.4. Thanks! – gaoithe Mar 03 '19 at 17:09
12

Finally, years later, I've written a script to correctly extract all images from an SVG file, using a proper XML library to parse the SVG code.

https://github.com/denilsonsa/small_scripts/blob/master/extract_embedded_images_from_svg.py

This script requires Python 3.4 or newer. (Look at git history if you need to run on older Python versions.)

Denilson Sá Maia
  • 12,863
  • 12
  • 40
  • 44
  • Thanks, since it works. But it's much slower than the PDF workaround. Have you thought about parallel processing? Right now, the script only uses a single CPU core/thread. – DanMan Aug 30 '18 at 10:11
  • @DanMan Unfortunately, *making it parallel* is not a magic solution to speed up anything. I'd need to profile the code in order to identify the bottleneck. If the bottleneck is XML parsing, I'm sorry, that part can't be done in parallel. Can you please send me by e-mail the exact SVG files that are too slow? Whenever I have some time, I may investigate the performance. – Denilson Sá Maia Sep 01 '18 at 07:57
  • Yeah, I tried doing it myself, and it turned out that the XML parsing is the slow part, not decoding the images. That said, `cElementTree` is supposed to be faster. But maybe something like Sax works better, too. – DanMan Sep 01 '18 at 12:37
  • 1
    @DanMan `cElementTree` is likely faster. However, [on Python 3.3, both are be the same](https://github.com/python/cpython/commit/a72a98f24a19928e31dcc4cab2cd2ad0f1846e11). At some point I'll likely update that script to Python 3. – Denilson Sá Maia Sep 18 '18 at 03:38
5

As yet another workaround, you can save as PDF, then open that document with Inkscape.

Uncheck "embed images", and bingo, all the pngs/jpegs will be spewed out into your home directory.

Messy, but quicker than goofing about with the data: URL.

mik01aj
  • 1,524
  • 1
  • 12
  • 16
Nicholas Wilson
  • 224
  • 2
  • 5
  • Where did you find that "embed images" option? – mik01aj Jun 30 '16 at 06:42
  • 1
    When you open the PDF document in inkscape, it's on the next dialog. – Nicholas Wilson Jun 30 '16 at 13:32
  • I had a PDF from which I tried to extract an image by importing it in Inkscape. In that case, being able to do this *on* import rather than *after* import comes in even more handy. – user149408 Nov 25 '16 at 12:49
  • I'm not sure but this way any embedded ICC profiles seem to get lost in the process. The images I extracted straight from the SVG via that Python script have ICC profiles embedded. – DanMan Aug 30 '18 at 10:26
4

Open the image in Inkscape, right-click on it, in the contextual menu, choose Extract Image and you're done.

DevonDahon
  • 141
  • 3
3

Open the svg in Firefox, right-click on image, Save image as... — it will give you embedded image as file (png, jpg, etc).

Or, open the svg in Chrome, open DevTools (F12), go to Network tab, reload the page and you'll see query to data:img/png;base64.... Right-click it, select "Open in new tab", the file will be saved without extension. Then rename the saved file adding it proper extension like .png

Pavel
  • 320
  • 1
  • 3
  • 14
  • This will work if the picture is a <100KB or similar. chrome do not allow large pictures to be exported like this – Moshe L Dec 30 '21 at 11:09
1

I improve the php-script of @Johnny_Bit. New release of the script can use svg with new lines. It extracts multiple images form svg file and save them in external png files. Svg and png files are in 'svg' directory, but you can change it in constant 'SVG_DIR'.

<?php

define ( 'SVG_DIR', 'svg/' );
define ( 'SVG_PREFIX', 'new-' );

$svgs = glob(SVG_DIR.'*.svg');
$external = array();
$img = 1;

foreach ($svgs as $svg) {
    echo '<p>';
    $svg_data = file_get_contents( $svg );
    $svg_data = str_replace( array("\n\r","\n","\r"), "", $svg_data);
    $svg_file = substr($svg, strlen(SVG_DIR) );
    echo $svg_file.': '.strlen($svg_data).' ????';

    if ( preg_match_all( '|<image[^>]+>|', $svg_data, $images, PREG_SET_ORDER) ) {
        foreach ($images as $image_tag) {

            if ( preg_match('%xlink:href="data:([a-z0-9-/]+);base64,([^"]+)"%i', $image_tag[0], $regs) ) {
                echo '<br/>Embeded image has benn saved to file: ';

               $type = $old_type = $regs[1];
               $data = $old_data = $regs[2];
               $md5 = md5($data);
               if ( array_key_exists($md5, $external) ) {
                $image_file = $external[$md5];
               } else {
                    $data = str_replace(" ", "\r\n", $data);
                    $data = base64_decode($data);
                    $type = explode('/', $type);
                    $image_file = substr( $svg_file, 0, strlen($svg_file)-4 ) . '-' . ($img++) . '.png';
                    file_put_contents(SVG_DIR.$image_file, $data);
                    $external[$md5] = $image_file;
               }
               echo $image_file;
               $svg_data = str_replace('xlink:href="data:'.$old_type.';base64,'.$old_data.'"', 'xlink:href="'.$image_file.'"', $svg_data);
            }
        }
        file_put_contents(SVG_DIR.SVG_PREFIX.'.svg', $svg_data);
    }

   echo '</p>';
}

?>
Ivan Z
  • 131
  • 4
0

Open your file in Inkscape and select the bitmap that you wish to export. Click File->Export Bitmap (Ctrl+Shift+E) and it should export only the selected bitmap.

zx485
  • 2,170
  • 11
  • 17
  • 24
Chris
  • 953
  • 2
  • 11
  • 25
  • 1
    I don't like this solution because it will re-encode the image. I would prefer a solution that extracts the image in its original format. – Denilson Sá Maia Dec 01 '13 at 00:21
  • 1
    Yes, it seems like Inkscape re-encodes the image but it saves PNG images by default. So I am assuming that the re-encoding is at least lossless. – Chris Dec 02 '13 at 17:10
  • 3
    Well, not really. The embedded image might have had transformations (scaling, rotation…), might have been clipped, or even something else I'm not aware. Inkscape will certainly export the selected object after applying all these transformations, which means this solution is not exactly lossless. – Denilson Sá Maia Dec 03 '13 at 23:39