CodePlea icon
CodePlea
Random thoughts on programming
20 Jul 2018

How to Embed an Arbitrary File in a C Program


A problem I've run into several times over the year, is that a small program I've written relies on another outside file. Maybe it needs to include an image, or maybe it relies on another small executable. It's a pain to deploy multiple files in these cases. I'd like everything nicely wrapped up into one executable.

So how do you embed a file into an executable?

Many years ago I wrote a small utility that takes a binary file for input. It then hex-encodes the file and outputs C code. The C code is an array which is initialized with the file's contents.

I just dug this program up to use it yet again, and I've decided that it'd be nice to share it.

The code is short, so here it is:

/* hexembed.c - copyright Lewis Van Winkle */
/* zlib license */
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
    if (argc < 2) {printf("Usage:\n\thexembed <filename>\n"); return 1;}

    const char *fname = argv[1];
    FILE *fp = fopen(fname, "rb");
    if (!fp) {
        fprintf(stderr, "Error opening file: %s.\n", fname);
        return 1;
    }

    fseek(fp, 0, SEEK_END);
    const int fsize = ftell(fp);

    fseek(fp, 0, SEEK_SET);
    unsigned char *b = malloc(fsize);

    fread(b, fsize, 1, fp);
    fclose(fp);

    printf("/* Embedded file: %s */\n", fname);
    printf("const int fsize = %d;\n", fsize);
    printf("const unsigned char *file = {\n");

    int i;
    for (i = 0; i < fsize; ++i) {
        printf("0x%02x%s",
                b[i],
                i == fsize-1 ? "" : ((i+1) % 16 == 0 ? ",\n" : ","));
    }
    printf("\n};\n");

    free(b);
    return 0;
}

I've also posted it to Github under the name hexembed.

Code Explanation

Hopefully the code is pretty self-explanatory. It first takes the filename as an argument. It opens the file, uses ftell to find the file size, then reads the entire file into memory. The last step is to loop through the file's data and write out the C array initializer.

Usage

It's pretty easy to use. First build it with any C compiler. It's ANSI C, so it should run on any OS.

>gcc hexembed.c -o hexembed

Then run it on the file you would like to embed. Pipe the output into a .c file that you can include with your program.

> hexembed some_file.jpg > some_file.c
> cat some_file.c

/* Embedded file: some_file.jpg */
const int fsize = 1873;
const unsigned char *file = {
0x2f,0x2a,0x0a,0x20,0x2a,0x20,0x68,0x65,0x78,0x65,0x6d,0x62,0x65,0x64,0x20,0x2d,
0x20,0x61,0x20,0x73,0x69,0x6d,0x70,0x6c,0x65,0x20,0x75,0x74,0x69,0x6c,0x69,0x74,
0x79,0x20,0x74,0x6f,0x20,0x68,0x65,0x6c,0x70,0x20,0x65,0x6d,0x62,0x65,0x64,0x20,
0x66,0x69,0x6c,0x65,0x73,0x20,0x69,0x6e,0x20,0x43,0x20,0x70,0x72,0x6f,0x67,0x72,
    ...
};

Now you can simply #include "some_file.c" in your C or C++ program and your program will have access to some_file.jpg.

Why do it this way?

I think this works really well for some applications. It's great when you have a small file that doesn't change much. In effect, that file becomes part of your source code. Another benefit is that the end result is very portable. It'll work with any compiler on any OS.

I love writing code that I know will still compile 10 years from now without problems.

Alternative - xxd

I didn't know this when I wrote my program (which only took a few minutes anyway), but apparently xxd can already do this. You run it with the -i flag to "output in C include file style."

Here's the Wikipedia page for xxd.

Alternative - Linking the Blob in Directly

Of course if you have many files you may find this setup tedious. Or if you have really big files this may not be the best solution. You probably don't want to feed a 10MB array initializer into your compiler. In that case there are other options.

One alternative is to use your linker to embed a binary blob directly into your executable. For example:

gcc -c my_program.c -o my_program.o
ld -r -b binary -o some_file.o some_file.jpg
gcc my_program.o some_file.o -o my_program

Because the data is linked directly into your executable, my_program.c might access it like this:

extern const char binary_some_file_jpg_start[];
extern const char binary_some_file_jpg_end[];

void foobar() {
    printf("The linked file is %d bytes and the first character is %d.\n",
        binary_some_file_jpg_end - binary_some_file_jpg_start,
        binary_some_file_jpg_start[0]);
}

The symbol names may differ. You can find them with objdump -t some_file.o.

The biggest drawback to this method is that you're relying on special features of your toolset. So if you need to change compilers in the future, you're probably going to have a hard time.


Like this post? Consider following me on Twitter or following me on Github. Don't forget to subscribe to my feed.