Listing File Volumes

For Arboretum I'm making a file browser for the Save As… and Open menus. Currently you can navigate around and pick files, but it's missing the ability to access another hard drive, USB storage, or SD card or anything like that.

Windows 10's file explorer has a This PC section that lists them under a heading called Devices and drives. Linux Mint's file browser Nemo lists them similarly under Devices. So in this article all I'm really looking at is: where to get this list.

We need at least two pieces of information about each device. One is the name that we're actually showing to the user as part of the list, which could be a name they've given it like "Media" or "Backup". And the second is a file path for that device so we know what folder to show in the browser when they click on it.

Windows link to this section

It turned out to be pretty confusing for Linux, so maybe it's easier to start with Windows.

On Windows you can relatively easily loop through the volumes with the following C code.

#include <Windows.h>
#include <stdbool.h>
#include <stdio.h>

bool is_volume_ready(const wchar_t* name)
{
    BOOL result = GetVolumeInformationW(name, NULL, 0, NULL, NULL, NULL, NULL, 0);
    return result && GetLastError() != ERROR_NOT_READY;
}

#define VOLUME_NAME_CAP 50

void list_volumes(void)
{
    wchar_t volume_name[VOLUME_NAME_CAP];
    HANDLE handle = FindFirstVolumeW(volume_name, VOLUME_NAME_CAP);
    if(handle != INVALID_HANDLE_VALUE)
    {
        BOOL found;
        do
        {
            if(is_volume_ready(volume_name))
            {
                printf("%ls\n", volume_name);
            }
            found = FindNextVolumeW(handle, volume_name, VOLUME_NAME_CAP);
        } while(found);
        FindVolumeClose(handle);
    }
}

int main(int argc, char** argv)
{
    list_volumes();
    return 0;
}
list_volumes.c

This produces the output:

\\?\Volume{4c1b02c1-d990-11dc-99ae-806e6f6e6963}\
\\?\Volume{4c1b02c4-d990-11dc-99ae-806e6f6e6963}\

These aren't particularly useful to us by themselves. They're just GUIDs and not much else. The important part is that this loop can be used to look at all the volumes!

What is wchar_t and why do I use Windows functions ending in 'W'? See Appendix: Windows API and Unicode.

Volume Path link to this section

Instead of just printing each out, we can introduce the function GetVolumePathNamesForVolumeNameW. This takes a volume name and gives us a list of paths we could use to refer to it. There isn't a way to know up front how many characters long this list is, so it actually expects you to just try calling it with a guess. If it was enough room, great! If not, it tells you the amount you need and you can try a second time.

wchar_t* get_path_chain(const wchar_t* volume_name)
{
    // Ask for the volume path names and just guess an arbitrary size for it.
    int count = 50;
    wchar_t* path_chain = malloc(sizeof(wchar_t) * count);
    DWORD char_count;
    BOOL got = GetVolumePathNamesForVolumeNameW(volume_name, path_chain, count, &char_count);
    if(got)
    {
        return path_chain;
    }

    // If it wasn't enough room, resize the space to the amount passed back in
    // char_count and try again.
    DWORD error = GetLastError();
    if(error == ERROR_MORE_DATA)
    {
        count = char_count;
        path_chain = realloc(path_chain, sizeof(wchar_t) * count);
        got = GetVolumePathNamesForVolumeNameW(volume_name, path_chain, count, &char_count);
        if(got)
        {
            return path_chain;
        }
    }

    // If that didn't work out.
    free(path_chain);

    return NULL;
}
main_windows.c lines 5-35

The "path chain" has a somewhat unusual format. In C, strings are represented as an array of characters followed by one null character, as follows. The symbol for null is used to represent the null character, here.

sample text!␀

The list of paths has multiple null-terminated strings stored end to end. The end of each string has a null character and the end of the list is signified by an extra null character. You could also think of the end of the list being an empty string. So, it's more like this:

C:\␀Z:\Somewhere\Special\␀Q:\L5W59XYK87\Mystery Zone\␀␀

This path chain does admittedly have the nice property that, if we don't actually care about any of the alternate paths, we can use it as though it's just a single string containing the first path.

Volume Name link to this section

So, now we have the path to the device. Next, we need the user-facing name for it. This is probably the easiest part, as it's just one function GetVolumeInformationW. The name is guaranteed to be less than MAX_PATH + 1 characters, so you don't have to worry about size this time.

wchar_t* get_label(const wchar_t* path_chain)
{
    const int label_cap = MAX_PATH + 1;
    char* label = malloc(sizeof(wchar_t) * label_cap);
    BOOL got = GetVolumeInformationW(path_chain, label, label_cap, NULL, NULL, NULL, NULL, 0);
    if(got)
    {
        return label;
    }
    free(label);
    return NULL;
}
main_windows.c lines 37-48

Together link to this section

Now you have all the pieces of information needed, so here's the completed example!

#include <Windows.h>
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>

wchar_t* get_path_chain(const wchar_t* volume_name)
{
    // Ask for the volume path names and just guess an arbitrary size for it.
    int count = 50;
    wchar_t* path_chain = malloc(sizeof(wchar_t) * count);
    DWORD char_count;
    BOOL got = GetVolumePathNamesForVolumeNameW(volume_name, path_chain, count, &char_count);
    if(got)
    {
        return path_chain;
    }

    // If it wasn't enough room, resize the space to the amount passed back in
    // char_count and try again.
    DWORD error = GetLastError();
    if(error == ERROR_MORE_DATA)
    {
        count = char_count;
        path_chain = realloc(path_chain, sizeof(wchar_t) * count);
        got = GetVolumePathNamesForVolumeNameW(volume_name, path_chain, count, &char_count);
        if(got)
        {
            return path_chain;
        }
    }

    // If that didn't work out.
    free(path_chain);

    return NULL;
}

wchar_t* get_label(const wchar_t* path_chain)
{
    const int label_cap = MAX_PATH + 1;
    wchar_t* label = malloc(sizeof(wchar_t) * label_cap);
    BOOL got = GetVolumeInformationW(path_chain, label, label_cap, NULL, NULL, NULL, NULL, 0);
    if(got)
    {
        return label;
    }
    free(label);
    return NULL;
}

bool is_volume_ready(const wchar_t* name)
{
    BOOL result = GetVolumeInformationW(name, NULL, 0, NULL, NULL, NULL, NULL, 0);
    return result && GetLastError() != ERROR_NOT_READY;
}

#define VOLUME_NAME_CAP 50

void list_volumes(void)
{
    wchar_t volume_name[VOLUME_NAME_CAP];
    HANDLE handle = FindFirstVolumeW(volume_name, VOLUME_NAME_CAP);
    if(handle != INVALID_HANDLE_VALUE)
    {
        BOOL found;
        do
        {
            if(is_volume_ready(volume_name))
            {
                wchar_t* path_chain = get_path_chain(volume_name);
                wchar_t* label = get_label(path_chain);
                printf("Label: %ls Path: %ls\n", label, path_chain);
                free(path_chain);
                free(label);
            }            
            found = FindNextVolumeW(handle, volume_name, VOLUME_NAME_CAP);
        } while(found);
        FindVolumeClose(handle);
    }
}

int main(int argc, char** argv)
{
    list_volumes();
    return 0;
}
main_windows.c

The most straightforward way to compile this code is with Visual Studio. It includes a C compiler with its Visual C++ stuff and generally bundles the two together. So look for the C++ things rather than C.

The only dependencies here are the Windows SDK and the C Run-Time library, though. So if you have another setup that has those then you're good to go.

Linux link to this section

So it turns out there's not just one set of standard Linux API calls to do this. The data is kept by various subsystems and in several informational directories and you're just kind of expected to piece it together.

This article is a decent start List all the mounted file systems or drives on Linux in C (C99) using mntent.h. It lists a bunch of mounts that aren't useful to a user, so they'd need to be filtered out, and doesn't have labels.

So lets give something like that a go.

#include <mntent.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <sys/vfs.h>

bool starts_with(const char* a, const char* b)
{
    return strncmp(a, b, strlen(b)) == 0;
}

static void list_volumes(void)
{
    FILE* file = setmntent(_PATH_MOUNTED, "r");
    if(file)
    {
        for(;;)
        {
            struct mntent* entry = getmntent(file);
            if(!entry)
            {
                break;
            }
            if(starts_with(entry->mnt_fsname, "/dev/"))
            {
                printf("mountpoint: %s source: %s\n", entry->mnt_dir, entry->mnt_fsname);
            }
        }
        endmntent(file);
    }
}

int main(int argc, char** argv)
{
    list_volumes();
    return 0;
}
list_mounts.c

Output:

mountpoint: / source: /dev/sda5
mountpoint: /mnt/C0DA7331DA7322B8 source: /dev/sda2

This accesses a file called /etc/mtab. <mntent.h> is just a standard library to parse this type of file. It still needs the labels, though!

udev is a device manager for the kernel. It keeps all the device files on the system in the /dev directory. One of these is /dev/disk/by-label, which contains symlinks to device files that are named after the labels we need. We already have paths to the device files like /dev/sda# in the output above. So, to get a label all you need to do is search /dev/disk/by-label for a matching file.

How Do You Match Files? link to this section

You could follow symlinks until you reach the end at a "real" file. Then, take the path name. Then do the same for the other file and at the end compare the two path names. Linux has a function called realpath that does exactly this.

The functions in <dirent.h> can be used to actually walk the directory. Then for each entry compare "real" paths and hopefully find a label.

char* get_label(const char* device_path)
{
    char* label = NULL;

    char* canonical_device = realpath(device_path, NULL);
    if(canonical_device)
    {
        const char* by_label = "/dev/disk/by-label";
        DIR* directory = opendir(by_label);
        if(directory)
        {
            for(;;)
            {
                struct dirent* entry = readdir(directory);
                if(!entry)
                {
                    break;
                }
                const char* name = entry->d_name;
                if(strcmp(name, ".") == 0 || strcmp(name, "..") == 0)
                {
                    // Skip the entries for the directory itself and its parent.
                    continue;
                }
                if(entry->d_type == DT_LNK)
                {
                    // If it's a symlink, check if it matches the given device.
                    char* path = join_path(by_label, name);
                    char* canonical = realpath(path, NULL);
                    bool matches = strcmp(canonical, canonical_device) == 0;
                    free(path);
                    free(canonical);
                    if(matches)
                    {
                        label = strdup(name);
                        break;
                    }
                }
            }
            closedir(directory);
        }
        free(canonical_device);
    }

    return label;
}
list_mounts_with_encoded_labels.c lines 23-68

Now we got the paths to each volume and a nice label for each!

mountpoint: / label: (null)
mountpoint: /media/andrew/OS Windows label: OS\x20Windows
mountpoint: /mnt/C0DA7331DA7322B8 label: \x7eMedia\x7e

Wait.

Who's been messing with my labels?

udev Property Encoding link to this section

udev disallows certain characters in its strings and encodes them by replacing "potentially unsafe" characters with their hexadecimal value preceded by \x, like \x20. Since backslash is used for this, it also has to be replaced by its own code \x5C.

So to get the proper labels we have to decode and replace these hex codes.

char* decode_label(const char* encoded)
{
    // Since the encoding replaces characters with a fixed-size sequence, the
    // output has to be either the same size or smaller. So allocating space
    // the same size as the encoded version is always going to be enough.
    int end = strlen(encoded);
    char* label = malloc(end + 1);

    // Find any 4-byte sequences starting with \x to replace. Otherwise, copy
    // as you go.
    int j = 0;
    for(int i = 0; i < end; j += 1)
    {
        if(i <= end - 4 && encoded[i] == '\\' && encoded[i + 1] == 'x')
        {
            char value = strtol(&encoded[i + 2], NULL, 16);
            if(value)
            {
                label[j] = value;
            }
            i += 4;
        }
        else
        {
            label[j] = encoded[i];
            i += 1;
        }
    }
    label[j] = '\0';

    return label;
}
main_linux.c lines 27-58

Now things are looking ~real~ nice.

mountpoint: / label: (null)
mountpoint: /media/andrew/OS Windows label: OS Windows
mountpoint: /mnt/C0DA7331DA7322B8 label: ~Media~

This Is Okay link to this section

This is the completed linux example.

#include <dirent.h>
#include <mntent.h>
#include <sys/vfs.h>

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char* join_path(const char* path, const char* subpath)
{
    int a = strlen(path);
    int b = strlen(subpath);
    int count = a + 1 + b;
    char* result = malloc(count + 1);
    memcpy(result, path, a);
    memcpy(&result[a], "/", 1);
    memcpy(&result[a + 1], subpath, b);
    result[count] = '\0';
    return result;
}

// udev disallows certain characters in its strings and encodes them by
// replacing "potentially unsafe" characters with their hexadecimal value
// preceded by \x, like \x20. Since backslash is used for this, it also has
// to be replaced by its own code \x5C.
char* decode_label(const char* encoded)
{
    // Since the encoding replaces characters with a fixed-size sequence, the
    // output has to be either the same size or smaller. So allocating space
    // the same size as the encoded version is always going to be enough.
    int end = strlen(encoded);
    char* label = malloc(end + 1);

    // Find any 4-byte sequences starting with \x to replace. Otherwise, copy
    // as you go.
    int j = 0;
    for(int i = 0; i < end; j += 1)
    {
        if(i <= end - 4 && encoded[i] == '\\' && encoded[i + 1] == 'x')
        {
            char value = strtol(&encoded[i + 2], NULL, 16);
            if(value)
            {
                label[j] = value;
            }
            i += 4;
        }
        else
        {
            label[j] = encoded[i];
            i += 1;
        }
    }
    label[j] = '\0';

    return label;
}

char* get_label(const char* device_path)
{
    char* label = NULL;

    char* canonical_device = realpath(device_path, NULL);
    if(canonical_device)
    {
        const char* by_label = "/dev/disk/by-label";
        DIR* directory = opendir(by_label);
        if(directory)
        {
            for(;;)
            {
                struct dirent* entry = readdir(directory);
                if(!entry)
                {
                    break;
                }
                const char* name = entry->d_name;
                if(strcmp(name, ".") == 0 || strcmp(name, "..") == 0)
                {
                    // Skip the entries for the directory itself and its parent.
                    continue;
                }
                if(entry->d_type == DT_LNK)
                {
                    // If it's a symlink, check if it matches the given device.
                    char* path = join_path(by_label, name);
                    char* canonical = realpath(path, NULL);
                    bool matches = strcmp(canonical, canonical_device) == 0;
                    free(path);
                    free(canonical);
                    if(matches)
                    {
                        label = decode_label(name);
                        break;
                    }
                }
            }
            closedir(directory);
        }
        free(canonical_device);
    }

    return label;
}

bool starts_with(const char* a, const char* b)
{
    return strncmp(a, b, strlen(b)) == 0;
}

static void list_volumes(void)
{
    FILE* file = setmntent(_PATH_MOUNTED, "r");
    if(file)
    {
        for(;;)
        {
            struct mntent* entry = getmntent(file);
            if(!entry)
            {
                break;
            }
            if(starts_with(entry->mnt_fsname, "/dev/"))
            {
                char* label = get_label(entry->mnt_fsname);
                printf("mountpoint: %s label: %s\n", entry->mnt_dir, label);
                free(label);
            }
        }
        endmntent(file);
    }
}

int main(int argc, char** argv)
{
    list_volumes();
    return 0;
}
main_linux.c

I actually used Eclipse CDT to build this, similar to how I use Visual Studio on Windows. You can of course also compile on the command line with GCC directly. Pick up the gcc and libc-dev packages. With those you should be able to compile and run it with this command.

gcc -o list_volumes -std=gnu99 main_linux.c && ./list_volumes

So something I left out is /etc/mtab isn't managed by the kernel. It's managed by mount and umount. It's still very dependable, but it's worth mentioning that the kernel maintains its own list that you can access at /proc/self/mountinfo. This has its own format and unfortunately doesn't have a corresponding library like <mntent.h> to help read it.

Cross Platform link to this section

As a bonus, I'm including the original example I put together. It includes Windows and Linux in the same file using preprocessor conditionals. It also does full parsing of /proc/self/mountinfo on Linux and converts strings to UTF-8 on Windows.

I couldn't figure out a good way to make it digestible for this article but anyone who's interested in code that's a bit closer to what I'm using in Arboretum can take a look at main_both.c!


Appendix: Windows API And Unicode link to this section

The Windows API has three versions for many of its functions that involve strings.

  • A version ending in the letter 'A' which uses Windows code pages.

    The 'A' is for ANSI, because an early code page was fashioned after an American National Standards Institute draft. It's considered a bit of a misnomer because ANSI didn't have anything to do with making the specification. Windows code pages are Microsoft's standard.

  • A version ending in the letter 'W' which uses Unicode encoded in UTF-16.

    The 'W' stands for "wide" because Windows uses a 16-bit "wide-character", wchar_t to store each UTF-16 code unit.

  • A generic version that can be compiled as either of the other two.

    The generic version uses the code page version unless the preprocessor symbolUNICODE is defined before including Windows.h, in which case it uses Unicode version. It also introduces a special type TCHAR, which stands in for either a char or wchar_t and is switched between them by the same definition.

Generally, I think Windows expects you to use the generic version and the preprocessor switch. But, code pages are obsolete and Unicode is used in file paths and Windows, internally. So, I usually prefer explictly calling the 'W' versions of functions so nothing uses code pages accidentally.

UTF-8 link to this section

Unicode is definitely the dominant representation for text in 2018. But on Linux, OS/X, and on the World Wide Web the preferred encoding is UTF-8 instead of UTF-16. If you want to write a cross-platform program, then, you either have to handle both UTF-8 and UTF-16 or choose one and convert to the other when needed.

I stick with UTF-8 and convert to and from UTF-16 only when I'm talking to the Windows API. This complicates code a bit, so I omit it in examples to keep things focused on the topic at hand.