-5

I have the following code in order to obtain a parent directory from a given path. Note: size_t is a typedef for unsigned int.

/****************************************************
This function takes a full path to a file, and returns
the directory path by returning the string up to the last backslash.

Author: Aashish Bharadwaj
*****************************************************/
_TCHAR* GetDirectoryFromPath(const _TCHAR* path)
{
   size_t size = _tcslen(path);
   size_t lastBackslash = 0;
   for (size_t i = 0; i < size; i++)
   {
      if (path[i] == '\\')
      {
         lastBackslash = i;
      }
   }

   _TCHAR* dirPath = new _TCHAR();
   size_t i;
   for (i = 0; i <= lastBackslash; i++)
   {
      dirPath[i] = path[i];
   }
   dirPath[i + 1] = '\0';  //THIS IS VERY NECESSARY! Otherwise, a bunch of garbage is appended to the character array sometimes.

   return dirPath;
}

The issue is that sometimes it appends a weird "@" looking symbol to the end of the string that it returns.enter image description here

I was wondering if anyone knew what this was and why it was doing so.

Aashishkebab
  • 295
  • 3
  • 10
  • 1
    How many `_TCHAR`'s does `_TCHAR* dirPath = new _TCHAR();` allocate? – NathanOliver May 22 '18 at 20:45
  • @NathanOliver I'm not sure what you mean. It allocates a pointer to a _TCHAR array of unspecified size. It automatically resizes as you append characters to it. – Aashishkebab May 22 '18 at 20:46
  • 1
    Nope. It allocates a pointer to a single `_TCHAR`. – NathanOliver May 22 '18 at 20:47
  • @NathanOliver Well yes, that's how an array works in general, it is just a pointer to the first item. – Aashishkebab May 22 '18 at 20:47
  • 1
    You really should be using a `std::string` or `std::wstring` as those wil manage that kind of stuff for you. – NathanOliver May 22 '18 at 20:47
  • 3
    " It allocates a pointer to a _TCHAR array of unspecified size. It automatically resizes as you append characters to it." - both of those statements are wrong. –  May 22 '18 at 20:48
  • @NathanOliver I can't use those, because the program is written in unicode, and I don't have a choice to use other strings. – Aashishkebab May 22 '18 at 20:48
  • 1
    No, that is not how an array works. If you want a pointer to an array you need something like `_TCHAR* dirPath = new _TCHAR[some_size];`. – NathanOliver May 22 '18 at 20:49
  • @NeilButterworth I've seen it dynamically increase the size of the array in memory. And if the statements are wrong, you could correct them, rather than just pointing out that they are wrong. Isn't that the point of StackOverflow? – Aashishkebab May 22 '18 at 20:49
  • You haven't seen it "automatically resize", because arrays don't have that facility, and you have been told how to solve the problem - use a data structure like a vector that does "automatically resize". –  May 22 '18 at 20:52
  • @NathanOliver That is how an array works, and why you can do *(myArray + 1). Because that's just the memory address next to the start of the array. – Aashishkebab May 22 '18 at 20:52
  • @AashishBharadwaj Doing `*(myArray + 1)` is `myArray` was allocated with `type myArray = new type();` is undefined behavior. There is a big difference between `new type()` and `new type[some_value]`. – NathanOliver May 22 '18 at 20:54
  • @NeilButterworth 1. It does automatically resize, that is how a heap works. In fact, its size is indefinite. And we can prove this because the function works correctly except for the very last character in some cases. 2. I am unable to use any other data structure, as this program is in unicode and multi platform. It is not viable to create a vector of unicode characters. – Aashishkebab May 22 '18 at 20:54
  • 1
    @AashishBharadwaj then you clearly don't understand how C++ memory management works, because everything you have stated is wrong. And yes, you can use a different data structure, such as `std::wstring` (which is Unicode and cross-platform, with caveats), or `std::vector<_TCHAR>`, or `std::basic_string<_TCHAR>`, or `std::string` with UTF-8 (which is fully Unicode and cross-platform). – Remy Lebeau May 22 '18 at 20:56
  • 2
    It really sounds like you could use a [good C++ book](http://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list). You have a lot of misconceptions on how manual memory management works and that fact that the programs does not work as expected proves that your array allocation is not correct. – NathanOliver May 22 '18 at 20:56
  • @NathanOliver The following prints out '4': int* howdy = new int(); howdy[1] = 4; cout << *(howdy + 1); – Aashishkebab May 22 '18 at 20:56
  • 4
    "It does automatically resize" - no, it doesn't. "that is how a heap works" - no, it isn't. There really is no point in asking questions here if you are simply going to deny the validity of informed advice you are given. –  May 22 '18 at 20:56
  • 2
    @AashishBharadwaj that code has **undefined behavior** and is trashing memory, as it is accessing illegal memory that it has not allocated. `new int()` allocates **1 and only 1** `int`. `howdy[0]` and `*(howdy+0)` are valid in that case, but `howdy[1]` and `*(howdy+1)` are out of bounds and not valid. – Remy Lebeau May 22 '18 at 20:58
  • Apparently there is no point asking questions on Stack Overflow. This place used to be a welcoming community where people didn't resort to "go read a book on C++" and "you're wrong". I specifically asked "why is this occurring", and the responses were "go read a book" and "you're wrong". – Aashishkebab May 22 '18 at 20:58
  • @RemyLebeau So do I need to specify an array size beforehand? – Aashishkebab May 22 '18 at 21:00
  • 1
    @AashishBharadwaj If you want to allocate an array, then yes. Or use `std::vector` (and don't say you can't), which contains a dynamically sized array for you. – Remy Lebeau May 22 '18 at 21:00
  • 1
    @AashishBharadwaj Ever hear of buffer overruns? That is what your example of "working code" is invoking, and is a major cause off many bugs that go undetected in C++ programs. – PaulMcKenzie May 22 '18 at 21:59

1 Answers1

5

The problem is that you are allocating only 1 TCHAR and then you are writing past the end of the memory block that you have allocated. Your code has undefined behavior.

You need to use new _TCHAR[...] instead of new _TCHAR().

You are also not handling the case where no backslashes are found. In that case, lastBackslash is 0 even though the 1st character is not a backslash. You are not checking for that possibility. And because your loop is using <= instead of <, it will end up copying that 1st character when it shouldn't be.

Try something more like this instead:

const size_t c_invalid_index = (size_t) -1;

_TCHAR* GetDirectoryFromPath(const _TCHAR* path)
{
    size_t lastBackslash = c_invalid_index;

    size_t size = _tcslen(path);
    for (size_t i = 0; i < size; ++i)
    {
        if (path[i] == _T('\\'))
        {
            lastBackslash = i;
        }
    }

    if (lastBackslash == c_invalid_index)
        return NULL;

    _TCHAR* dirPath = new _TCHAR[lastBackslash + 2];
    for (size_t i = 0; i <= lastBackslash; ++i)
    {
        dirPath[i] = path[i];
    }
    dirPath[lastBackslash + 1] = _T('\0');

    return dirPath;
}

Alternatively:

_TCHAR* GetDirectoryFromPath(const _TCHAR* path)
{
    const _TCHAR *lastBackslash = NULL;

    size_t size = _tcslen(path);
    for (size_t i = 0; i < size; ++i)
    {
        if (path[i] == _T('\\'))
        {
            lastBackslash = &path[i];
        }
    }

    if (!lastBackslash)
        return NULL;

    size = (lastBackslash - path) + 1;

    _TCHAR* dirPath = new _TCHAR[size + 1];
    for (size_t i = 0; i < size; ++i)
    {
        dirPath[i] = path[i];
    }
    dirPath[size] = _T('\0');

    return dirPath;
}

That being said, you really should not be using raw string pointers like this. It would be much safer and cleaner to use std::basic_string<_TCHAR> instead (if not std::string or std::wstring, or std::u16string or std::u32string in C++11 and later), eg:

#include <string>

typedef std::basic_string<_TCHAR> tstring;

...

tstring GetDirectoryFromPath(const tstring &path)
{
    tstring::size_type pos = path.find_last_of(_T('\\'));
    if (pos == tstring::npos)
        return tstring();
    return path.substr(0, pos+1);
}
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • I definitely appreciate your answer. Using new _TCHAR[number] unfortunately also did not work. – Aashishkebab May 23 '18 at 12:00
  • I figured out that the problem was literally just the dirPath[i + 1] = '\0' should be dirPath[i] = '\0'. – Aashishkebab May 23 '18 at 12:01
  • I think I still should use the new _TCHAR[...] as opposed to _TCHAR() though. – Aashishkebab May 23 '18 at 12:01
  • Please be aware that when TCHAR is char, a simple loop will not suffice. On systems with a multibyte character set, the multibyte character might contain a back-slash character (0x5C) as the trailing byte. It is safer to use library functions such as find_last_of(),_tcsrchr() rather than comparing bytes. – TheSteve May 30 '18 at 02:02