42

Is name, surname and email in the Git commit history a personal information according to the coming GDPR (General Data Protection Regulation) and is there any special treatment required?

Git is a distributed system and therefore it is probably impossible to have a complete control of it - especially for opensource projects.

EDIT: It is also probably impossible to delete any information from the Git history. If the Git history contains any personal information, is it OK not being able to erase the information on request?

feetwet
  • 21,795
  • 12
  • 80
  • 175
Oldrich Svec
  • 521
  • 4
  • 4
  • 1
    Read Article 17 ('right to be forgotten'). Unless there are "overriding legitimate grounds", then yes, you must adhere to it. For Git, for example, it is possible to change the author and committer name and e-mail of multiple commits. – Brandin Dec 07 '17 at 09:00
  • 1
    And how about when personal information is or was contained in the tracked files? – Oldrich Svec Dec 07 '17 at 11:27
  • 2
    If you search the Git manual, Stackoverflow, and other resources, there are technical ways to remove information from a Git repo as well. They may not be easy or convenient, but the fact that it is not technically convenient to do such a thing is not an overriding legitimate ground not to do it if required. If you are in charge of the Git repo, it would be better to just ban personal information from coming into the repo in the first place (e.g. put that information somewhere else, like a separate database file that can be edited if needed). – Brandin Dec 07 '17 at 12:36
  • 22
    It's not just inconvenient to change history in git, it breaks everything to a point that the technology becomes useless. If GitHub started rewriting the history of every repo everytime someone requested their data be deleted, this would cause a major disruption to every open source project that was unlucky enough to have had that user contribute to it, no matter how small their contribution, every clone of the repo would become unmergable, thousands of hours of work could be lost. So, it's not just inconvenient, it is technically impossible without fundamentally breaking the technology. – James Roper Jan 22 '18 at 04:48
  • 2
    How does it work with clones? It's technically possible for Github to change every fork that is stored on their server, but not the forks that have been cloned to workstations, or forks of those, or ... – Graham Apr 10 '18 at 20:27
  • 1
    If they do go and change every fork on their servers it totally screws up all of the remote repos that try to merge with it. And what happens if someone re-pushes commits from some other user that had previously been deleted? – Laurence Gonsalves Apr 10 '18 at 21:59
  • It may become a licence requirement of a project that such users must identify themselves 'anaonymously', and then add that anonymous identity to the mail file, just so that the commit history itself starts off as fully anonymised. One programatic method is to hash the email! It also may be a requirement that servers must know who cloned the mail file as well, which may further complicate matters. – Philip Oakley Apr 17 '18 at 22:05
  • @JamesRoper 1. Changing the commit history to remove an e-mail address is not "rewriting the history of every repo." Git and GitHub already document how to accomplish this. 2. The fact that you are using Git cannot be valid reason not to adhere to the GDPR. If it could, any organization could nullify the GDPR requirements merely by electing to store personal information in a Git database (for 'technical' reasons, of course). – Brandin Apr 26 '18 at 17:45
  • 5
    You can't change history in git without rewriting it, that's a by design. Anyway, since writing the above, my company (whose core business is open source) has got advice from lawyers who specialise in GDPR. The key thing about holding personal data is whether you have a valid reason to hold. According to the lawyers, maintaining provinenance of IP is a valid reason to keep personal information. – James Roper Apr 27 '18 at 20:07
  • @Brandin It can be done, sure, but in the end you are creating a new repository, incompatible with already cloned repositories. Git notices the changes in past commits by design in order to prevent maliciously modifying git history - this can be used to detect adding malicious software in an old commit. Doing changes like this will cause practical issues. – 0.. May 01 '18 at 15:26
  • It is both possible and not too difficult to remove information from git. Check out the rebase command. It does not break anything. – Free Radical May 11 '18 at 06:48

2 Answers2

10

I believe in this regard there is a legitimate interest to not delete this information.

You would have to perform a legitimate interest assessment, but for us (with private repositories) the reason would be to ensure that if in the future any malicious or illegal code was found that was knowingly committed to a repository there is trail and a way of understanding who did it.

rrrr-o
  • 201
  • 2
  • 5
  • 2
    This is similar to the advice our lawyers gave us, we're an open source company, in our case we have a legitimate interest in the personal details of committers in order to demonstrate the provenance of the IP of the code base. We actually keep much more than just email address and name, we keep physical address information, through requiring contributors to sign a CLA, and this is all a legitimate interest. – James Roper Apr 27 '18 at 20:20
2

In the case of Git the user is submitting that data themselves along with the commit, so it is reasonable to assume they consent to it being stored since that is explicitly what they are asking the git server to do.

Users may however withdraw that consent later on. For a site like Github where it is made clear that projects not marked as private may be distributed and copied by other users outside of Github's control, there is little that could be done. This is similar to DMCA infringements and the like - at most they can ask Github to scrub data from their systems, but can't expect Github to scrub other people's computers or the rest of the internet for them.

In general though Git does allow such information to be retroactively removed from commits, which would likely be a reasonable request, especially for private repos.

user
  • 1,876
  • 1
  • 10
  • 23
  • 1
    So do you think that an ex-employee could force a company to remove his/her name, email etc. from the companies repositories? I mean sure it would be possible to alter the git history of all repos that you have access to, but every computer that was used to work on that repo, is probably storing a copy of it and also think about all the backups. It would be insanely time consuming to really remove the data from all these places. Also, can you really expect that a user has given consent, if the sysadmin has set up their git config or they simply don't fully understand git? – Forivin Apr 26 '18 at 14:22
  • 1
    Potentially. But presumably they would have used their company provided email address so that shouldn't need to be scrubbed. And the company has a legitimate need to retain employee records which include their name, including records of work they did which is now part of the git repo. – user Apr 26 '18 at 14:24
  • But do they have the right to keep records of every single line of code you wrote and when you wrote it etc? I mean would youcall that a "legitimate need"? – Forivin Apr 26 '18 at 14:29
  • 1
    Yes, because if there are issues with the software later it could be useful or even vital. You did the work for them with the understanding that this would be the case when you committed the code to git. – user Apr 26 '18 at 14:31
  • So? If you don't work for the company anymore and you want them to delete those records of you then I don't see why they could keep the data just to ask you for help in the future. – Forivin Apr 27 '18 at 09:22
  • The GDPR doesn't mean companies can't store any information about you without permission. Imagine if banks were required to delete the data of people who owed them money. – user Apr 27 '18 at 10:00
  • Well, that is different because I don't owe a company any help when I don't work for it anymore. – Forivin Apr 27 '18 at 10:09
  • I can't make this any simpler for you. GDPR does not give you that right. – user Apr 27 '18 at 10:42