As I understand it, having a GitHub profile as a portfolio has become an essential element in applying for entry-level computer programming jobs—insightfully, a friend of mine draws a comparison with the rise of unpaid internships in other fields. Something about GitHub that gets in the way of maintaining a presentable portfolio is that forks of other people’s repositories made just to submit a pull request can crowd out repositories showcasing one’s work. Sometimes pull requests can take months to be responded to by upstream maintainers, leaving unimpressive repositories sitting around on one’s profile for all that time.

The following Perl script, git-gh-fork, forks a repository and then sets various attributes of it to make it as obvious as GitHub allows that it’s just a temporary fork made in order to submit a pull request. Invoke it like this:

$ cd repo
$ git gh-fork

You will need the following perl libraries: Net::GitHub, Git::Wrapper, Config::GitLike, Term::UI and File::XDG. On a Debian-based system, most of these can be installed with apt-get install libgit-wrapper-perl libconfig-gitlike-perl libterm-ui-perl libnet-github-perl. You’ll need to obtain File::XDG from CPAN.

#!/usr/bin/perl

# git-gh-fork --- Create tidy GitHub forks for pull requests
#
# Copyright (C) 2017  Sean Whitton
#
# git-gh-fork is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# git-gh-fork is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with git-gh-fork.  If not, see <http://www.gnu.org/licenses/>.

use strict;
use warnings;
no warnings "experimental::smartmatch";

use Data::Dumper;

use Net::GitHub;
use Git::Wrapper;
use Config::GitLike;
use Term::UI;
use Term::ReadLine;
use File::XDG;
use File::Temp qw/tempdir/;
use File::chdir;
use Path::Class;
use Sys::Hostname;
use autodie; # die if problem reading or writing a file

my $xdg = File::XDG->new(name => 'net-github');
my $term = Term::ReadLine->new('brand');

my $cache_dir = $xdg->cache_home();
my $oauth_token_file = $cache_dir->file("oauth_token");
my $oauth_token;
if (-f "$oauth_token_file") {
    $oauth_token = $oauth_token_file->slurp();
    chomp $oauth_token;
} else {
    $oauth_token = get_new_oauth_token();
}

my $github = Net::GitHub->new(access_token => $oauth_token);
my $repos = $github->repos;
my $github_user = $github->user;
my $user = $github_user->show()->{'login'};
my $git = Git::Wrapper->new(".");
my $config = Config::GitLike->new( confname => 'config' );
$config->load_file('.git/config');

# TODO check all remotes, not just origin
# TODO resolve the URI, taking account of insteadOf and pushInsteadOf
# in ~/.gitconfig, and then then match against the URIs github accepts
my $origin_url = $config->get(key => "remote.origin.url");
$origin_url =~ m|([a-zA-Z0-9]+)/(.*)(?:\.git)?/?$|;
my $org = $1;
my $repo = $2;
my $prompt = "Do you want to submit a PR from $user against repo $repo belonging to $org?";
my $confirm = $term->ask_yn(prompt => $prompt, default => 'y',);
die "looks like I need a better regexp" unless $confirm;

unless (fork_exists()) {
    $repos->create_fork($org, $repo);
    until (fork_exists()) {
        print "Waiting for fork to be created ...\n";
        sleep 5;
    }
}
$repos->set_default_user_repo($user, $repo);
my $fork = $repos->get();

my @branches = $repos->branches;
unless (grep { $_->{name} eq "github" } @branches) {
    my $worktree = dir(tempdir());
    my $readme = $worktree->file("README.md");
    system "git worktree add --detach $worktree";
    {
        local $CWD = $worktree;
        system "git checkout --orphan github";
        system "git rm -rf .";
        my $fh = $readme->openw();
        $fh->print("This repository is just a fork made in order to submit a pull request");
        close $fh;
        system "git add README.md";
        system "git commit -m 'fork for a pull request'";
    }
    $worktree->rmtree();
    system "git worktree prune";

    system "git remote add -f fork $fork->{html_url}";
    system "git push fork +github";
    system "git branch -D github";
    $repos->update({
                    name => "$repo",
                    has_wiki => 0,
                    homepage => "",
                    description => "Temporary fork for a pull request",
                    has_issues => 0,
                    has_downloads => 0,
                    default_branch => "github",
                   });

    my $branches = "";
    for my $branch (@branches) {
        unless ($branch->{name} eq "github") {
            $branches .= " :$branch->{name}";
        }
    }
    system "git push fork $branches";
}

sub get_new_oauth_token {
    $cache_dir->mkpath();
    my $user = $term->get_reply(prompt => 'GitHub username');
    my $pass = $term->get_reply(prompt => 'GitHub password');
    my $github = Net::GitHub::V3->new( login => "$user", pass => "$pass" );
    my $oauth = $github->oauth;
    # TODO this will fail if a token has already been created on this
    # host -- we should be able to re-use it
    my $o = $oauth->create_authorization({scopes => ['repo'], note => 'git gh-fork@' . hostname});
    my $oauth_token_file_handle = $oauth_token_file->openw();
    $oauth_token_file_handle->print($o->{token} . "\n");
    return $o->{token}
}

sub fork_exists {
    my @user_repos = $repos->list_user($user);
    return grep { $_->{name} eq $repo } @user_repos;
}

If you have any suggestions for git gh-fork, please send me a patch or a pull request against the version in my dotfiles repository.

Update 2017/ii/14: Applied patches from Tom Hoover. Thanks.

Update 2017/ii/18: Rewritten in Perl, with various improvements. Python version still available.

apt-get install python-github

#!/usr/bin/env python

# clean-github-pr --- Create tidy repositories for pull requests
#
# Copyright (C) 2016  Sean Whitton
#
# clean-github-pr is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# clean-github-pr is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with clean-github-pr.  If not, see <http://www.gnu.org/licenses/>.

import github

import sys
import time
import tempfile
import shutil
import subprocess
import os

CREDS_FILE = os.getenv("HOME") + "/.cache/clean-github-pr-creds"

def main():
    # check arguments
    if len(sys.argv) != 2:
        print sys.argv[0] + ": usage: " + sys.argv[0] + " USER/REPO"
        sys.exit(1)

    # check creds file
    try:
        f = open(CREDS_FILE, 'r')
    except IOError:
        print sys.argv[0] + ": please put your github username and password, separated by a colon, in the file ~/.cache/clean-github-pr-creds"
        sys.exit(1)

    # just to be sure
    os.chmod(CREDS_FILE, 0600)

    # make the fork
    creds = f.readline()
    username = creds.split(":")[0]
    pword = creds.split(":")[1].strip()
    token = f.readline().strip()

    if len(token) != 0:
        g = github.Github(token)
    else:
        g = github.Github(username, pword)

    u = g.get_user()

    source = sys.argv[1]
    if '/' in source:
        fork = sys.argv[1].split("/")[1]
        print "forking repo " + source
        u.create_fork(g.get_repo(source))
    else:
        fork = sys.argv[1]

    while True:
        try:
            r = u.get_repo(fork)
        except github.UnknownObjectException:
            print "still waiting"
            time.sleep(5)
        else:
            break

    # set up & push github branch
    user_work_dir = os.getcwd()
    work_area = tempfile.mkdtemp()
    os.chdir(work_area)
    subprocess.call(["git", "clone", "https://github.com/" + username + "/" + fork])
    os.chdir(work_area + "/" + fork)
    subprocess.call(["git", "checkout", "--orphan", "github"])
    subprocess.call(["git", "rm", "-rf", "."])
    with open("README.md", 'w') as f:
        f.write("This repository is just a fork made in order to submit a pull request; please ignore.")
    subprocess.call(["git", "add", "README.md"])
    subprocess.call(["git", "commit", "-m", "fork for a pull request; please ignore"])
    subprocess.call(["git", "push", "origin", "+github"])
    os.chdir(user_work_dir)
    shutil.rmtree(work_area)

    # make sure the branch has been pushed
    time.sleep(5)

    # set clean repository settings
    r.edit(fork,
           has_wiki=False,
           description="Fork for a pull request; please ignore",
           homepage="",
           has_issues=False,
           has_downloads=False,
           default_branch="github")

if __name__ == "__main__":
    main()

I’ve created three proposed changes to clean-github-pr. None affect its current operation, so you may continue to use it normally after applying the changes.

  1. I wanted the ability to perform the cleanup of an existing Github repository (i.e. without forking). In fact, since your dotfiles.git exists on your own site, and not on Github, I used this ability to “cleanup” the repository I created to hold the following pull requests. To cleanup an existing Github repository, simply execute clean-github-pr.py name-of-my-repository (note there is no slash character, so the python script knows it belongs to me). To cleanup the temporary repository I created, I executed clean-github-pr.py spwhitton-dotfiles. link to proposed changes

  2. I have 2FA authorization enabled, therefore I had to create to personal access token in order to use the API. After creating the token, simply add it as the 2nd line of your ”~/.cache/clean-github-pr-creds” file. link to proposed changes

  3. I use an ssh key to push to my repositories (rather than a username/password). This proposed change enables the use of the ssh key. Simply use the word ‘ssh’ as the password in your ”~/.cache/clean-github-pr-creds” file. For example, my ”~/.cache/clean-github-pr-creds” file contains:

tomhoover:ssh

Comment by tom Sun 02 Oct 2016 00:49:51 UTC

I’ve added a 4th recommended change. The Python interpreter location is currently hardcoded as /usr/bin/python. On my system, the Python interpreter is located at /usr/local/bin/python. Rather than changing the hard-coded location to my Python interpreter, I’ve changed the shebang line to:

#!/usr/bin/env python

This will allow the script to lookiup the path to the Python interpreter automatically via env.

The proposed change may be found here.

Comment by tom Tue 04 Oct 2016 02:47:39 UTC

I’ve applied three of your four patches—thanks!

I did not apply the patch to change the clone URI. I think the issue is better handled with the following addition to ~/.gitconfig:

[url "git@github.com:"]
    pushInsteadOf = https://github.com/

(I have never pushed to GitHub over HTTP.)

Comment by spwhitton Wed 15 Feb 2017 03:33:45 UTC

I think the issue is better handled with the following addition to ~/.gitconfig:

[url “git@github.com:”]
pushInsteadOf = https://github.com/

I learned something new–Thanks!

Comment by tom Sat 18 Feb 2017 17:09:45 UTC