Archive for the ‘General’ Category

Exporting code to Git and tiding up its history

Sunday, June 12th, 2011

Notice:

The original guide is available as a GitHub gist at https://gist.github.com/1021890. Nevertheless, its content as on 2011-06-12, is presented bellow.

Author: Tiago Alves Macambira [tmacam burocarata org]
Licence:Creative Commons By-SA

1   Introduction

So you have that awesome (or perhaps a not so awesome but at least not shameful) project of yours that is just lingering around, collecting dust and you thought: "What if I released this code to the world? Would I get famous? Would I became rich? Would I became the next Linus Torvards?" Well, I would hate to disappoint you but the answer to those question is probably no.

Nevertheless, be it for self promotion, for pure generosity or just for the sake of having third-party maintained backup of your code, releasing it to the world is a Good Thing (tm), and something that would earn you some karma points -- and we are all short on those, right?

"B-b-but", you say, "my code is in a <ancient, restraining, démodé or plain untrendy by last standards> Version Control System and I would like to do what all the other cool kids are doing and export it as a Git repository in, say, GitHub or... like... whatever..."

Fear no more, dear sheep, this guide is for you.

1.1   Objectives

This short guide's purpose is to show you how to export a project from another Version Control System -- or even from another Git repository -- such that its history is represented as cleanly and linearly as possible.

Perhaps this project my be part of an old corporate project that you just got approval to release as open source and, although you would like it to retain as much history information as possible when you release it, you still has a need (or obligation) to strip from it all and any sensitive corporate information it has while releasing it. Maybe they are in the form of sensitive log files that found their way into the repository, or personal information (e-mails, usernames) that are in the commit log messages.

So, it's not a matter of just removing, renaming, copying or moving files around and committing -- as those files would still show up in history, revealing the information you wanted to protect and taking unnecessary repository space -- but of doing some serious cleaning and re-structuring in the source code, its history and associated meta-data -- whatever that is. This short guide is also about that.

2   Starting things up

2.1   Importing from the previous version control system

So, the first thing you must do is import your project from the Version Control System it is currently residing into a Git repository.

If it is a subversion repository, git-svn will do just fine. If you are using something else, say, perforce or CVS, similar tools exist to convert your project and its history to Git. You may need to do a intermediary conversion, say, from CVS to subversion and from subversion to Git.

For simplicity, last assume you have a project in a subversion repository. Let's also assume that the URL for this repository root is svn+ssh://svn.example.tld/secure/repositories/meh_project/ and that your project (or the files you want to export) is located in aux/super_dupper_code. The following command would fetch this project and its history from subversion into a new Git repository

git svn clone --no-metadata \
 svn+ssh://svn.example.tld/secure/repositories/meh_project/aux/super_dupper_code

First, notice that we are not converting the whole repository to Git: we are limiting as much as possible what we are importing from subversion by grabbing just the code inside the super_duper_code directory. If, for some reason you had to import the whole repository into Git, do not worry, we will explain how to "prune" it later.

The import may bring some extra files that you may want to remove say, because they are lame, for some legal reason or because they contain sensitive information that it is not OK to share with the whole world. We will completely remove them and their history from the Git repository later, hopefully leaving no trace of them whatsoever.

Since we have no interest in exporting any changes we make back to its original subversion repository, we are using the --no-metadata option here. It will also get rid of some extra git-svn-id: lines that git-svn adds at the end of every commit. Had we not used the --no-metadata option, we would need to edit the commit messages to remove them. We will also show how to modify commit meta-data (commit messages, commit authors etc) later.

2.2   Cloning a repository

"We did not even got started and we are already cloning my repository? What gives?", you may ask.

Most of the steps we will give you in the following sections will alter your Git repository in semi-destructive ways, making heavy use of git-filter-branch. I say semi-destructive because although git-filter-branch almost always makes a copy of your repository's previous state, getting back to this state may be complicated or, depending on the kind of modification performed by git-filter-branch, impossible.

Additionally, it is of our interest to get rid of any "previous state" we get and properly cloning a repository does the trick.

So, to avoid regrets and problems, let's first make a proper backup or clone of your git repository:

git clone --no-hardlinks /XYZ /ABC

Using --no-hardlinks makes Git create a clone by really coping the files and not by using hard-links. This way the original repository won't share files and metadata with its clone. See the man page if you have no idea of what I am talking about.

Another way to get the same effect is by using a file://path/to/your/git/repo URL, as documented in the section "Checklist for shrinking a repository" from git filter-branch manpage:

git clone  file://full/path/to/XYZ /ABC

With your backup done, let's move to destruct and reconstruct your Git history.

3   Pruning files from history

The import may have brought some extra files. Now it's time to remove them and prune the history we have in our Git repository.

Removing them from Git with a git rm will just remove them from the last commit, but it will still leave traces and previous versions of those files in our Git history -- not really what we wanted. We want to remove any trace of them from the Git repository.

3.1   Extract a single directory

Suppose you had to bring more files from your precious VCS than you originally wanted. Say, you imported a whole CVS repository into Git and all you wanted was a project that lives inside a particular subdirectory. In this case, instead of removing all the other files and directories, it would be simpler (and saner) to extract the target subdirectory from the whole mess.

Let's suppose your target subdirectory path is projects/parsing/htmlparser. The following commands would detach it this from your repository, leaving nothing but it and its history:

git filter-branch --subdirectory-filter projects/parsing/htmlparser HEAD -- --all
git reset --hard
git gc --aggressive
git prune

Notice that the first command ends in -- --all. That's right: two dashes space dash-dash-all. That will force Git to rewrite the history for all branches and tags you have.

Now your repository consists only of the contents of projects/parsing/htmlparser and its history. Nothing more, nothing less. Well, you may have mentioned other files in your commit messages but they will not be there.

3.2   Remove files and directory from history for real

So, by now we limited our history to the enclosing subdirectory holding all the files we wanted. But there may still be some extra files that you may not want to export because they are lame, for some legal reason or because they contain sensitive information that it is not OK to share with the whole world. Let's erase them from our repository and from its history altogether.

To remove a file or a directory named path/to/SensitiveLogs from your repository, run:

git filter-branch --index-filter \
  "git rm -r -f --cached --ignore-unmatch path/to/SensitiveLogs" \
  --prune-empty HEAD -- --all

Remove all files and directory you don't want exported using the command above.

4   Fixing and tiding meta-data

OK. As far as files and their history goes, your repository is clean and neat. But during the process of converting your project and its history to Git, some commit information such as commit author and commit messages may have been lost or altered. Perhaps your commit messages mention sensitive data or informs your previous and now invalid e-mail. Time to fix that.

4.1   Fix committer information

Let's start this section with the committer information: its name and e-mail address.

Once again, we will use git-filter-branch to edit our commit history. This time, though, I will show two ways to accomplishing the same task.

The first is somewhat more elaborated as it shows how one can programmatically alter the committer information. Say, for instance, that except for a given committer, all other committers' meta-data are OK. So you just want to alter commits related to this guy. Let's say that this guy was you using a now invalid e-mail address. All you have to do is alter only those commits where that old and invalid e-mail is used. Here is how:

git filter-branch --commit-filter '
        if [ "$GIT_COMMITTER_NAME" = "tmacam" ];
        then
                GIT_AUTHOR_NAME=`git config --get user.name`;
                # or ...="Your (full) Name";
                GIT_AUTHOR_EMAIL=`git config --get user.email`;
                # or ...="your.email@example.tld";
                GIT_COMMITTER_NAME=$GIT_AUTHOR_NAME;
                GIT_COMMITTER_EMAIL=$GIT_AUTHOR_EMAIL;
                git commit-tree "$@";
        else
                git commit-tree "$@";
        fi' HEAD

Notice that this command is assuming that you had already configured your identification information in git. If this is not your case, just replace those git config --get xxxxxxx commands for "Your name" and "<your.email@example.tld>".

Anyway, as you can see, with some Bash programming kung-fu you can create a pretty elaborated logic on how to replace or modify committers' meta-data.

If all you want is to replace all committer information for a single identity, the following one-liner would to the trick:

git filter-branch --env-filter '\
    GIT_AUTHOR_EMAIL="your.email@example.tld";\
    GIT_AUTHOR_NAME="Your (Full) Name";\
    export GIT_AUTHOR_EMAIL;\
    export GIT_AUTHOR_NAME;\
    export GIT_COMMITTER_EMAIL=${GIT_AUTHOR_EMAIL};\
    export GIT_COMMITTER_NAME=${GIT_AUTHOR_NAME};'

And that's it. All commits will be attributed to "Your (Full) Name <your.email@example.tld>".

4.2   Fix log messages

Now time to tidy up those commit log messages. Guess what we will use for this: git-filter-branch and its --msg-filter option. You can perform almost any kind of editing with this duo: add lines, remove lines, replace text. Just give it the name of a program that will alter the log messages and that's it. The sky is the limit. :)

So, here is a short example of a command that will remove all those nasty "git-svn-id:" lines that you got in your log messages just because if did not read what I wrote in the Importing from the previous version control system section.:

git filter-branch --msg-filter ' sed -e "/^git-svn-id:/d" '

5   Final steps

5.1   Shrink your repository.

Now that your repository, its history, commits and their log messages are all clean, tidy and free from shameful or sensitive information, is time to do one last thing: shrink your repository.

See, as I said before in the Cloning a repository section, git-filter-branch does store some copies of the state of the repository as it goes modifying it. Now that we got here, we don't need or want those copies. Time to get rid of them.

Go back to the Cloning a repository section and create another clone of your repository using the procedures explained there. This should give you a clean and neat clone to export/upload.

5.2   Final check

Use a tool like GitX or gitk to analyse your history and look for any missing or pending problem. Are there any empty branches you want to remove? Do the commit messages look good? Does your project has any tag or branch that should not be exported or that makes no sense in being exported? Remove them.

Fix those issues and shrink your repository once again. Yeah, your heard me right: go clean your repo once again!

Good boy.

5.3   Export it.

Well, time to export :-) Hooray! But export to where?

Well, there are countless options -- you could setup your own git environment or use something like GitHub. I strongly recommend you taking the latter. Just head to GitHub's page, setup an account and click on the "New repository" button. Fill the presented form and follow the steps presented there. And that's it :-) Your code now lives in a public Git repository and is there for the whole world to see. Hope you are proud of if -- I really do. ;)

6   Closing remarks

6.1   Some missing things and TODOs

I merely covered the steps I usually perform when I move code to GitHub from old subversion and CVS repositories of mine that used to hold stuff from my masters and PhD -- so, there your got it, lame code ;)

This means that there are tons of stuff I don't cover here. For instance:

  • How to add a copyright notice to all header files, from their first commit and make them persist across all changes?
  • How to do the opposite: remove comments or copyright notices from files and make this removal persist across changes to the files?
  • Edit the contents of some particular commit message.

And so many other issues I don't have to deal with since I own the code I am releasing. Or because I am lazy to fix everything. :-)

Your mileage may vary ;)

Teias e mais teias

Monday, April 11th, 2011

Sim, esse blog está entregue às baratas.

Não, não há nada de interessante aqui, exceto que faço 30 anos em 4 dias.

Grato.

Private GIT repositories (on DreamHost)

Monday, November 8th, 2010

This is yet another guide describing how to setup private HTTP-accessible Git repositories on Dreamhost using Git’s git-http-backend (a.k.a git’s Smart HTTP protocol). While similar guides can easily be found by the thousands in the Web (I’ve listed some of them in the Refereces section), I’ve found that some guides have outdated information or that the setup described in them could be improved. Thus, this guide tries to update, improve and consolidate the information dispersed in such sources.

(more…)

Dia do Blog

Tuesday, August 31st, 2010

Bom, disseram por aí que hoje é o dia do blog. Aproveitando a deixa, mando o link para uma matéria entitulada “Como ser índo” do EpicShit, o blog recomendado de hoje.

Exercícios e Músicas

Thursday, November 24th, 2005

Tirar a poeira do blog e dizer que se exercitar escutando música melhora a sua capacidade mental, dizem os especialistas.

E é isso ai :-)

The Strokes – I Can’t Win

Filosofia de Bar

Wednesday, September 21st, 2005

Envelhecer é acumular frustrações

Eu… meia garrafa de vodka depois

***
Propellerheads – Crash

Ajax, prototype.js, MochiKit

Thursday, September 8th, 2005

Acho que estou ficando velho e avesso a ficar aprendendo linguagens novas apenas pela mastrubação mental. JavaScript então consegue ir além da simples preguiça — a linguagem é horrivel de ler, estranha demais! Parece que existem agora algumas ferramentas que melhoram sensivelmente esse problema, fazendo programação AJAX ficar trivialmente fácil.

Doublecast

Monday, May 23rd, 2005

Fazia um bom tempo que eu não aparecia por aqui. Resolvi vir e fazer um post mais para fins de bookmarking do que para qualquer outra coisa. É, outras coisas têm ocupado a minha mente e me envolvido numa áurea de culpa da qual está cada vez mais complicado me despir.

De qualquer forma, eu ainda não esqueci esse blog. Acreditem. CSS Style for you WordPress RSS Feed.

***

Toda vez que eu volto de Fortaleza uma gripe se prende às minhas pernas e eu tenho que carregá-la por pelo menos uma semana a tira-colo para poder livrar-me dela.

Dessa vez peguei uma amidalite, com direito a antibiótico e tudo mais. Pelo visto a amidalite já deu uma trégua, mas deixou uma saudosa (?!) gripe no lugar. Realmente, foi uma ótima semana em Fortaleza.

***

Eu deveria ter me afastado deles por mais tempo, mas uma tarde de domingo morosa sempre é uma boa desculpa para retomar a antigos vícios.

O chato é que parece que, logo agora que eu finalmente consegui fazer o Titus e a Yuna se agarrarem, terei que me afastar deles mais uma vez. Por mais uma semana, quem sabe? Quem sabe eu consigo pegar o doublecast da próxima vez?!

***

Sim, falando em vícios, o python-link do dia é esse: The Python Chalange. Descubra o quando você realmente sabe usar a sua linguagem preferida. Mas não vale “trapacear”.

Escandalos nas comunidades On-lines

Friday, April 1st, 2005

Ontem a notícia da hora era que Matt Mullenweg, o homem atrás do WordPress, o sistema de blogging usado por todo mundo da
C9 e bastante popular mundo afora, estaria participando de um esquema link-spam. Não vou entrar no mérito de julgar o que ele fez. Já existem opiniões demais sobre isso na Web, para que mais uma?

O único ponto que eu gostaria de fazer é sobre como, de repente, as pessoas podem criar grande comunidades na internet e como isso traz, mesmo que a contra gosto, responsabilidade. Como bem poderia dizer meu amigo Rommel, “onde tem o humano, existe o politico”.

Hoje, pelo menos para mim, a bomba do dia é que o sítio dos Little-Gamers foi fechado pela MPAA. Sinceramente não sei o que pensar…

É pop e é ruim

Wednesday, March 23rd, 2005

Acordei com vontade de mandar alguém ir tomar onde o sol não brilha. Abro meus e-mails e vejo alguém dizendo mais uma pérola do gênero “Ah, Dj Fulano é muito pop… e é ruim.”

Para que tanta hipocrisia! Larga de frescura e diga logo que você detesta-o apenas porque ele é pop, seu pescoçudo idiota!

***

Nerds: por quê eles gostam do .org?

50% menos ajuizado

Saturday, March 19th, 2005

Sangue, sugadores, bisturis, brocas, várias injeções de anestesia, 3 horas de operação e 2 dentes a menos. E isso ai comunidade, agora é sério: desde quinta passada estou 50% menos sisudo.

(Dá para notar que eu só postei isso para estreiar a minha nova instalação do wordpress, com spam-karma e tudo mais?!)

Condenado à liberdade

Thursday, December 9th, 2004

Essa liberdade é muito mais uma tortura do que uma dádiva. Daí que Sartre dizia que estávamos “condenados à liberdade”. Não escolhemos existir e muito menos no miserável, angustiante e nauseabundo mundo existencialista. Mas, uma vez lançados na vida ao nascer, somos responsáveis por tudo o que fazemos.

Mais aqui e aqui.

Marky – Hatiras, J. Majik – Space Invader

Então você resolveu hospedar seu próprio serviço de e-mail com Linux…

Monday, October 18th, 2004

Eu nem sei bem como eu acabei batendo nisso: So you want to host your own Linux mail server… ). Provavelmente deve ter sido através de um dos feeds RSS que eu leio.

De qualquer forma, com as recentes panes do moicas (o servidor de e-mails do .burocrata.org), esse assunto tem atraindo bastante a minha atenção e esse post e a sua seqüência explicam muito bem o porquê e o como de ter o seu próprio servidor de e-mails – usando Debian :-D

Updated: Link para a slashdot…

Reflexões Matinais

Monday, September 20th, 2004

Hoje de manhã, enquanto o meu fígado reclamava os excessos do dia anterior, eu comecei a ponderar algumas coisas, algumas atitudes minhas, coisas do tipo: “eu não tenho mais idade apra me comportar feito moleque, que eu devia arrumar uma moça boa para mim, para me aquietar e começar a agir feito gente grande” mas… bah! Que se dane! [piada sem graça]Eu só tenho 1,67m mesmo![/piada sem graça]. ;-)

[tema de 2001 - Uma odiséia no Espaço]
Ah! Aproveitando o besteirol: alguém tem aque avisar para os americanos que eles podem falar “Bum”, mas que eles não podem dizer “Bum Bum Bum Bum Bum Bum Bum”
[/tema]

Agnelli & Nelson – Everyday 2002 (alex gold mix) @ DI

It just won’t do

Wednesday, September 1st, 2004

Duas semanas infernais. No final delas, um backtrace dá as caras:

Depois dele, duas ótimas noites de sono.

FatBoySlim @ Brigton Beach – Tim Deluxe – It Just Won’t Do

Eu “surtarei”…

Wednesday, August 25th, 2004

Eu surtarei em 3 dias. Daqui a 3 dias ocorrerá um festival de sushi. Daqui a 3 dias vai ter show do Bruno e Marrone (cruzes!) e eu cantarei “Amor de Carnaval” completamente entorpecido. Daqui a 3 dias será sábado. Eu surtarei em uma boa hora.

Até lá, mais webcomics no-no-sense, e milhares de horas frente ao gdb

The Cure – Between Days (acustic)

Telemar acha “Fácil” tirar doce de criança

Sunday, August 22nd, 2004

Para todos aqueles que adoram “teorias conspiratórias”, saiu uma ótima matéria no Observatório da Imprensa sobre o papel real da Fácil Internet no jogo sujo da Telemar.

Beck – Dead Melodies

Taglines

Monday, July 12th, 2004

Essa é das antigas mas, de repente, voltou a fazer um baita sentido:

“I may be drunk, but in the morning I will be sober, while you will still be stupid and ugly.”

Winston Churchill

Agradáveis surpresas

Tuesday, June 22nd, 2004

… são tão melhores quando você não tenta racionalizá-las. E elas ficam ainda melhores quando você deixa ao tempo o trabalho de decantá-las para extrair o que de mais sincero havia

No doubt – Bathwater

You and your museum of lovers
The precious collection you’ve housed in your covers
My simpleness threatened by my own admission

And the bags are much too heavy
In my insecure condition
My pregnant mind is fat full with envy again

But I still love to wash in your old bathwater
Love to think that you couldn’t love another
I can’t help it…you’re my kind of man

Wanted and adored by attractive women
Bountiful selection at your discretion
I know I’m diving into my own destruction

So why do we choose the boys that are naughty?
I don’t fit in so why do you want me?
And I know I can’t tame you…but I just keep trying

‘Cause I love to wash in your old bathwater
Love to think that you couldn’t love another
I’m on your list with all your other women
But I still love to wash in your old bathwater
You make me feel like I couldn’t love another
I can’t help it…you’re my kind of man Why do the good girls always want the bad boys?

So I pacify problems with kisses and cuddles
Diligently doubtful through all kinds of trouble
Then I find myself choking on all my contradictions

‘Cause I still love to wash in your old bathwater
Love to think that you couldn’t love another
Share a toothbrush…you’re my kind of man
I still love to wash in your old bathwater
Make me feel like I couldn’t love another
I can’t help it…you’re my kind of man

No I can’t help myself
I can’t help myself
I still love to wash in your old bathwater

Eu estou é saindo de menos

Saturday, June 19th, 2004

Vou te contar… às vezes eu imagino que a trilha sonora da minha vida não poderia passar sem “The Offspring – Pretty Fly (for a white guy)”. Ontem foi um exemplo clássico de um momento em que essa música soaria alto, numa sucessão de flashes de cenas de comédia pastelão. E tudo isso por que eu não queria, não podia, não me permitiria passar uma única sexta-feira em casa.

Eu já devia ter previsto. Quem é D’Artangnon sem Athos, Porthos e Aramis? O que é o Acquaman sem os outros Superamigos? Quem sou eu para achar que eu me daria bem saindo só para a inauguração de uma boate e nova, onde só rolaria música eletrônica? Quem? :shock:

Confesso que as minhas expectativas não estavam muito altas. Não tinham como estar. Tudo que eu queria era, mais uma vez, escutar um bom “tunts-tunts”, só, na minha, como tantas vezes eu fiz em Fortaleza. Inevitavelmente eu encontraria o clássico tipo de público que sempre freqüenta esses cantos, na clássica distribuição 4 cuecas para 1 calcinha, e quanto a isso eu não esperava surpresas. O que eu não esperava, no entanto, era encontrá-los em um número tão reduzido.

Eu devia ter previsto, eu devia ter pressentido que as coisas iam ser ironicamente decepcionantes. Eu não devia ter sequer tirado o pé para fora do carro. Devia ter seguido para os locais de sempre. Mas nããããããão, :cool: eu tinha que inventar de ir para mais uma furada! Recapitulando: à 1 da manhã praticamente não havia carros estacionados em frente à casa, ao entrar eu dei de cara com a porta de vidro, ao tentar ir no banheiro masculino fui impedido porque uma menina estava trancada lá dentro (?!), e de todas as 25 pessoas que estavam no estabelecimento, contando com a equipe de apoio, de bar, seguranças e tudo mais, eu era o único sujeito que não sabia os nomes de todos os outros 24 presentes. Será que tinha algo de errado? O ápice da noite foi quando eu, fatidica e completamente sóbrio e desanimado às 2:30 da noite, ainda cogito tentar a sorte em outro canto! Não apenas cogito como me dirijo para lá. Convenhamos, no auge da repescagem eu ainda ia ter cara de preparar a vara, colocar a isca e esperar que alguma coitada ainda mordesse a minha linha? Coitada e embreagada, diga-se de passagem, porque, na repescagem, as sóbrias ou foram embora copos atrás, já devidamente acompanhadas, ou não vão se submeter ao vexame de ser fisgada na repescagem.

É… um dia eu aprendo. Espero. ;-)