Categories
Latest Posts
Running Cromwell on AWS/Batch
- Posted: 2018-11-21.
Parallel mysql myisam repair
- Posted: 2018-11-21.
Does a TKI like Crizotinib kill tumor cells ?
- Posted: 2018-10-28.
Save Spotify to flac or mp3
- Posted: 2018-07-14.
Replace all symlinks by the original file
- Posted: 2018-03-02.
Tag Collection
spotify mysql bioinformatics BASH Conky NFS NGS telenet Installation Cluster PHP FTP TrainOfThought CPAN XFS compress line-end HPC ExecOnCommand Ubuntu Apache Searching natbib NSCLC whitespace Remote Paired-End fuzzy match levenshtein Recovery SLURM spam galaxy cromwell windows bibtex Batch R python apoptosis LaTeX Linespacing api dos2unix Yelo.tv VMware mpd Rcran torque/pbs sudoers terraform Password AWS Literature timer preamble todo.txt drmaa osd_cat Typesetting docker Headless javascript Perl Silverlight cloudformation Image tikz antialias indent VMplayer cotd proftpd fancy GATK
Log in
Converting EndNote to BibTex
Posted on 2011-09-01 12:05:09
by Geert Vandeweyer
Loading Content
Converting EndNote to BibTex
Posted on 2011-09-01 12:05:09
by Geert Vandeweyer
Converting end note references to bibtex is a pain. However, some steps can be automized to make it a least bearable. The steps below are no guarantee the result will be perfect, but if all goes well, you will have won a few hours for large documents. I've tested this procedure only with .docx files for msword, and endnote X4.
1. Preparing the word document
First step is preparing the document. What has worked best for me is selecting everything, setting the font to times new roman, size 11pt. The less fancy formatting is present, the better it seems to work.
2. Export the EndNote library
While the document is open, go to endnote. You should now see a seperate library for the open document containing all the references. Select them all. If option is not available, just select all the references in the main library. In 'edit-output styles', make sure the BibTex option is available. Now, with all references selected, go to 'File-Export'. Select Text File (TXT) and BibTex Export as output style. Press OK. You now have .txt file containing bibtex entries.
3. Trim and prepare the BibTex library
Now copy the exported file to a linux machine. On this machine, open a text editor and paste the following code to a file named 'bibtexparser.pl'. This program will scan the bibtex file and assign unique identifiers to each entry. Notes and keywords are discarded, as they are not needed in a standard natbib/bibtex style. Also, curly brackets are added to titles to preserve capitalisation.
#!/bin/perl # convert to unix file format $head = `head -n 1 $ARGV[0]`; if ($head =~ m/\r\n$/) { $lineend = "\r\n"; } elsif ($head =~ m/\n\r$/) { $lineend = "\n\r"; } elsif($head =~ m/\r$/) { $lineend = "\r"; } else { $lineend = "\n"; } # prepare key hash my %keys; my @alph = ( "a" .. "z" ); my %alph_index; @alph_index{ @alph } = (0 .. $#alph); # open file handles open IN, $ARGV[0]; open OUT, ">clean_$ARGV[0]"; # go ! my $started = 0; $counter = 0; $linecount = 0; $continue = 0; my $key; my $entry = ''; my $firstline = ''; my $firstauthor = ''; my $year = ''; $skip = 0; while(<IN>) { my $line = $_; $linecount++; $line =~ s/$lineend$//; # check for running note/keywords, if so, print & continue if ($continue == 1) { if ($skip != 1) { $entry .= "$line\n"; } if ($line =~ m/\}/) { # note is finished $continue = 0; $skip = 0; } next; } if ($line eq "") { # keep empty lines $entry .= "\n"; next; } if ($started == 0 && $line =~ m/\@\w+\{/) { # reference started $started = 1; # strip any reference id present $line =~ s/(\{.*)$/\{/; $firstline = $line; next; } if ($started == 1 && $line =~ m/^\}$/) { $started = 0; # store key $key = $firstauthor.$year; while ($keys{$key} == 1) { ## exists, attach a,b,c etc #my $tmp = $key; $key =~ m/(\D*)(\d*)(\D*)/; my $idx = 0; if ($3 ne "") { $idx = $alph_index{ $3 } + 1; } $key = $1.$2.$alph[$idx]; } $keys{$key} = 1; if ($key ne "") { $key .= ","; } $entry = $firstline.$key."\n".$entry."}\n"; print OUT $entry; $entry = ''; $firstauthor = ''; $year = ''; $counter++; next; } if ($started == 1 && $line =~ m/^\s+\w+\s=\s\{/) { # valid line : keyword = {value} ## get first author if ($line =~ m/^\s+author\s=\s\{([\w\s]+)/) { $firstauthor = $1; $firstauthor =~ s/\s//g; } ## get year elsif ($line =~ m/^\s+year\s=\s\{(\d+)/) { $year = $1; } elsif ($line =~ m/^\s+title\s=\s\{(.*)\},$/) { #print "title: $line\n"; $line = " title = {{$1}},"; } # check for multiline note/keywords/abstracts, by missing closing curly bracket... if ($line !~ m/\}/) { $continue = 1; if ($line =~ m/^\s+note\s=/ || $line =~ m/^\s+keywords\s=/) { $skip = 1; next; } } elsif ($line =~ m/^\s+note\s=/ || $line =~ m/^\s+keywords\s=/) { next; } $entry .= "$line\n"; next; } # what remains : started == 0 for some reason, and invalid lines (no keyword = value combo) } print "\n\tDone: Found $counter references\n"; close OUT; close IN;
Now convert the bibtex file using the following commands:
# first convert to unix file format ! dos2unix export.txt # or fromdos export.txt # run perl script perl bibtexparser.pl export.txt # rename the cleaned bibtex file mv cleaned_export.txt BibTexFile.bib
4. Create a .tex file from the main document
Next step is to create a starting tex file from the docx document. We will use abiword for the conversion, so make sure you have it installed. The command will create a maintext.tex file.
abiword --to=tex maintext.docx
Now check the file with your favourite text editor. If the font & size settings were correct, there shouldn't be any small big or other font-related commands surrounding the paragraphs. commands like flushleft, spacing, or newpage are normal and will be taken care of later.
5. Replace the references
We will now create another perl script that will scan both the main document file and the bibtex file, and replaces all HREF entries with cite commands. Name it texparser.pl
#!/bin/perl
$texfile = $ARGV[0];
$bibfile = $ARGV[1];
## load bibtex
my %bib;
my %doubles;
my $started = 0;
$continue = 0;
my $key;
my $entry = '';
my $firstline = '';
my $firstauthor = '';
my $year = '';
my $title = '';
open IN, $bibfile;
while (<IN>) {
my $line = $_;
chomp($line);
# check for running note/keywords, if so, print & continue
if ($continue == 1) {
if ($line =~ m/\}/) {
# note is finished
$continue = 0;
}
next;
}
if ($line eq "") {
next;
}
if ($started == 0 && $line =~ m/\@\w+\{/) {
# reference started
$started = 1;
# strip any reference id present
$line =~ m/\{(.*),$/;
$key = $1;
next;
}
if ($started == 1 && $line =~ m/^\}$/) {
$started = 0;
# store item
if (exists($bib{$title}) && ($year != $bib{$title}{'year'} || $firstauthor ne $bib{$title}{'author'})) {
# true double item
if ($bib{'title'}{'author'} ne 'double entry') {
$doubles{$bib{$title}{'key'}}{'title'} = $title;
$doubles{$bib{$title}{'key'}}{'author'} = $bib{$title}{'author'};
$doubles{$bib{$title}{'key'}}{'year'} = $bib{$title}{'year'} = $year;
$bib{$title}{'author'} = 'double entry';
}
$doubles{$key}{'title'} = $title;
$doubles{$key}{'year'} = $year;
$doubles{$key}{'author'} = $author;
}
else {
$bib{$title}{'key'} = $key;
$bib{$title}{'year'} = $year;
$bib{$title}{'author'} = $firstauthor;
}
$title = '';
$key = '';
$year = '';
$firstauthor = '';
next;
}
if ($started == 1 && $line =~ m/^\s+\w+\s=\s\{/) {
# valid line : keyword = {value}
## get first author
if ($line =~ m/^\s+author\s=\s\{([\w\s]+)/) {
$firstauthor = $1;
#$firstauthor =~ s/\s//g;
}
## get year
elsif ($line =~ m/^\s+year\s=\s\{(\d+)/) {
$year = $1;
}
elsif ($line =~ m/^\s+title\s=\s\{\{(.+)\}\}/) {
$title = $1;
}
# check for multiline note/keywords/abstracts, by missing closing curly bracket...
if ($line !~ m/\}/) {
$continue = 1;
}
next;
}
# what remains : started == 0 for some reason, and invalid lines (no keyword = value combo)
}
close IN;
## scan document for hypertarget tags
my %targets;
open IN, $texfile;
while (<IN>) {
my $line = $_;
chomp($line);
if ($line =~ m/hypertarget\{_ENREF_(\d+)\}\{\d+\.\s+/) { #([\w\s]+),.*:\s+\\textbf\{([^\}]+)\}\.\s.+\}(\d+),.*\}/) {
my $ref = $1;
## replace abiword convertor oddities
$line =~ s/(\{\`\`\})|('')/"/g;
# set variables
my $firstauthor = '';
my $year = '';
my $title = '';
#print "$line\n";
# should match, standard article style
if ($line =~ m/hypertarget\{_ENREF_(\d+)\}\{\d+\.\s+([\w\s]+),{0,1}.*:\s+\\textbf\{([^\}]+)\}(.*)/) { #\.\s.+\}(\d+),.*\}/) {
$firstauthor = $2;
$title = $3;
$rest = $4;
$firstauthor =~ s/\s\w+$//;
if ($rest =~ m/\.\s.+\};{0,1}\s*(\d+)[\(,\.].*\}/) {
$year = $1;
}
elsif ($rest =~ m/\s(\d+):\s\d+\-\d+/){
$year = $1;
}
}
elsif ($line =~ m/hypertarget\{_ENREF_(\d+)\}\{\d+\.\s+\\textbf\{([^\}]+)\}\.\s.+\}(\d+),.*\}/) {
$title = $2;
$year = $3;
}
if ($title eq "" || $year eq "") {
print "PROBLEM SCANNING REFERENCE:\n";
print "###########################\n";
print " $line\n";
print "\nPlease Provide the needed items: \n";
print "Year: ";
$year = <STDIN>;
chomp($year);
print "First Author Last Name: ";
$firstauthor = <STDIN>;
chomp($firstauthor);
print "Title: ";
$title = <STDIN>;
chomp($title);
}
if (!exists($bib{$title})) {
print "=> ENTRY NOT FOUND : $title\n";
}
elsif ($bib{$title}{'author'} ne 'double entry') {
$targets{$ref} = $bib{$title}{'key'};
#print "$ref maps to $bib{$title}{'key'}\n";
}
else {
print "=> DOUBLE ENTRY : $title\n";
}
}
}
close IN;
## START REPLACING
open IN, $texfile;
open OUT, ">replaced_$texfile";
my $linecounter = 0;
while (<IN>) {
$line++;
my $line = $_;
while ($line =~ m/(\\href{#_ENREF_\d+\}\{[\d\-,]+\})[,\]]/) {
my $hit = $1;
$hit =~ m/\\href{#_ENREF_\d+\}\{([\d\-]+)\}/;
my @items = split(/,/,$1);
my $replace = '';
foreach (@items) {
if ($_ =~ m/(\d+)\-(\d+)/) {
my $first = $1;
my $last = $2;
for ($idx = $first; $idx <= $last; $idx++) {
$replace .= $targets{$idx}.', ';
}
}
else {
$replace .= $targets{$_}.', ';
}
}
if ($replace ne '') {
$replace = '~\cite{'.substr($replace,0,-2).'}';
$line =~ s/\[{0,1}\\href{#_ENREF_\d+\}\{[\d\-,]+\}[\],]{0,1}/$replace/;
}
else {
print "replacement failed for : '$hit'\n";
}
$exit = 1;
}
# merge citations now seperated by space
if ($line =~ m/~\\cite\{[\w,\s]+\}\s~\\cite\{/) {
$line =~ s/~\\cite\{([\w,\s]+)\}\s~\\cite\{([\w,\s]+)\}/~\\cite\{$1, $2\}/;
}
print OUT $line;
}
close IN;
close OUT;
print "\n\n";
print "##########\n";
print "## DONE ##\n";
print "##########\n";
print " Remember to check errors for missing entries and adding needed bibtex commands to preamble and end of document.\n";
Start the script by the command below. If some non standard entries are found in the reference list in the main document, you will be asked for first author, year and title. These will be used to scan the bibtex entries for a match. the assigned identifiers are composed of FirstauthorYear with a-z attached if needed. The output file is named replaced_maintext.tex
perl texparser.pl maintext.tex BibTexFile.bib
6. Clean up the resulting file
The resulting tex file still contains a lot of obsolete commands. The following script removes most of it. Use with caution, it might remove some valid content !
#!/bin/perl open IN, $ARGV[0]; open OUT , ">stripped_$ARGV[0]"; while (<IN>) { if ($_ =~ m/begin\{spacing/ || $_ =~ m/end\{spacing/ || $_ =~ m/\\newpage/ || $_ =~ m/\\begin\{flushleft\}$/ || $_ =~ m/\\end\{flushleft\}$/) { next; } print OUT $_; } close IN; close OUT;
perl strip.pl replaced_maintext.tex
The output file will now be stripped_replaced_maintext.tex. Open this file in a text editor. You now need to add the natbib package to the preamble, and specify bibliography. Also, all the original references are still in the file, but can (and should) be deleted now.
% in the preamble \usepackage[numbers, square, sectionbib,sort&compress]{natbib} % at the end of the document, or where you need it to be. \bibliographystyle{unsrtnat} % tell the bib file to use \bibliography{BibTexFile}
And now all should be fine. One error that is still in the method:
- some cites (rare, but existing) do not group (seperate \cite{} instead of \cite{one,two}
dos2unix, LaTeX, natbib, Perl, preamble
Comments
No comments yet. Be the first to leave one !
Comments
Loading Comments