Scribbles & Snippets

... of low and no utility ...

Latest Posts

Running Cromwell on AWS/Batch
- Posted: 2018-11-21.

Parallel mysql myisam repair
- Posted: 2018-11-21.

Does a TKI like Crizotinib kill tumor cells ?
- Posted: 2018-10-28.

Save Spotify to flac or mp3
- Posted: 2018-07-14.

Replace all symlinks by the original file
- Posted: 2018-03-02.

Log in

Converting EndNote to BibTex

Posted on 2011-09-01 12:05:09
by Geert Vandeweyer

Loading Content

Comments

Loading Comments

Converting EndNote to BibTex

Posted on 2011-09-01 12:05:09
by Geert Vandeweyer

Converting end note references to bibtex is a pain. However, some steps can be automized to make it a least bearable. The steps below are no guarantee the result will be perfect, but if all goes well, you will have won a few hours for large documents. I've tested this procedure only with .docx files for msword, and endnote X4.

1. Preparing the word document

First step is preparing the document. What has worked best for me is selecting everything, setting the font to times new roman, size 11pt. The less fancy formatting is present, the better it seems to work.

2. Export the EndNote library

While the document is open, go to endnote. You should now see a seperate library for the open document containing all the references. Select them all. If option is not available, just select all the references in the main library. In 'edit-output styles', make sure the BibTex option is available. Now, with all references selected, go to 'File-Export'. Select Text File (TXT) and BibTex Export as output style. Press OK. You now have .txt file containing bibtex entries.

3. Trim and prepare the BibTex library

Now copy the exported file to a linux machine. On this machine, open a text editor and paste the following code to a file named 'bibtexparser.pl'. This program will scan the bibtex file and assign unique identifiers to each entry. Notes and keywords are discarded, as they are not needed in a standard natbib/bibtex style. Also, curly brackets are added to titles to preserve capitalisation.

#!/bin/perl
# convert to unix file format
$head = `head -n 1 $ARGV[0]`;
if ($head =~ m/\r\n$/) {
	$lineend = "\r\n";
}
elsif ($head =~ m/\n\r$/) {
	$lineend = "\n\r";
}
elsif($head =~ m/\r$/) {
	$lineend = "\r";
}
else {
	$lineend = "\n";
}

# prepare key hash
my %keys;
my @alph = ( "a" .. "z" );
my %alph_index;
@alph_index{ @alph } = (0 .. $#alph);

# open file handles
open IN, $ARGV[0];
open OUT, ">clean_$ARGV[0]";

# go !
my $started = 0;
$counter = 0;
$linecount = 0;
$continue = 0;
my $key;
my $entry = '';
my $firstline = '';
my $firstauthor = '';
my $year = '';
$skip = 0;
while(<IN>) {
	my $line = $_;
	$linecount++;
	$line =~ s/$lineend$//;
	# check for running note/keywords, if so, print & continue
	if ($continue == 1) {
		if ($skip != 1) {
			$entry .= "$line\n";
		}
		if ($line =~ m/\}/) {
			# note is finished
			$continue = 0;
			$skip = 0;
		}
		next;
	}
 
	if ($line eq "") {
		# keep empty lines
		$entry .= "\n";
		next;
	}
	if ($started == 0 && $line =~ m/\@\w+\{/) {
		# reference started
		$started = 1;
		# strip any reference id present
		$line =~ s/(\{.*)$/\{/;
		$firstline = $line;
		next;
	}
	if ($started == 1 && $line =~ m/^\}$/) {
		$started = 0;
		# store key
		$key = $firstauthor.$year;
		while ($keys{$key} == 1) {
			## exists, attach a,b,c etc
			#my $tmp = $key;
			$key =~ m/(\D*)(\d*)(\D*)/;
			my $idx = 0;
			if ($3 ne "") {
				$idx = $alph_index{ $3 } + 1;
			}
			$key = $1.$2.$alph[$idx];
		}
		$keys{$key} = 1;
		if ($key ne "") {
			$key .= ",";
		}
		$entry = $firstline.$key."\n".$entry."}\n";
		print OUT $entry;
		$entry = '';
		$firstauthor = '';
		$year = '';
		$counter++;
		next;
	}
	if ($started == 1 && $line =~ m/^\s+\w+\s=\s\{/) {
		# valid line : keyword = {value}
		## get first author
		if ($line =~ m/^\s+author\s=\s\{([\w\s]+)/) {
			$firstauthor = $1;
			$firstauthor =~ s/\s//g;	
		}
		## get year
		elsif ($line =~ m/^\s+year\s=\s\{(\d+)/) {
			$year = $1;
		}
		elsif ($line =~ m/^\s+title\s=\s\{(.*)\},$/) {
			#print "title: $line\n";
			$line = "   title = {{$1}},";
		}
		# check for multiline note/keywords/abstracts, by missing closing curly bracket...
		if ($line !~ m/\}/) {
			
			$continue = 1;
			if ($line =~ m/^\s+note\s=/ || $line =~ m/^\s+keywords\s=/) {
				$skip = 1;
				next;
			}
		}
		elsif ($line =~ m/^\s+note\s=/ || $line =~ m/^\s+keywords\s=/) {
			next;
		}
		$entry .= "$line\n";
		next;
	}
	# what remains : started == 0 for some reason, and invalid lines (no keyword = value combo)
}
print "\n\tDone: Found $counter references\n";
close OUT;
close IN;

Now convert the bibtex file using the following commands:

# first convert to unix file format !
dos2unix export.txt
# or 
fromdos export.txt
 
# run perl script
perl bibtexparser.pl export.txt
 
# rename the cleaned bibtex file
mv cleaned_export.txt BibTexFile.bib

4. Create a .tex file from the main document

Next step is to create a starting tex file from the docx document. We will use abiword for the conversion, so make sure you have it installed. The command will create a maintext.tex file.

abiword --to=tex maintext.docx

Now check the file with your favourite text editor. If the font & size settings were correct, there shouldn't be any small big or other font-related commands surrounding the paragraphs. commands like flushleft, spacing, or newpage are normal and will be taken care of later.

5. Replace the references

We will now create another perl script that will scan both the main document file and the bibtex file, and replaces all HREF entries with cite commands. Name it texparser.pl

#!/bin/perl

$texfile = $ARGV[0];
$bibfile = $ARGV[1];

## load bibtex
my %bib;
my %doubles;
my $started = 0;
$continue = 0;
my $key;
my $entry = '';
my $firstline = '';
my $firstauthor = '';
my $year = '';
my $title = '';
open IN, $bibfile;
while (<IN>) {
	my $line = $_;
	chomp($line);
	# check for running note/keywords, if so, print & continue
	if ($continue == 1) {
		if ($line =~ m/\}/) {
			# note is finished
			$continue = 0;
		}
		next;
	}
 
	if ($line eq "") {
		next;
	}
	if ($started == 0 && $line =~ m/\@\w+\{/) {
		# reference started
		$started = 1;
		# strip any reference id present
		$line =~ m/\{(.*),$/;
		$key = $1;
		next;
	}
	if ($started == 1 && $line =~ m/^\}$/) {
		$started = 0;
		# store item
		if (exists($bib{$title}) && ($year != $bib{$title}{'year'} || $firstauthor ne $bib{$title}{'author'})) {
			# true double item
			if ($bib{'title'}{'author'} ne 'double entry') {
				$doubles{$bib{$title}{'key'}}{'title'} = $title;
				$doubles{$bib{$title}{'key'}}{'author'} = $bib{$title}{'author'};
				$doubles{$bib{$title}{'key'}}{'year'} = $bib{$title}{'year'} = $year;
				$bib{$title}{'author'} = 'double entry';
			}
			$doubles{$key}{'title'} = $title;
			$doubles{$key}{'year'} = $year;
			$doubles{$key}{'author'} = $author;
		}
		else {
			$bib{$title}{'key'} = $key;
			$bib{$title}{'year'} = $year;
			$bib{$title}{'author'} = $firstauthor;
		}
		$title = '';
		$key = '';
		$year = '';
		$firstauthor = '';
		next;
	}
	if ($started == 1 && $line =~ m/^\s+\w+\s=\s\{/) {
		# valid line : keyword = {value}
		## get first author
		if ($line =~ m/^\s+author\s=\s\{([\w\s]+)/) {
			$firstauthor = $1;
			#$firstauthor =~ s/\s//g;	
		}
		## get year
		elsif ($line =~ m/^\s+year\s=\s\{(\d+)/) {
			$year = $1;
		}
		elsif ($line =~ m/^\s+title\s=\s\{\{(.+)\}\}/) {
			$title = $1;
		}
		# check for multiline note/keywords/abstracts, by missing closing curly bracket...
		if ($line !~ m/\}/) {
			$continue = 1;
		}
		next;
	}
	# what remains : started == 0 for some reason, and invalid lines (no keyword = value combo)
}
close IN;

## scan document for hypertarget tags
my %targets;
open IN, $texfile;
while (<IN>) {
	my $line = $_;
	chomp($line);
	 if ($line =~ m/hypertarget\{_ENREF_(\d+)\}\{\d+\.\s+/) { #([\w\s]+),.*:\s+\\textbf\{([^\}]+)\}\.\s.+\}(\d+),.*\}/) {
		my $ref = $1;
		## replace abiword convertor oddities
		$line =~ s/(\{\`\`\})|('')/"/g;
		# set variables
		my $firstauthor = '';
		my $year = '';
		my $title = '';
		#print "$line\n";
		# should match, standard article style
		if ($line =~ m/hypertarget\{_ENREF_(\d+)\}\{\d+\.\s+([\w\s]+),{0,1}.*:\s+\\textbf\{([^\}]+)\}(.*)/) { #\.\s.+\}(\d+),.*\}/) {
			$firstauthor = $2;
			$title = $3;
			$rest = $4;
			$firstauthor =~ s/\s\w+$//;
			if ($rest =~ m/\.\s.+\};{0,1}\s*(\d+)[\(,\.].*\}/) {
				$year = $1;
			}
			elsif ($rest =~ m/\s(\d+):\s\d+\-\d+/){
				$year = $1;
			}
		}
		elsif ($line =~ m/hypertarget\{_ENREF_(\d+)\}\{\d+\.\s+\\textbf\{([^\}]+)\}\.\s.+\}(\d+),.*\}/) {
			$title = $2;
			$year = $3;
		}
		if ($title eq "" || $year eq "") {
			print "PROBLEM SCANNING REFERENCE:\n";
			print "###########################\n";
			print "   $line\n";
			print "\nPlease Provide the needed items: \n";
			print "Year: ";
			$year = <STDIN>;
			chomp($year);
			print "First Author Last Name: ";
			$firstauthor = <STDIN>;
			chomp($firstauthor);
			print "Title: ";
			$title = <STDIN>;
			chomp($title);
		}	
		if (!exists($bib{$title})) {
			print "=> ENTRY NOT FOUND : $title\n";
		}
		elsif ($bib{$title}{'author'} ne 'double entry') {
			$targets{$ref} = $bib{$title}{'key'};
			#print "$ref maps to $bib{$title}{'key'}\n";
		}
		else {
			print "=> DOUBLE ENTRY : $title\n";
		}
			
	}
}
close IN;

## START REPLACING
open IN, $texfile;
open OUT, ">replaced_$texfile";
my $linecounter = 0;
while (<IN>) {
	$line++;
	my $line = $_;
	while ($line =~ m/(\\href{#_ENREF_\d+\}\{[\d\-,]+\})[,\]]/) {
		my $hit = $1;
		$hit =~ m/\\href{#_ENREF_\d+\}\{([\d\-]+)\}/;
		my @items = split(/,/,$1);
		my $replace = '';
		foreach (@items) {
			if ($_ =~ m/(\d+)\-(\d+)/) {
				my $first = $1;
				my $last = $2;
				for ($idx = $first; $idx <= $last; $idx++) {
					$replace .= $targets{$idx}.', ';
				}
			}
			else {
				$replace .= $targets{$_}.', ';
			}
		}
		if ($replace ne '') {
			$replace = '~\cite{'.substr($replace,0,-2).'}';
			$line =~ s/\[{0,1}\\href{#_ENREF_\d+\}\{[\d\-,]+\}[\],]{0,1}/$replace/;
		}
		else {
			print "replacement failed for : '$hit'\n";
		}
		$exit = 1; 
		
	}
	# merge citations now seperated by space
	if ($line =~ m/~\\cite\{[\w,\s]+\}\s~\\cite\{/) {
		$line =~ s/~\\cite\{([\w,\s]+)\}\s~\\cite\{([\w,\s]+)\}/~\\cite\{$1, $2\}/;
	}
	print OUT $line;
}
close IN;
close OUT;

print "\n\n";
print "##########\n";
print "## DONE ##\n";
print "##########\n";
print "  Remember to check errors for missing entries and adding needed bibtex commands to preamble and end of document.\n";

Start the script by the command below. If some non standard entries are found in the reference list in the main document, you will be asked for first author, year and title. These will be used to scan the bibtex entries for a match. the assigned identifiers are composed of FirstauthorYear with a-z attached if needed. The output file is named replaced_maintext.tex

perl texparser.pl maintext.tex BibTexFile.bib

6. Clean up the resulting file

The resulting tex file still contains a lot of obsolete commands. The following script removes most of it. Use with caution, it might remove some valid content !

#!/bin/perl

open IN, $ARGV[0];
open OUT , ">stripped_$ARGV[0]";
while (<IN>) {
	if ($_ =~ m/begin\{spacing/ || $_ =~ m/end\{spacing/ || $_ =~ m/\\newpage/ || $_ =~ m/\\begin\{flushleft\}$/ || $_ =~ m/\\end\{flushleft\}$/) {
		next;
	}
	print OUT $_;
}
close IN;
close OUT;

perl strip.pl replaced_maintext.tex

The output file will now be stripped_replaced_maintext.tex. Open this file in a text editor. You now need to add the natbib package to the preamble, and specify bibliography. Also, all the original references are still in the file, but can (and should) be deleted now.

% in the preamble
\usepackage[numbers, square, sectionbib,sort&compress]{natbib}

% at the end of the document, or where you need it to be.
\bibliographystyle{unsrtnat}
% tell the bib file to use
\bibliography{BibTexFile}

And now all should be fine. One error that is still in the method:

- some cites (rare, but existing) do not group (seperate \cite{} instead of \cite{one,two}

dos2unix, LaTeX, natbib, Perl, preamble

Comments

No comments yet. Be the first to leave one !

Categories

Latest Posts

Tag Collection

Log in

Converting EndNote to BibTex

Posted on 2011-09-01 12:05:09
by Geert Vandeweyer

Comments

Converting EndNote to BibTex

Posted on 2011-09-01 12:05:09
by Geert Vandeweyer

1. Preparing the word document

2. Export the EndNote library

3. Trim and prepare the BibTex library

4. Create a .tex file from the main document

5. Replace the references

6. Clean up the resulting file

dos2unix, LaTeX, natbib, Perl, preamble

Comments

Leave Comment

Loading...

Categories

Latest Posts

Tag Collection

Log in

Converting EndNote to BibTex

Posted on 2011-09-01 12:05:09by Geert Vandeweyer

Comments

Converting EndNote to BibTex

Posted on 2011-09-01 12:05:09by Geert Vandeweyer

1. Preparing the word document

2. Export the EndNote library

3. Trim and prepare the BibTex library

4. Create a .tex file from the main document

5. Replace the references

6. Clean up the resulting file

dos2unix, LaTeX, natbib, Perl, preamble

Comments

Leave Comment

Loading...

Posted on 2011-09-01 12:05:09
by Geert Vandeweyer

Posted on 2011-09-01 12:05:09
by Geert Vandeweyer