Monday, May 5, 2014

Automating table copying

In the last post, I discussed the miserable way I was copying over an entire database.  It sucks.  So I wrote a little python code to print out all the sql statements to sequentially paste into the psql shell.

 At first, I tried to automate the entire project from python, but I failed. I believe its because my PostgreSQL database is remote (and shared on my hosting) so only a true superuser can call the \COPY command from a remote script.  Basically, my point of failure hinged around the cursor class of psycopg, specifically the copy_expert command.  Now, that I am typing this, file size might have been an issue as well.  I'll come back to this at some point.... maybe.

I tried for about an hour or two, but then realized that my attempts and full automation were taking longer than me copying over the entire database with partial-automation.  These are tough decisions to make.  I decided to push forward with some text outputs that would allow me to run psql scripts and quickly achieve some table updates.

Anyway, here is the python code.

from sys import argv
import os

script, table_name = argv
csv_full_path = os.path.dirname(os.path.realpath(__file__)) + '\%s.csv' % (table_name)
sp = '''

###

'''

def printq(q, i):
 i+=1
 print 'QUERY (' + str(i) + '):\n'
 print q + '\n'
 return i
 
def main():
 n=0
 print sp
 
 #1) drop serialization of primary key
 query = "ALTER TABLE rebate_%s ALTER COLUMN id DROP DEFAULT;" % (table_name)
 n=printq(query, n)
 
 #2) copy in records
 query = "\COPY rebate_%s FROM %s WITH CSV HEADER;" % (table_name, csv_full_path)
 n=printq(query, n)
 
 #3) add contstaint back to database
 query = "DROP SEQUENCE rebate_%s_id_seq; CREATE SEQUENCE rebate_%s_id_seq MAXVALUE 2147483647; ALTER TABLE rebate_%s ALTER COLUMN id TYPE INTEGER; ALTER TABLE rebate_%s ALTER COLUMN id SET DEFAULT nextval('public.rebate_%s_id_seq'::text);" % (table_name, table_name, table_name, table_name, table_name)
 n=printq(query, n)
 
 print sp
 
if __name__ == "__main__":
 main()

As you can see, there are the three (3) main operations that were followed in the previous post:
1. De-serialization of the primary key
2. Copy in records
3. Add constraint back to database

Run this from the command line with your "table name" as the first argument
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> python C:\Python27\psql\query.py zipcodes

You should get the following commands back:

###

QUERY (1):
ALTER TABLE rebate_zipcodes ALTER COLUMN id DROP DEFAULT; 
QUERY (2):
\COPY rebate_zipcodes FROM C:\Python27\psql\zipcodes.csv WITH CSV HEADER; 
QUERY (3):
DROP SEQUENCE rebate_zipcodes_id_seq; CREATE SEQUENCE rebate_zipcodes_id_seq MAXVALUE 2147483647; ALTER TABLE rebate_zipcodes ALTER COLUMN id TYPE INTEGER; ALTER TABLE rebate_zipcodes ALTER COLUMN id SET DEFAULT nextval('public.rebate_zipcodes_id_seq'::text);


###
In the directory where this python script resides, be sure to save your csv files there as well with correctly corresponding table names.  If they aren't, just make some tweaks to the code and you will be all set.

Copy those output queries into your psql shell and you are on your way to semi-automating a miserable task!

No comments:

Post a Comment