Monday, April 28, 2014

Coping tables in and out using psql

I have some tables that exist in another database (which is also remote).   Instead of doing cross-database queries, which a lot of RDBMS do not allow with remote queries, let's just copy the tables out into a CSV and then copy them back in.

I know there are a million ways to do this, but I'm going to use psql because its pretty easy and fast.  If you have a better way, please feel free to clue me in.

First, boot up psql (I just type psql into windows run), then we log in using our credentials      

Server [localhost]: example.com
Database [postgres]: example_database_name
Port [5432]:
Username [postgres]: example_username
psql (9.2.3, server 8.4.20)
WARNING: psql version 9.2, server version 8.4.
         Some psql features might not work.
WARNING: Console code page (437) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type "help" for help.

example_database_name=>          
Hopefully, you know where your destination is and what tables exist inside that database.  If you need a quick refresher (like I always do), type \d for quick listing of tables

example_database_name=> \d

In this example, I'm going to copy out the zipcodes table and move it to another database. I do this by running the \COPY command and plunking the output into a .csv file in destination of my choice.

example_database_name=> \COPY (SELECT * FROM zipcodes) TO 'C:\local\path\to\destination\zipcodes.csv' WITH CSV HEADER;
That took about ten seconds... not too bad for about 100,000 records.

Now lets exit the example database

example_database_name=> \q

OK--now let's connect to a different database using psql. Fire up psql again, with your other database's credentials (use the same methodology as the first code block above)

example_database_name2=> \COPY rebate_zipcodes FROM C:\Users\Dorothy\Desktop\zipcodes.csv WITH CSV HEADER;

Obviously, in both the \COPY statements you can omit the WITH CSV HEADER statement.  However, I like to have it so I know what the hell the table looks like in case I need to investigate the .csv, which I often do.

Great! That actually took about a minute. But regardless, this is not such a bad way to handle cross database table copying.

More info on copy and other psql commands:
http://www.postgresql.org/docs/9.2/static/sql-copy.html

Sunday, April 27, 2014

Connect to a database

If you are connecting to a remote database, create the database with your remote host first.  Go to the cpanel or whatever you have, and add a new PostgreSQL database (In this example, I will be using PostgreSQL).  Every hosting account is probably different, so I'm not going to bother with the specifics.
Added the database “rebates”.

I like to also use a database management tool locally. So, at this stage, I like to add my IP address so that I can connect locally to my remote database server. (This is fairly standard for authentication with most hosting sites, I believe). Generally, you will need a specific username, password, and database at this point for authentication purposes.

Added:  rebates with user admin from XX.XXX.XXX.0/24
Now, locally I open up my database management tool (I'm using SQL Manager Lite for PostgreSQL) and register my newly created database.




Then, simply connect to the newly registered database.  Success!

Great-- but that has nothing to do with our django app. Let's configure the database in the settings.py, which exists in the project directory file. Our project is called "rebates". Inside the "rebates" directory, there is a project specific directory (automatically created by django when you start the project),which also has the same name of "rebates".   Inside this directory, you will find your settings.py file. I realize this is a little confusing.

The tree looks like this, where:
rebates is the project and
rebate is the app
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> tree /f
Folder PATH listing for volume OS
Volume serial number is BA99-C83E
C:.
¦   manage.py
¦
+---rebate
¦       admin.py
¦       models.py
¦       tests.py
¦       views.py
¦       __init__.py
¦
+---rebates
        settings.py
        settings.pyc
        urls.py
        wsgi.py
        __init__.py
        __init__.pyc

I like to use notepad++ as my default editor.  I'm going to open my settings.py file with notepad++

(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> start notepad++ rebates\settings.py

Inside settings.py, there is a default area for DATABASE which looks like this.
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': 'mydatabase',
    }
}
You can find out more about these configurations for each type of database inside the django documentation:
https://docs.djangoproject.com/en/1.6/ref/settings/#databases
Let's configure our database with the remote PostgreSQL database
DATABASES = {
    'default': {
        'ENGINE':'django.db.backends.postgresql_psycopg2',
        'NAME': 'jaytarle_rebates',
        'USER': 'jaytarle_admin',
        'PASSWORD': 'XXXXXXXXXXXX',
        'HOST': 'jaytarlecki.com',
        'PORT': '5432',
    }
}

That should do the trick for the database, but before we close the settings.py file let's add rebate to our list of INSTALLED_APPS
INSTALLED_APPS = (
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'rebate',
)

Now that we have linked up the rebate app to our django project, let's go into the rebate directory and launch the models.py file:
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> start notepad++ rebate\models.py

Time to create our database model! So, I already have an idea of how I want to structure my first test table, which is called zipcodes.  As you may be able to guess, its a table of zipcodes and other associated info.  I already have the data in another database.  For the sake of my own sanity, I'm just going to copy the CREATE statement here, so I can translate it into the django database model language, as needed.

In PostgreSQL, our zipcodes table:
CREATE TABLE zipcodes (
  id SERIAL,
  zipcode VARCHAR(5) NOT NULL,
  state CHAR(2) NOT NULL,
  city VARCHAR(28) NOT NULL,
  type CHAR(1) NOT NULL,
  countyfips CHAR(5) NOT NULL,
  latitude DOUBLE PRECISION NOT NULL,
  longitude DOUBLE PRECISION NOT NULL,
  areacode CHAR(3) NOT NULL,
  financecode CHAR(6) NOT NULL,
  lastline CHAR(4),
  fac CHAR(1),
  msa CHAR(4),
  pmsa CHAR(4),
  modifieduser VARCHAR(128) NOT NULL,
  modifieddate TIMESTAMP WITHOUT TIME ZONE NOT NULL
) 
So, we can skip the id field, as django's backend will handle that for us.

from django.db import models

# Create your models here.
class Zipcodes(models.Model):
 zipcode = models.CharField(max_length=5)
 state = models.CharField(max_length=2)
 city = models.CharField(max_length=100)
 type = models.CharField(max_length=1)
 countyfips = models.CharField(max_length=5)
 latitude = models.FloatField()
 longitude = models.FloatField()
 areacode = models.CharField(max_length=3)
 financecode = models.CharField(max_length=6)
 lastline = models.CharField(max_length=4, null=True)
 fac = models.CharField(max_length=1, null=True)
 msa = models.CharField(max_length=4, null=True)
 pmsa = models.CharField(max_length=4, null=True)
 modifieduser = models.CharField(max_length=128, default='admin')
 modifieddate = models.DateField(auto_now=True)

 def __unicode__(self):
  return self.zipcode


Inside a virtual environment, you will need to get the windows binary that allows proper syncing with a postgre database (don't ask, just do it)
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> easy_install http://www.stickpeople.com/projects/python/win-psycopg/psycopg2-2.4.win32-pyx.x-pg9.0.3-release.exe

Now, create the database by syncing it in powershell
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> python manage.py syncdb

Create a superuser if prompted, and viola! You have just synced the database. Django will create a whole slew of tables which are very useful, but I won't explain in this post.

What we care about right now is the zipcodes table. Refresh your local rebates database inside your SQL Manager and lets see how we did.  Let's script out a CREATE TABLE statement and view how our model in Django created the table in PostgreSQL

CREATE TABLE public.rebate_zipcodes (
  id SERIAL,
  zipcode VARCHAR(5) NOT NULL,
  state VARCHAR(2) NOT NULL,
  city VARCHAR(100) NOT NULL,
  type VARCHAR(1) NOT NULL,
  countyfips VARCHAR(5) NOT NULL,
  latitude DOUBLE PRECISION NOT NULL,
  longitude DOUBLE PRECISION NOT NULL,
  areacode VARCHAR(3) NOT NULL,
  financecode VARCHAR(6) NOT NULL,
  lastline VARCHAR(4),
  fac VARCHAR(1),
  msa VARCHAR(4),
  pmsa VARCHAR(4),
  modifieduser VARCHAR(128) NOT NULL,
  modifieddate DATE NOT NULL,
  CONSTRAINT rebate_zipcodes_pkey PRIMARY KEY(id)
) 
WITH (oids = false);
Not too shabby... Looks just like the table we used as our template. Its time to start building out that model!

Setting up a django project

In powershell, navigate to directory where your apps reside
PS C:\Python27\home\jtarlecki\djcode> virtualenv .\django-jay

Then change directory and activate the virtualenv:
PS C:\Python27\home\jtarlecki\djcode> cd django-jay
PS C:\Python27\home\jtarlecki\djcode\django-jay> Scripts\activate.ps1

Afterwords, your cursor should look like this:
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay>

Now, install an instance of django inside this virtual environment:
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay> pip install django

 After this downloads and unpacks, start your django project
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay> django-admin.py startproject rebates

Then, navigate to newly created directory, and start your app within that project
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay> cd rebates(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> python manage.py startapp rebate

Now you are ready to begin programming!

If at any point, you want to kill the virtual environment, do the following:
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> deactivate

Now you have a new project and app started contained within its own virtual environment. Time to start coding!