## Wednesday, May 21, 2014

### Version control with git

I've used UberSVN and TortoiseSVN for version control before, but I figured this time I will join the rest of the world and start using git.

I've played around with git before, but never started using it for real.  Why? I don't know. I'm sort of an idiot that doesn't version control a lot of my software. I would use subversion for work stuff, but for my own personal projects, I just build the software outright.  Generally, I don't really take on giant projects to pursue in my free time.

Anyway, here are my notes for setting up git.

Unpack and then navigate over to your powershell window. It doesn't matter if you are inside your virutalenv.

Initialize git:
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay> git init
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay> git add rebates
Commit the project with a message (-m):
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay> git commit -m 'Initial commit of rebate_info api'
If you don't already have a git repository online, make one.  When you have one, create an online 'origin' for the local project.
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay> git remote add origin https://github.com/jtarlecki/rebate_info.git
Check on the origin:
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay> git remote -v
origin  https://github.com/jtarlecki/rebate_info.git (fetch)
origin  https://github.com/jtarlecki/rebate_info.git (push)
Push the code to your git repository.
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay> git push -u origin master
Magic.

## Wednesday, May 14, 2014

### Windows issues with using django inside a virtualenv

Following up on a post with starting from scratch with a Python install, I found that starting a django project (without any dependencies -- truly in a virtual environment) was a little more difficult than I expected.

Turns out that when I was another machine, I was using the global install of django rather than running the isolated one contained in my virtual environment.

When did I realize this? I tried to run the standard django-admin.py
(django-jay) PS C:\Python27\home\jtarlecki\djcode> django-admin.py startprjoect rebates
And I got this:

django-admin.py : The term 'django-admin.py' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:1
+ ~~~~~~~~~~~~~~~
+ CategoryInfo          : ObjectNotFound: (django-admin.py:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
What's the issue? Well, windows really doesn't know about the Python executable created in the Scripts folder when the virtual environment is created, so instead we have to tell it explicitly about the module ((django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay> python -m django-admin startproject rebates) in which it should look to fire up python.  That and, we need to invoke the command with a leading python.  We also drop the .py from django-admin.py  Observe:

(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay> python -m django-admin startproject rebates

### Setting up python

This is pretty trivial, but I just booted up a brand new computer and have been in the process of downloading things for hours.

2. From powershell, run this to make sure you can run python from anywhere:
(django-jay) PS C:\Python27\> [Environment]::SetEnvironmentVariable("Path", "$env:Path;C:\Python27", "User") 3. Install PIP. Download get-pip.py, then run it from your powershell window: (django-jay) PS C:\Python27\> python get-pip.py 4. Set the PATH environment variable to include the C:\Python27\Scripts directory (django-jay) PS C:\Python27\> [Environment]::SetEnvironmentVariable("Path", "$env:Path;C:\Python27\Scripts", "User")
5. Quit out of powershell, and reboot it.  Now run the following command:
PS C:\Python27> pip install virtualenv

That's it for the stuff I need! I'll update this as it evolves.

## Monday, May 5, 2014

### Webservices with TastyPie

We're going to turn our Django model into a web service using the python package "tastypie"

What is tastypie? Honestly, I don't really know. But, I do know how to make it return .json and xml API-like resources for a properly set up database schema!

Let's install tastypie.  From inside our django project directory use pip:
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> pip install django-tastypie
Let's also pip some additional xml support.
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> pip install defusedxml "lxml>=3"

Now, let's create a new file inside our app (rebate) called api.py
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> start notepad++ rebate\api.py

Inside api.py, append the following code
from tastypie.resources import ModelResource
from tastypie.constants import ALL
# constant ALL sets the types of query types we can perform on our models.
from models import Zipcodes

class ZipcodesResource(ModelResource):
class Meta:
queryset = Zipcodes.objects.all() #this is everything in the zipcodes table
resource_name = 'zipcodes'
# basically, this give the url path
# localhost:8000/<project-level-url>/<app-level-url>/resource_name/
When we first ran django-admin.py startapp rebate , django created a collection of files inside our rebate app:
+---rebate
¦       models.py
¦       tests.py
¦       views.py
¦       __init__.py
So, we just added api.py, now we need to add urls.py to our app to help give this thing some specificity within the app. It doesn't really matter where we put it (we could put this at the project level) but for now, let's leave it here:
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> start notepad++ rebate\urls.py
Inside this app-level urls.py file, append the following code:
from django.conf.urls import patterns, include, url
from rebate.api import ZipcodesResource

#now that ArticleResource is imported, create an instance of it
zipcode_resource = ZipcodesResource()

urlpatterns = patterns('',
url(r'^api/',include(zipcode_resource.urls)),
# these urls are automagically created by our ModelResource class from api.py
)
To explain some of the comments above: we're going to pull in the ZipcodesResource() class from api.py then create an instance of it at runtime.  Then using some tastypie wizardry, the ModelResource class imported in api.py automagically creates some urls for us to use. Le'ts promise ourselves to read more about it here later.

At this point we are officially daisy-chaining some urls. Make sure your entire django project has some clue about its enclosed apps. Fire up the urls.py from the project directory  (rebates):
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> start notepad++ rebates\urls.py
Inside rebates\urls.py, append the following url pattern:
url(r'', include('rebate.urls')),
Basically, this is just a pass-through that allows the /api/ url we defined at the app-level to be the top-level url at the project-level. Confused? Yes, it is confusing.  If you mess around with it, you will get it.

OK great! Now lets run the server
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> python manage.py runserver
And navigate our web browser to the following :

http://localhost:8000/<project-level-url>/<app-level-url>/resource_name/
or
http://localhost:8000/api/zipcodes/
as our <project-level-url> is simply a pass-through as described above.

Oops! Tastypie isn't the cleanest with urls.  You should see the following message prompting you to put a little additional info into your url
OK. Let's append that to the end of our url.

http://localhost:8000/api/zipcodes/?format=json
Awesome! We get a response, along with some metadata to help give context for what the response actually is.

 Its a big table, so tastypie by default gives responses in chunks of twenty (20) records.  Screen snippet above.
By default, tastypie also allows you to put a primary key number as the next-level-down url. Say we wanted to see the detail on the first object pictured above (id=45143), we could call it with the following url:
http://localhost:8000/api/zipcodes/45134/?format=json
Which give the following resultant:

 response
Some of you probably noticed the recouse_uri field annotated in the previous picture. All I did was copy that in to my browser. Notice that tastypie has a built in url pattern for primary keys of tables you hook into its built-in resources, which has the form:

http://localhost:8000/api/zipcodes/<pk>/?format=json
where <pk> is the primary key number from your table.
I'm still getting xml errors because of something missing in my installed lxml.  I tried re-installing from pip, but still get the error. For now, I'm only concerned with .json responses anyway, so I'm just going to skip for now.

If we want to add a filter to our urls (say, we want a response based on a specific query), we can add a filter to the api.py file.  Let's say we want to search for zipcode information by zipcode, since someone who is using this api probably has no idea what our primary keys are in our database.

Thus, let's put a zipcode filter in our api.py file.. append the following inside the meta class of the  ZipcodesResource class:

filtering = {
"zipcode": ('exact'),
}
There are many different filters you can add, but right now we will look at the exact match.

Access the following url

http://localhost:8000/api/zipcodes/?zipcode=19104&format=json

Voila! Now we have a list of zipcode info for 19104!  Read up on the tastypie docs for more info.

### Automating table copying

In the last post, I discussed the miserable way I was copying over an entire database.  It sucks.  So I wrote a little python code to print out all the sql statements to sequentially paste into the psql shell.

At first, I tried to automate the entire project from python, but I failed. I believe its because my PostgreSQL database is remote (and shared on my hosting) so only a true superuser can call the \COPY command from a remote script.  Basically, my point of failure hinged around the cursor class of psycopg, specifically the copy_expert command.  Now, that I am typing this, file size might have been an issue as well.  I'll come back to this at some point.... maybe.

I tried for about an hour or two, but then realized that my attempts and full automation were taking longer than me copying over the entire database with partial-automation.  These are tough decisions to make.  I decided to push forward with some text outputs that would allow me to run psql scripts and quickly achieve some table updates.

Anyway, here is the python code.

from sys import argv
import os

script, table_name = argv
csv_full_path = os.path.dirname(os.path.realpath(__file__)) + '\%s.csv' % (table_name)
sp = '''

###

'''

def printq(q, i):
i+=1
print 'QUERY (' + str(i) + '):\n'
print q + '\n'
return i

def main():
n=0
print sp

#1) drop serialization of primary key
query = "ALTER TABLE rebate_%s ALTER COLUMN id DROP DEFAULT;" % (table_name)
n=printq(query, n)

#2) copy in records
query = "\COPY rebate_%s FROM %s WITH CSV HEADER;" % (table_name, csv_full_path)
n=printq(query, n)

#3) add contstaint back to database
query = "DROP SEQUENCE rebate_%s_id_seq; CREATE SEQUENCE rebate_%s_id_seq MAXVALUE 2147483647; ALTER TABLE rebate_%s ALTER COLUMN id TYPE INTEGER; ALTER TABLE rebate_%s ALTER COLUMN id SET DEFAULT nextval('public.rebate_%s_id_seq'::text);" % (table_name, table_name, table_name, table_name, table_name)
n=printq(query, n)

print sp

if __name__ == "__main__":
main()

As you can see, there are the three (3) main operations that were followed in the previous post:
1. De-serialization of the primary key
2. Copy in records
3. Add constraint back to database

Run this from the command line with your "table name" as the first argument
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> python C:\Python27\psql\query.py zipcodes

You should get the following commands back:

###

QUERY (1):
ALTER TABLE rebate_zipcodes ALTER COLUMN id DROP DEFAULT;
QUERY (2):
\COPY rebate_zipcodes FROM C:\Python27\psql\zipcodes.csv WITH CSV HEADER;
QUERY (3):
DROP SEQUENCE rebate_zipcodes_id_seq; CREATE SEQUENCE rebate_zipcodes_id_seq MAXVALUE 2147483647; ALTER TABLE rebate_zipcodes ALTER COLUMN id TYPE INTEGER; ALTER TABLE rebate_zipcodes ALTER COLUMN id SET DEFAULT nextval('public.rebate_zipcodes_id_seq'::text);

###
In the directory where this python script resides, be sure to save your csv files there as well with correctly corresponding table names.  If they aren't, just make some tweaks to the code and you will be all set.

Copy those output queries into your psql shell and you are on your way to semi-automating a miserable task!

## Saturday, May 3, 2014

### Copying an entire database into a django-based project using psql

This title will be a bit misleading.  I know what you are thinking...

Great this guy has a way for me to define a django model then copy in an entire database--- constraints, defaults, foreign keys and all!

Well, no, I don't have a great way to do that.  Is it possible? Probably. But I don't know how to do it.

I have some tables that I want to preserve the relationship structure. So when I add tables in my django project (models.py) then sync up to the database, I need to to disable the automatically generated primary key field, so I can insert the records into the database via the \COPY command.  Finally, I need to reinstate the serialization of the id field as a constraint.

So, first, connect to the database using the method described in a previous post.

Then create the database model in models.py. In a complex data model situation, I like to bring the tables in one at a time. That means, creating a class in models.py, then syncing to the database one table (or collection of related tables) at a time.

Let's add the States class, which is simply a listing of the United States. These will be used in conjunction with the previously created Zipcodes class:

class States(models.Model):
abbr = models.CharField(max_length=2)
name = models.CharField(max_length=35)
modifieddate = models.DateField(auto_now=True)
In the powershell window, let's sync this up to the database.
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> python manage.py syncdb
Which will hopefully result in some confirmation text saying that the table has been created.

Creating tables ...
Creating table rebate_states
Installing custom SQL ...
Installing indexes ...
Installed 0 object(s) from 0 fixture(s)
Now, connect to that database with psql and drop the constraint of the id field. Basically, the idea here is to change it from a SERIAL field to a simple INTEGER field.  The de-serialization drops the field's ability to be primary key in postgreSQL.

rebates=> ALTER TABLE rebate_states ALTER COLUMN id DROP DEFAULT;
Great-- with the default dropped, now let's copy in all the data.
jaytarle_rebates=> \COPY rebate_states FROM C:\Users\Dorothy\Desktop\states.csv
Now, we change the id field back into a true primary key by setting its default as nextval sequence.  First, let's look at the syntax nicely formatted.
DROP SEQUENCE rebate_states_id_seq;

CREATE SEQUENCE rebate_states_id_seq
MAXVALUE 2147483647;

ALTER TABLE rebate_states
ALTER COLUMN id TYPE INTEGER;

ALTER TABLE rebate_states
ALTER COLUMN id SET DEFAULT nextval('public.rebate_states_id_seq'::text);
Now, just squish all that into one line so you can execute from the psql shell:
jaytarle_rebates=> DROP SEQUENCE rebate_states_id_seq; CREATE SEQUENCE rebate_st
ates_id_seq MAXVALUE 2147483647; ALTER TABLE rebate_states ALTER COLUMN id TYPE
INTEGER; ALTER TABLE rebate_states ALTER COLUMN id SET DEFAULT nextval('public.r
ebate_states_id_seq'::text);
From which we get the following confirmation:
DROP SEQUENCE
CREATE SEQUENCE
ALTER TABLE
ALTER TABLE
And, voila... our records are in the database and re-synced up with our django project.

Did I copy in a whole database? Well, no. I need to do this process for every table. I know, it sucks.  Perhaps there is a better way...

## Monday, April 28, 2014

### Coping tables in and out using psql

I have some tables that exist in another database (which is also remote).   Instead of doing cross-database queries, which a lot of RDBMS do not allow with remote queries, let's just copy the tables out into a CSV and then copy them back in.

I know there are a million ways to do this, but I'm going to use psql because its pretty easy and fast.  If you have a better way, please feel free to clue me in.

First, boot up psql (I just type psql into windows run), then we log in using our credentials

Server [localhost]: example.com
Database [postgres]: example_database_name
Port [5432]:
psql (9.2.3, server 8.4.20)
WARNING: psql version 9.2, server version 8.4.
Some psql features might not work.
WARNING: Console code page (437) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type "help" for help.

example_database_name=>
Hopefully, you know where your destination is and what tables exist inside that database.  If you need a quick refresher (like I always do), type \d for quick listing of tables

example_database_name=> \d

In this example, I'm going to copy out the zipcodes table and move it to another database. I do this by running the \COPY command and plunking the output into a .csv file in destination of my choice.

example_database_name=> \COPY (SELECT * FROM zipcodes) TO 'C:\local\path\to\destination\zipcodes.csv' WITH CSV HEADER;

Now lets exit the example database

example_database_name=> \q

OK--now let's connect to a different database using psql. Fire up psql again, with your other database's credentials (use the same methodology as the first code block above)

example_database_name2=> \COPY rebate_zipcodes FROM C:\Users\Dorothy\Desktop\zipcodes.csv WITH CSV HEADER;

Obviously, in both the \COPY statements you can omit the WITH CSV HEADER statement.  However, I like to have it so I know what the hell the table looks like in case I need to investigate the .csv, which I often do.

Great! That actually took about a minute. But regardless, this is not such a bad way to handle cross database table copying.

http://www.postgresql.org/docs/9.2/static/sql-copy.html

## Sunday, April 27, 2014

### Connect to a database

If you are connecting to a remote database, create the database with your remote host first.  Go to the cpanel or whatever you have, and add a new PostgreSQL database (In this example, I will be using PostgreSQL).  Every hosting account is probably different, so I'm not going to bother with the specifics.

I like to also use a database management tool locally. So, at this stage, I like to add my IP address so that I can connect locally to my remote database server. (This is fairly standard for authentication with most hosting sites, I believe). Generally, you will need a specific username, password, and database at this point for authentication purposes.

Now, locally I open up my database management tool (I'm using SQL Manager Lite for PostgreSQL) and register my newly created database.

Then, simply connect to the newly registered database.  Success!

Great-- but that has nothing to do with our django app. Let's configure the database in the settings.py, which exists in the project directory file. Our project is called "rebates". Inside the "rebates" directory, there is a project specific directory (automatically created by django when you start the project),which also has the same name of "rebates".   Inside this directory, you will find your settings.py file. I realize this is a little confusing.

The tree looks like this, where:
rebates is the project and
rebate is the app
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> tree /f
Folder PATH listing for volume OS
Volume serial number is BA99-C83E
C:.
¦   manage.py
¦
+---rebate
¦       models.py
¦       tests.py
¦       views.py
¦       __init__.py
¦
+---rebates
settings.py
settings.pyc
urls.py
wsgi.py
__init__.py
__init__.pyc

I like to use notepad++ as my default editor.  I'm going to open my settings.py file with notepad++

(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> start notepad++ rebates\settings.py

Inside settings.py, there is a default area for DATABASE which looks like this.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': 'mydatabase',
}
}
You can find out more about these configurations for each type of database inside the django documentation:
https://docs.djangoproject.com/en/1.6/ref/settings/#databases
Let's configure our database with the remote PostgreSQL database
DATABASES = {
'default': {
'ENGINE':'django.db.backends.postgresql_psycopg2',
'NAME': 'jaytarle_rebates',
'HOST': 'jaytarlecki.com',
'PORT': '5432',
}
}

That should do the trick for the database, but before we close the settings.py file let's add rebate to our list of INSTALLED_APPS
INSTALLED_APPS = (
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'rebate',
)

Now that we have linked up the rebate app to our django project, let's go into the rebate directory and launch the models.py file:
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> start notepad++ rebate\models.py

Time to create our database model! So, I already have an idea of how I want to structure my first test table, which is called zipcodes.  As you may be able to guess, its a table of zipcodes and other associated info.  I already have the data in another database.  For the sake of my own sanity, I'm just going to copy the CREATE statement here, so I can translate it into the django database model language, as needed.

In PostgreSQL, our zipcodes table:
CREATE TABLE zipcodes (
id SERIAL,
zipcode VARCHAR(5) NOT NULL,
state CHAR(2) NOT NULL,
city VARCHAR(28) NOT NULL,
type CHAR(1) NOT NULL,
countyfips CHAR(5) NOT NULL,
latitude DOUBLE PRECISION NOT NULL,
longitude DOUBLE PRECISION NOT NULL,
areacode CHAR(3) NOT NULL,
financecode CHAR(6) NOT NULL,
lastline CHAR(4),
fac CHAR(1),
msa CHAR(4),
pmsa CHAR(4),
modifieduser VARCHAR(128) NOT NULL,
modifieddate TIMESTAMP WITHOUT TIME ZONE NOT NULL
)
So, we can skip the id field, as django's backend will handle that for us.

from django.db import models

class Zipcodes(models.Model):
zipcode = models.CharField(max_length=5)
state = models.CharField(max_length=2)
city = models.CharField(max_length=100)
type = models.CharField(max_length=1)
countyfips = models.CharField(max_length=5)
latitude = models.FloatField()
longitude = models.FloatField()
areacode = models.CharField(max_length=3)
financecode = models.CharField(max_length=6)
lastline = models.CharField(max_length=4, null=True)
fac = models.CharField(max_length=1, null=True)
msa = models.CharField(max_length=4, null=True)
pmsa = models.CharField(max_length=4, null=True)
modifieddate = models.DateField(auto_now=True)

def __unicode__(self):
return self.zipcode

Inside a virtual environment, you will need to get the windows binary that allows proper syncing with a postgre database (don't ask, just do it)
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> easy_install http://www.stickpeople.com/projects/python/win-psycopg/psycopg2-2.4.win32-pyx.x-pg9.0.3-release.exe

Now, create the database by syncing it in powershell
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay\rebates> python manage.py syncdb

Create a superuser if prompted, and viola! You have just synced the database. Django will create a whole slew of tables which are very useful, but I won't explain in this post.

What we care about right now is the zipcodes table. Refresh your local rebates database inside your SQL Manager and lets see how we did.  Let's script out a CREATE TABLE statement and view how our model in Django created the table in PostgreSQL

CREATE TABLE public.rebate_zipcodes (
id SERIAL,
zipcode VARCHAR(5) NOT NULL,
state VARCHAR(2) NOT NULL,
city VARCHAR(100) NOT NULL,
type VARCHAR(1) NOT NULL,
countyfips VARCHAR(5) NOT NULL,
latitude DOUBLE PRECISION NOT NULL,
longitude DOUBLE PRECISION NOT NULL,
areacode VARCHAR(3) NOT NULL,
financecode VARCHAR(6) NOT NULL,
lastline VARCHAR(4),
fac VARCHAR(1),
msa VARCHAR(4),
pmsa VARCHAR(4),
modifieduser VARCHAR(128) NOT NULL,
modifieddate DATE NOT NULL,
CONSTRAINT rebate_zipcodes_pkey PRIMARY KEY(id)
)
WITH (oids = false);
Not too shabby... Looks just like the table we used as our template. Its time to start building out that model!

### Setting up a django project

In powershell, navigate to directory where your apps reside
PS C:\Python27\home\jtarlecki\djcode> virtualenv .\django-jay

Then change directory and activate the virtualenv:
PS C:\Python27\home\jtarlecki\djcode> cd django-jay
PS C:\Python27\home\jtarlecki\djcode\django-jay> Scripts\activate.ps1

Afterwords, your cursor should look like this:
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay>

Now, install an instance of django inside this virtual environment:
(django-jay) PS C:\Python27\home\jtarlecki\djcode\django-jay> pip install django