Feed on
Posts
Comments

In this tutorial I’ll show you how to create a simple, yet powerful, tagging system using SQLAlchemy with TurboGears. As the concept of tags, and social tagging in particular, have become very popular, clients now demand “tagging-enabled” applications. So, here’s a simple way to get you started.

Our application will associate sites with tags (many to many relationship), like delicious does, but in a much simplified manner. For instance, delicious keeps tracks of which user gave which tag to which URL. We will only associate sites with tags. But it will be very easy to add this functionality later.

We’ll quickstart a new project (the -s argument tells tg-admin that the project will use SQLAlchemy and not SQLObject)

$ tg-admin quickstart -s tags
Enter package name [tags]:
Do you need Identity (usernames/passwords) in this project? [no] yes

Defining The Model

We are going to have a table for the sites, a table for the tags and a table that associates sites with tags (many-to-many). Here’s the code which defines the tables (which goes in model.py):

sites_table = Table(’sites’, metadata,
        Column(’site_id’, Integer, primary_key=True),
        Column(‘title’, Unicode(256)),
        Column(‘url’, Unicode(1024)),
        )

tags_table = Table(‘tags’, metadata,
        Column(‘tag_id’, Integer, primary_key=True),
        Column(‘name’, Unicode(32), index=‘tag_idx’))

sites_tags_table = Table(’sites_tags’, metadata,
        Column(’site_id’, Integer,
              ForeignKey(’sites.site_id’),
              primary_key=True),
        Column("tag_id", Integer,
              ForeignKey(‘tags.tag_id’),
              primary_key=True))

We will now create the Python classes that correspond to these tables:

class Tag(object):
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return self.name
    def link(self):
        return "/tags/"+self.name

class Site(object):
    def __init__(self, url, title):
        self.url, self.title = url, title

Note the link() method in the Tag class. You might wonder what it does there. It just a little habbit that I wanted to share with you. I’ve found myself many times hard-coding URLs inside my templates. Then, if you want to make a tag linkable in many different places in your app, you have to hard-code the link every time. In this way, you can just pass the tag object to your template and do something like:

<a href="${tag.link()}">${tag.name}</a>

Ok, now we continue with mapping the classes to the tables:

mapper(Tag, tags_table)
mapper(Site, sites_table, properties = {
    ‘tags’: relation(Tag, secondary=sites_tags_table, lazy=False),
    })

Great. Now we can construct the database and start populating it:

$ tg-admin sql create
$ tg-admin shell

>>> g = Site(‘http://www.google.com’, ‘Search engine’)
>>> g.tags
[]
>>> g.tags = [Tag(’search’), Tag(‘google’)]
>>> session.save(g)
>>> session.flush()
(here SQLAlchemy echos the SQL statements it executes)

Handling tags

So we got the model right. The next step is to allow the users to provide tags for the site. The easiest way (for you and your users) is to ask them to enter the tags in a space-separated list. Suppose you are given this kind of space-seperated string of tags from a user, then you have to:

  • convert all tags to lower case, in order to avoid case senstivity issues
  • check if the string contains the same tag twice
  • find which tags are already in the database and which are new
  • recover from some nonsense that users might throw at you

and then get a list of Tag objects that you can assign to a site. So here’s a function that does just that:

def get_tag_list(tags):
    """Get a string of space sperated tag,
    and returns a list of tag objects"
""
    result = []
    tags = tags.replace(‘;’,‘ ‘).replace(‘,’,‘ ‘)

    tags = [tag.lower() for tag in tags.split()]
    tags = set(tags)        # no duplicates!
    if in tags:
        tags.remove()

    for tag in tags:
        tag = tag.lower()
        tagobj = session.query(Tag).selectfirst_by(name=tag)
        if tagobj is None:
            tagobj = Tag(name=tag)
        result.append(tagobj)
    return result

So you can now easily do something like:

>>> f = Site(‘http://www.flickr.com’, ‘Flickr!’)
>>> f.tags = get_tag_list(‘photo sharing photograpy’)
>>> f.tags
[photo, sharing, photograpy]
>>> f.tags[0].link()
‘/tags/photo’

Tag Search

It is straightforward to just list a site together with its tags:

<h3 class="site-title"><a href="${site.url}" target="_blank">${site.title}</a></h3>
        <p class="site-tags">Tags:
        <a py:for="tag in site.tags[:5]" href="${tag.link()}" class="tag">${tag.name}</a>
        </p>

Search is a bit more tricky. It took me few attempts until I got the search queries right. Here’s how to fetch all sites that are tagged by ‘google’:

    q = session.query(Site)
    sites = q.select((Tag.c.name==‘google’) &amp; q.join_to(‘tags’))

the magic is mostly inside the join_to method – it stands for the SQL statements that makes sure that the Tag clause is associated to the sites. Without it, the query runs over the entire cartesian product of Sites x Tags.

You can make the query simpler (for MySQL; not you), if you fetch the tag_id of ‘google’ first. Then, the query uses only 2 of the 3 tables:

    tagobj = session.query(Tag).get_by(name=‘google’)
    if not tagobj:
        raise cherrypy.InternalRedirect(‘/notfound’)
    sites = session.query(Site).select((sites_tags_table.c.tag_id == tagobj.tag_id) &amp;
                 (sites_tags_table.c.site_id == Site.c.site_id))

To search for google|photo:

q = session.query(Site)
sites = q.select(
    Tag.c.name.in_(‘google’, ‘photo’) &amp;
        q.join_to(‘tags’))

To search for sharing+photos:

q = session.query(Site)
sites = q.select(
    Tag.c.name.in_(’sharing’,‘photos’) &amp;
        q.join_to(‘tags’),
    group_by=[Site.c.site_id],
    having=(func.count(Site.c.site_id)==2))

The idea is that sites that are tagged both with ’sharing’ and ‘photos’ will appear twice in the select, then after grouping by site_id and getting all which appear twice, we get the desired result.

There are many more things that can be done from this point, like: associating with the tag-site relationship which user added the tag, rendering a tag cloud and so on. Feel free to leave comments!

RSS feed | Trackback URI

8 Comments »

Comment by Timur Izhbulatov
2007-01-16 09:04:56

Thanks! Very helpful.

 
Comment by damjan
2007-01-23 05:35:38

You say “To search for google+photo” but then you do a search for ’sharing’ and ‘photos’…

 
Comment by thesamet
2007-01-23 06:10:03

Thanks Damjan. Post updated.

 
Comment by damjan
2007-01-25 19:07:38

Also, you use `session.query(Site)’ often but also you use `q.’ which is assumed to be from the first `q = session.query(Site)’ … it’s a bit confusing. It’s possible to just use `q’ always right?

 
Comment by damjan
2007-01-26 17:19:00

Can’t the page.tags be a list of normal strings? Because currently it’s a list of Tag instances.

For ex. if I wish for ‘photos’ in page.tags to work I had to implement this method in the Tag class:

def __cmp__(self, other):
if isinstance(other, Tag):
return cmp(self.name, other.name)
elif isinstance(other, basestring):
return cmp(self.name, other)
else:
return cmp(self, other)

Comment by thesamet
2007-01-28 06:21:39

Damjan, you can add to the Site class a property that will give you the tags as a list of strings:

class Site(object):
    def __init__(self, url, title):
        self.url, self.title = url, title
    @property
    def tag_strings(self):
        return [tag.name for tag in self.tags]
 
 
Comment by Steve Thomas Subscribed to comments via email
2007-03-09 21:16:59

One comment and a question:
Comment: The quotes you use in the examples seem to be forward quote and backward quote, which when I paste into TextMate on my Macintosh give a syntax error and I need to replace the quotes.

Example:
I change: ‘google’
to: ‘google’

Question:
When I do the:
$ tg-admin sql create

I get an error:
File “build/bdist.macosx-10.3-fat/egg/sqlalchemy/schema.py”, line 149, in __call__
File “build/bdist.macosx-10.3-fat/egg/sqlalchemy/schema.py”, line 31, in _init_items
File “build/bdist.macosx-10.3-fat/egg/sqlalchemy/schema.py”, line 449, in _set_parent
sqlalchemy.exceptions.ArgumentError: The ‘index’ keyword argument on Column is boolean only. To create indexes with a specific name, append an explicit Index object to the Table’s list of elements.

it is caused by the 3rd line below:
tags_table = Table(“tags”, metadata,
Column(‘tag_id’, Integer, primary_key=True),
Column(‘name’, Unicode(32), index=’tag_idx’))

I am running with MySQL database.

I can use the tutorial without specifying the index or index name, but it would be useful.

Thanks.

 
Trackback by sites poker en ligne
2008-04-02 07:55:38

poker gratuites pas en ligne…

 
Name (required)
E-mail (required - never shown publicly)
URI
How much is two plus four? (required, to check if you are human)
Subscribe to comments via email
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.