Friday, July 16, 2010

PostgreSQL: Performance Tuning

"Need is the mother of discovery" -Harpreet Singh
I wrote this line just few minutes before writing this blog, as my need of optimizing PostgreSQL's performance lead me to search/discover for some cool facts and features of postgres and tools related to it.

For any postgres user thinking about 100 or more concurrent users is like a nightmare. I will admit that some time back I was also a bit scared on thinking about 100 concurrent users with postgres, but with the end of my search I am happy that I found a usable way to achieve that.
"Knowledge increases by sharing"
So I thought I will pass it to everyone who is searching for it on the internet.
The need that triggered me to search for this was to recommend Hardware as well as Software configuration to support 100-200 concurrent users on Openbravo ERP and postgres/Oracle as the database.
For me as I am a postgres supporter I believed that postgres will be able to handle it. And yippee I was right.
Coming back to the main point:
Postgres doesn't support too many users (concurrent) by default, it comes with very solid configuration aimed at everyone's best guess as to how an "average" database on "average" hardware should be.
Postgres has some default configuration options to fine tune it, like:
- max_connections
- shared_buffers
- effective_cache_size
- etc etc.
But these are not enough for postgres to support 100+ (concurrent) users.
In a reply of my query to postgres performance mailing list, I came to know about connection pooling.
One and the only con that I saw in this is that it is external, I mean we have to configure an external tool to do connection pooling.
There are tools like pgpool to make the job easy for us (pgpool is a middleware that works between PostgreSQL servers and a PostgreSQL database client).
Connection pooling tools provide us features like:
- Connection Pooling: It reduces connection overhead, and improves system's overall throughput.
- Replication: Using the replication function enables creating a real-time backup on 2 or more physical disks.
- Load Balance: As the name suggests it distributes the queries on two or more replicated servers.
- Limiting Exceeding Connections: With the use of this extra connections are queued instead of returning an error immediately.
- Parallel Query: Using the parallel query function, data can be divided among the multiple (replicated) servers.
Configuring these properly can fine tune postgres' performance to handle 100-200 concurrent users.

Happy *postgresing*

To read more about performance tuning in postgreSQL read this.
For more on pgpool click here.

No comments:

Post a Comment