Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

The Database Forum


You are currently viewing our The Database Forum as a guest. Please register to participate.
Login



Reply
Massive amount of data
Old 11-29-2007, 03:42 AM Massive amount of data
Average Talker

Posts: 15
Name: Edwin Boiten
Trades: 0
I am playing with some ideas and wonder how you would do it.

I have about 2,500,000 products from various datafeeds. I want to include it into 1 site but am thinking about the database.
I haven't dealt with these amounts of data before and wonder if it's wise to put it all into 1 database having performance in mind.

I can split the data logicly based upon categories and build seperate databases. but that would mean if i do a search i would have to connect to mulipal databases.

What would you do and why.
edynas is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 11-29-2007, 06:11 PM Re: Massive amount of data
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
Building a connection to a database is a very expensive operation. I think I'd use one large database, but partition the rows into multiple tables by category, or something else along those lines.

But I would also consider the amount of programming effort it will take to query across all these tables, and how that might hurt the amount of functionality you're able to provide.

I'd also use SQL Server or Oracle instead of MySQL for high data volumes.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
Old 11-29-2007, 09:08 PM Re: Massive amount of data
chrishirst's Avatar
Missing! presumed drunk.

Posts: 41,517
Name: Chris Hirst
Location: Blackpool. UK
Trades: 0
only 2 1/2 million rows

not a big deal for a single database on any db server really.
__________________
Chris. ->> Links are advertising NOT optimising!! <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- I SEO the only industry where all the cowboys are Indians?
chrishirst is offline
Reply With Quote
View Public Profile Visit chrishirst's homepage!
 
Old 11-30-2007, 01:32 AM Re: Massive amount of data
joder's Avatar
Flipotron

Posts: 6,442
Name: James
Location: In the ocean.
Trades: 0
Any RDBMS can handle it. It's all according to how it is managed.

I would break it up into multiple tables. You will want to read up on Normalization to create the tables and subjects like Joins to get the information out.

If you go with MySQL, use the InnoDB storage engine. NOT MyISAM.
joder is offline
Reply With Quote
View Public Profile
 
Old 11-30-2007, 01:56 PM Re: Massive amount of data
Average Talker

Posts: 15
Name: Edwin Boiten
Trades: 0
Thanks all
Get the idea it's not such a big problem.
Will read into InnoDB storage engine and use one database but split the categories over the tables

Took a dedicated box with 2GB ram that i mainly will use for the site using the database. I'll translate the site in Euro languages so this time next year there will be 10 more sites having about the same amount of data.
Should i look into allocating memory to the database or should i use default settings.

Cheers,
Edwin
edynas is offline
Reply With Quote
View Public Profile
 
Old 11-30-2007, 03:24 PM Re: Massive amount of data
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
Database = mainly disc based. Any database management system will use RAM for caching and whatnot, but they have a design goal of being consistent. Like if there's an earthquake and the building looses power, everything in RAM is obviously lost. So a RDBMS uses RAM to speed things up but it doesn't rely on it.

You'd be much much much better off buying faster RAID 5 input output subsystems than RAM for database use. It's not that RAM won't help (of course it will, and so will more and faster CPUs), but the biggest bottle neck is disc based IO.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
Old 11-30-2007, 04:06 PM Re: Massive amount of data
Ultra Talker

Posts: 310
Trades: 0
you don't need to split them across many table, just use proper indices and you'll be fine, 2.5 million rows are not much.
dman_2007 is offline
Reply With Quote
View Public Profile
 
Old 12-02-2007, 12:23 AM Re: Massive amount of data
ForrestCroce's Avatar
Half Man, Half Amazing

Posts: 3,023
Name: Forrest Croce
Location: Seattle, WA
Trades: 0
Like others have said, 2.5 m rows is nothing to worry about. I've worked with a quarter billion rows, broken vertically instead of horizontally ( ie so you pull them back together with a join instead of a union ) on a mid range Dell server.

The main question you need to find an answer to is how will the data be used? Divide it up in a way that will let you get at what you need, but only what you need.

Don't have too few indexes, but don't have too many, either. They can make it thousands of times faster to find data, and take a great deal of load off a server by letting it read in the specific records it needs, instead of doing a table scan, which will dramatically slow any other transactions that are running at the time. But nothing is free: indexes make it slower to add, edit, or delete data. When you change anything, the index needs to be kept up to date so it can be trusted to find the data.

A good rule of thumb is that any column that's used in where or join clauses ought to be indexed, unless it's used, for example as select * from table where column1 like '%something%'. They're best at exact match lookups, good at range lookups, and in many db systems they're good at where column1 like 'something%' ... but if you need to find records based on text in the middle of a column, you're forcing an index scan instead of an index seek ... in some situations the database will choose to ignore the index and go straight to the tables.

Also use compound indexes when they're appropriate. Older databases could only use one index per query; that's not the case any more, but some are better than others at using five indexes to speed up a particular query. If you're always running queries that find or join based on a group of columns, say doctype and docnumber, make one index with both columns. In that example you'd put them in the opposite order; you can exploit a compound index if you only want its first column. Finally, if all the columns you want are covered by a compound index, the database can run your query without ever going to the base table. But again there's no free lunch, and these indexes are less efficient in other ways, like disc space and maintenance time when you change the underlying data.
__________________

Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
ForrestCroce is offline
Reply With Quote
View Public Profile Visit ForrestCroce's homepage!
 
Reply     « Reply to Massive amount of data
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.27083 seconds with 12 queries