Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

PHP Forum


You are currently viewing our PHP Forum as a guest. Please register to participate.
Login



Freelance Jobs

Reply
Export data from other website
Old 01-25-2010, 08:55 PM Export data from other website
Super Talker

Posts: 128
Name: Jose daSilva
Trades: 0
Hi

Just finished my website.

I have a client that has a lot of products with descriptions, photos and other related information in his website, that want to export automatically to my database to be viewed on my website without uploding one by one by one, which would take him monthes to do it.

There is any script that I can add in my website in order to allow him to do the upload automatically from his website to my?

Thank you
josil is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 01-26-2010, 04:08 AM Re: Export data from other website
tripy's Avatar
Do not try this at home!

Posts: 3,621
Name: Thierry
Location: I'm the uber Spaminator !
Trades: 0
If you can get your hand on a db dump or an xml file, the parsing can be easily done.
Otherwise, no, you will need to write a program that will parse each and every pages of the target site to extract the informations from the HTML.

It is a slow, tedious and imperfect job.
And often the TOS of the site you intend to take datas from have a clause denying automated access, ie: crawler to fetch their datas.
If they see a host fetching many pages in a quick way, you might get blocked to access the site.
__________________
Only a biker knows why a dog sticks his head out the window.
tripy is offline
Reply With Quote
View Public Profile Visit tripy's homepage!
 
Old 01-26-2010, 11:15 AM Re: Export data from other website
Super Talker

Posts: 128
Name: Jose daSilva
Trades: 0
Quote:
Originally Posted by tripy View Post
If you can get your hand on a db dump or an xml file, the parsing can be easily done.
Otherwise, no, you will need to write a program that will parse each and every pages of the target site to extract the informations from the HTML.

It is a slow, tedious and imperfect job.
And often the TOS of the site you intend to take datas from have a clause denying automated access, ie: crawler to fetch their datas.
If they see a host fetching many pages in a quick way, you might get blocked to access the site.

thanks for you reply.

The intention is to export WITH PERMISSION all data and photos from a person that want to export to my website. Nothing ilegal. I have to give him a script to to that, or must I have te script in order to give him permission to do so?
Do you know where I can looking for that script? Of course that I must do same alterations on it.
josil is offline
Reply With Quote
View Public Profile
 
Old 01-26-2010, 02:50 PM Re: Export data from other website
tripy's Avatar
Do not try this at home!

Posts: 3,621
Name: Thierry
Location: I'm the uber Spaminator !
Trades: 0
I know of nothing ready made, no.
But I'll attach a python script I use to replicate a gallery of pictures, and you will get the idea, I think.

If you can have an export, the simpliest way to do that, is to ask for a database dump (or export, or backup, it depends of the database terminology) and import it in an instance running on your servers.

From there, the migration to your database will be way less difficult.

So, for example, and as we are in the PHP forum, I consider your DB is mysql.
For simplification, I assume their DB is mysql too.

1) They create a dump of their database.
2) You get that dump, and restore it on your server
3) That's all !

If they use another database, then you will need to install an instance of that database.
Then, you (re)create the tables in your db, and write a program that read data from the source system, and write the rows to your target system.

something like (pseudo code !)
PHP Code:
$dbSrc=something();
$dbTrg=somethingElse();

$qSrc="select field1, field2, field3 from table1";
$rSrc=otherSystem_query($qSrc);
while(
$oSrc=otherSystem_fetch_array($rSrc)){
  
$qTrg="insert into table1 (field1, field2, field3) values ({$oSrc->field1},{$oSrc->field2},{$oSrc->field3})";
  
mysql_query($qTrg);

Once you have the original datas in your db, the conversion to your structure will need a bit of work, but be easier.
For instance, imagine they have a user table that have everything in 1 table, but you use 2 tables (one for the mandatory infos, and another for the optional infos).
You would transfer it that way:
Code:
/*
source system database name: their
your database name: we
*/

/*
To migrate the user, we first insert what we need in the base table.
the we.userBase.id and we.userDetail.id fields are not auto_increment, 
which allows us to keep the relation with the original datas
*/

insert into we.usersBase (id, name, username, email)
select id, name, username, email
from their.users

/*
now, we do the same for the other elements
*/
insert into we.userDetail (birthDate, interest, dogsName)
select id, birthDate, interest, dogsName
from their.users
And finally, this is a little script I used long time ago, to fetch a batch of pages of a gallery.
It's in python, but the logical evolution should be clear.
There is a class "parser", that open each page one after the other, and looks for "<img>" tags in it.
For each tag it founds that contains "pics/" in the path, it gives that url to an "downloader" object that will fetch it and store locally.

Code:
from __future__ import division
import BeautifulSoup, os, sys, random
import threading, time, urllib2

class Downloader(threading.Thread):
  def __init__(self, parent, url, dest, origin):
    threading.Thread.__init__(self)
    self.url=url
    self.origin=origin
    self.dest=dest
    self.parent=parent
    if not os.path.exists(dest):
      os.mkdir(dest)
  
  def run(self):
    ret=False
    cpt=0
    self.parent.running+=1
    file=os.path.basename(self.url)
    part=file[0].lower()
    locFile='%s/%s/%s'%(self.dest, part, file)
    worked=True
    if os.path.exists(locFile) and os.path.isfile(locFile):
      ret=True
      worked=False
      log('Downloader :: File %s exists in %s'%(file, part))
    while ret==False and cpt<10:
      try:
        self.parent.lock.acquire()
        log('Downloader :: start fetch (%d) >%s<'%(cpt,self.url))
        self.parent.lock.release()
        con=urllib2.Request(self.url, headers={'User-Agent':'Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11'
                                                 ,'Accept-Charset':'utf-8;q=0.7,*'},origin_req_host=self.origin
                            )
        hLocal=open(locFile,'w')
        hRemote=urllib2.urlopen(con)
        hLocal.write(hRemote.read())
        hLocal.close()
        hRemote.close()
        ret=True
      except urllib2.URLError, msg:
        log('Downloader :: ERROR::%s'%msg)
        ret=False
      except Exception, msg:
        log('Downloader :: UNKNOWN ERROR::%s'%msg)
      finally:
        cpt+=1
    self.parent.lock.acquire()
    if worked==True:
      log('Downloader :: Finished url %s'%self.url)
    self.parent.lock.release()
    self.parent.running-=1
    return ret
  
class Parser():
  def __init__(self, url):
    self.url=str(url)
    self.base='http://someplace.com/'
    self.dest=os.path.abspath(os.path.join(os.path.dirname(__file__),'someplace_img'))
    self.page=None
    self.max=6581    #Total nbr of pages to check for new files
    self.pattern=str('@@')
    self.parsed=[]
    self.running=0
    self.maxThreads=10
    self.lock=threading.Lock()
    self.Parse()
    
  def Parse(self):
    while len(self.parsed)<self.max:
      while self.page==None or self.page in self.parsed:
        self.page=random.randint(0,self.max)
      log('randomly chosen page %d'%(int(self.page)))
      url=self.url
      url=url.replace(self.pattern,str(self.page))
      con=urllib2.Request(url, headers={'User-Agent':'Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11'
                                             ,'Accept-Charset':'utf-8;q=0.7,*'}
                        )
      perc=float(1*(len(self.parsed)/self.max))*100
      log('completed %f%% (%d/%d)'%(perc,len(self.parsed),self.max))
      log('Parsing page at %s'%(url))
      gotHandle=False
      while gotHandle==False:
        try:
          handle=urllib2.urlopen(con)
          gotHandle=True
        except urllib2.URLError, msg:
          time.sleep(1)
          gotHandle=False
      source=''
      for lines in handle:
        source+=lines
      try:
        soup = BeautifulSoup.BeautifulSoup(source,fromEncoding="ascii")
        for img in soup.findAll('img'):
          if img['src'].find('pics/')>-1:
            imgUrl='%s%s'%(self.base,img['src'])
            dwn=Downloader(self, imgUrl,self.dest, url)
            while self.running>=self.maxThreads:
              #log('Too many threads: %d. Sleeping'%self.running)
              time.sleep(1)
            dwn.start()
      except UnicodeDecodeError:
        log('Beautifulsoup could not  parse %s because of invalid utf-8 chars in the source'%(url))
      while self.running>0:
        time.sleep(3)
      self.parsed.append(self.page)

def log(string):
  print string
  
if __name__=='__main__':
  url='http://someplace.com/index.php?pageno=@@&sort=ever'
  parser=Parser(url)
__________________
Only a biker knows why a dog sticks his head out the window.
tripy is offline
Reply With Quote
View Public Profile Visit tripy's homepage!
 
Old 01-28-2010, 03:33 AM Re: Export data from other website
Super Talker

Posts: 128
Name: Jose daSilva
Trades: 0
Quote:
Originally Posted by tripy View Post
I know of nothing ready made, no.
But I'll attach a python script I use to replicate a gallery of pictures, and you will get the idea, I think.

Thanks a lot.

I will try doing something like you mention

Last edited by chrishirst; 01-28-2010 at 06:30 AM..
josil is offline
Reply With Quote
View Public Profile
 
Reply     « Reply to Export data from other website
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.43596 seconds with 12 queries