Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

PHP Forum


You are currently viewing our PHP Forum as a guest. Please register to participate.
Login



Freelance Jobs

Reply
cURL - pause transfer
Old 11-17-2009, 08:20 AM cURL - pause transfer
Junior Talker

Posts: 4
Trades: 0
Hi,

I have a question about downloading web page using cURL.
I am trying to parse a web page where I found javascript

<script language=javascript>
function updateprogress(nPercent){document.getElementById(' imgprogress').width=nPercent*3;document.getElement ById('txtprogress').innerHTML=nPercent+" %";if( nPercent < 100 )setTimeout( 'updateprogress('+(nPercent+1)+')', 75 );}</script>

<script language=javascript>updateprogress(1);</script>

When browser reads the webpage it process javascript updateprogress.
This causes the transfer to be paused for some time.
After that script there is a table which contains data that I want to get.

When the webpafe is displayed in web browser everything is fine.
But when I read it using curl (no delay in transfer) table is empty.

It looks that the server is still preparing data to display and it is not ready.

I have a question - is there a way to suspend curl transfer ?
I already tries write callbacks - and this only pause php script not the transfer. It looks that curl is getting the data in the background and without wait in the middle the data are not ready yet.

I will appriciate any sugestions.

Piotr
Piotr is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 11-17-2009, 03:56 PM Re: cURL - pause transfer
tripy's Avatar
Do not try this at home!

Posts: 3,621
Name: Thierry
Location: I'm the uber Spaminator !
Trades: 0
Quote:
When browser reads the webpage it process javascript updateprogress.
This causes the transfer to be paused for some time.
Yep, but Curl won't do that...
Curl is just sending requests and reading answers, it don't integrate a javascript engine, so javascript (which is a client side language, as opposed to server side languages) will not be interpreted.

Your problem is probably that the table is fetched or populated via javascript.
I have only 1 hint for you: ue firefox, install firebug, activate the network tab, and look at the http requests that are done.
If there is an ajax call done to fetch the datas, you will see it.

Otherwise, you will have to find your way through the javascript to find out how the table is altered with this code.

But, reading that there is an element named "txtprogress", and that javascript wait for it to be 100%, there probably is an ajax call be done in the back....
__________________
Only a biker knows why a dog sticks his head out the window.
tripy is offline
Reply With Quote
View Public Profile Visit tripy's homepage!
 
Old 11-18-2009, 03:14 AM Re: cURL - pause transfer
Junior Talker

Posts: 4
Trades: 0
Thanks.

I will try that. For now I could not see any javascript code on the webpage that will load something

This is a page that I want to parse
http://www.gettextbooks.com/search/?isbn=8374954515

and this is my php script that just reads the page and prints it.
http://jbl.gofreeserve.com/justcopy.php?ISBN=8374954515

PHP code
<? php
$address = "http://www.gettextbooks.com/search/?isbn=";
$isbn = $_GET['ISBN'];
$url = $address.$isbn ;
echo $url;
echo "zaczynamy<br>";
$ch = curl_init();
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
curl_setopt($ch, CURLOPT_TIMEOUT, 120);

$data = curl_exec($ch);
echo $data;
curl_close($ch);
?>

I will try to see all the requests sent to server as you suggested.
Thanks
Piotr is offline
Reply With Quote
View Public Profile
 
Old 11-18-2009, 03:53 AM Re: cURL - pause transfer
tripy's Avatar
Do not try this at home!

Posts: 3,621
Name: Thierry
Location: I'm the uber Spaminator !
Trades: 0
I copy/pasted your code in a php file, and ran it from command line, and I can see the table correctly populated in the returned string:
HTML Code:
....
<table border=0 width=100%><tr valign=top><td width=100% class=regularText><font size=-1><br/><i></i></font><font size="-2"><br/>ISBN-10: 83-7495-451-5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <font color=#666666>(8374954515)</font></font><font size="-2"><br/>ISBN-13: 978-83-7495-451-8 <font color=#666666>(9788374954518)</font></font></td><td align=right><img src="http://images.amazon.com/images/P/.01.MZZZZZZZ.jpg"/><font size=-2><br/><a href="/bookbag/add/9788374954518"><font color=#7777CC>Add&nbsp;to&nbsp;Wish&nbsp;List</font></a>&nbsp;<br/><a href="/pricealert/9788374954518"><font color=#7777CC>Set&nbsp;Price&nbsp;Alert</font></a>&nbsp;</font></td></tr></table>
....
After that, you'll have to parse the result string to extract what you want, but it's there.

I used a console PHP call, to prevent what you might had if you outputed the string in your browser.
As the browser reads the html, the <script> tag into the returned string is interpreted by your browser, and result in the delay.
But it's just a side effect of echoing the string in your browser, not the transfer that stalls.
__________________
Only a biker knows why a dog sticks his head out the window.

Last edited by tripy; 11-18-2009 at 03:56 AM..
tripy is offline
Reply With Quote
View Public Profile Visit tripy's homepage!
 
Old 11-18-2009, 07:24 AM Re: cURL - pause transfer
Junior Talker

Posts: 4
Trades: 0
Not everything - I need to get <table border=0 cellpadding=0 cellspacing=0 width=500 class=StoreTable>

You will see the difference if you will also open original webpage in the browser.
StoreTable is empty when I read it using cURL and it is filled with data when opened by browser


It is strange but for some search pages it is OK. e.g
http://jbl.gofreeserve.com/justcopy....=2-246-65641-9
Piotr is offline
Reply With Quote
View Public Profile
 
Old 11-18-2009, 07:40 AM Re: cURL - pause transfer
tripy's Avatar
Do not try this at home!

Posts: 3,621
Name: Thierry
Location: I'm the uber Spaminator !
Trades: 0
Not by me...
When I take the html code Curl returns me, it's this:
HTML Code:
<html>
<head>
<title>Era zawirowan by Alan Greenspan - 9788374954518 - Compare Prices on New & Used Textbooks, Cheap Textbooks
- GetTextbooks.com</title>
<meta name="keywords" content="8374954515, 83-7495-451-5, 9788374954518, 978-83-7495-451-8" />
<LINK REL=STYLESHEET HREF="/styles/default.css" TYPE="text/css">
</head>
<body bgcolor=#FFFFFF topmargin=0 leftmargin=0 marginwidth=0 marginheight=0>
<table width="100%" style="color: black; background-color: #000000;" ID="Table1">
    <tr>
        <td align="center">
        <table width="100%" border="0">
            <tr>
                <td><img src="/images/space.gif" width="5" height="1" /></td>
                <td valign="top" width="100%"><a href="/" target="_top" style="text-decoration: none;"><font
                    size="+2" color="green"
                ><b>GetTextbooks.com</b></font></a><br />
                <b style="font-size: 12px;">&nbsp;<font color="#FFFFFF">&nbsp;Compare Prices & Save up to 90%</font></b></td>
                <td><img src="/images/space.gif" width="30" height="1" /></td>
                <form name=frm action="/search/" target="_parent" ID="Form1">
                <td align=right width=500>
                <table border=0 cellpadding=0 cellspacing=0>
                    <tr>
                        <td valign=bottom><font color="#FFFFFF">Search by multiple ISBN, single ISBN, title,
                        author, etc ...</font><br />
                        <img src="/images/space.gif" width="1" height="3" /><br />
                        <input type="text" size="60" name="isbn" value="8374954515" ID="Text1" /><script
                            language="javascript"
                        >document.frm.isbn.focus()</script></td>
                        <td>&nbsp;</td>
                        <td valign=bottom><input type="submit" value="Go" /></td>
                    </tr>
                </table>
                </td>
                </form>
                <td><img src="/images/space.gif" width="5" height="1" /></td>
            </tr>
        </table>
        </td>
    </tr>
</table>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
    <tr>
        <td align="right" class=""><a href="/user/login/">Login</a> | <a href="/user/signup/">Sign up</a> | <a
            href="/settings/"
        >Settings</a> | <a href="/bookbag">My Wish List</a> | <a href="/ibundle/">My iBundle</a>&nbsp;</td>
    </tr>
</table>
<center>
<div id=progress name=progress class=regularText>
<center><br />
<table border=0>
    <tr>
        <td id=txtmsgprogress align=center class=regularText><b>Searching the web for the best textbook prices</b><br />
        just be a few seconds ...</td>
    </tr>
</table>
<table border=0 width=420>
    <tr>
        <td width=60></td>
        <td>
        <table cellpadding=0 cellspacing=0 width=300 class=progressBar>
            <tr>
                <td width=300><img id=imgprogress src="/images/line.gif" width=0 height=5 /></td>
            </tr>
        </table>
        </td>
        <td id=txtprogress width=60 align=right class=regularText>0 %</td>
    </tr>
</table>
</center>
</div>
<script language=javascript>function updateprogress(nPercent){document.getElementById('imgprogress').width=nPercent*3;document.getElementById('txtprogress').innerHTML=nPercent+" %";if( nPercent < 100 )setTimeout( 'updateprogress('+(nPercent+1)+')', 75 );}</script><script
    language=javascript
>updateprogress(1);</script><script language=javascript>document.getElementById('progress').style.visibility="hidden";document.getElementById('progress').style.height="0";document.getElementById('progress').style.position="absolute";</script><br />
<a name="9788374954518" />
<style>
A.plainBlk {
  color: #000000;
  text-decoration: none;
}
</style>
<script language="javascript" type="text/javascript">function ov(r,c){r.style.background=c;r.style.cursor='pointer';}function ot(r,c){r.style.background=c;r.style.cursor='';window.status='';}function bot(){window.status='';}function u(mouseEvent,sUrl){var elementID;if(!mouseEvent)mouseEvent = window.event;if(mouseEvent.target)elementID = mouseEvent.target.id;else if(mouseEvent.srcElement)elementID = mouseEvent.srcElement.id;var noActionIDs = new Array();noActionIDs[0] = "cl";noActionIDs[1] = "hl";for(var i = 0; i < noActionIDs.length; i++) {if(elementID == noActionIDs[i])return;}performAction(sUrl);}function performAction(sUrl) {window.open(sUrl,'_blank');}</script>
<table border=0 cellpadding=0 cellspacing=0 width=500 class=StoreTable>
    <tr>
        <td colspan="10" align=right>
        <table border=0 width=100%>
            <tr valign=top>
                <td width=100% class=regularText><font size=-1><br />
                <i></i></font><font size="-2"><br />
                ISBN-10: 83-7495-451-5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <font color=#666666>(8374954515)</font></font><font
                    size="-2"
                ><br />
                ISBN-13: 978-83-7495-451-8 <font color=#666666>(9788374954518)</font></font></td>
                <td align=right><img src="http://images.amazon.com/images/P/.01.MZZZZZZZ.jpg" /><font size=-2><br />
                <a href="/bookbag/add/9788374954518"><font color=#7777CC>Add&nbsp;to&nbsp;Wish&nbsp;List</font></a>&nbsp;<br />
                <a href="/pricealert/9788374954518"><font color=#7777CC>Set&nbsp;Price&nbsp;Alert</font></a>&nbsp;</font></td>
            </tr>
        </table>
        </td>
    </tr>
    <tr>
        <td colspan="10"><br />
        <font color=#666666><i>Shipping to Switzerland <a href="/preferences/" target=_top><font
            color=#7777CC
        ><img src="/images/flags/CH.png" width=18 height=12 border=0 /></font></a>, Total Price in Swiss Francs (<a
            href="/preferences/" target=_top
        ><font color=#7777CC>change</font></a>)<br />
        Coupons included, Displaying Stores in Any Language (<a href="/preferences/" target=_top><font color=#7777CC>change</font></a>)</i></font><br />
        <br />
        <blockquote><br />
        <b>No copies of this book were found in stock from<br />
        over 100 online book stores and marketplaces.</b><br />
        <br />
        <ul>
            <li><a href="/pricealert/9788374954518/"><b>Alert me when this book becomes available.</b></a><br />
            <br />
            </li>
        </ul>
        </blockquote>
    </tr>
    </td>
    <tr>
        <td></td>
        <td colspan=10 class=regularText></td>
        <td></td>
    </tr>
</table></center>
<br />
<br />
<center><span class=logoLink><font color=#aaaaaa><a href="/" target=_top>Home</a> | <a
    href="/browse/" target=_top
>Browse</a> | <a href="/professors/" target=_top>Professors</a> | <a href="/webmasters/" target=_top>Webmasters</a> | <a
    href="/contact/" target=_top
>Contact Us</a><br />
<img src="/images/space.gif" width=1 height=6 /><br />
[ <a href="http://www.gettextbooks.ca">Canada</a> | <a href="http://www.gettextbooks.co.uk">United Kingdom</a> ]<br />
<img src="/images/space.gif" width=1 height=6 /><br />
[ <a href="http://www.getcdprices.com/?src=733">CDs</a> | <a href="http://www.getdvdprices.com/?src=733">DVDs</a> ]<br />
<img src="/images/space.gif" width=1 height=6 /><br />
Copyright &copy; 2003-2009 <a href="/" target=_top>GetTextbooks.com</a></font><br />
<br />
</span></center>
</body>
</html>
If you look closely, you will find the table on the line 84, and it's a 1 column table that hold what is displayed into the browser.

I based my tests on the url:
http://www.gettextbooks.com/search/?isbn=8374954515
__________________
Only a biker knows why a dog sticks his head out the window.
tripy is offline
Reply With Quote
View Public Profile Visit tripy's homepage!
 
Old 11-18-2009, 08:23 AM Re: cURL - pause transfer
Junior Talker

Posts: 4
Trades: 0
So it is strange - I do not get the same results.
I need to investigate it more.

Thanks for the help.

Actually - here where the problem is

Wht you got is
HTML Code:
<table border=0 cellpadding=0 cellspacing=0 width=500 class=StoreTable>
    <tr>
        <td colspan="10" align=right>
        <table border=0 width=100%>
            <tr valign=top>
                <td width=100% class=regularText><font size=-1><br />
                <i></i></font><font size="-2"><br />
                ISBN-10: 83-7495-451-5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <font color=#666666>(8374954515)</font></font><font
                    size="-2"
                ><br />
                ISBN-13: 978-83-7495-451-8 <font color=#666666>(9788374954518)</font></font></td>
                <td align=right><img src="http://images.amazon.com/images/P/.01.MZZZZZZZ.jpg" /><font size=-2><br />
                <a href="/bookbag/add/9788374954518"><font color=#7777CC>Add&nbsp;to&nbsp;Wish&nbsp;List</font></a>&nbsp;<br />
                <a href="/pricealert/9788374954518"><font color=#7777CC>Set&nbsp;Price&nbsp;Alert</font></a>&nbsp;</font></td>
            </tr>
        </table>
        </td>
    </tr>
    <tr>
 
actual web page downloaded by web browser is
HTML Code:
<table border=0 cellpadding=0 cellspacing=0 width=500 class=StoreTable>
<tr>
<td colspan="10" align=right>
<table border=0 width=100%>
<tr valign=top>
<td width=100% class=regularText>
<font size=+1>
<b>Era zawirowan</b>
</font><br/>
<b>by <font class="nav_tab">
<a href="/author/Alan_Greenspan">Alan Greenspan</a>
</font>
</b><font size=-1><br/><i>Published 2008</i></font><font size="-2"><br/>
ISBN-10: 83-7495-451-5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <font color=#666666>(8374954515)</font>
</font><font size="-2"><br/>ISBN-13: 978-83-7495-451-8 

So there is a part of the html code missing when downloaded using cURL

Last edited by Piotr; 11-18-2009 at 08:29 AM..
Piotr is offline
Reply With Quote
View Public Profile
 
Reply     « Reply to cURL - pause transfer
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.64319 seconds with 12 queries