Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

PHP Forum


You are currently viewing our PHP Forum as a guest. Please register to participate.
Login



Freelance Jobs

Reply
Extracting content from div
Old 09-15-2009, 08:39 PM Extracting content from div
Brian07002's Avatar
Defies a Status

Posts: 2,162
Name: ...
Location: ...
Trades: 0
Can someone show me an example of how I would go about extracting just the text contained inside a div tag on a site and save to a text file?

Example to extract the content of the div.text1 tag throughout a particular website.
__________________
Made2Own

Please login or register to view this content. Registration is FREE
Brian07002 is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 09-15-2009, 08:59 PM Re: Extracting content from div
Average Talker

Posts: 29
Trades: 0
You can't using PHP, but you can with JavaScript and then pass that info to PHP using AJAX. IN JavaScript, you just use the .InnerHTML property of the DIV.
__________________

Please login or register to view this content. Registration is FREE

Great Web Hosting just $9/mo. Includes free domain and renewals. Free 7 day trial.
bunchjesse is offline
Reply With Quote
View Public Profile Visit bunchjesse's homepage!
 
Old 09-15-2009, 09:35 PM Re: Extracting content from div
NullPointer's Avatar
Will Code for Food

Posts: 2,815
Name: Matt
Location: Irvine, CA
Trades: 0
Using preg_match:
PHP Code:
$content//your html content
//this may be wrong someone a little better
//with regular expressions will probably know the correct way
$expression '/<div(.*)>(.*)</div>/';

$matches = array(); //this will store all of the matches
preg_match($expression$content$matches);

$divContent = array();

foreach(
$matches as $match)
{
     
$match preg_replace('/<div(.*)>/'''$match);
     
$divContent[] = str_replace('</div>'''$match);
}

//$divContent will now contain an array of strings extracted 
//from the html 
You'll need to double check the regular expressions, but that should do it.
__________________

Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
NullPointer is online now
Reply With Quote
View Public Profile Visit NullPointer's homepage!
 
Old 09-16-2009, 01:52 PM Re: Extracting content from div
JeremyMiller's Avatar
WT Moderator

Posts: 1,712
Name: Jeremy Miller
Location: Las Vegas, NV
Trades: 0
Your question doesn't make sense to me unless you're talking about crawling a site and extracting content from a div on the page. RegEx's are powerful, but can be challenging to match in a lot of situations. Since ID's are supposed to be unique, we can use this to our advantage by searching for it (this WILL NOT WORK if the id is not unique):
PHP Code:
<?php
$content 
'<!DOCTYPE html>
            <html lang="en">
              <head>
                <meta charset="UTF-8">
                <title></title>
                <script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"><!-- Makes HTML 5 elements available for styling --></script>
              </head>
              <body>
                <nav id="header_nav">
                  <ul>
                    <li><a href=""></a></li>
                    <li><a href=""></a></li>
                    <li><a href=""></a></li>
                  </ul>
                </nav>
                <div id="text1">My content</div>
              </body>
            </html>'
;
//Find id
if (($id_pos strpos($content'text1')) !== false) {
  
$end_of_tag strpos($content'>'$id_pos);
  
$close_tag strpos($content'</div>'$end_of_tag);
  echo 
substr($content$end_of_tag 1$close_tag $end_of_tag 1);
}
?>
@Matt: When writing a reg ex to apply to HTML tags, I usually use [^>]* for matching the inner contents as the > character should not show up (if done correctly) in the rest of a tag and will give non-greedy matches. When I use the dot character, I always end up playing with the ? greedy toggler to figure out if I have it right. Others more versed in RegEx's may have a different philosophy or easier way of understanding it, but I thought you may appreciate knowing someone else's way (with reasoning).
__________________
Jeremy Miller

Please login or register to view this content. Registration is FREE

Last edited by JeremyMiller; 09-16-2009 at 01:55 PM..
JeremyMiller is offline
Reply With Quote
View Public Profile Visit JeremyMiller's homepage!
 
Old 09-16-2009, 02:10 PM Re: Extracting content from div
chrishirst's Avatar
Missing! presumed drunk.

Posts: 42,384
Name: Chris Hirst
Location: Blackpool. UK
Trades: 0
Quote:
Originally Posted by bunchjesse View Post
You can't using PHP, but you can with JavaScript and then pass that info to PHP using AJAX. IN JavaScript, you just use the .InnerHTML property of the DIV.
Nope, you can't as that would be blocked as cross site scripting.
__________________
Chris. ->>
Please login or register to view this content. Registration is FREE
<<-

A foolish consistency is the hobgoblin of little minds
Thought for today:- Is SEO the only industry where all the cowboys are Indians?
chrishirst is online now
Reply With Quote
View Public Profile Visit chrishirst's homepage!
 
Old 09-17-2009, 01:25 AM Re: Extracting content from div
Brian07002's Avatar
Defies a Status

Posts: 2,162
Name: ...
Location: ...
Trades: 0
Thanks for all the advice, will try it eventually.
__________________
Made2Own

Please login or register to view this content. Registration is FREE
Brian07002 is offline
Reply With Quote
View Public Profile
 
Old 09-18-2009, 10:12 AM Re: Extracting content from div
Extreme Talker

Posts: 185
Trades: 0
You can try http://simplehtmldom.sourceforge.net/ for manipulation of HTML elements, it is very powerful DOM analyser.
__________________

Please login or register to view this content. Registration is FREE

Please login or register to view this content. Registration is FREE

Please login or register to view this content. Registration is FREE
weker is offline
Reply With Quote
View Public Profile
 
Old 09-20-2009, 07:21 PM Re: Extracting content from div
dweebsonduty's Avatar
Junior Talker

Posts: 3
Name: Shane Burgess
Trades: 0
Quote:
Originally Posted by chrishirst View Post
Nope, you can't as that would be blocked as cross site scripting.
But what if you used php to read and print it then javascript to do the rest...

Inefficient, I know but possible.

The preg match is the best way to go.
__________________

Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE

dweebsonduty is offline
Reply With Quote
View Public Profile
 
Reply     « Reply to Extracting content from div
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.84791 seconds with 12 queries